### Task 1: Polynomial Regression - Model Comparison
**Objective**: Compare Linear Regression vs Polynomial Regression and understand when to use each

**Dataset**: `Task-Datasets/task1_polynomial_data.csv`

**Instructions**:
1. Load the dataset containing Experience_Years and Salary data (15 rows)
2. Visualize the data with a scatter plot
3. Build and train the following models:
   - Linear Regression
   - Polynomial Regression with degree=2
   - Polynomial Regression with degree=3
   - Polynomial Regression with degree=4
4. Create visualizations for each model showing:
   - Original data points
   - Regression line/curve
   - Proper title and labels
5. Make a prediction: What salary would you expect for someone with 8.5 years of experience using each model?
6. Compare the predictions and explain which model seems most appropriate and why

**Deliverable**: 
- Code with all four models
- Four separate visualizations
- Prediction comparison
- Brief written explanation (markdown cell) of which model is best

### Task 2: Support Vector Regression (SVR)
**Objective**: Implement SVR and understand the importance of feature scaling

**Dataset**: `Task-Datasets/task2_svr_data.csv`

**Instructions**:
1. Load the dataset with Temperature and Ice_Cream_Sales (20 rows)
2. **Without feature scaling**:
   - Build and train a Linear Regression model
   - Visualize the results
3. **Without feature scaling**:
   - Build and train an SVR model with RBF kernel
   - Visualize the results
   - Note what happens to the predictions
4. **With proper feature scaling**:
   - Apply StandardScaler to both X and y
   - Build and train an SVR model with RBF kernel
   - Visualize the results (remember to inverse transform for visualization)
5. Make a prediction: What ice cream sales would you expect at 27Â°C?
   - Use both Linear Regression and properly scaled SVR
   - Compare the predictions
6. Explain why feature scaling is critical for SVR

**Deliverable**: 
- Code showing SVR with and without scaling
- Comparison with Linear Regression
- Visualizations
- Explanation of why scaling matters

### Task 3: Decision Tree Regression
**Objective**: Implement Decision Tree Regression and visualize decision boundaries

**Dataset**: `Task-Datasets/task3_decision_tree_data.csv`

**Instructions**:
1. Load the dataset with Hours_Studied and Exam_Score (25 rows)
2. Build and train a Decision Tree Regressor (use random_state=0)
3. Create two visualizations:
   - **Standard resolution**: Plot original data and predictions
   - **High resolution**: Use np.arange with step 0.1 to show the step-like nature of decision trees
4. Compare with Linear Regression:
   - Build a Linear Regression model on the same data
   - Visualize both models on the same plot or separate plots
5. Make predictions:
   - Predict exam score for 23 hours of study
   - Compare Decision Tree vs Linear Regression predictions
6. Explain the advantage of Decision Tree for this type of data

**Deliverable**: 
- Decision Tree model with high-resolution visualization
- Comparison with Linear Regression
- Predictions
- Explanation of when Decision Trees are advantageous

## Part 2: Assignments

### Assignment 1: Comprehensive Model Comparison
**Objective**: Build and compare all three regression techniques on the same dataset

**Scenario**: A company wants to predict salaries based on position levels. The salary structure follows an exponential growth pattern as employees move up the hierarchy.

**Dataset**: `Assignment-Dataset/assignment1_salary_prediction.csv`

**Dataset Description**:
- **Check Data Dictionary for details**
- 10 position levels with corresponding salaries
- Non-linear relationship (exponential growth)

**Tasks**:

#### 1. Data Exploration
- Load and examine the dataset
- Create a scatter plot to visualize the relationship
- Describe the pattern you observe

#### 2. Model 1: Linear Regression
- Build and train a Linear Regression model
- Visualize predictions
- Predict salary for position level 6.5

#### 3. Model 2: Polynomial Regression
- Test multiple polynomial degrees (2, 3, 4, 5, 6)
- For each degree:
  - Train the model
  - Visualize the fit
  - Predict salary for position level 6.5
- Compare results and identify the best degree

#### 4. Model 3: Support Vector Regression
- Apply proper feature scaling (StandardScaler)
- Build SVR with RBF kernel
- Visualize results (inverse transform for display)
- Predict salary for position level 6.5

#### 5. Model 4: Decision Tree Regression
- Build Decision Tree Regressor
- Create high-resolution visualization
- Predict salary for position level 6.5

#### 6. Model Comparison
- Create a comparison table with:
  - Model name
  - Prediction for level 6.5
  - Visual assessment (does it fit the data well?)
  - Pros and cons for this dataset
- Create a combined visualization showing all models

#### 7. Analysis and Recommendations
- Which model performs best for this salary prediction problem?
- Why does it perform better than others?
- What are the risks of each approach?
- Which model would you recommend for production use?

**Deliverable**:
- Complete implementation of all four models
- Individual visualizations for each model
- Combined comparison visualization
- Comparison table
- Comprehensive analysis report (markdown cells)

### Assignment 2: Multi-Feature Regression
**Objective**: Apply advanced regression techniques to multi-feature datasets

**Scenario**: A building management company wants to predict energy consumption based on environmental factors to optimize HVAC systems and reduce costs.

**Dataset**: `Assignment-Dataset/assignment2_energy_efficiency.csv`

**Dataset Description**:
- **Check Data Dictionary for details**
- 100 records with 4 features
- Features: Temperature, Humidity, Wind_Speed, Solar_Radiation
- Target: Energy_Consumption

**Tasks**:

#### 1. Data Preparation
- Load and explore the dataset
- Display statistical summary
- Check for missing values
- Split data: 80% training, 20% testing (random_state=42)

#### 2. Baseline: Multiple Linear Regression
- Build Multiple Linear Regression model
- Train on training set
- Make predictions on test set
- Calculate metrics:
  - RÂ² score
  - Mean Absolute Error (MAE)
  - Root Mean Squared Error (RMSE)

#### 3. Model 2: Support Vector Regression
- Apply StandardScaler to features
- Build SVR with RBF kernel
- Train and predict
- Calculate same metrics
- Compare with baseline

#### 4. Model 3: Decision Tree Regression
- Build Decision Tree Regressor
- Try different max_depth values (3, 5, 10, None)
- For each depth:
  - Train and predict
  - Calculate metrics
- Identify best max_depth

#### 5. Model Evaluation and Comparison
- Create comparison table with all metrics
- Visualize predictions vs actual values for each model:
  - Scatter plot (predicted vs actual)
  - Add diagonal line (perfect prediction)
- Create residual plots for each model

#### 6. Feature Importance Analysis (Decision Tree only)
- Extract feature importances from best Decision Tree
- Create bar plot showing importance of each feature
- Interpret which environmental factors most affect energy consumption

#### 7. Business Insights
- Which model is most accurate for energy prediction?
- Which environmental factor has the biggest impact?
- Provide 3 recommendations for optimizing energy consumption
- Discuss trade-offs between model accuracy and interpretability

**Deliverable**:
- Complete preprocessing and model implementation
- Three trained models with performance metrics
- Comparison visualizations
- Feature importance analysis
- Business insights report (markdown cells)

### Assignment 3: Time Series Prediction with Polynomial Features
**Objective**: Apply regression techniques to time-series data with feature engineering

**Scenario**: A financial analyst wants to predict stock closing prices based on daily trading data to inform investment decisions.

**Dataset**: `Assignment-Dataset/assignment3_stock_prices.csv`

**Dataset Description**:
- **Check Data Dictionary for details**
- 90 days of trading data
- Features: Day, Opening_Price, High_Price, Low_Price, Volume
- Target: Closing_Price

**Tasks**:

#### 1. Data Exploration
- Load and examine the dataset
- Create time series plot showing opening and closing prices over time
- Calculate and display correlation matrix
- Identify which features are most correlated with closing price

#### 2. Feature Engineering
- Create new features:
  - `Price_Range` = High_Price - Low_Price
  - `Price_Change` = Closing_Price - Opening_Price (shift by 1 to avoid data leakage)
  - `Volume_MA` = Moving average of volume (window=5)
- Handle any NaN values from moving average
- Select final feature set for modeling

#### 3. Data Preparation
- Split data: Use first 70 days for training, last 20 days for testing (time series split)
- **Important**: Do not shuffle the data (maintain temporal order)

#### 4. Model 1: Multiple Linear Regression
- Build baseline model with original features
- Train and predict
- Calculate RÂ², MAE, RMSE

#### 5. Model 2: Polynomial Regression
- Create polynomial features (degree=2) for numeric features
- Build and train model
- Make predictions
- Calculate metrics
- Compare with baseline

#### 6. Model 3: Decision Tree Regression
- Build Decision Tree with best max_depth (test values: 3, 5, 7, 10)
- Train and predict
- Calculate metrics
- Analyze feature importance

#### 7. Visualization and Analysis
- Create time series plot comparing:
  - Actual closing prices
  - Predictions from each model
  - Use different colors/styles for each
- Create comparison table with all metrics
- Plot residuals over time for each model

#### 8. Model Selection and Limitations
- Which model performs best on the test set?
- Discuss overfitting concerns
- What are the limitations of using regression for stock price prediction?
- What additional data or features might improve predictions?
- Would you recommend using these models for actual trading? Why or why not?

**Deliverable**:
- Feature engineering code
- Three regression models with proper time series handling
- Comprehensive visualizations
- Performance comparison table
- Critical analysis of limitations (markdown cells)

## Part 3: Assessment

### Real-World Project: Car Price Prediction System

**Objective**: Apply all learned concepts from Weeks 14-15 in a comprehensive machine learning project

**Scenario**: You are a data scientist at an automotive company that buys and sells used cars. The company wants to develop an intelligent pricing system that accurately predicts car prices based on various features to:
- Price vehicles competitively
- Identify undervalued cars for purchase
- Maximize profit margins
- Provide instant price estimates to customers

**Dataset**: `Assessment-Dataset/assessment_car_price_prediction.csv`

**Dataset Description**:
- **Check Data Dictionary for complete details**
- 200 records of used cars
- Mix of numerical and categorical features
- Features include: Brand, Year, Mileage, Engine_Size, Horsepower, Fuel_Type, Transmission, Previous_Owners, Accident_History, Service_Records
- Target: Price (in dollars)

---

### Phase 1: Data Understanding & Preprocessing (Week 14 Skills)

#### 1.1 Data Loading and Exploration
- Load the dataset and display basic information:
  - Shape (rows, columns)
  - Data types
  - First and last 5 rows
- Statistical summary for numerical features
- Check for missing values
- Identify categorical vs numerical features

#### 1.2 Exploratory Data Analysis (EDA)
- Create visualizations:
  - Distribution of target variable (Price) - histogram
  - Price distribution by Brand - box plot
  - Price distribution by Fuel_Type - box plot
  - Correlation heatmap for numerical features
  - Scatter plot: Mileage vs Price
  - Scatter plot: Year vs Price
- Identify key insights from visualizations

#### 1.3 Data Preprocessing

**A. Handle Categorical Variables:**
- Encode categorical features:
  - Brand (OneHotEncoder)
  - Fuel_Type (OneHotEncoder)
  - Transmission (OneHotEncoder or LabelEncoder)
  - Accident_History (LabelEncoder: Yes=1, No=0)
  - Service_Records (LabelEncoder: Yes=1, No=0)
- Handle dummy variable trap (drop first column for OneHotEncoded features)

**B. Train-Test Split:**
- Split data: 70% training, 30% testing
- Use random_state=42 for reproducibility

**C. Feature Scaling:**
- Identify which numerical features need scaling
- Apply StandardScaler to numerical features
- Fit on training data, transform both training and test sets

**D. Validation:**
- Print shapes of X_train, X_test, y_train, y_test
- Verify no missing values remain
- Display first 5 rows of preprocessed training data

### Phase 2: Model Development (Week 15 Skills)

#### 2.1 Baseline Model: Multiple Linear Regression
- Build Multiple Linear Regression model
- Train on training set
- Make predictions on both training and test sets
- Calculate evaluation metrics:
  - RÂ² score (train and test)
  - Mean Absolute Error (MAE)
  - Mean Squared Error (MSE)
  - Root Mean Squared Error (RMSE)
- Store results for comparison

#### 2.2 Model 2: Polynomial Regression
- Create polynomial features (test degrees: 2, 3)
- For each degree:
  - Transform features
  - Train model
  - Calculate all metrics
  - Check for overfitting (compare train vs test RÂ²)
- Select best polynomial degree
- **Note**: Be careful with high degrees - may cause overfitting

#### 2.3 Model 3: Support Vector Regression
- Ensure features are properly scaled
- Build SVR with RBF kernel
- Try different parameters:
  - kernel='rbf', C=100, gamma='auto'
  - kernel='rbf', C=1000, gamma='scale'
- Train and evaluate each configuration
- Select best SVR model

#### 2.4 Model 4: Decision Tree Regression
- Build Decision Tree Regressor
- Test different hyperparameters:
  - max_depth: 3, 5, 10, None
  - min_samples_split: 2, 5, 10
  - min_samples_leaf: 1, 2, 5
- For each configuration:
  - Train model
  - Calculate metrics
  - Check for overfitting
- Select best Decision Tree model
- Extract and visualize feature importances

### Phase 3: Model Evaluation & Comparison

#### 3.1 Comprehensive Model Comparison
- Create comparison table with all models:

| Model | Train RÂ² | Test RÂ² | MAE | RMSE | Training Time |
|-------|----------|---------|-----|------|---------------|
| Multiple Linear Regression | ... | ... | ... | ... | ... |
| Polynomial Regression (degree X) | ... | ... | ... | ... | ... |
| SVR (best params) | ... | ... | ... | ... | ... |
| Decision Tree (best params) | ... | ... | ... | ... | ... |

- Analyze which model performs best
- Identify any overfitting issues

#### 3.2 Visualization - Predicted vs Actual
For each model, create:
- Scatter plot: Predicted vs Actual prices (test set)
- Add diagonal line representing perfect predictions
- Color points by prediction error magnitude
- Add RÂ² score to plot title

#### 3.3 Residual Analysis
For each model:
- Calculate residuals (actual - predicted)
- Create residual plot (residuals vs predicted values)
- Plot histogram of residuals
- Analyze residual patterns:
  - Are residuals randomly distributed?
  - Is there any pattern indicating model limitations?
  - Are there outliers?

#### 3.4 Feature Importance Analysis
- For Decision Tree model:
  - Extract feature importances
  - Create horizontal bar plot
  - List top 10 most important features
- Interpret results:
  - Which features most influence car prices?
  - Are the results intuitive?
  - Any surprising findings?

### Phase 4: Model Selection & Business Application

#### 4.1 Final Model Selection
Based on your analysis, select the best model and justify your choice considering:
- Accuracy (Test RÂ², RMSE)
- Overfitting concerns
- Interpretability
- Training/prediction speed
- Business requirements

Write a comprehensive justification (at least 200 words).

*Your justification here*

#### 4.2 Price Predictions for New Cars
Create 3 hypothetical cars with different characteristics:

**Car 1**: Budget sedan
- Brand: Toyota, Year: 2015, Mileage: 80000, Engine_Size: 1.5, Horsepower: 110
- Fuel_Type: Petrol, Transmission: Manual, Previous_Owners: 2
- Accident_History: No, Service_Records: Yes

**Car 2**: Luxury sedan
- Brand: BMW, Year: 2020, Mileage: 30000, Engine_Size: 3.0, Horsepower: 320
- Fuel_Type: Diesel, Transmission: Automatic, Previous_Owners: 1
- Accident_History: No, Service_Records: Yes

**Car 3**: Older vehicle with issues
- Brand: Ford, Year: 2012, Mileage: 150000, Engine_Size: 2.0, Horsepower: 150
- Fuel_Type: Petrol, Transmission: Manual, Previous_Owners: 4
- Accident_History: Yes, Service_Records: No

For each car:
- Preprocess the data correctly
- Make prediction using your best model
- Explain the predicted price
- Discuss confidence in the prediction

#### 4.3 Business Insights & Recommendations

Provide comprehensive analysis addressing:

**A. Key Findings:**
- What are the top 5 factors affecting car prices?
- Which car brands retain value best?
- How much does mileage affect price?
- Impact of accident history on pricing

**B. Business Recommendations:**
1. Inventory Management:
   - Which types of cars should the company prioritize buying?
   - Which features add most value?

2. Pricing Strategy:
   - How can the company identify underpriced cars in the market?
   - What price adjustments would maximize profit?

3. Customer Advisory:
   - What should customers know about factors affecting car value?
   - How can sellers maximize their car's value?

**C. Model Limitations:**
- What are the limitations of your chosen model?
- When might the model's predictions be unreliable?
- What additional data would improve predictions?

**D. Future Improvements:**
- How could the model be enhanced?
- What other machine learning techniques might work better?
- How should the model be maintained and updated?

Write at least 500 words addressing these points.

*Your comprehensive business analysis here*

---

## Bonus Challenges

If you want to go beyond the requirements:

### Bonus 1: Ensemble Methods
- Implement Random Forest Regressor
- Compare with Decision Tree
- Does ensemble improve performance?

### Bonus 2: Hyperparameter Tuning
- Use GridSearchCV or RandomizedSearchCV
- Optimize SVR or Decision Tree hyperparameters
- Document improvement achieved

### Bonus 3: Cross-Validation
- Implement k-fold cross-validation (k=5)
- Calculate average scores across folds
- Compare with simple train-test split

### Bonus 4: Outlier Detection and Handling
- Identify outliers in price data
- Test model performance with and without outliers
- Recommend outlier handling strategy

### Bonus 5: Model Deployment Preparation
- Save your best model using joblib or pickle
- Create a function that takes raw car data and returns price prediction
- Write a simple CLI or function interface for predictions

### Bonus 6: Additional Regression Techniques
- Try Ridge Regression or Lasso Regression
- Implement Gradient Boosting Regressor
- Compare with your previous models

---

## Submission Guidelines

### Deliverables:
1. **This Jupyter Notebook** with:
   - All code cells executed and showing outputs
   - Clear markdown explanations for each section
   - Well-commented code
   - All visualizations displayed

2. **Code Quality Requirements**:
   - Use meaningful variable names
   - Add comments for complex operations
   - Follow consistent code style
   - Remove any debugging/test code

3. **Documentation Requirements (Report)**:
   - Executive summary at the beginning
   - Methodology explanation
   - Clear interpretation of results
   - Conclusions and recommendations


## Link to your publication

*Add your publication link here*

---

**Good luck with your assignments! Remember, the goal is not just to build models, but to understand when and why to use each regression technique, and how to apply them to solve real-world business problems.**

**Key Takeaways from Week 15:**
- Polynomial Regression: Best for non-linear relationships with smooth curves
- SVR: Powerful for non-linear patterns, requires feature scaling
- Decision Trees: Excellent for capturing complex, non-continuous patterns
- Model selection depends on data characteristics and business requirements
- Always validate on test data to check for overfitting

## Happy New Year 2026! ðŸŽ‰