## Roadmap

1. **Load and Review the Dataset**
   - Load the dataset using Pandas
   - Display the first few rows and summary statistics
   - List the column names
   - **Resources**: [Pandas Documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html)
   
2. **Basic Statistical Analysis**
   - Generate descriptive statistics
   - Plot histograms for key features
   - Plot bar, line graphs, and distributions
   - **Libraries**: Pandas, Matplotlib, Seaborn
   - **Resources**:  [Matplotlib Documentation](https://matplotlib.org/stable/users/explain/quick_start.html)
                     [Matplotlib Examples](https://matplotlib.org/stable/gallery/index.html)

3. **Feature Analysis**
   - Scatter plots of key features vs. house prices
   - Analyze correlations between features and the target variable (sale price)
   - **Libraries**: Pandas, Matplotlib, Seaborn
   - **Resources**:  [Seaborn Pairplot](https://seaborn.pydata.org/generated/seaborn.pairplot.html)


4. **Data Processing and Feature Engineering**
   - Handle missing values
   - Encode categorical variables
   - Normalize/scale numerical features
   - Feature selection using Recursive Feature Elimination (RFE)
   - **Libraries**: Pandas, Scikit-learn
   - **Resources**:  [Scikit-learn Preprocessing](https://scikit-learn.org/stable/modules/preprocessing.html),
                     [Handling Missing Data](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html)

5. **Modeling**
   - Train and evaluate baseline models (Linear Regression, Decision Tree)
   - Implement advanced models (XGBoost, ANN)
   - Tune hyperparameters using GridSearchCV

   - **Libraries and Methods**:
   - **Scikit-learn**: Linear Regression, Decision Trees, Support Vector Regression (SVR), Random Forest, Gradient Boosting, RFE, GridSearchCV
      - **Resources**: [Scikit-learn Documentation](https://scikit-learn.org/stable/index.html)
   - **XGBoost**: Gradient boosting models
      - **Resources**: [XGBoost Documentation](https://xgboost.readthedocs.io/en/stable/python/python_intro.html#plotting)
   - **TensorFlow**: ANN models
      - **Resources**:  [TensorFlow Documentation](https://www.tensorflow.org/)
                        [Keras Regression](https://www.tensorflow.org/tutorials/keras/regression)

6. **Evaluation and Comparison** (Including both MAE and RMSE on log-transformed values)
   - Evaluate models using Mean Absolute Error (MAE) and Root-Mean-Squared-Error (RMSE) on log-transformed values
   - Compare model performance
   - Discuss overfitting and underfitting
- **Libraries**: Scikit-learn

- **Mean Absolute Error (MAE)**:
$$ \text{MAE} = \frac{1}{n} \sum_{i=1}^{n} | y_i - \hat{y_i} | $$
- **Root Mean Squared Error (RMSE)**:
$$ \text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y_i})^2} $$
- **Resources**: [Model Evaluation Metrics](https://scikit-learn.org/stable/modules/model_evaluation.html)

7. **Visualization and Inference**
   - Visualize model performance
   - Inference and conclusions
- **Libraries**: Matplotlib, Seaborn
- **Resources**:  [Data Visualization with Seaborn](https://seaborn.pydata.org/tutorial.html),
                  [Matplotlib Gallery](https://matplotlib.org/stable/gallery/index.html)


#### Handling Missing Data
- **Tasks**:
  - Identify missing values
  - Impute missing values using mean, median, mode, or advanced techniques like KNN imputation
- **Libraries**: Pandas, Scikit-learn
- **Resources**: [Imputation Methods](https://scikit-learn.org/stable/modules/impute.html)

#### Encoding Categorical Variables
- **Tasks**:
  - Convert categorical variables to numerical using One-Hot Encoding or Label Encoding
- **Libraries**: Pandas, Scikit-learn
- **Resources**: [Categorical Feature Encoding](https://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features)

#### Feature Selection
- **Tasks**:
  - Use Recursive Feature Elimination (RFE) to select important features
- **Libraries**: Scikit-learn
- **Resources**: [RFE Documentation](https://scikit-learn.org/stable/modules/feature_selection.html#rfe)

#### Hyperparameter Tuning
- **Tasks**:
  - Optimize model parameters using GridSearchCV or RandomizedSearchCV
- **Libraries**: Scikit-learn
- **Resources**: [GridSearchCV Documentation](https://scikit-learn.org/stable/modules/grid_search.html)

#### Advanced Modeling Techniques
- **Support Vector Regression (SVR)**:
  - **Description**: SVR is used for regression tasks, it predicts continuous values.
  - **Library**: Scikit-learn
  - **Resources**: [SVR Documentation](https://scikit-learn.org/stable/modules/svm.html#svr)
- **Artificial Neural Networks (ANN)**:
  - **Description**: ANN is a computational model based on the structure and functions of biological neural networks.
  - **Library**: TensorFlow, Keras
  - **Resources**: [TensorFlow ANN Tutorial](https://www.tensorflow.org/tutorials/keras/classification)
- **XGBoost**:
  - **Description**: XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework.
  - **Library**: XGBoost
  - **Resources**: [XGBoost Documentation](https://xgboost.readthedocs.io/en/stable/)

#### Model Evaluation Metrics
- **Tasks**:
  - Evaluate model performance using MAE, RMSE, and runtime
  - **Libraries**: Scikit-learn
  - **Resources**: [Model Evaluation Documentation](https://scikit-learn.org/stable/modules/model_evaluation.html)

#### Runtime Evaluation
- **Tasks**:
  - Measure computational efficiency and model training time
- **Libraries**: time
- **Resources**: [Measuring Runtime in Python](https://docs.python.org/3/library/time.html)

#### Additional Resources
- **Python Documentation**: [Python Official Documentation](https://docs.python.org/3/)
- **NumPy Documentation**: [NumPy Official Documentation](https://numpy.org/doc/stable/)