Tous Ensemble - Ensemble Learning project

This project is the final assignment of the Ensemble Learning class of 2023 at CentraleSupélec as part of the Master in Data Sciences & Business Analytics. It consists in 2 parts:

Predict Airbnb Prices in New York City using several ensemble methods seen in class.
Implement a Decision Tree from scratch on Python allowing to deal with both a regression task and a classification task.

Table of Contents

About The Project
- Built With
Installation
Usage
Roadmap
Acknowledgments

About The Project

Airbnb has become a popular alternative to traditional hotels, allowing individuals to list their properties as rental places. However, determining the optimal price for an Airbnb listing can be challenging for hosts, especially in large cities like New York, where the number of listings is substantial. To help hosts set competitive prices and improve their occupancy rates, accurate prediction of Airbnb prices is crucial. In this project, we aimed to predict the price of Airbnb listings in New York City using Ensemble Learning techniques on a Kaggle dataset. Our goal was to train and tune the hyperparameters of 14 methods (such as Decision trees, Random Forest or XGBoost, etc.) and combine their predictions using Stacking and Voting algorithms, two popular Ensemble techniques, that we developed ourselves and which performed better in comparison to Scikit-Learn’s implementation. We have evaluated the performance of our Ensemble models using several metrics such as MAE (Mean Absolute Error), RMSE (Root Mean Squared Er- decision tree algorithms that were compared with Scikit- Learn’s implementation on the task of predicting prices of Airbnb listings as well as on 4 other datasets. The results of this project confirmed that Stacking, Voting and Boosting are very interesting ensemble techniques that, if associated with proper feature engineering, could allow to provide valuable insights to Airbnb hosts in New York City for better decision making.

(back to top)

Built With

(back to top)

Installation

Clone the repo

git clone https://github.com/adel-R/Ensemble2023

Install the packages contained in the requirements.txt file

Unix/macOS

python -m pip install -r requirements.txt

Windows
```
py -m pip install -r requirements.txt
```

(back to top)

Usage

Decision Tree from scratch

Airbnb Price prediction

Final_Results = pd.DataFrame(all_scores)
Final_Results = Final_Results[['Model', 'R2', 'MAE', 'MSE', 'RMSE', 'MAPE', 'error_ratio_rmse', 'error_ratio_mae']] 
Final_Results.sort_values('R2', ascending = False)

Classification task

run_classification(load_digits())

Regression task

run_regression(load_diabetes())

(back to top)

Repository tree structure

.
│   Airbnb_Price_Prediction_Project.pdf
│   global_results.ipynb
│   README.md
│   requirements.txt
│
├───.ipynb_checkpoints
│       Airbnb_Price_Prediction_Project-First_Experimentations_Amine_Zaamoun-checkpoint.ipynb
│       draft_adel - Copy-checkpoint.ipynb
│       draft_adel-checkpoint.ipynb
│       global_results-checkpoint.ipynb
│       Untitled-checkpoint.ipynb
│
├───catboost_info
│   │   catboost_training.json
│   │   learn_error.tsv
│   │   time_left.tsv
│   │
│   └───learn
│           events.out.tfevents
│
├───dataset
│       AB_NYC_2019.csv
│       airbnb-listings.csv
│       name_tsne.csv
│       text_tsne.csv
│
├───decision_tree_from_scratch
│   │   Decision_Tree.py
│   │   Test.py
│   │   Test_of_homemade_decision_tree.ipynb
│   │   tree_classification_digits_dataset.png
│   │   tree_classification_iris_dataset.png
│   │   tree_regression_Airbnb_dataset.png
│   │   tree_regression_california_housing.png
│   │   tree_regression_diabetes_dataset.png
│   │
│   ├───.ipynb_checkpoints
│   │       homemade_decision_tree-checkpoint.ipynb
│   │
│   └───__pycache__
│           Decision_Tree.cpython-39.pyc
│
├───drafts
│   │   Airbnb_Price_Prediction_Project-First_Experimentations_Amine_Zaamoun.ipynb
│   │   decision_tree_scratch_draft.ipynb
│   │   draft_adel.ipynb
│   │
│   └───.ipynb_checkpoints
│           draft_adel-checkpoint.ipynb
│
├───functions
│   │   functions.py
│   │
│   └───__pycache__
│           functions.cpython-39.pyc
│
├───img
│       .gitignore
│       Airbnb_NYC-prices.jfif
│       decision_tree_from_scratch-viz.jfif
│       New_York_City_.png
│       plot_Airbnb_Price_NYC.png
│
├───models
│   │   adaboost_tuning.ipynb
│   │   bagging_tuning.ipynb
│   │   catboost_tuning.ipynb
│   │   decision_tree_from_scratch.ipynb
│   │   decision_tree_tuning.ipynb
│   │   extremely_randomized_forest_tuning.ipynb
│   │   lgbm_tuning.ipynb
│   │   random_forest_tuning.ipynb
│   │   sk_gradient_boosting_tuning.ipynb
│   │   sk_hist_gradient_boosting_tuning.ipynb
│   │   stacking_tuning.ipynb
│   │   tree_regression_Airbnb_dataset.png
│   │   voting_tuning.ipynb
│   │   xgboost_tuning.ipynb
│   │
│   ├───.ipynb_checkpoints
│   │       adaboost_tuning-checkpoint.ipynb
│   │       bagging_tuning-checkpoint.ipynb
│   │       catboost_tuning  TO DO-checkpoint.ipynb
│   │       decision_tree_from_scratch-checkpoint.ipynb
│   │       decision_tree_from_scratch_tuning-checkpoint.ipynb
│   │       decision_tree_tuning-checkpoint.ipynb
│   │       draft_adel - Copy-checkpoint.ipynb
│   │       extremely_randomized_forest_tuning-checkpoint.ipynb
│   │       lgbm_tuning-checkpoint.ipynb
│   │       random_forest_tuning-checkpoint.ipynb
│   │       sk_gradient_boosting_tuning-checkpoint.ipynb
│   │       sk_hist_gradient_boosting_tuning-checkpoint.ipynb
│   │       stacking_tuning-checkpoint.ipynb
│   │       voting_tuning-checkpoint.ipynb
│   │       xgboost_tuning-checkpoint.ipynb
│   │
│   ├───saved_models
│   │       adaboost_params.json
│   │       bagging_params.json
│   │       catboost_params.json
│   │       decision_tree_params.json
│   │       extremely_randomized_forest_params.json
│   │       homemade_tree_params.json
│   │       lgbm_params.json
│   │       lgbm_tuned.txt
│   │       random_forest_params.json
│   │       sk_gradient_boosting_params.json
│   │       sk_hist_gradient_boosting_params.json
│   │       vote_params.json
│   │       xgb_model.json
│   │       xgb_params.json
│   │
│   └───saved_scores
│           homemade_decision_tree_score.json
│           homemade_stacking_scores.json
│           sk_stacking_scores.json
│
└───seq2vec_tsne
    │   nlp_tsne_embedding_of_texts.ipynb
    │
    └───.ipynb_checkpoints
            nlp_tsne_embedding of texts-checkpoint.ipynb
            nlp_tsne_embedding_of_texts-checkpoint.ipynb

The global_results.ipynb notebook summarizes all the results obtained, and allows to re-fit all the saved models.

The experimented models are tuned and saved in separate notebooks in the folder models.

All the notebooks rely on helper functions stored in the folder functions.

The decision tree algorithm coded from scratch, the tsne embeddings performed on textual data and some draft notebooks have been stored in separate folders.

(back to top)

Contact

Acknowledgments

(back to top)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tous Ensemble - Ensemble Learning project

About The Project

Built With

Installation

Usage

Decision Tree from scratch

Airbnb Price prediction

Classification task

Regression task

Repository tree structure

Contact

Acknowledgments

About

Releases

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.ipynb_checkpoints		.ipynb_checkpoints
catboost_info		catboost_info
dataset		dataset
decision_tree_from_scratch		decision_tree_from_scratch
drafts		drafts
functions		functions
img		img
models		models
seq2vec_tsne		seq2vec_tsne
Airbnb_Price_Prediction_Project.pdf		Airbnb_Price_Prediction_Project.pdf
Final Report - Ensemble Learning Project.pdf		Final Report - Ensemble Learning Project.pdf
LICENSE		LICENSE
README.md		README.md
global_results.ipynb		global_results.ipynb
requirements.txt		requirements.txt

License

adel-R/Ensemble2023

Folders and files

Latest commit

History

Repository files navigation

Tous Ensemble - Ensemble Learning project

About The Project

Built With

Installation

Usage

Decision Tree from scratch

Airbnb Price prediction

Classification task

Regression task

Repository tree structure

Contact

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages