This repository contains the code and resources for predicting the country destination of new users on Airbnb. The project uses machine learning techniques to build a model that predicts the country where a new user will make their first travel booking.
The dataset used for this project is the "Airbnb New User Bookings" dataset obtained from the Kaggle competition hosted by Airbnb. It includes various user features such as age, gender, signup method, and more, along with the target variable "country_destination."
The goal of this project is to predict a categorical target variable ("country_destination") based on user features. Here's an overview of the approach:
-
Data Preprocessing: Load, clean, and preprocess the dataset. Handle missing values and convert categorical variables into numerical format.
-
Feature Engineering: Extract relevant features from the dataset, such as age and gender.
-
Model Selection: Since the target variable is categorical, we use classification algorithms such as Random Forest Classifier.
-
Model Training: Train the chosen classification algorithm on the preprocessed data.
-
Model Evaluation: Evaluate the model's performance using classification metrics like accuracy, precision, recall, and F1-score.
-
Prediction and Submission: Use the trained model to predict the country destinations for new users in the test dataset and prepare a submission file.
data/
: Directory containing the dataset files.model_selection/
: Trained model files.README.md
: This file providing an overview of the project.- and more
- Python 3.x
- Clone this repository.
- Install the required libraries
- analysing
- cleaning
- modeling
Improvements in the database with other evaluation tests
This project is licensed under the MIT License.
Feel free to contact us for any questions or collaborations.