# Travel Package Recommendation
The `Tourism.xlsx` file contains the last year of customer data at a travel package company. These customers have been pitched 1 of 5 different travel packages - Basic, Standard, Deluxe, Super Deluxe, King. Looking at the data of the last year, we observed that only 18% of the customers purchased the package they were pitched. Dig into the data and develop a recommendation model that will tell the company's sales team which package they should pitch based on the customer's attributes.

## Schema
You can view the data schema on the `"Data Dict"` sheet of the `Tourism.xlsx` file. The raw data is contained on the `"Tourism"` sheet.

## Accepted Models
Please use a **nonlinear** model on this dataset (e.g. SVM,XGBoost, Neural Network, Random Forest). **Do not test multiple models**, pick one and stick with it. We want to see your ability to set a baseline model and improve the model with tuning. 

## Accepted Languages
**NOTE: Please develop your solution using Python. R will not be accepted**. 

## Accepted Modules
You can use any module you want but bonus points if you use any of the following:
- PyTorch
- PySpark (DataFrames and/or SparkMLlib)
- Plotly
- Numba
- NVIDIA Rapids

## Solution Guide
These steps below will help you organize a solution. Please describe your insights using graphs **and** text cells or comments as you progress through the solution.
- Data Cleaning
  - Handle outliers, missing values, and any other invalid entries
- Perform Exploratory Data Analysis
  - **Univariate Analysis** of all features, identify dependent and independent variable(s)
  - **Bivariate Analysis** of each dependent variables vs the independent variable
- Feature Engineering
  - Format the clean data for your model and split it into training, test, and validation sets.
- Model Baseline
  - Determine the scoring metric(s) you will use and explain why
  - Train a baseline model and report its peformance
- Model Tuning 
  - Cross validate and tune the hyperparameters of the model
  - Choose the best model and explain why you choose it
  - Report its performance
- Conclusion
  - Summarize insights you made during data cleaning and EDA
  - Compare the performance of the baseline model to the tuned model  