# Taxi orders prediction using Machine Learning

Sweet Lift Taxi company has collected historical data on taxi orders at airports. To attract more drivers during peak hours, we need to predict the amount of taxi orders for the next hour. Build a model for such a prediction.

The RMSE metric on the test set should not be more than 48.

## Project instructions

1. Download the data and resample it by one hour.
2. Analyze the data.
3. Train different models with different hyperparameters. The test sample should be 10% of the initial dataset. 
4. Test the data using the test sample and provide a conclusion.

## Data description

The data is stored in file `taxi.csv`. The number of orders is in the '*num_orders*' column.

<hr>

 # Table of contents

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ol>
        <li><a href="#open_the_data">Open the data file and study the general information</a></li>
        <li><a href="#prepare_the_data">Data preparation</a></li>
        <li><a href="#model_analysis">Model analysis</a></li>
        <li><a href="#model_training">Model training</a></li>
        <li><a href="#model_testing">Model testing</a></li>
        <li><a href="#conclusion">Conclusion</a></li>
    </ol>
</div>
<br>
<hr>

<div id="open_the_data">
    <h2>Open the data file and study the general information</h2> 
</div>

In [None]:
# import pandas and numpy for data preprocessing and manipulation
import numpy as np
import pandas as pd

# matplotlib and seaborn for visualization
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# import module for splitting and cross-validation using gridsearch
from sklearn.model_selection import train_test_split, GridSearchCV

# import metric to measure quality of model
from sklearn.metrics import mean_squared_error

# import machine learning models
from sklearn.linear_model import LinearRegression # import linear regression algorithm
from sklearn.ensemble import RandomForestRegressor # import random forest algorithm
from catboost import CatBoostRegressor, Pool # import catboost regressor
from lightgbm import LGBMRegressor # import lightgbm regressor
from xgboost import XGBRegressor # import xgboost regressor

from IPython.display import display

print('Project libraries has been successfully been imported!')

In [None]:
# read the data
try:
    df = pd.read_csv('https://code.s3.yandex.net/datasets/taxi.csv')
except:
    #df = pd.read_csv('C:/Users/hotty/Desktop/Practicum by Yandex/Projects/Numerical Methods/car_data.csv')
print('Data has been read correctly!')

<div id="prepare_the_data">
    <h2>Data preparation</h2> 
</div>

<div id="model_analysis">
    <h2>Model analysis</h2> 
</div>

<div id="model_training">
    <h2>Model training</h2> 
</div>

<div id="model_testing">
    <h2>Model testing</h2> 
</div>

<div id="conclusion">
    <h2>Conclusion</h2> 
</div>

# Review checklist

- [x]  Jupyter Notebook is open
- [ ]  The code is error-free
- [ ]  The cells with the code have been arranged in order of execution
- [ ]  The data has been downloaded and prepared
- [ ]  The data has been analyzed
- [ ]  The model has been trained and hyperparameters have been selected
- [ ]  The models have been evaluated. Conclusion has been provided
- [ ] *RMSE* for the test set is not more than 48