# Bloc 5 - Industrialisation d'un algorithme d'apprentissage automatique et automatisation des processus de décision - Getaround analysis

## Introduction

Getaround is an American online car sharing service founded in 2009 in San Francisco and launched to the public in 2011. It is now the European leader for car sharing, connecting drivers who need a car to car owners.

### Problematic

- 1 - Getaround has experienced problems with unsatisfied users due to late car returns from the previous driver. The company would like to optimize the delay between two rentals.

- 2 - Getaround currently works on pricing optimization. The company would like to suggest an optimum price to car owners depending on the characteristics of their car.

### Scope

To optimize the delay between two rentals, the product manager of Getaround needs some insights to define the right trade off that would solve the late check-outs issue without impacting the car owners' revenues. For this purpose, Getaround provided a dataset containing information on car shares, which included information on potential cancelation of the shares, check-out time for ended shares, and information on the previous share when applicable.

To optimize the price of rentals for car owners, Getaround would like to develop an API that would predict an optimum price based on cars' characteristics. For this purpose, the data science team provided a dataset containing information about cars, and the corresponding prices that are currently applied.

### Aims and objectives

**Aim 1: Build a dashboard that would help the product manager make a decision about late check-outs.**

Objectives:
- 1 - Evaluate the proportion of late check-outs and their impact on the next driver.
- 2 - Evaluate the benefits of the new feature for car drivers and car owners.
- 3 - Evaluate the global impact of the new feature on car rentals and cancelations.
- 4 - Evaluate the proportion of problematic cases solved by the new feature and the proportion of rentals that would be affected.

**Aim 2: Build an API that would predict an optimum rental price depending on cars' features.**

Objectives:
- 1 - Develop a machine learning model that would predict an optimum rental price.
- 2 - Develop an API for price prediction.
- 3 - Test the API functionality.

##
## Methods

### 1 - Library import

### 2 - File reading and basic exploration

The dataset related to the analysis of late check-outs was composed of 7 features describing 21.310 car shares. It contained missing values for some crucial information about previous shares and previous delays at check-out.

The dataset related to price prediction described 4.843 cars' characteristics, among which the brand, the mileage, the power engine, records on many options, and the rental price per day. It did not contain any missing value.

### 3 - Analysis of delay data

It was noticeable that more than 90% of the provided data did not contain any information about previous shares. The decision was made to keep this data for the current analysis, following the assumption that these shares might correspond to episodic shares, new users of the service, or possibly unsatisfied users who share their car only once (45% of the dataset). 

To ease the use of the dataset, features were renamed. The data was then augmented with the delay of the previous driver. Shares that did not have information on the delay at check-out (for ended rentals) were dropped from the data. Shares that did show a previous driver but did not have information on the previous delay were also discarded for the analysis.

Then, to build the dashboard, data was obtained for each figure and saved to later be plotted. Results of the analysis were published online at https://cnmgetaroundanalysis.herokuapp.com.

### 4 - Prediction of pricing

After removal of outliers, data was preprocessed for machine learning. A simple linear regression model was trained. The R2 scores on train and test sets are 0.69 and 0.71, respectively. The preprocessor and the model were saved to be used in the API. The API, published online at https://cnmgetaroundprediction.herokuapp.com/ was tested after its development.

##
## Conclusion

Regarding the analysis of late check-outs, a dashboard is available online. It shows that a delay of 1 hour between two rentals would solve 50% of problematic cases if applied to all cars, while only affecting 1.7% of all rentals.

Regarding the price prediction, an API is also available online to optimize the rental price of cars depending on their features. The machine learning model could be further optimized in future developments.

##
## Code

### 1 - Library import

In [None]:
### 1 - library import ### ----

import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import  OneHotEncoder, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import pickle

import requests

import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots


###
### 2 - File reading and basic exploration

In [None]:
### 2 - file reading and basic exploration - import dataset ### ----

# load data
data = pd.read_excel("cnm_bloc5_get_around_delay_analysis.xlsx")
data_ml = pd.read_csv("cnm_bloc5_get_around_pricing_project.csv")


In [None]:
### 2 - file reading and basic exploration - get basic stats on data ### ----

# print shape of data
print("Data: Delays")
print()
print("Number of rows: {}".format(data.shape[0]))
print("Number of columns: {}".format(data.shape[1]))
print()

# display dataset
pd.set_option('display.max_columns', None)
print("Dataset display: ")
display(data.head())
print()

# display basic statistics
print("Basics statistics: ")
data_desc = data.describe(include='all')
display(data_desc)
print()

# display percentage of missing values in columns and rows
percent_nan_col = data.isnull().sum() / data.shape[0] * 100
print("Percentage of missing values per column:\n{}".format(percent_nan_col))
print()
percent_nan_row = data[data.isnull().all(axis = 1)].shape[0] / data.shape[1] * 100
print("Percentage of rows fully filled with missing values: {}".format(percent_nan_row))


In [None]:
### 2 - file reading and basic exploration - get basic stats on data_ml ### ----

# print shape of data
print("Data: Prices")
print()
print("Number of rows: {}".format(data_ml.shape[0]))
print("Number of columns: {}".format(data_ml.shape[1]))
print()

# display dataset
pd.set_option('display.max_columns', None)
print("Dataset display: ")
display(data_ml.head())
print()

# display basic statistics
print("Basics statistics: ")
data_desc = data_ml.describe(include='all')
display(data_desc)
print()

# display percentage of missing values in columns and rows
percent_nan_col = data_ml.isnull().sum() / data_ml.shape[0] * 100
print("Percentage of missing values per column:\n{}".format(percent_nan_col))
print()
percent_nan_row = data_ml[data_ml.isnull().all(axis = 1)].shape[0] / data_ml.shape[1] * 100
print("Percentage of rows fully filled with missing values: {}".format(percent_nan_row))


###
### 3 - Analysis of delay data

In [None]:
### 3 - analysis of delay data  - get and save data for plot 0 ### ----

# number of shares per car owner

# get unique cars
cars_unique = data["car_id"].unique()

# initialise dataframe to store data
data_plot0 = pd.DataFrame(cars_unique, columns = ["car_id"])

# fill data per car owner
data_plot0["count_total"] = [data.loc[data["car_id"] == car,:].shape[0] for car in data_plot0["car_id"]]

# get percent of cars with only one share
percent1car = data_plot0.loc[data_plot0["count_total"] == 1,:].shape[0] / data_plot0.shape[0] * 100

# get cause for sharing car only once
cars_one_share = data_plot0.loc[data_plot0["count_total"] == 1,"car_id"]
data_one_share = data.loc[data["car_id"].isin(cars_one_share),:]
percent_canceled = data_one_share.loc[data_one_share["state"] == "canceled",:].shape[0] / \
    data_one_share.shape[0] * 100

# report
print("Percentage of owners that rent their car only once: {}%".format(np.round(percent1car,1)))
print("Percentage of cancelations for owners that rent their car only once: {}%".format(np.round(percent_canceled,1)))


In [None]:
### 3 - analysis of delay data  - preprocess data ### ----

# rename columns 
data.rename(columns = {"delay_at_checkout_in_minutes": "delay", "previous_ended_rental_id": "previous_id",
    "time_delta_with_previous_rental_in_minutes": "time_delta"}, inplace = True)

# augment data with delay from previous rental
for i in range(0,data.shape[0]):
    if np.isnan(data.loc[i,"previous_id"]):
        data.loc[i,"previous_delay"] = np.NaN
    else:
        previous_rental = data.loc[i,"previous_id"]
        data.loc[i,"previous_delay"] = data.loc[data["rental_id"] == previous_rental,"delay"].values[0]

# drop rows that contain a missing value for check-out delay (only for ended rentals)
# (these rows could not be analysed and would mess up analysis)
index_drop = data.loc[(data["delay"].isnull()) & (data["state"] == "ended"),:].index
data = data.drop(index_drop, axis = 0).reset_index(drop = True)

# drop rows that contain a missing value for previous delay (only if previous_id is known)
# (these rows could not be analysed and would mess up analysis)
index_drop = data.loc[(data["previous_delay"].isnull()) & (data["previous_id"].notnull()),:].index
data = data.drop(index_drop, axis = 0).reset_index(drop = True)


In [None]:
### 3 - analysis of delay data  - get and save data for plot 1 ### ----

# proportion on late vs on time check-outs

# get data
data_plot1 = data.loc[data["state"] == "ended",["state","delay"]]

# add column with encoded delay
data_plot1["checkout"] = data_plot1["delay"].apply(lambda x: "on time" if x <= 0 else "late")

# rename and save
data_plot1.to_csv("streamlit/cnm_bloc5_data_plot1.csv", index = False)


In [None]:
### 3 - analysis of delay data  - get and save data for plot 2 ### ----

# distribution of delays

# get data
data_plot2 = data_plot1.loc[data_plot1["delay"] > 0,:]

# drop outliers for plotting
upper_bond = data_plot2["delay"].mean() + 2 * data_plot2["delay"].std()
data_plot2 = data_plot2.loc[data_plot2["delay"] < upper_bond,:]

# rename and save
data_plot2.to_csv("streamlit/cnm_bloc5_data_plot2.csv", index = False)


In [None]:
### 3 - analysis of delay data  - get and save data for plot 3 ### ----

# proportion of cancelations after delayed check-outs

# get rental id of late check-outs
mask_late = (data["state"] == "ended") & (data["delay"] > 0)
late_checkouts = data.loc[mask_late,"rental_id"]

# get data
data_plot3 = data.loc[data["previous_id"].isin(late_checkouts),:]

# rename and save
data_plot3.to_csv("streamlit/cnm_bloc5_data_plot3.csv", index = False)


In [None]:
### 3 - analysis of delay data  - get and save data for plot 4 ### ----

# proportion of car drivers that would benefit from the new feature

# get data
data_plot4 = data.loc[:,["previous_id","previous_delay"]]

# get benefit per rental
data_plot4["benefit"] = "no benefit"
data_plot4.loc[(data_plot4["previous_id"].notnull()) & (data_plot4["previous_delay"] > 0),"benefit"] = "benefit"

# rename and save
data_plot4.to_csv("streamlit/cnm_bloc5_data_plot4.csv", index = False)


In [None]:
### 3 - analysis of delay data  - get and save data for plot 5 ### ----

# proportion of car owners that would benefit from the new feature

# get unique cars
cars_unique = data["car_id"].unique()

# initialise dataframe to store data
data_plot5 = pd.DataFrame(cars_unique, columns = ["car_id"])

# fill data per car owner
for i in range(0,data_plot5.shape[0]):
    # set masks
    mask_car = data["car_id"] == data_plot5.loc[i,"car_id"]
    mask_canceled = data["state"] == "canceled"
    mask_delay = data["previous_delay"] > 0
    # fill data
    data_plot5.loc[i,"count_total"] = data.loc[mask_car,:].shape[0]
    data_plot5.loc[i,"count_canceled_delay"] = data.loc[mask_car & mask_canceled & mask_delay,:].shape[0]

# get benefit per car owner
data_plot5["benefit"] = data_plot5["count_canceled_delay"].apply(lambda x: "benefit" if x > 0 else "no benefit")

# rename and save
data_plot5.to_csv("streamlit/cnm_bloc5_data_plot5.csv", index = False)


In [None]:
### 3 - analysis of delay data  - get and save data for plot 6 ### ----

# proportion of canceled rentals due to delay per car owner

# re-use data from previous plot
data_plot6 = data_plot5.copy()

# get percentage of cancelations due to previous delay for each car owner
data_plot6["percent_canceled_delay"] = data_plot6["count_canceled_delay"] / data_plot6["count_total"] * 100

# keep only beneficiaries
data_plot6 = data_plot6.loc[data_plot6["count_canceled_delay"] > 0,:]

# rename and save
data_plot6.to_csv("streamlit/cnm_bloc5_data_plot6.csv", index = False)


In [None]:
### 3 - analysis of delay data  - get and save data for plot 7 ### ----

# proportion of cancelations due to previous delay

# get data
data_plot7 = data.loc[data["state"] == "canceled",["state","previous_delay"]]

# add column with encoded cause for cancelation
data_plot7["cause"] = data["previous_delay"].apply(lambda x: "not known" if np.isnan(x) else "delay" if x > 0 
    else "no delay")

# rename and save
data_plot7.to_csv("streamlit/cnm_bloc5_data_plot7.csv", index = False)


In [None]:
### 3 - analysis of delay data  - get and save data for plot 8 ### ----

# proportion of rentals affected by the new feature

# set thresholds
thresholds = np.arange(0,450,30)

# initialise variable to store results
data_plot81 = pd.DataFrame(index = range(0,len(thresholds)), columns = ["threshold","type","percent_total"])
data_plot82 = pd.DataFrame(index = range(0,len(thresholds)), columns = ["threshold","type","percent_total"])

# get percentage of rentals affected depending threshold (all cars)
for i in range(0,len(thresholds)):
    data_plot81.loc[i,"threshold"] = thresholds[i]
    data_plot81.loc[i,"type"] = "all cars"
    data_plot81.loc[i,"percent_total"] = data.loc[data["previous_delay"] > thresholds[i],:].shape[0] / \
        data.shape[0] * 100
    
# get percentage of rentals affected depending threshold (connect cars)
for i in range(0,len(thresholds)):
    data_plot82.loc[i,"threshold"] = thresholds[i]
    data_plot82.loc[i,"type"] = "connect cars"
    data_plot82.loc[i,"percent_total"] = data.loc[(data["checkin_type"] == "connect") & 
        (data["previous_delay"] > thresholds[i]),:].shape[0] / data.shape[0] * 100
    
# compile data
data_plot8 = pd.concat([data_plot81,data_plot82], axis = 0)

# rename and save
data_plot8.to_csv("streamlit/cnm_bloc5_data_plot8.csv", index = False)


In [None]:
### 3 - analysis of delay data  - get and save data for plot 9 ### ----

# proportion of cancelations concerned by the new feature

# set thresholds
thresholds = np.arange(0,450,30)

# initialise variable to store results
data_plot91 = pd.DataFrame(index = range(0,len(thresholds)), columns = ["threshold","type","percent_total"])
data_plot92 = pd.DataFrame(index = range(0,len(thresholds)), columns = ["threshold","type","percent_total"])

# get percentage of cancelations affected depending threshold (all cars)
for i in range(0,len(thresholds)):
    data_plot91.loc[i,"threshold"] = thresholds[i]
    data_plot91.loc[i,"type"] = "all cars"
    data_plot91.loc[i,"percent_total"] = data.loc[(data["state"] == "canceled") & \
        (data["previous_delay"] > thresholds[i]),:].shape[0] / \
        data.loc[data["state"] == "canceled",:].shape[0] * 100
    
# get percentage of cancelations affected depending threshold (connect cars)
for i in range(0,len(thresholds)):
    data_plot92.loc[i,"threshold"] = thresholds[i]
    data_plot92.loc[i,"type"] = "connect cars"
    data_plot92.loc[i,"percent_total"] = data.loc[(data["state"] == "canceled") & \
        (data["checkin_type"] == "connect") & (data["previous_delay"] > thresholds[i]),:].shape[0] / \
        data.loc[data["state"] == "canceled",:].shape[0] * 100
    
# compile data
data_plot9 = pd.concat([data_plot91,data_plot92], axis = 0)

# rename and save
data_plot9.to_csv("streamlit/cnm_bloc5_data_plot9.csv", index = False)


In [None]:
### 3 - analysis of delay data  - get and save data for plot 10 ### ----

# proportion of cancelations due to delay concerned by the new feature

# set thresholds
thresholds = np.arange(0,450,30)

# initialise variable to store results
data_plot101 = pd.DataFrame(index = range(0,len(thresholds)), columns = ["threshold","type","percent_total"])
data_plot102 = pd.DataFrame(index = range(0,len(thresholds)), columns = ["threshold","type","percent_total"])

# get percentage of cancelations fixed depending threshold (all cars)
mask = (data["state"] == "canceled") & (data["previous_delay"] > 0)
for i in range(0,len(thresholds)):
    data_plot101.loc[i,"threshold"] = thresholds[i]
    data_plot101.loc[i,"type"] = "all cars"
    data_plot101.loc[i,"percent_total"] = data.loc[mask & (data["previous_delay"] < thresholds[i]),:].shape[0] / \
        data.loc[mask,:].shape[0] * 100
    
# get percentage of cancelations fixed depending threshold (connect cars)
mask = (data["state"] == "canceled") & (data["previous_delay"] > 0)
for i in range(0,len(thresholds)):
    data_plot102.loc[i,"threshold"] = thresholds[i]
    data_plot102.loc[i,"type"] = "connect cars"
    data_plot102.loc[i,"percent_total"] = data.loc[mask & (data["checkin_type"] == "connect") & \
        (data["previous_delay"] < thresholds[i]),:].shape[0] / data.loc[mask,:].shape[0] * 100
    
# compile data
data_plot10 = pd.concat([data_plot101,data_plot102], axis = 0)

# rename and save
data_plot10.to_csv("streamlit/cnm_bloc5_data_plot10.csv", index = False)


In [None]:
### 3 - analysis of delay data  - get and save data for plot 11 ### ----

# proportion of car shares that would be negatively affected by the new feature

# set thresholds
thresholds = np.arange(0,450,30)

# initialise variable to store results
data_plot111 = pd.DataFrame(index = range(0,len(thresholds)), columns = ["threshold","type","percent_total"])
data_plot112 = pd.DataFrame(index = range(0,len(thresholds)), columns = ["threshold","type","percent_total"])

# get percentage of car shares affected depending threshold (all cars)
mask = (data["state"] == "ended") & (data["previous_id"].notnull())
for i in range(0,len(thresholds)):
    data_plot111.loc[i,"threshold"] = thresholds[i]
    data_plot111.loc[i,"type"] = "all cars"
    data_plot111.loc[i,"percent_total"] = data.loc[mask & (data["time_delta"] < thresholds[i]),:].shape[0] / \
        data.shape[0] * 100
    
# get percentage of car shares affected depending threshold (connect cars)
mask = (data["state"] == "ended") & (data["previous_id"].notnull())
for i in range(0,len(thresholds)):
    data_plot112.loc[i,"threshold"] = thresholds[i]
    data_plot112.loc[i,"type"] = "connect cars"
    data_plot112.loc[i,"percent_total"] = data.loc[mask & (data["checkin_type"] == "connect") & \
        (data["time_delta"] < thresholds[i]),:].shape[0] / data.shape[0] * 100
    
# compile data
data_plot11 = pd.concat([data_plot111,data_plot112], axis = 0)

# rename and save
data_plot11.to_csv("streamlit/cnm_bloc5_data_plot11.csv", index = False)


###
### 4 - Prediction of pricing

In [None]:
### 4 - prediction of pricing - univariate analysis ### ----

# set figure to make subplots
fig1 = make_subplots(
    rows = 4,
    cols = 4,
    subplot_titles = (
        "A. Rental price", "B. Mileage", "C. Engine power", "",
        "D. Model key", "E. Paint color", "F. Car type", "G. Fuel",
        "G. Private parking", "H. Gps", "I. Air conditioning", "J. Automatic",
        "G. Connect", "H. Speed regulator", "I. Winter tires"),
    vertical_spacing = 0.12,
    horizontal_spacing = 0.05)

# plot distribution of each numeric variable
features_num = ["rental_price_per_day","mileage", "engine_power"]
[fig1.add_trace(go.Histogram(
    x = data_ml[features_num[i]],
    marker_color = px.colors.qualitative.Vivid[i]),
    row = 1, col = i+1) for i in [0, 1, 2]]

# plot categorical variables
features_cat = ["model_key","paint_color","car_type","fuel"]
for i in [0, 1, 2, 3]:
    data_current = data_ml[features_cat[i]].value_counts()
    fig1.add_trace(go.Bar(
        x = data_current.index,
        y = data_current.values,
        marker_color = px.colors.qualitative.Vivid[3:]),
        row = 2, col = i+1)
features_cat = ["private_parking_available","has_gps","has_air_conditioning","automatic_car"]
for i in [0, 1, 2, 3]:
    data_current = data_ml[features_cat[i]].value_counts()
    fig1.add_trace(go.Bar(
        x = data_current.index,
        y = data_current.values,
        marker_color = px.colors.qualitative.Vivid[2:]),
        row = 3, col = i+1)
features_cat = ["has_getaround_connect","has_speed_regulator","winter_tires"]
for i in [0, 1, 2]:
    data_current = data_ml[features_cat[i]].value_counts()
    fig1.add_trace(go.Bar(
        x = data_current.index,
        y = data_current.values,
        marker_color = px.colors.qualitative.Vivid[2:]),
        row = 4, col = i+1)

# update layout
fig1.update_xaxes(tickfont = dict(size = 8))
fig1.update_yaxes(tickfont = dict(size = 8))
fig1.update_layout(
        margin = dict(l = 60, r= 50, t= 140),
        title_text = "Figure 1. Univariate analysis",
        title_x = 0.5,
        title_y = 0.95,
        title_font_size = 18,
        bargroupgap = 0.2,
        showlegend = False,
        plot_bgcolor = "rgba(0,0,0,0)",
        paper_bgcolor = "rgb(232,232,232)",
        width = 800,
        height = 1000)

fig1.show()


In [None]:
### 4 - prediction of pricing - correlation matrix ### ----

# get correlation matrix
features_num = ["rental_price_per_day","mileage", "engine_power"]
corr_matrix = data_ml.loc[:,features_num].corr().round(2)

# plot correlation matrix
fig2 = ff.create_annotated_heatmap(corr_matrix.values,
                                  x = corr_matrix.columns.tolist(),
                                  y = corr_matrix.index.tolist())

# update layout
fig2.update_layout(
        margin = dict(l = 180, b = 40, t = 80),
        title_text = "Figure 2. Correlation matrix",
        title_x = 0.5,
        title_y = 0.95,
        title_font_size = 18,   
        plot_bgcolor = "rgba(0,0,0,0)",
        paper_bgcolor = "rgb(232,232,232)",
        width = 800,
        height = 400)

fig2.show()


In [None]:
### 4 - prediction of pricing - preprocessing ### ----

# drop outliers in mileage
upper_bound = data_ml["mileage"].mean() + 3 * data_ml["mileage"].std()
data_ml = data_ml.loc[data_ml["mileage"] < upper_bound,:]

# drop outliers in engine power
lower_bound = data_ml["engine_power"].mean() - 3 * data_ml["engine_power"].std()
upper_bound = data_ml["engine_power"].mean() + 3 * data_ml["engine_power"].std()
data_ml = data_ml.loc[(data_ml["engine_power"] < upper_bound) & (data_ml["engine_power"] > lower_bound),:]

# drop some useless columns
data_ml = data_ml.drop(["Unnamed: 0"], axis =1)


In [None]:
### 4 - prediction of pricing - preprocessing for machine learning ### ----

# separate target variable Y from features X
X = data_ml.drop(["rental_price_per_day"], axis = 1)
Y = data_ml["rental_price_per_day"]

# divide dataset into train and test sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 1)

# create preprocessor object from pipelines for numeric and categorical features
features_num = X._get_numeric_data().columns
features_cat = X.drop(features_num, axis = 1).columns
numeric_transformer = Pipeline(steps = [
    ("scaler", StandardScaler())
])
categorical_transformer = Pipeline(steps = [
    ("encoder", OneHotEncoder(drop = "first"))
])
preprocessor = ColumnTransformer(transformers = [
    ("num", numeric_transformer, features_num),
    ("cat", categorical_transformer, features_cat)
])

# scale numeric features, encode categorical features
X_train = preprocessor.fit_transform(X_train)
X_test = preprocessor.transform(X_test)


In [None]:
### 4 - prediction of pricing - train model and assess performance ### ----

# train model
regressor = LinearRegression()
regressor.fit(X_train, Y_train)

# make predictions on train and test sets
Y_train_pred = regressor.predict(X_train)
Y_test_pred = regressor.predict(X_test)

# assess performance and print report
r2_train = r2_score(Y_train, Y_train_pred)
r2_test = r2_score(Y_test, Y_test_pred)
print("R2 score on training set: ", r2_train)
print("R2 score on test set: ", r2_test)


In [None]:
### 4 - prediction of pricing - save transformer and model ### ----

# save transformer
file_name = "fastapi/preprocessor.pkl"
pickle.dump(preprocessor, open(file_name, "wb"))

# save model
file_name = "fastapi/model.pkl"
pickle.dump(regressor, open(file_name, "wb"))


In [None]:
### 4 - prediction of pricing - api testing ### ----

# get example data from the dataset
datatest = data_ml.iloc[0,:-1].to_dict()

# post request
response = requests.post("https://cnmgetaroundprediction.herokuapp.com/predict", json = datatest)
print(response.json())
