## Problem Approach

Based on the issue described and the dataset given from your side, the approach I followed was first to create a regressor model to predict gasoline consumption for both gasolines (i.e. SP98 and E10). Once model was created, I compared both consumptions for a random sample of 10000 drives. Finally for decision making, I scraped gasoline prices for gas station in Madrid city.

## Model Training

For this, the machine learning algorithm selected was a **Gradient Boosting** with a test split of 0.15. Previously, data entries with distances higher than 50 km were filtered out as I understood Cobiby type of business focuses on travels up to that distance. The input variables to model were: distance, speed, outside temperature, use of AC, presence of rain conditions, sun weather and type of gasoline.

The model optimized by cross validation obtained a **mean square error** on validation set of **0.16**(note that due to low number of data entries, a test split was not provided in order to maximize model learning capacity). 

## Random Sampling

A random sample of **10000 drives** was generated. Model input variables were randomly selected as follows:
* Distance: float number from 1 km to 50 km with 1 decimal precision.
* Speed: float number from 14 km/h to 90 km/4 with 1 decimal precision.
* Outside temp: float number from -5 ºC to 40 ºC with 1 decimal precision.
* Use of AC: boolean
* Rain conditions: boolean
* Sunny: boolean

Gasoline consumptions were predicted by model for both type of gasolines based on same sample of 10000 drives. Following pairplot shows comparison for both gasoline consumption: 

<img src="img/pairplot_consum.png" />


Based on above, it seems that at short distances and low speed, the E10 gasoline consumes more than SP98 gasoline. And this enhances at low outside temperature. It seems E10 behaves poorer when car is cold and/or does not have enough time to warm up when distance is short or at low average speed drives.

Model also shows a change of tendency when close to low limits. I understand this is  due to some bias on the model due to the small dataset provided (less than 400 entries). Most likely there are not enough number of cases at such conditions.

Anyway, this is just pure gasoline consumption but gasolines have different prices. Let's see afterwards how it looks comparison when price is taken into account.

## Gasoline Prices

From [this](https://www.dieselogasolina.com/) website, I scraped up to date prices of SP98 and E10 gasolines for all gasoline stations at Madrid city.

In [4]:
import pandas as pd

In [6]:
df_prices = pd.read_pickle('data/prices')
df_prices.sample(10)

Unnamed: 0,name,direction,gas_type,price
100,Gasolinera CEPSA en MADRID (MADRID)CALLE CORAZ...,"CALLE CORAZON DE MARIA, 76",Sin plomo 95,1329
185,Gasolinera CEPSA en MADRID (MADRID)BARRIO CENT...,"BARRIO CENTRO-EMBAJADORES, 83",Gasóleo C,889
772,Gasolinera CEPSA en MADRID (MADRID)AVENIDA CIU...,"AVENIDA CIUDAD DE BARCELONA, 61",Sin plomo 95,1354
439,Gasolinera BALLENOIL en MADRID (MADRID)CALLE C...,"CALLE CERRO DEL MURMULLO, 1",Sin plomo 95,1179
530,Gasolinera BP en MADRID (MADRID)CL PRINCIPE D...,CL PRINCIPE DE VERGARA 106,Gasóleo A,1159
160,Gasolinera REPSOL en MADRID (MADRID)PLAZA SETU...,"PLAZA SETUBAL, 4",Sin plomo 95,1289
778,Gasolinera SHELL en MADRID (MADRID)CALLE CARDE...,"CALLE CARDENAL HERRERA ORIA, 81",Sin plomo 98,1479
136,Gasolinera REPSOL en MADRID (MADRID)CALLE REAL...,"CALLE REAL DE ARGANDA, 74",Sin plomo 98,1469
19,Gasolinera REPSOL en MADRID (MADRID)PASEO SAN ...,"PASEO SAN FRANCISCO DE SALES, 44",Gasóleo A,1179
612,Gasolinera REPSOL en MADRID (MADRID)AVENIDA OP...,"AVENIDA OPORTO, SN",Sin plomo 95,1289


Below distribution of ratio of prices between SP98 and E10 gasolines for each gas station:

<img src="img/ratio_dist.png" />
<img src="img/ratio_box.png" />

## Which gasoline type is "cheaper"?

Based on the distribution of ratio prices between SP98 and E10, let's show same pairplot as before but this time taking into account actual costs and not just consumption.

For a ratio of 1.09, i.e. SP98 gasoline 1.09 times more expensive than E10:

<img src="img/pairplot_1_09.png" />

For all conditions, it is cheaper to use E10.

Now let's check for a very low price ratio, for example 1.06:

<img src="img/pairplot_1_04.png" />

At this extreme low ratio, it starts to appear cases at short distances and low outside temperature where SP98 is "cheaper" than E10 but they are only very few cases

## Recommendation

Based on this results, I **recommend to use E10** as type of gasoline for Cobify cars. 

But I strongly suggest to **collect a larger and more detailed dataset** to have a more precise predictor. For example type of drives: urban, mix, highway and also drives entries when motor is at normal operating temperature.

## Future developments

By scraping, Cobify can be daily advised with gas stations with cheaper gasoline prices. Also it can be studied cheaper oil station companies.