#  Autolib: Electric Car Shairing Service

## Research Question
We are presented with a research question whereby we are to identify the most popular hour of the day for picking up a shared electric car (Bluecar) in the city of Paris over the month of April 2018.

Futher to this, there are bonus questions:
- What is the most popular hour for returning cars?
- What station is the most popular?
   - Overall?
   - At the most popular picking hour?
- What postal code is the most popular for picking up Blue cars? Does the most popular station belong to that postal code?
    - Overall?
    - At the most popular picking hour?

- Do the results change if you consider Utilib and Utilib 1.4 instead of Blue cars? 


 The following is the link to the data and its description:
 [http://bit.ly/autolib_dataset] and [Link](https://drive.google.com/a/moringaschool.com/file/d/13DXF2CFWQLeYxxHFekng8HJnH_jtbfpN/view?usp=sharing)




### Importing the necessary libraries for the project

In [36]:
import pandas as pd
import numpy as np

### Loading and previewing the data

In [156]:
#url=http://bit.ly/autolib_dataset

df=pd.read_csv('http://bit.ly/autolib_dataset')

In [65]:
df.head(10)

Unnamed: 0,Address,Cars,Bluecar counter,Utilib counter,Utilib 1.4 counter,Charge Slots,Charging Status,City,Displayed comment,ID,...,Scheduled at,Slots,Station type,Status,Subscription status,year,month,day,hour,minute
0,2 Avenue de Suffren,0,0,0,0,0,nonexistent,Paris,,paris-suffren-2,...,,2,station,ok,nonexistent,2018,4,8,11,43
1,145 Rue Raymond Losserand,6,6,0,0,0,operational,Paris,,paris-raymondlosserand-145,...,,0,station,ok,nonexistent,2018,4,6,7,24
2,2 Avenue John Fitzgerald Kennedy,3,3,0,2,0,operational,Le Bourget,,lebourget-johnfitzgeraldkennedy-2,...,,1,station,ok,nonexistent,2018,4,3,20,14
3,51 Rue EugÃ¨ne OudinÃ©,3,3,1,0,1,operational,Paris,,paris-eugeneoudine-51,...,,2,station,ok,nonexistent,2018,4,4,4,37
4,6 avenue de la Porte de Champerret,3,3,0,0,0,nonexistent,Paris,,paris-portedechamperret-6,...,,3,station,ok,nonexistent,2018,4,8,17,23
5,8 Boulevard Voltaire,0,0,0,0,0,nonexistent,Paris,,paris-voltaire-8,...,,4,station,ok,nonexistent,2018,4,6,7,2
6,37 rue Leblanc,0,0,0,0,0,nonexistent,Paris,"Station en parking (niv -1), accÃ¨s 37 rue Leb...",paris-citroencevennes-parking,...,,0,station,closed,nonexistent,2018,4,8,18,20
7,17 Rue des Luaps ProlongÃ©e,3,3,1,0,0,nonexistent,Nanterre,,nanterre-luaps-17,...,,0,station,ok,nonexistent,2018,4,4,22,13
8,34 avenue Jean Moulin,1,1,0,0,0,nonexistent,Paris,,paris-jeanmoulin-34,...,,4,station,ok,nonexistent,2018,4,2,22,58
9,41 boulevard de Rochechouart,6,6,0,0,0,nonexistent,Paris,,paris-anvers-parking,...,,0,station,ok,nonexistent,2018,4,4,15,2


### Access information on the dataframe

In [103]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 25 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Address              5000 non-null   object
 1   Cars                 5000 non-null   int64 
 2   Bluecar counter      5000 non-null   int64 
 3   Utilib counter       5000 non-null   int64 
 4   Utilib 1.4 counter   5000 non-null   int64 
 5   Charge Slots         5000 non-null   int64 
 6   Charging Status      5000 non-null   object
 7   City                 5000 non-null   object
 8   Displayed comment    111 non-null    object
 9   ID                   5000 non-null   object
 10  Kind                 5000 non-null   object
 11  Geo point            5000 non-null   object
 12  Postal code          5000 non-null   int64 
 13  Public name          5000 non-null   object
 14  Rental status        5000 non-null   object
 15  Scheduled at         47 non-null     object
 16  Slots 

## Data Cleaning

### Validity

Here, unecessary columns are removed. 'month' and 'year' column is also removed because the sample collected in the 9 consecutive days is from the same month and year.

In [157]:
#dropping columns
df.drop(['Cars','Displayed comment','Geo point','Public name','Scheduled at','year','month'],axis=1,inplace=True)
df.head(2)

Unnamed: 0,Address,Bluecar counter,Utilib counter,Utilib 1.4 counter,Charge Slots,Charging Status,City,ID,Kind,Postal code,Rental status,Slots,Station type,Status,Subscription status,day,hour,minute
0,2 Avenue de Suffren,0,0,0,0,nonexistent,Paris,paris-suffren-2,STATION,75015,operational,2,station,ok,nonexistent,8,11,43
1,145 Rue Raymond Losserand,6,0,0,0,operational,Paris,paris-raymondlosserand-145,STATION,75014,operational,0,station,ok,nonexistent,6,7,24


### Accuracy
as per the data description, charge slots can only be greater than 0 when operational. therefore checking:

In [158]:
#checking if charge slots are more than 0 when their status is operational  
df[~(df['Charge Slots']>0)&(df['Charging Status']=='operational')]

Unnamed: 0,Address,Bluecar counter,Utilib counter,Utilib 1.4 counter,Charge Slots,Charging Status,City,ID,Kind,Postal code,Rental status,Slots,Station type,Status,Subscription status,day,hour,minute
1,145 Rue Raymond Losserand,6,0,0,0,operational,Paris,paris-raymondlosserand-145,STATION,75014,operational,0,station,ok,nonexistent,6,7,24
2,2 Avenue John Fitzgerald Kennedy,3,0,2,0,operational,Le Bourget,lebourget-johnfitzgeraldkennedy-2,STATION,93350,operational,1,station,ok,nonexistent,3,20,14
17,182 Boulevard Voltaire,6,0,0,0,operational,Paris,paris-voltaire-182,STATION,75011,operational,0,station,ok,nonexistent,9,12,14
19,11 avenue RenÃ© Coty,4,0,2,0,operational,Paris,paris-renecoty-11,STATION,75014,operational,0,station,ok,nonexistent,9,21,31
20,2 Place de Catalogne,3,0,0,0,operational,Paris,paris-catalogne-2,SPACE,75014,operational,0,full_station,ok,operational,5,6,16
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4973,18 Rue Balard,5,0,0,0,operational,Paris,paris-balard-18,SPACE,75015,operational,0,full_station,ok,operational,9,13,34
4989,3 Rue Mongenot,4,0,0,0,operational,Saint-MandÃ©,saintmande-mongenot-3,STATION,94160,operational,0,station,ok,nonexistent,6,7,28
4990,47 boulevard de l'HÃ´pital,0,0,0,0,operational,Paris,paris-hopital-47,STATION,75013,operational,4,station,ok,nonexistent,3,21,13
4994,15 rue de Rocroy,0,0,0,0,operational,Paris,paris-rocroy-15,STATION,75010,operational,1,station,ok,nonexistent,1,20,49


### completeness

as shown below,the data has no null values thus complete considering the unecessary columns were dropped

In [68]:
df.isnull().sum()

Address                0
Bluecar counter        0
Utilib counter         0
Utilib 1.4 counter     0
Charge Slots           0
Charging Status        0
City                   0
ID                     0
Kind                   0
Postal code            0
Rental status          0
Slots                  0
Station type           0
Status                 0
Subscription status    0
day                    0
hour                   0
minute                 0
dtype: int64

### Consistency
as shown below there are no duplicate records 

In [61]:
df.duplicated().sum()

0

### Uniformity
The following lines of code are meant to edit the column names ensuring they are in small letters and separated by an underscore for multiple worded column names

In [159]:
df=df.rename(columns=str.lower)
df.head()

Unnamed: 0,address,bluecar counter,utilib counter,utilib 1.4 counter,charge slots,charging status,city,id,kind,postal code,rental status,slots,station type,status,subscription status,day,hour,minute
0,2 Avenue de Suffren,0,0,0,0,nonexistent,Paris,paris-suffren-2,STATION,75015,operational,2,station,ok,nonexistent,8,11,43
1,145 Rue Raymond Losserand,6,0,0,0,operational,Paris,paris-raymondlosserand-145,STATION,75014,operational,0,station,ok,nonexistent,6,7,24
2,2 Avenue John Fitzgerald Kennedy,3,0,2,0,operational,Le Bourget,lebourget-johnfitzgeraldkennedy-2,STATION,93350,operational,1,station,ok,nonexistent,3,20,14
3,51 Rue EugÃ¨ne OudinÃ©,3,1,0,1,operational,Paris,paris-eugeneoudine-51,STATION,75013,operational,2,station,ok,nonexistent,4,4,37
4,6 avenue de la Porte de Champerret,3,0,0,0,nonexistent,Paris,paris-portedechamperret-6,PARKING,75017,operational,3,station,ok,nonexistent,8,17,23


In [160]:
df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_')
df.head(5)

Unnamed: 0,address,bluecar_counter,utilib_counter,utilib_1.4_counter,charge_slots,charging_status,city,id,kind,postal_code,rental_status,slots,station_type,status,subscription_status,day,hour,minute
0,2 Avenue de Suffren,0,0,0,0,nonexistent,Paris,paris-suffren-2,STATION,75015,operational,2,station,ok,nonexistent,8,11,43
1,145 Rue Raymond Losserand,6,0,0,0,operational,Paris,paris-raymondlosserand-145,STATION,75014,operational,0,station,ok,nonexistent,6,7,24
2,2 Avenue John Fitzgerald Kennedy,3,0,2,0,operational,Le Bourget,lebourget-johnfitzgeraldkennedy-2,STATION,93350,operational,1,station,ok,nonexistent,3,20,14
3,51 Rue EugÃ¨ne OudinÃ©,3,1,0,1,operational,Paris,paris-eugeneoudine-51,STATION,75013,operational,2,station,ok,nonexistent,4,4,37
4,6 avenue de la Porte de Champerret,3,0,0,0,nonexistent,Paris,paris-portedechamperret-6,PARKING,75017,operational,3,station,ok,nonexistent,8,17,23


## Analysis

### Identify the most popular hour of the day for picking up a shared electric car (Bluecar) in the city of Paris over the month of April 2018.

we are assuming that the counter counts the number of cars available at the stations at that given time 

In [151]:
df[(df['bluecar_counter'] == 0) & (df['city'] == 'Paris')]['hour'].value_counts().sort_values(ascending = False).head()

2     54
21    52
3     52
16    50
23    49
Name: hour, dtype: int64

### What is the most popular hour for returning cars 

again assuming there are resources to count at the stations thus all resources have been returned at the given time

In [166]:

df[(df['bluecar_counter'] > 0) & (df['utilib_counter'] > 0) & (df['utilib_1.4_counter'] > 0)]['hour'].value_counts().head(2)

14    5
1     5
Name: hour, dtype: int64

#### most popular hour for picking cars

In [168]:

df[(df['bluecar_counter'] == 0) & (df['utilib_counter'] == 0) & (df['utilib_1.4_counter'] == 0)]['hour'].value_counts().head(2)

14    70
21    67
Name: hour, dtype: int64

### What station is the most popular?
    -Overall?

In [176]:
df[(df['bluecar_counter'] == 0) & (df['utilib_counter'] == 0) & (df['utilib_1.4_counter'] == 0)&(df['kind']=='STATION')]['id'].value_counts().head(1)

guyancourt-philibertdelorme-9    8
Name: id, dtype: int64

### What station is the most popular?
    -At the most popular picking hour?

In [177]:
df[(df['hour']==14)&(df['kind']=='STATION')]['id'].value_counts().head(1)

paris-aumale-28    3
Name: id, dtype: int64

### What postal code is the most popular for picking up Blue cars?
Does the most popular station belong to that postal code?

In [180]:
df[(df['bluecar_counter'] == 0)]['postal_code'].value_counts().head(1)

75008    94
Name: postal_code, dtype: int64

In [181]:
#to get the list of stations with the most popular postal code
df[df['postal_code']==75008]['id'].unique()

array(['paris-sainthonore-161', 'paris-courcelles-69',
       'paris-haussmann-61', 'paris-saintaugustin-9',
       'paris-concorde-parking', 'paris-rome-73', 'paris-berri-10',
       'paris-malesherbes-7', 'paris-monceau-43',
       'paris-constantinople-4', 'paris-chateaubriand-19',
       'paris-sainthonore-422', 'paris-hoche-31', 'paris-tronchet-19',
       'paris-courcelles-40', 'paris-batignolles-9', 'paris-liege-24',
       'paris-matignon-2', 'paris-georgev-44', 'paris-astorg-11',
       'paris-messine-12', 'paris-francois1er-7', 'paris-rome-46',
       'paris-madeleinetronchet-parking', 'paris-friedland-42',
       'paris-francois1er-38', 'paris-sainthonore-123', 'paris-artois-11',
       'paris-pierrecharron-62', 'paris-sainthonore-91', 'paris-dutuit-1',
       'paris-malesherbes-113', 'paris-laboetie-37'], dtype=object)

In [179]:
df[(df['id']=='guyancourt-philibertdelorme-9')& (df['postal_code']==75008)]

Unnamed: 0,address,bluecar_counter,utilib_counter,utilib_1.4_counter,charge_slots,charging_status,city,id,kind,postal_code,rental_status,slots,station_type,status,subscription_status,day,hour,minute


the most popular postal code for picking up a bluecar

In [182]:
df[(df['bluecar_counter'] == 0)&(df['hour']==14)]['postal_code'].value_counts().head(1)

75009    6
Name: postal_code, dtype: int64

In [183]:
df[df['postal_code']==75009]['id'].unique()

array(['paris-anvers-parking', 'paris-londres-6', 'paris-rochechouart-31',
       'paris-juleslefebvre-1', 'paris-lafayette-56',
       'paris-adolphemax-6', 'paris-victormasse-35', 'paris-turgot-4',
       'paris-victoire-76', 'paris-provence-35', 'paris-rochambeau-6',
       'paris-aumale-28', 'paris-conservatoire-13', 'paris-trudaine-26',
       'paris-chateaudun-21', 'paris-faubourgmontmartre-31',
       'paris-haussmann-6', 'paris-clichy-23', 'paris-lafayette-7',
       'paris-milton-12', 'paris-madeleine-8', 'paris-chausseedantin-5',
       'paris-paulescudier-2'], dtype=object)

##### checking on utilib 

In [174]:
df[(df['utilib_counter'] == 0)&(df['hour']==14)]['postal_code'].value_counts().head(1)

75015    14
Name: postal_code, dtype: int64

In [172]:
df[(df['id']=='guyancourt-philibertdelorme-9')& (df['postal_code']==75015)]

Unnamed: 0,address,bluecar_counter,utilib_counter,utilib_1.4_counter,charge_slots,charging_status,city,id,kind,postal_code,rental_status,slots,station_type,status,subscription_status,day,hour,minute


#### checking on utilib 1.4

In [175]:
df[(df['utilib_1.4_counter'] == 0)&(df['hour']==14)]['postal_code'].value_counts().head(1)

75015    14
Name: postal_code, dtype: int64

In [173]:
df[(df['id']=='guyancourt-philibertdelorme-9')& (df['postal_code']==75015)]

Unnamed: 0,address,bluecar_counter,utilib_counter,utilib_1.4_counter,charge_slots,charging_status,city,id,kind,postal_code,rental_status,slots,station_type,status,subscription_status,day,hour,minute


In [131]:
df[(df['id']=='guyancourt-philibertdelorme-9')]['postal_code'].value_counts()

78280    8
Name: postal_code, dtype: int64

#### further summary

In [132]:
df.head()

Unnamed: 0,address,bluecar_counter,utilib_counter,utilib_1.4_counter,charge_slots,charging_status,city,id,kind,postal_code,rental_status,slots,station_type,status,subscription_status,day,hour,minute
0,2 Avenue de Suffren,0,0,0,0,nonexistent,Paris,paris-suffren-2,STATION,75015,operational,2,station,ok,nonexistent,8,11,43
1,145 Rue Raymond Losserand,6,0,0,0,operational,Paris,paris-raymondlosserand-145,STATION,75014,operational,0,station,ok,nonexistent,6,7,24
2,2 Avenue John Fitzgerald Kennedy,3,0,2,0,operational,Le Bourget,lebourget-johnfitzgeraldkennedy-2,STATION,93350,operational,1,station,ok,nonexistent,3,20,14
3,51 Rue EugÃ¨ne OudinÃ©,3,1,0,1,operational,Paris,paris-eugeneoudine-51,STATION,75013,operational,2,station,ok,nonexistent,4,4,37
4,6 avenue de la Porte de Champerret,3,0,0,0,nonexistent,Paris,paris-portedechamperret-6,PARKING,75017,operational,3,station,ok,nonexistent,8,17,23


to get the charging status for the stations available

In [146]:
df[df['kind']=='STATION']['charging_status'].value_counts()

nonexistent    2415
operational    1863
broken          110
Name: charging_status, dtype: int64

In [150]:
df['station_type'].value_counts()

station         4615
full_station     382
subs_center        3
Name: station_type, dtype: int64

In [145]:
df.groupby(['charge_slots'])['slots'].sum()

charge_slots
0    5851
1    2485
2    1326
Name: slots, dtype: int64

### Recommendations
From the analysis, the above findings  would guide in a more understanding of  the behavioural patterns concerning the electric car sharing service in terms of popularity in the stations and periods in a typical day
