# Overview
In this week's independent project, you will be working as a data scientist working for an electric car-sharing service company. You have been tasked to process stations data to understand electric car usage over time by solving for the following research question;

**Research Question**
* Identify the most popular hour of the day for picking up a shared electric car (Bluecar) in the city of Paris over the month of April 2018.

**Bonus Questions (Optional)**
* What is the most popular hour for returning cars?
* What station is the most popular?
    * Overall?
    * At the most popular picking hour?

* What postal code is the most popular for picking up Blue cars? Does the most popular station belong to that postal code?
    * Overall?
    * At the most popular picking hour?
Do the results change if you consider Utilib and Utilib 1.4 instead of Blue cars? 

## 1.0 Importing libraries to be used

In [571]:
# Importing pandas
import pandas as pd
# Importing numpy
import numpy as np

## 2.0 Loading the dataset

In [572]:
# Importing the autolib_dataset into the environment
df = pd.read_csv('/content/Autolib_dataset .csv')

### 2.1 Previewing the dataet

In [573]:
# previewing the first 10 rows
df

Unnamed: 0,Address,Cars,Bluecar counter,Utilib counter,Utilib 1.4 counter,Charge Slots,Charging Status,City,Displayed comment,ID,Kind,Geo point,Postal code,Public name,Rental status,Scheduled at,Slots,Station type,Status,Subscription status,year,month,day,hour,minute
0,2 Avenue de Suffren,0,0,0,0,0,nonexistent,Paris,,paris-suffren-2,STATION,"48.857, 2.2917",75015,Paris/Suffren/2,operational,,2,station,ok,nonexistent,2018,4,8,11,43
1,145 Rue Raymond Losserand,6,6,0,0,0,operational,Paris,,paris-raymondlosserand-145,STATION,"48.83126, 2.313088",75014,Paris/Raymond Losserand/145,operational,,0,station,ok,nonexistent,2018,4,6,7,24
2,2 Avenue John Fitzgerald Kennedy,3,3,0,2,0,operational,Le Bourget,,lebourget-johnfitzgeraldkennedy-2,STATION,"48.938103, 2.4286035",93350,Le Bourget/John Fitzgerald Kennedy/2,operational,,1,station,ok,nonexistent,2018,4,3,20,14
3,51 Rue EugÃ¨ne OudinÃ©,3,3,1,0,1,operational,Paris,,paris-eugeneoudine-51,STATION,"48.8250327, 2.3725162",75013,Paris/EugÃ¨ne OudinÃ©/51,operational,,2,station,ok,nonexistent,2018,4,4,4,37
4,6 avenue de la Porte de Champerret,3,3,0,0,0,nonexistent,Paris,,paris-portedechamperret-6,PARKING,"48.8862632, 2.2874511",75017,Paris/Porte de Champerret/6,operational,,3,station,ok,nonexistent,2018,4,8,17,23
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,8 avenue MÃ©nelotte,2,2,0,0,0,nonexistent,Colombes,,colombes-menelotte-8,STATION,"48.9246525, 2.259313",92700,Colombes/MÃ©nelotte/8,operational,,3,station,ok,nonexistent,2018,4,6,11,26
4996,37 rue de Dantzig,4,4,0,0,1,operational,Paris,,paris-dantzig-37,STATION,"48.8335103, 2.2987201",75015,Paris/Dantzig/37,operational,,2,station,ok,nonexistent,2018,4,4,16,56
4997,142 rue du Bac,1,1,0,0,1,operational,Paris,,paris-bac-142,STATION,"48.8508194, 2.3237968",75007,Paris/Bac/142,operational,,4,station,ok,nonexistent,2018,4,1,7,1
4998,2 avenue du Val de Fontenay,2,2,0,0,0,nonexistent,Fontenay-Sous-Bois,,fontenaysousbois-valdefontenay-2,STATION,"48.8528247, 2.4869085",94120,Fontenay-Sous-Bois/Val de Fontenay/2,operational,,3,station,ok,nonexistent,2018,4,4,17,27


In [574]:
df['Scheduled at'].unique()

array([nan, '2016-07-27T15:32:21', '2016-07-05T13:10:09',
       '2015-09-29T13:58:29', '2016-04-26T14:28:01',
       '2012-09-17T08:57:28', '2016-01-06T10:35:38',
       '2018-01-03T10:52:00', '2016-07-27T15:32:22'], dtype=object)

In [575]:
df['Status'].unique()

array(['ok', 'closed', 'scheduled'], dtype=object)

## 3.0 Verifying data integrity

In [576]:
# Checking for null values in the dataset
df.isnull().any()

Address                False
Cars                   False
Bluecar counter        False
Utilib counter         False
Utilib 1.4 counter     False
Charge Slots           False
Charging Status        False
City                   False
Displayed comment       True
ID                     False
Kind                   False
Geo point              False
Postal code            False
Public name            False
Rental status          False
Scheduled at            True
Slots                  False
Station type           False
Status                 False
Subscription status    False
year                   False
month                  False
day                    False
hour                   False
minute                 False
dtype: bool

In [577]:
# checking for the total number of null values per column in the dataset
df.isnull().sum()

Address                   0
Cars                      0
Bluecar counter           0
Utilib counter            0
Utilib 1.4 counter        0
Charge Slots              0
Charging Status           0
City                      0
Displayed comment      4889
ID                        0
Kind                      0
Geo point                 0
Postal code               0
Public name               0
Rental status             0
Scheduled at           4953
Slots                     0
Station type              0
Status                    0
Subscription status       0
year                      0
month                     0
day                       0
hour                      0
minute                    0
dtype: int64

In [578]:
# Checking for duplicates
df.duplicated().any()

False

In [579]:
# checking for the duration the data was collected
days = list(df['day'].unique())
sorted(days)

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [580]:
# Checking if there is any yeaar apart from 2018
df['year'].unique() 

array([2018])

In [581]:
# Checking if there is any other month apart from aprill
df['month'].unique()

array([4])

In [582]:
# Checking the hours at which the data was collected
hours = list(df['hour'].unique())
sorted(hours)

[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23]

## 4.0 Data Cleaning

### 4.1 Validity

In [583]:
# function for loading an updated dataset
def preview(number):
      return df.head(number)

In [584]:
# Procedure 1: 
# Data Cleaning Action: Dropping irrelevant columns
# Explanation: The cars column is irrelevant, it
# contains the same values as Bluecar counter
#
df.drop('Cars', axis=1, inplace=True)
df.drop(['year', 'month'], axis=1, inplace=True) # We already know the month and year
# previewing the dataframe
df.drop('minute', axis=1, inplace=True) # I'm interested with the day and hour part
preview(3)

Unnamed: 0,Address,Bluecar counter,Utilib counter,Utilib 1.4 counter,Charge Slots,Charging Status,City,Displayed comment,ID,Kind,Geo point,Postal code,Public name,Rental status,Scheduled at,Slots,Station type,Status,Subscription status,day,hour
0,2 Avenue de Suffren,0,0,0,0,nonexistent,Paris,,paris-suffren-2,STATION,"48.857, 2.2917",75015,Paris/Suffren/2,operational,,2,station,ok,nonexistent,8,11
1,145 Rue Raymond Losserand,6,0,0,0,operational,Paris,,paris-raymondlosserand-145,STATION,"48.83126, 2.313088",75014,Paris/Raymond Losserand/145,operational,,0,station,ok,nonexistent,6,7
2,2 Avenue John Fitzgerald Kennedy,3,0,2,0,operational,Le Bourget,,lebourget-johnfitzgeraldkennedy-2,STATION,"48.938103, 2.4286035",93350,Le Bourget/John Fitzgerald Kennedy/2,operational,,1,station,ok,nonexistent,3,20


In [585]:
# Procedure 2: 
# Data Cleaning Action: Stripping leading and trailing sapces in the column
# Explanation: Leading and trailing spaces may lead to invalid outputs
#
columns = list(df.columns.str.lstrip())
df.columns = columns
# previewing the dataframe
preview(3)

Unnamed: 0,Address,Bluecar counter,Utilib counter,Utilib 1.4 counter,Charge Slots,Charging Status,City,Displayed comment,ID,Kind,Geo point,Postal code,Public name,Rental status,Scheduled at,Slots,Station type,Status,Subscription status,day,hour
0,2 Avenue de Suffren,0,0,0,0,nonexistent,Paris,,paris-suffren-2,STATION,"48.857, 2.2917",75015,Paris/Suffren/2,operational,,2,station,ok,nonexistent,8,11
1,145 Rue Raymond Losserand,6,0,0,0,operational,Paris,,paris-raymondlosserand-145,STATION,"48.83126, 2.313088",75014,Paris/Raymond Losserand/145,operational,,0,station,ok,nonexistent,6,7
2,2 Avenue John Fitzgerald Kennedy,3,0,2,0,operational,Le Bourget,,lebourget-johnfitzgeraldkennedy-2,STATION,"48.938103, 2.4286035",93350,Le Bourget/John Fitzgerald Kennedy/2,operational,,1,station,ok,nonexistent,3,20


In [586]:
# Procedure 2: 
# Data Cleaning Action: Replacing '' with '_' in columns
# Explanation: For easier selection of a columns during selecting
#
columns=list(df.columns.str.replace(' ','_'))
df.columns = columns
# previewing the dataframe
preview(3)

Unnamed: 0,Address,Bluecar_counter,Utilib_counter,Utilib_1.4_counter,Charge_Slots,Charging_Status,City,Displayed_comment,ID,Kind,Geo_point,Postal_code,Public_name,Rental_status,Scheduled_at,Slots,Station_type,Status,Subscription_status,day,hour
0,2 Avenue de Suffren,0,0,0,0,nonexistent,Paris,,paris-suffren-2,STATION,"48.857, 2.2917",75015,Paris/Suffren/2,operational,,2,station,ok,nonexistent,8,11
1,145 Rue Raymond Losserand,6,0,0,0,operational,Paris,,paris-raymondlosserand-145,STATION,"48.83126, 2.313088",75014,Paris/Raymond Losserand/145,operational,,0,station,ok,nonexistent,6,7
2,2 Avenue John Fitzgerald Kennedy,3,0,2,0,operational,Le Bourget,,lebourget-johnfitzgeraldkennedy-2,STATION,"48.938103, 2.4286035",93350,Le Bourget/John Fitzgerald Kennedy/2,operational,,1,station,ok,nonexistent,3,20


### 4.2 Accuracy

In [587]:
# The following dataset has no calculated fields hence, 
# I will skip this part

### 4.2 Completeness

#### 4.2.1 Checking for null values

In [588]:
# Checking if there is any null values in the dataset
df.isnull().any()

Address                False
Bluecar_counter        False
Utilib_counter         False
Utilib_1.4_counter     False
Charge_Slots           False
Charging_Status        False
City                   False
Displayed_comment       True
ID                     False
Kind                   False
Geo_point              False
Postal_code            False
Public_name            False
Rental_status          False
Scheduled_at            True
Slots                  False
Station_type           False
Status                 False
Subscription_status    False
day                    False
hour                   False
dtype: bool

In [589]:
# checking for the sum of the null values in every column
df.isnull().sum()

Address                   0
Bluecar_counter           0
Utilib_counter            0
Utilib_1.4_counter        0
Charge_Slots              0
Charging_Status           0
City                      0
Displayed_comment      4889
ID                        0
Kind                      0
Geo_point                 0
Postal_code               0
Public_name               0
Rental_status             0
Scheduled_at           4953
Slots                     0
Station_type              0
Status                    0
Subscription_status       0
day                       0
hour                      0
dtype: int64

#### 4.2.2 Dealing with null values

In [590]:
# filtering out where scheduled in the future
df = df[~(df['Rental_status'] == 'future')]
df = df.reset_index()
df.drop('index', axis=1, inplace=True)

In [591]:
# filtering out records which contain 'CENTER' in the kind column
df = df[~(df['Kind'] == 'CENTER')]
df = df.reset_index()
df.drop('index', axis=1, inplace=True)

In [592]:
# filtering out station that are not operational
df = df[(df['Rental_status'] == 'operational')]
df = df.reset_index()
df.drop('index', axis=1, inplace=True)

In [593]:
# checking for any record where the rental status is future
df[(df['Rental_status'] == 'future')]

Unnamed: 0,Address,Bluecar_counter,Utilib_counter,Utilib_1.4_counter,Charge_Slots,Charging_Status,City,Displayed_comment,ID,Kind,Geo_point,Postal_code,Public_name,Rental_status,Scheduled_at,Slots,Station_type,Status,Subscription_status,day,hour


In [594]:
# checking for any record where there is 'CENTER' in the kind column
df[df['Kind'] == 'CENTER']

Unnamed: 0,Address,Bluecar_counter,Utilib_counter,Utilib_1.4_counter,Charge_Slots,Charging_Status,City,Displayed_comment,ID,Kind,Geo_point,Postal_code,Public_name,Rental_status,Scheduled_at,Slots,Station_type,Status,Subscription_status,day,hour


In [595]:
# checking for any stations that are not operational
df[~(df['Rental_status'] == 'operational')]

Unnamed: 0,Address,Bluecar_counter,Utilib_counter,Utilib_1.4_counter,Charge_Slots,Charging_Status,City,Displayed_comment,ID,Kind,Geo_point,Postal_code,Public_name,Rental_status,Scheduled_at,Slots,Station_type,Status,Subscription_status,day,hour


In [596]:
# checking for unique values in the status column
df['Status'].unique()

array(['ok'], dtype=object)

In [597]:
# Dropping the 'Displayed_comment' and 'Scheduled_at' column 
# since it conatains alot of null values, the 'Rental_status'
# column since it only contains one type of value and the status column
df.drop(['Displayed_comment', 'Scheduled_at', 'Rental_status', 'Status'], axis=1, inplace=True)
# Previewing the dataset
preview(3)

Unnamed: 0,Address,Bluecar_counter,Utilib_counter,Utilib_1.4_counter,Charge_Slots,Charging_Status,City,ID,Kind,Geo_point,Postal_code,Public_name,Slots,Station_type,Subscription_status,day,hour
0,2 Avenue de Suffren,0,0,0,0,nonexistent,Paris,paris-suffren-2,STATION,"48.857, 2.2917",75015,Paris/Suffren/2,2,station,nonexistent,8,11
1,145 Rue Raymond Losserand,6,0,0,0,operational,Paris,paris-raymondlosserand-145,STATION,"48.83126, 2.313088",75014,Paris/Raymond Losserand/145,0,station,nonexistent,6,7
2,2 Avenue John Fitzgerald Kennedy,3,0,2,0,operational,Le Bourget,lebourget-johnfitzgeraldkennedy-2,STATION,"48.938103, 2.4286035",93350,Le Bourget/John Fitzgerald Kennedy/2,1,station,nonexistent,3,20


### 4.3 Consistency

#### 4.3.1 Checking for duplicates

In [598]:
# Checking for duplicated records in the dataset in any
df.duplicated().any()

True

In [599]:
# Checking the sum of all duplicated records
df.duplicated().sum()

26

#### 4.3.1 Dealing with duplicates

In [600]:
# Dropping the duplicates
df = df.drop_duplicates()

In [601]:
# Checking if the is any duplicates in the new dataframe
df.duplicated().any()

False

### 4.4 Uniformity

In [602]:
# Checking for uniformity
preview(15)

Unnamed: 0,Address,Bluecar_counter,Utilib_counter,Utilib_1.4_counter,Charge_Slots,Charging_Status,City,ID,Kind,Geo_point,Postal_code,Public_name,Slots,Station_type,Subscription_status,day,hour
0,2 Avenue de Suffren,0,0,0,0,nonexistent,Paris,paris-suffren-2,STATION,"48.857, 2.2917",75015,Paris/Suffren/2,2,station,nonexistent,8,11
1,145 Rue Raymond Losserand,6,0,0,0,operational,Paris,paris-raymondlosserand-145,STATION,"48.83126, 2.313088",75014,Paris/Raymond Losserand/145,0,station,nonexistent,6,7
2,2 Avenue John Fitzgerald Kennedy,3,0,2,0,operational,Le Bourget,lebourget-johnfitzgeraldkennedy-2,STATION,"48.938103, 2.4286035",93350,Le Bourget/John Fitzgerald Kennedy/2,1,station,nonexistent,3,20
3,51 Rue EugÃ¨ne OudinÃ©,3,1,0,1,operational,Paris,paris-eugeneoudine-51,STATION,"48.8250327, 2.3725162",75013,Paris/EugÃ¨ne OudinÃ©/51,2,station,nonexistent,4,4
4,6 avenue de la Porte de Champerret,3,0,0,0,nonexistent,Paris,paris-portedechamperret-6,PARKING,"48.8862632, 2.2874511",75017,Paris/Porte de Champerret/6,3,station,nonexistent,8,17
5,8 Boulevard Voltaire,0,0,0,0,nonexistent,Paris,paris-voltaire-8,STATION,"48.8657658, 2.3664376",75011,Paris/Voltaire/8,4,station,nonexistent,6,7
6,17 Rue des Luaps ProlongÃ©e,3,1,0,0,nonexistent,Nanterre,nanterre-luaps-17,STATION,"48.88069, 2.21063",92000,Nanterre/Luaps/17,0,station,nonexistent,4,22
7,34 avenue Jean Moulin,1,0,0,0,nonexistent,Paris,paris-jeanmoulin-34,STATION,"48.8266807, 2.3237355",75014,Paris/Jean Moulin/34,4,station,nonexistent,2,22
8,41 boulevard de Rochechouart,6,0,0,0,nonexistent,Paris,paris-anvers-parking,PARKING,"48.88267, 2.34405",75009,Paris/Anvers/Parking,0,station,nonexistent,4,15
9,14 rue Censier,0,0,0,2,operational,Paris,paris-censier-14,STATION,"48.8411067, 2.3544235",75005,Paris/Censier/14,6,station,nonexistent,1,4


In [603]:
# The following dataset was found to be uniform hence no action will be taken

## 5.0 Data analysis

In [604]:
# Previewing the dataset for analysis
preview(15)

Unnamed: 0,Address,Bluecar_counter,Utilib_counter,Utilib_1.4_counter,Charge_Slots,Charging_Status,City,ID,Kind,Geo_point,Postal_code,Public_name,Slots,Station_type,Subscription_status,day,hour
0,2 Avenue de Suffren,0,0,0,0,nonexistent,Paris,paris-suffren-2,STATION,"48.857, 2.2917",75015,Paris/Suffren/2,2,station,nonexistent,8,11
1,145 Rue Raymond Losserand,6,0,0,0,operational,Paris,paris-raymondlosserand-145,STATION,"48.83126, 2.313088",75014,Paris/Raymond Losserand/145,0,station,nonexistent,6,7
2,2 Avenue John Fitzgerald Kennedy,3,0,2,0,operational,Le Bourget,lebourget-johnfitzgeraldkennedy-2,STATION,"48.938103, 2.4286035",93350,Le Bourget/John Fitzgerald Kennedy/2,1,station,nonexistent,3,20
3,51 Rue EugÃ¨ne OudinÃ©,3,1,0,1,operational,Paris,paris-eugeneoudine-51,STATION,"48.8250327, 2.3725162",75013,Paris/EugÃ¨ne OudinÃ©/51,2,station,nonexistent,4,4
4,6 avenue de la Porte de Champerret,3,0,0,0,nonexistent,Paris,paris-portedechamperret-6,PARKING,"48.8862632, 2.2874511",75017,Paris/Porte de Champerret/6,3,station,nonexistent,8,17
5,8 Boulevard Voltaire,0,0,0,0,nonexistent,Paris,paris-voltaire-8,STATION,"48.8657658, 2.3664376",75011,Paris/Voltaire/8,4,station,nonexistent,6,7
6,17 Rue des Luaps ProlongÃ©e,3,1,0,0,nonexistent,Nanterre,nanterre-luaps-17,STATION,"48.88069, 2.21063",92000,Nanterre/Luaps/17,0,station,nonexistent,4,22
7,34 avenue Jean Moulin,1,0,0,0,nonexistent,Paris,paris-jeanmoulin-34,STATION,"48.8266807, 2.3237355",75014,Paris/Jean Moulin/34,4,station,nonexistent,2,22
8,41 boulevard de Rochechouart,6,0,0,0,nonexistent,Paris,paris-anvers-parking,PARKING,"48.88267, 2.34405",75009,Paris/Anvers/Parking,0,station,nonexistent,4,15
9,14 rue Censier,0,0,0,2,operational,Paris,paris-censier-14,STATION,"48.8411067, 2.3544235",75005,Paris/Censier/14,6,station,nonexistent,1,4


### 5.1 Identify the most popular hour of the day for picking up a shared electric car (Bluecar) in the city of Paris over the month of April 2018.

In [605]:
# creating a new dataframe known as paris
paris = df[(df['City'] == 'Paris')]
paris = paris.sort_values(by='day',ascending=True) # sorting by day
paris = paris.reset_index()
paris.drop('index', axis=1, inplace=True)
paris

Unnamed: 0,Address,Bluecar_counter,Utilib_counter,Utilib_1.4_counter,Charge_Slots,Charging_Status,City,ID,Kind,Geo_point,Postal_code,Public_name,Slots,Station_type,Subscription_status,day,hour
0,73 Rue de Rome,0,0,0,1,operational,Paris,paris-rome-73,STATION,"48.881834, 2.32056",75008,Paris/Rome/73,4,station,nonexistent,1,13
1,29 rue du Cotentin,0,0,0,1,operational,Paris,paris-cotentin-29,STATION,"48.8385459, 2.3126749",75015,Paris/Cotentin/29,4,station,nonexistent,1,19
2,9 boulevard Bourdon,1,0,0,2,operational,Paris,paris-bourdon-9,STATION,"48.8484226, 2.366004",75004,Paris/Bourdon/9,4,station,nonexistent,1,14
3,12 Avenue de Messine,0,0,1,1,operational,Paris,paris-messine-12,STATION,"48.87593, 2.314194",75008,Paris/Messine/12,4,station,nonexistent,1,16
4,143 boulevard Vincent Auriol,0,0,0,0,operational,Paris,paris-vincentauriol-143,SPACE,"48.8328237, 2.3624101",75013,Paris/Vincent Auriol/143,1,full_station,operational,1,13
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2609,41 avenue Bosquet,4,0,0,0,operational,Paris,paris-bosquet-41,STATION,"48.8580816, 2.3040495",75007,Paris/Bosquet/41,1,station,nonexistent,9,21
2610,47 rue de la Grange aux Belles,0,0,0,1,operational,Paris,paris-grangeauxbelles-47,STATION,"48.87569, 2.36801",75010,Paris/Grange aux Belles/47,3,station,nonexistent,9,18
2611,4 Rue de Montfaucon,0,0,0,0,nonexistent,Paris,paris-montfaucon-4,STATION,"48.85273, 2.335816",75006,Paris/Montfaucon/4,4,station,nonexistent,9,0
2612,15 avenue du GÃ©nÃ©ral Laperrine,0,0,0,0,operational,Paris,paris-generallaperrine-15,STATION,"48.8345778, 2.4076458",75012,Paris/Laperrine/15,5,station,nonexistent,9,8


In [606]:
# Determing the most popular hour for picking up a car
group = paris['Bluecar_counter'].groupby(paris['hour'])
group.mean().sort_values(ascending=True).head(1)

hour
20    1.475248
Name: Bluecar_counter, dtype: float64

### 5.2 What is the most popular hour for returning cars?

In [607]:
# Determing the most popular hour for returning a car
group = paris['Bluecar_counter'].groupby(paris['hour'])
group.mean().sort_values(ascending=False).head(1)

hour
12    2.232143
Name: Bluecar_counter, dtype: float64

### 5.3 What station is the most popular?

In [608]:
# previewing the general dataset
preview(15)

Unnamed: 0,Address,Bluecar_counter,Utilib_counter,Utilib_1.4_counter,Charge_Slots,Charging_Status,City,ID,Kind,Geo_point,Postal_code,Public_name,Slots,Station_type,Subscription_status,day,hour
0,2 Avenue de Suffren,0,0,0,0,nonexistent,Paris,paris-suffren-2,STATION,"48.857, 2.2917",75015,Paris/Suffren/2,2,station,nonexistent,8,11
1,145 Rue Raymond Losserand,6,0,0,0,operational,Paris,paris-raymondlosserand-145,STATION,"48.83126, 2.313088",75014,Paris/Raymond Losserand/145,0,station,nonexistent,6,7
2,2 Avenue John Fitzgerald Kennedy,3,0,2,0,operational,Le Bourget,lebourget-johnfitzgeraldkennedy-2,STATION,"48.938103, 2.4286035",93350,Le Bourget/John Fitzgerald Kennedy/2,1,station,nonexistent,3,20
3,51 Rue EugÃ¨ne OudinÃ©,3,1,0,1,operational,Paris,paris-eugeneoudine-51,STATION,"48.8250327, 2.3725162",75013,Paris/EugÃ¨ne OudinÃ©/51,2,station,nonexistent,4,4
4,6 avenue de la Porte de Champerret,3,0,0,0,nonexistent,Paris,paris-portedechamperret-6,PARKING,"48.8862632, 2.2874511",75017,Paris/Porte de Champerret/6,3,station,nonexistent,8,17
5,8 Boulevard Voltaire,0,0,0,0,nonexistent,Paris,paris-voltaire-8,STATION,"48.8657658, 2.3664376",75011,Paris/Voltaire/8,4,station,nonexistent,6,7
6,17 Rue des Luaps ProlongÃ©e,3,1,0,0,nonexistent,Nanterre,nanterre-luaps-17,STATION,"48.88069, 2.21063",92000,Nanterre/Luaps/17,0,station,nonexistent,4,22
7,34 avenue Jean Moulin,1,0,0,0,nonexistent,Paris,paris-jeanmoulin-34,STATION,"48.8266807, 2.3237355",75014,Paris/Jean Moulin/34,4,station,nonexistent,2,22
8,41 boulevard de Rochechouart,6,0,0,0,nonexistent,Paris,paris-anvers-parking,PARKING,"48.88267, 2.34405",75009,Paris/Anvers/Parking,0,station,nonexistent,4,15
9,14 rue Censier,0,0,0,2,operational,Paris,paris-censier-14,STATION,"48.8411067, 2.3544235",75005,Paris/Censier/14,6,station,nonexistent,1,4


#### 5.3.1 Overall?

In [609]:
# Determing the most popular station for picking up a car
group = df['Bluecar_counter'].groupby(df['Public_name'])
group.mean().sort_values(ascending=True).head(1)

Public_name
Paris/Maine/4    0.0
Name: Bluecar_counter, dtype: float64

#### 5.3.2 At the most popular picking hour?

In [610]:
# Determing the most popular hour for picking up a car
group = df['Bluecar_counter'].groupby(df['hour'])
group.mean().sort_values(ascending=True).head(1)

hour
19    2.058511
Name: Bluecar_counter, dtype: float64

### 5.4 What postal code is the most popular for picking up Blue cars? Does the most popular station belong to that postal code?

In [611]:
# Determing the most popular Postal_code for picking up a car
group = df['Bluecar_counter'].groupby(df['Postal_code'])
group.mean().sort_values(ascending=True).head(1)

Postal_code
75001    0.659574
Name: Bluecar_counter, dtype: float64

In [612]:
# Checking if it matches with the most popular station name
df2 = df[(df['Public_name'] == 'Paris/Maine/4')]
df2['Postal_code']

885    75015
Name: Postal_code, dtype: int64

## 5.0 Data anlaysis 2

**Functions**

In [613]:
# function for determing the most popular hour for picking 
# a car
def pop_hour():
    column = str(input("Which colum do you want to select: "))
    group = str(input("Input the column to group by: "))
    group = df[column].groupby(df[group])
    return group.mean().sort_values(ascending=True).head(1)

In [614]:
# function for determing the most popular station
# for picking up a car
def pop_station():
    column = str(input("Which column do you want to select: "))
    group = str(input("Input the column to group by: "))
    group = df[column].groupby(df[group])
    return group.mean().sort_values(ascending=True).head(1)

In [615]:
# function for determing the most popular hour
# for returning a car
def pop_return_hour():
    column = str(input("Which column do you want to select: "))
    group = str(input("Input the column to group by: "))
    group = df[column].groupby(df[group])
    return group.mean().sort_values(ascending=False).head(1)

In [616]:
# function for determing the most popular postal code
# for picking up a car
def pop_postal_code():
    column = str(input("Which column do you want to select: "))
    group = str(input("Input the column to group by: "))
    group = df[column].groupby(df[group])
    return group.mean().sort_values(ascending=True).head(1)

### Using Utilib_counter

In [617]:
preview(10)

Unnamed: 0,Address,Bluecar_counter,Utilib_counter,Utilib_1.4_counter,Charge_Slots,Charging_Status,City,ID,Kind,Geo_point,Postal_code,Public_name,Slots,Station_type,Subscription_status,day,hour
0,2 Avenue de Suffren,0,0,0,0,nonexistent,Paris,paris-suffren-2,STATION,"48.857, 2.2917",75015,Paris/Suffren/2,2,station,nonexistent,8,11
1,145 Rue Raymond Losserand,6,0,0,0,operational,Paris,paris-raymondlosserand-145,STATION,"48.83126, 2.313088",75014,Paris/Raymond Losserand/145,0,station,nonexistent,6,7
2,2 Avenue John Fitzgerald Kennedy,3,0,2,0,operational,Le Bourget,lebourget-johnfitzgeraldkennedy-2,STATION,"48.938103, 2.4286035",93350,Le Bourget/John Fitzgerald Kennedy/2,1,station,nonexistent,3,20
3,51 Rue EugÃ¨ne OudinÃ©,3,1,0,1,operational,Paris,paris-eugeneoudine-51,STATION,"48.8250327, 2.3725162",75013,Paris/EugÃ¨ne OudinÃ©/51,2,station,nonexistent,4,4
4,6 avenue de la Porte de Champerret,3,0,0,0,nonexistent,Paris,paris-portedechamperret-6,PARKING,"48.8862632, 2.2874511",75017,Paris/Porte de Champerret/6,3,station,nonexistent,8,17
5,8 Boulevard Voltaire,0,0,0,0,nonexistent,Paris,paris-voltaire-8,STATION,"48.8657658, 2.3664376",75011,Paris/Voltaire/8,4,station,nonexistent,6,7
6,17 Rue des Luaps ProlongÃ©e,3,1,0,0,nonexistent,Nanterre,nanterre-luaps-17,STATION,"48.88069, 2.21063",92000,Nanterre/Luaps/17,0,station,nonexistent,4,22
7,34 avenue Jean Moulin,1,0,0,0,nonexistent,Paris,paris-jeanmoulin-34,STATION,"48.8266807, 2.3237355",75014,Paris/Jean Moulin/34,4,station,nonexistent,2,22
8,41 boulevard de Rochechouart,6,0,0,0,nonexistent,Paris,paris-anvers-parking,PARKING,"48.88267, 2.34405",75009,Paris/Anvers/Parking,0,station,nonexistent,4,15
9,14 rue Censier,0,0,0,2,operational,Paris,paris-censier-14,STATION,"48.8411067, 2.3544235",75005,Paris/Censier/14,6,station,nonexistent,1,4


### 5.1 What is the most popular hour for returning cars?

In [619]:
pop_return_hour()

Which column do you want to select: Utilib_counter
Input the column to group by: hour


hour
8    0.097938
Name: Utilib_counter, dtype: float64

### 5.2 What station is the most popular?

In [621]:
pop_station()

Which column do you want to select: Utilib_counter
Input the column to group by: Public_name


Public_name
Alfortville/Charles de Gaulle/16    0.0
Name: Utilib_counter, dtype: float64

### 5.2.1 The most popular picking hour

In [623]:
pop_hour()

Which colum do you want to select: Utilib_counter
Input the column to group by: hour


hour
23    0.010638
Name: Utilib_counter, dtype: float64

### 5.3 What postal code is the most popular for picking up Blue cars? Does the most popular station belong to that postal code?

In [624]:
pop_postal_code()

Which column do you want to select: Utilib_counter
Input the column to group by: Postal_code


Postal_code
92340    0.0
Name: Utilib_counter, dtype: float64

In [625]:
# Checking if it matches with the most popular station name
df2 = df[(df['Public_name'] == 'Alfortville/Charles de Gaulle/16')]
df2['Postal_code']

1703    94140
2474    94140
2976    94140
4092    94140
4604    94140
Name: Postal_code, dtype: int64