# Project Scope 

### Project Scope and Plan

In July 2019, the Department for Transport released data on the type and number of journeys within
the United Kingdom per year. The annual numbers of car journeys consistently climbed from 2015 to
2018, where it reached a nine-year high of 986 trips per household per year [1]. A noteworthy 75% of
UK residents aged 17 and above possessed driver's licences, while 76% of households owned a
minimum of one car, and 77% of the total distance travelled each year was by car.

The growing congestion resulting from these statistics prompted a need for considering alternative
transportation methods . As a result, the Mayor of London and the London Assembly introduced
the Mayor's Transport Strategy in 2018, focusing on three primary objectives:

- Promoting Healthy Streets & Healthy People
- Enhancing the Public Transport Experience
- Developing New Homes and Jobs

### Objectives

#### How can we increase the uptake of cycling in London?
- Expanding cycling infrastructure so residents live within 400m of the cycling networks will increase the number of journeys completed by bike
- Separating bike lanes from main roads, cars and large vehicles will make cyclists feel safer and thus increase the numbers of journeys completed by bike
- Having a sustainable availability and distribution of safe and affordable hire bikes in London will increase the numbers of journeys completed by bike

#### What are the main factors that determine whether people choose to cycle?
- More journeys are completed by bike in dry weather than rain
- More journeys are completed by bike in the summer months than in the winter months
- The time of day has an impact on the number of journeys completed by bike
- More journeys are completed by bike in central London than outer London as journeys are typically shorter

#### What are the demographics of cyclists in these cities, and are there any underrepresentedgroups that can be engaged with to increase the uptake of cycling as a mode of transport?
- The majority of journeys completed by bike in London are completed within commuting hours
- Residents of ‘deprived’ areas of London complete fewer journeys on bike than those in ‘wealthy’ areas

#### What interventions and changes to the transport network have had the most impact on cycling engagement?

## 1. Prepare the Workstation

In [1]:
!pip install pandas
!pip install matplotlib seaborn
!pip install conda
!pip install openpyxl



In [2]:
# Imports 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
# Load the data
central_london = pd.read_csv('Central London.csv')
inner_london = pd.read_csv('Inner London.csv')
outer_london = pd.read_csv('Outer London.csv')
biking_sites = pd.read_excel('Biking sites.xlsx')

  central_london = pd.read_csv('Central London.csv')
  inner_london = pd.read_csv('Inner London.csv')


In [4]:
# view the data
biking_sites.head()

Unnamed: 0,UnqID,ProgID,SurveyDescription,Easting,Northing,Location,Borough,Functional cycling area
0,CENCY001,CENCY,Central area cycle surveys,530251.49,178742.45,Millbank (south of Thorney Street),Westminster,Central
1,CENCY002,CENCY,Central area cycle surveys,533362.68,181824.45,Bishopsgate,City of London,Central
2,CENCY003,CENCY,Central area cycle surveys,532334.06,180520.37,Southwark Bridge,Southwark,Central
3,CENCY004,CENCY,Central area cycle surveys,532052.5,179677.64,Southwark Bridge Road,Southwark,Central
4,CENCY005,CENCY,Central area cycle surveys,533031.59,180213.46,Tooley Street,Southwark,Central


In [5]:
# View the data 
central_london.head()

Unnamed: 0,Survey wave (calendar quarter),Equivalent financial quarter,Site ID,Location,Survey date,Weather,Time,Period,Direction,Start hour,Start minute,Number of private cycles,Number of cycle hire bikes,Total cycles,Unnamed: 14,Unnamed: 15,Unnamed: 16
0,2014 Q1 (January-March),2013-14 Q4,CENCY001,Millbank (south of Thorney Street),"ven, 24/01/14",Dry,0600 - 0615,Early Morning (06:00-07:00),Northbound,6.0,0.0,0.0,0.0,0.0,,,
1,2014 Q1 (January-March),2013-14 Q4,CENCY001,Millbank (south of Thorney Street),"ven, 24/01/14",Dry,0615 - 0630,Early Morning (06:00-07:00),Northbound,6.0,15.0,15.0,0.0,15.0,,,
2,2014 Q1 (January-March),2013-14 Q4,CENCY001,Millbank (south of Thorney Street),"ven, 24/01/14",Dry,0630 - 0645,Early Morning (06:00-07:00),Northbound,6.0,30.0,35.0,0.0,35.0,,,
3,2014 Q1 (January-March),2013-14 Q4,CENCY001,Millbank (south of Thorney Street),"ven, 24/01/14",Dry,0645 - 0700,Early Morning (06:00-07:00),Northbound,6.0,45.0,59.0,2.0,61.0,,,
4,2014 Q1 (January-March),2013-14 Q4,CENCY001,Millbank (south of Thorney Street),"ven, 24/01/14",Dry,0700 - 0715,AM peak (07:00-10:00),Northbound,7.0,0.0,73.0,0.0,73.0,,,


In [6]:
# view the data
inner_london.head()

Unnamed: 0,Survey wave (year),Site ID,Location,Survey date,Weather,Time,Period,Direction,Start hour,Start minute,Number of private cycles,Number of cycle hire bikes,Total cycles
0,2015.0,INNCY001,Grove Road,"mer, 20/05/15",Dry,0600 - 0615,Early Morning (06:00-07:00),Northbound,6.0,0.0,1.0,0.0,1.0
1,2015.0,INNCY001,Grove Road,"mer, 20/05/15",Dry,0615 - 0630,Early Morning (06:00-07:00),Northbound,6.0,15.0,2.0,0.0,2.0
2,2015.0,INNCY001,Grove Road,"mer, 20/05/15",Dry,0630 - 0645,Early Morning (06:00-07:00),Northbound,6.0,30.0,2.0,0.0,2.0
3,2015.0,INNCY001,Grove Road,"mer, 20/05/15",Dry,0645 - 0700,Early Morning (06:00-07:00),Northbound,6.0,45.0,4.0,0.0,4.0
4,2015.0,INNCY001,Grove Road,"mer, 20/05/15",Dry,0700 - 0715,AM peak (07:00-10:00),Northbound,7.0,0.0,4.0,0.0,4.0


In [7]:
# view the data
outer_london.head()

Unnamed: 0,Survey wave (year),Site ID,Location,Survey date,Weather,Time,Period,Direction,Start hour,Start minute,Number of male cycles,Number of female cycles,Number of unknown cycles,Total cycles
0,2015,OUTCY001,High Road Leyton,"ven, 26/06/15",Dry,0600 - 0615,Early Morning (06:00-07:00),Northbound,6,0,2,1,0,3
1,2015,OUTCY001,High Road Leyton,"ven, 26/06/15",Dry,0615 - 0630,Early Morning (06:00-07:00),Northbound,6,15,3,0,0,3
2,2015,OUTCY001,High Road Leyton,"ven, 26/06/15",Dry,0630 - 0645,Early Morning (06:00-07:00),Northbound,6,30,2,0,0,2
3,2015,OUTCY001,High Road Leyton,"ven, 26/06/15",Dry,0645 - 0700,Early Morning (06:00-07:00),Northbound,6,45,4,0,0,4
4,2015,OUTCY001,High Road Leyton,"ven, 26/06/15",Dry,0700 - 0715,AM peak (07:00-10:00),Northbound,7,0,4,1,0,5


## 2. Data Cleaning & Exploration

##### Biking Sites

In [8]:
# Determine the metadata of the data sets
print(biking_sites.shape)
print(biking_sites.columns)

(2023, 8)
Index(['UnqID', 'ProgID', 'SurveyDescription', 'Easting', 'Northing',
       'Location', 'Borough', 'Functional cycling area'],
      dtype='object')


In [16]:
biking_sites.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2023 entries, 0 to 2022
Data columns (total 8 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   UnqID                    2023 non-null   object 
 1   ProgID                   2023 non-null   object 
 2   SurveyDescription        2023 non-null   object 
 3   Easting                  2023 non-null   float64
 4   Northing                 2023 non-null   float64
 5   Location                 2023 non-null   object 
 6   Borough                  2023 non-null   object 
 7   Functional cycling area  2021 non-null   object 
dtypes: float64(2), object(6)
memory usage: 126.6+ KB


In [9]:
# Missing Values for biking sites
missing_values_biking_sites = biking_sites.isnull().sum()

# Displaying the columns with missing values and their count
missing_values_biking_sites

UnqID                      0
ProgID                     0
SurveyDescription          0
Easting                    0
Northing                   0
Location                   0
Borough                    0
Functional cycling area    2
dtype: int64

##### Outer London

In [10]:
# Determine the metadata of the data sets
print(outer_london.shape)
print(outer_london.columns)

(375660, 14)
Index(['Survey wave (year)', 'Site ID', 'Location', 'Survey date', 'Weather',
       'Time', 'Period', 'Direction', 'Start hour', 'Start minute',
       'Number of male cycles', 'Number of female cycles',
       'Number of unknown cycles', 'Total cycles'],
      dtype='object')


In [17]:
outer_london.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 375660 entries, 0 to 375659
Data columns (total 14 columns):
 #   Column                    Non-Null Count   Dtype 
---  ------                    --------------   ----- 
 0   Survey wave (year)        375660 non-null  int64 
 1   Site ID                   375660 non-null  object
 2   Location                  375660 non-null  object
 3   Survey date               374492 non-null  object
 4   Weather                   374692 non-null  object
 5   Time                      375660 non-null  object
 6   Period                    375660 non-null  object
 7   Direction                 375660 non-null  object
 8   Start hour                375660 non-null  int64 
 9   Start minute              375660 non-null  int64 
 10  Number of male cycles     375660 non-null  int64 
 11  Number of female cycles   375660 non-null  int64 
 12  Number of unknown cycles  375660 non-null  int64 
 13  Total cycles              375660 non-null  int64 
dtypes: i

In [11]:
# Missing values for outer london
missing_values_outer_london = outer_london.isnull().sum()

# Display the columns with missing values and their count
missing_values_outer_london

Survey wave (year)             0
Site ID                        0
Location                       0
Survey date                 1168
Weather                      968
Time                           0
Period                         0
Direction                      0
Start hour                     0
Start minute                   0
Number of male cycles          0
Number of female cycles        0
Number of unknown cycles       0
Total cycles                   0
dtype: int64

##### Inner London

In [12]:
# Determine the metadata of the data sets
print(inner_london.shape)
print(inner_london.columns)

(615168, 13)
Index(['Survey wave (year)', 'Site ID', 'Location', 'Survey date', 'Weather',
       'Time', 'Period', 'Direction', 'Start hour', 'Start minute',
       'Number of private cycles', 'Number of cycle hire bikes',
       'Total cycles'],
      dtype='object')


In [18]:
inner_london.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 615168 entries, 0 to 615167
Data columns (total 13 columns):
 #   Column                      Non-Null Count   Dtype  
---  ------                      --------------   -----  
 0   Survey wave (year)          523776 non-null  float64
 1   Site ID                     523776 non-null  object 
 2   Location                    523776 non-null  object 
 3   Survey date                 521024 non-null  object 
 4   Weather                     519102 non-null  object 
 5   Time                        523770 non-null  object 
 6   Period                      523770 non-null  object 
 7   Direction                   523776 non-null  object 
 8   Start hour                  523770 non-null  float64
 9   Start minute                523770 non-null  float64
 10  Number of private cycles    523776 non-null  float64
 11  Number of cycle hire bikes  523776 non-null  float64
 12  Total cycles                523776 non-null  float64
dtypes: float64(6),

In [13]:
# missing values for inner london
missing_values_inner_london = inner_london.isnull().sum()

# Display the columns with missing values and their count
missing_values_inner_london

Survey wave (year)            91392
Site ID                       91392
Location                      91392
Survey date                   94144
Weather                       96066
Time                          91398
Period                        91398
Direction                     91392
Start hour                    91398
Start minute                  91398
Number of private cycles      91392
Number of cycle hire bikes    91392
Total cycles                  91392
dtype: int64

##### Central London

In [14]:
# Determine the metadata of the data sets
print(central_london.shape)
print(central_london.columns)

(1048366, 17)
Index(['Survey wave (calendar quarter)', 'Equivalent financial quarter',
       'Site ID', 'Location', 'Survey date', 'Weather', 'Time', 'Period',
       'Direction', 'Start hour', 'Start minute', 'Number of private cycles',
       'Number of cycle hire bikes', 'Total cycles', 'Unnamed: 14',
       'Unnamed: 15', 'Unnamed: 16'],
      dtype='object')


In [19]:
central_london.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1048366 entries, 0 to 1048365
Data columns (total 17 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   Survey wave (calendar quarter)  758163 non-null  object 
 1   Equivalent financial quarter    758163 non-null  object 
 2   Site ID                         758163 non-null  object 
 3   Location                        758163 non-null  object 
 4   Survey date                     748007 non-null  object 
 5   Weather                         746329 non-null  object 
 6   Time                            758163 non-null  object 
 7   Period                          758163 non-null  object 
 8   Direction                       758163 non-null  object 
 9   Start hour                      758163 non-null  float64
 10  Start minute                    758163 non-null  float64
 11  Number of private cycles        758099 non-null  float64
 12  Number of cycl

In [15]:
# missing values for central london
missing_values_central_london = central_london.isnull().sum()

# Display the columns within missing values and their count
missing_values_central_london

Survey wave (calendar quarter)     290203
Equivalent financial quarter       290203
Site ID                            290203
Location                           290203
Survey date                        300359
Weather                            302037
Time                               290203
Period                             290203
Direction                          290203
Start hour                         290203
Start minute                       290203
Number of private cycles           290267
Number of cycle hire bikes         290267
Total cycles                       290203
Unnamed: 14                       1048366
Unnamed: 15                       1048366
Unnamed: 16                       1048366
dtype: int64