### Prepping Data Challenge: Travel Plans (Week 4)

The final introductory challenge for 2022 looks at how are students are getting to and from the school. Are the students travelling in a sustainable manner? What's the most popular type of sustainable travel?

### Input
There are two inputs this week.

1. The same input as week 1:

2. Travel choices where each student has filled in how they got to school in the previous week. The students entered these themselves so there are some spelling mistakes to watch out for. 

### Requirements
 - Input the data sets
 - Join the data sets together based on their common field
 - Remove any fields you don't need for the challenge
 - Change the weekdays from separate columns to one column of weekdays and one of the pupil's travel choice
 - Group the travel choices together to remove spelling mistakes
 - Create a Sustainable (non-motorised) vs Non-Sustainable (motorised) data field 
   - Scooters are the child type rather than the motorised type
 - Total up the number of pupil's travelling by each method of travel 
 - Work out the % of trips taken by each method of travel each day
   - Round to 2 decimal places
 - Output the data

In [1]:
import pandas as pd
import numpy as np
import re

In [2]:
# Input the data sets.
df = pd.read_csv('wk4-Input.csv')

In [3]:
df.head()

Unnamed: 0,Student ID,M,Tu,W,Th,F
0,1,Car,Car,Car,Car,Bycycle
1,2,Bicycle,Bicycle,Bicycle,Walk,Walk
2,3,Car,Bicycle,Carr,Walk,Car
3,4,Scooter,Scooter,Scootr,Scooter,Scoter
4,5,Bycycle,Carr,Scoter,Walkk,Scoter


In [4]:
#Change the weekdays from separate columns to one column of weekdays and one of the pupil's travel choice
#Group the travel choices together
df_pivot = pd.melt(df, id_vars=['Student ID'], value_vars=['M','Tu','W','Th','F'], var_name='Weekday', value_name='Method of Travel')

In [5]:
df_pivot.head()

Unnamed: 0,Student ID,Weekday,Method of Travel
0,1,M,Car
1,2,M,Bicycle
2,3,M,Car
3,4,M,Scooter
4,5,M,Bycycle


In [6]:
#remove spelling mistakes
df_pivot['Method of Travel'].unique()

array(['Car', 'Bicycle', 'Scooter', 'Bycycle', 'Walk', 'Aeroplane',
       'Helicopter', 'Van', "Mum's Shoulders", 'Hopped', 'Carr', 'Walkk',
       "Dad's Shoulders", 'Skipped', 'Scootr', 'Scoter', 'Wallk',
       'Jumped', 'Helicopeter', 'WAlk', 'Waalk'], dtype=object)

In [7]:
spellcheck = {'Car':'^C.*','Bicycle':'^B.*','Walk':'^W.*','Scooter':'^Sc.*','Helicopter':'^He.*'}

for correct, pattern in spellcheck.items():
    df_pivot['Method of Travel'] = df_pivot['Method of Travel'].replace(to_replace = pattern, value = correct, regex = True)

In [8]:
df_pivot['Method of Travel'].unique()

array(['Car', 'Bicycle', 'Scooter', 'Walk', 'Aeroplane', 'Helicopter',
       'Van', "Mum's Shoulders", 'Hopped', "Dad's Shoulders", 'Skipped',
       'Jumped'], dtype=object)

In [9]:
#Create a Sustainable (non-motorised) vs Non-Sustainable (motorised) data field 
Sus = ['Scooter','Bicycle','Walk',"Mum's Shoulders","Dad's Shoulders","Jumped"]

df_pivot["Sustainable?"] = np.where((df_pivot['Method of Travel'].isin(Sus)), 'Sustainable',"Non-Sustainable")

In [10]:
#Total up the number of pupil's travelling by each method of travel
df_pivot['Number of Trips'] = df_pivot.groupby(['Weekday', 'Method of Travel'])['Student ID'].transform('size')
df_pivot['Trips per day'] = df_pivot.groupby(['Weekday'])['Student ID'].transform('size')
df_pivot["% of trips per day"] = (df_pivot['Number of Trips']/ df_pivot['Trips per day']).round(2)

In [11]:
df_pivot2 = df_pivot.groupby(["Sustainable?",'Method of Travel','Weekday','Number of Trips','Trips per day',"% of trips per day"])['Student ID'].count().reset_index()

In [12]:
df_pivot2.head()

Unnamed: 0,Sustainable?,Method of Travel,Weekday,Number of Trips,Trips per day,% of trips per day,Student ID
0,Non-Sustainable,Aeroplane,F,9,1000,0.01,9
1,Non-Sustainable,Aeroplane,M,9,1000,0.01,9
2,Non-Sustainable,Aeroplane,Th,9,1000,0.01,9
3,Non-Sustainable,Aeroplane,Tu,9,1000,0.01,9
4,Non-Sustainable,Aeroplane,W,9,1000,0.01,9


In [13]:
output = df_pivot2[["Sustainable?",'Method of Travel','Weekday','Number of Trips','Trips per day',"% of trips per day"]]

In [14]:
output.head(10)

Unnamed: 0,Sustainable?,Method of Travel,Weekday,Number of Trips,Trips per day,% of trips per day
0,Non-Sustainable,Aeroplane,F,9,1000,0.01
1,Non-Sustainable,Aeroplane,M,9,1000,0.01
2,Non-Sustainable,Aeroplane,Th,9,1000,0.01
3,Non-Sustainable,Aeroplane,Tu,9,1000,0.01
4,Non-Sustainable,Aeroplane,W,9,1000,0.01
5,Non-Sustainable,Car,F,254,1000,0.25
6,Non-Sustainable,Car,M,422,1000,0.42
7,Non-Sustainable,Car,Th,302,1000,0.3
8,Non-Sustainable,Car,Tu,364,1000,0.36
9,Non-Sustainable,Car,W,413,1000,0.41


In [15]:
#Output the data
output.to_csv('wk4-output.csv',index=False)