# Preppin Data Week 4:
### The Prep School - Travel Plans

Note: I omitted some steps from the instructions this week as they were unnecessary to produce the final output

### Import pandas

In [50]:
import pandas as pd

### Import data

In [51]:
df = pd.read_csv('PD 2021 WK 4 Input.csv')
df

Unnamed: 0,Student ID,M,Tu,W,Th,F
0,1,Car,Car,Car,Car,Bycycle
1,2,Bicycle,Bicycle,Bicycle,Walk,Walk
2,3,Car,Bicycle,Carr,Walk,Car
3,4,Scooter,Scooter,Scootr,Scooter,Scoter
4,5,Bycycle,Carr,Scoter,Walkk,Scoter
...,...,...,...,...,...,...
995,996,Walk,Walk,Wallk,Walk,WAlk
996,997,Car,Walk,Bicycle,Walk,Walk
997,998,Bicycle,Bicycle,Bicycle,Bicycle,Bicycle
998,999,Bicycle,Bicycle,Bicycle,Walk,Walk


### Reshape data to change weekdays from separate columns to one column of weekdays and one of the pupil's travel choice

In [52]:
df = df.melt(id_vars=['Student ID'], value_vars=['M', 'Tu', 'W', 'Th', 'F'], var_name='Weekday', value_name='Method of Travel')
df.head()

Unnamed: 0,Student ID,Weekday,Method of Travel
0,1,M,Car
1,2,M,Bicycle
2,3,M,Car
3,4,M,Scooter
4,5,M,Bycycle


### Clean the methods of travel to remove spelling mistakes

1. Find out what all of the spelling errors are using .value_counts()
    - you can also use .unique(). I like the .value_counts() output better

In [53]:
df['Method of Travel'].value_counts()

Car                1586
Walk               1035
Bicycle             710
Scooter             252
Scoter              252
Walkk               176
Carr                169
Van                 162
Bycycle             162
Wallk                84
WAlk                 84
Scootr               84
Helicopter           72
Mum's Shoulders      46
Aeroplane            45
Waalk                24
Dad's Shoulders      24
Helicopeter          24
Jumped                3
Hopped                3
Skipped               3
Name: Method of Travel, dtype: int64

2. Change the necessary values using pd.replace()
    - you can also use a combination of .lower() and .replace() to eliminate the error caused by 'WAlk'. As there was only one error of this type, I replaced it manually.

In [54]:
df = df.replace('Scoter', 'Scooter')
df = df.replace('Walkk', 'Walk')
df = df.replace('Carr', 'Car')
df = df.replace('Wallk', 'Walk')
df = df.replace('WAlk', 'Walk')
df = df.replace('Bycycle', 'Bicycle')
df = df.replace('Scootr', 'Scooter')
df = df.replace('Waalk', 'Walk')
df = df.replace('Helicopeter', 'Helicopter')

3. Check to make sure all values have been changed.

In [55]:
df['Method of Travel'].value_counts()

Car                1755
Walk               1403
Bicycle             872
Scooter             588
Van                 162
Helicopter           96
Mum's Shoulders      46
Aeroplane            45
Dad's Shoulders      24
Hopped                3
Jumped                3
Skipped               3
Name: Method of Travel, dtype: int64

### Create a Sustainable column marking motorized options as non-sustainable and non-motorized options as sustainable

1. Create the new column via assignment, using replace to replace the motorized options with 'Non-Sustainable'

In [56]:
df['Sustainable'] = df['Method of Travel'].replace(['Car', 'Van', 'Helicopter', 'Aeroplane'], 'Non-Sustainable')

2. Replace the rest of the values in the column with the 'Sustainable' designation

In [57]:
df['Sustainable'] = df['Sustainable'].replace(['Walk', 'Bicycle', 'Scooter', "Mum's Shoulders", "Dad's Shoulders", 'Hopped', 'Jumped', 'Skipped'], 'Sustainable')

3. Check to make sure no values have been missed.

In [58]:
df['Sustainable'].value_counts()

Sustainable        2942
Non-Sustainable    2058
Name: Sustainable, dtype: int64

### Total up the number of pupils traveling by each method of travel for each weekday

1. Create a groupby based on the Sustainable, Method of Travel, and Weekday columns.
    - aggregate by .count
    - reset index to create a dataframe rather than a multi-index groupby object
    
    This will count the number of student trips using the student ID column

In [61]:
gb = df.groupby(by=['Sustainable', 'Method of Travel', 'Weekday']).count().reset_index()
gb

Unnamed: 0,Sustainable,Method of Travel,Weekday,Student ID
0,Non-Sustainable,Aeroplane,F,9
1,Non-Sustainable,Aeroplane,M,9
2,Non-Sustainable,Aeroplane,Th,9
3,Non-Sustainable,Aeroplane,Tu,9
4,Non-Sustainable,Aeroplane,W,9
5,Non-Sustainable,Car,F,254
6,Non-Sustainable,Car,M,422
7,Non-Sustainable,Car,Th,302
8,Non-Sustainable,Car,Tu,364
9,Non-Sustainable,Car,W,413


2. Rename the student ID column to 'Number of Trips'

In [62]:
gb = gb.rename(columns={'Student ID' : 'Number of Trips'})
gb.head()

Unnamed: 0,Sustainable,Method of Travel,Weekday,Number of Trips
0,Non-Sustainable,Aeroplane,F,9
1,Non-Sustainable,Aeroplane,M,9
2,Non-Sustainable,Aeroplane,Th,9
3,Non-Sustainable,Aeroplane,Tu,9
4,Non-Sustainable,Aeroplane,W,9


### Add a trips by day column

In [63]:
gb['Trips per day'] = 1000
gb.head()

Unnamed: 0,Sustainable,Method of Travel,Weekday,Number of Trips,Trips per day
0,Non-Sustainable,Aeroplane,F,9,1000
1,Non-Sustainable,Aeroplane,M,9,1000
2,Non-Sustainable,Aeroplane,Th,9,1000
3,Non-Sustainable,Aeroplane,Tu,9,1000
4,Non-Sustainable,Aeroplane,W,9,1000


### Calculate the% of trips taken by each method of travel each day
1. Calculate via column division.
    - round to two decimal places

In [65]:
gb['% of trips per day'] = round(gb['Number of Trips']/gb['Trips per day'], 2)
gb.head()

Unnamed: 0,Sustainable,Method of Travel,Weekday,Number of Trips,Trips per day,% of trips per day
0,Non-Sustainable,Aeroplane,F,9,1000,0.01
1,Non-Sustainable,Aeroplane,M,9,1000,0.01
2,Non-Sustainable,Aeroplane,Th,9,1000,0.01
3,Non-Sustainable,Aeroplane,Tu,9,1000,0.01
4,Non-Sustainable,Aeroplane,W,9,1000,0.01


### Export dataframe to CSV

In [66]:
gb.to_csv('pandas_solution.csv', index=False)