### Prepping Data Challenge:  Lift Your Spirits (week 30)

### Input
There is one input this week, detailing the time of each trip the lift takes, including which floor the passengers enter the lift and which floor the passengers leave the lift. 

For simplicity, assume that the lift does not stop mid-journey to pick up new passengers, but completes its current trip before starting a new one.

### Requirements
 - Input the data
 - Create a TripID field based on the time of day
   - Assume all trips took place on 12th July 2021
 - Calculate how many floors the lift has to travel between trips
   - The order of floors is B, G, 1, 2, 3, etc.
 - Calculate which floor the majority of trips begin at - call this the Default Position
 - If every trip began from the same floor, how many floors would the lift need to travel to begin each journey?
   - e.g. if the default position of the lift were floor 2 and the trip was starting from the 4th floor, this would be 2 floors that the lift would need to travel
 - How does the average floors travelled between trips compare to the average travel from the default position?
 - Output the data

In [1]:
import pandas as pd

In [2]:
#Input the data
df = pd.read_csv('WK30-Input.csv')

In [3]:
df.head()

Unnamed: 0,Hour,Minute,From,To
0,0,1,G,8
1,0,2,4,G
2,0,2,11,G
3,0,3,B,G
4,0,4,1,G


In [4]:
#Create a TripID field based on the time of day
#Assume all trips took place on 12th July 2021
df['Trip_time'] = pd.to_datetime('2021-07-12 ' + df['Hour'].astype(str) + ':' + df['Minute'].astype(str), format='%Y-%m-%d %H:%M')
df = df.reset_index().sort_values(by=['Trip_time', 'index']).rename(columns={'index' : 'TripID'})

In [5]:
#Calculate how many floors the lift has to travel between trips
#The order of floors is B, G, 1, 2, 3, etc.
df = df.replace({'From':{'G':0, 'B':-1}, 'To':{'G':0, 'B':-1}})
df['From'] = df['From'].astype('int')
df['To'] = df['To'].astype('int')
df['Floors'] = abs(df['From'].shift(-1) - df['To'])

In [6]:
df.head()

Unnamed: 0,TripID,Hour,Minute,From,To,Trip_time,Floors
0,0,0,1,0,8,2021-07-12 00:01:00,4.0
1,1,0,2,4,0,2021-07-12 00:02:00,11.0
2,2,0,2,11,0,2021-07-12 00:02:00,1.0
3,3,0,3,-1,0,2021-07-12 00:03:00,1.0
4,4,0,4,1,0,2021-07-12 00:04:00,10.0


In [7]:
#Calculate which floor the majority of trips begin at - call this the Default Position
df['Default Position'] = df.groupby('From').count().Hour.idxmax()

In [8]:
#If every trip began from the same floor, how many floors would the lift need to travel to begin each journey?
#e.g. if the default position of the lift were floor 2 and the trip was starting from the 4th floor, 
#this would be 2 floors that the lift would need to travel
df['From Default floor'] = abs(df['From'] - df['Default Position'])

In [9]:
#How does the average floors travelled between trips compare to the average travel from the default position?
output = df.groupby('Default Position', as_index=False).agg({'From Default floor':'mean','Floors':'mean'})
output['Difference'] = output['From Default floor'] - output['Floors']
output = output.rename(columns={'From Default Position':'Avg travel from default position','Floor':'Avg travel between trips currently'})

In [10]:
output.head()

Unnamed: 0,Default Position,From Default floor,Floors,Difference
0,0,3.744692,4.364188,-0.619497


In [11]:
output.to_csv('wk30-output.csv', index=False)