## daily_system_reports.csv

The averages of all daily_unit_reports.csv for a given day, to provide a system wide snapshot.

 - *day*
 - *elevators_num_fixes*
 - *escalators_availability*
 - *escalators_num_fixes*
 - *elevators_num_inspections*
 - *elevators_num_breaks*
 - *escalators_num_inspections*
 - *elevators_num_units*: Number of elevators included in the daily system report.
 - *escalators_num_units*: Number of escalators included in the daily system report.
 - *escalators_num_breaks*
 - *elevators_broken_time_percentage*
 - *elevators_availability*
 - *escalators_broken_time_percentage*

In [16]:
import pandas as pd

In [35]:
data = pd.read_csv('C:\\Users\\606569\\Documents\\my_GitHub\\D.C-metro-analysis\\data\\daily_system_reports.csv')

In [36]:
data.head()

Unnamed: 0,elevators_num_fixes,escalators_availability,escalators_num_fixes,elevators_num_inspections,elevators_num_breaks,escalators_num_inspections,elevators_num_units,escalators_num_units,escalators_num_breaks,elevators_broken_time_percentage,elevators_availability,day,escalators_broken_time_percentage
0,0,0.934766,32,0,0,1,0,588,59,0.0,1.0,2013-06-01,0.036173
1,0,0.920018,73,0,0,35,0,588,61,0.0,1.0,2013-06-02,0.049931
2,0,0.935932,71,0,0,46,0,588,71,0.0,1.0,2013-06-03,0.032025
3,0,0.935348,59,0,0,63,0,588,58,0.0,1.0,2013-06-04,0.027606
4,0,0.936133,62,0,0,42,0,588,66,0.0,1.0,2013-06-05,0.031049


This df is quite big, so to make this df more managable, I will break it down to different escalators and elevators df's, instead of having both on 1 df

In [37]:
data.columns

Index(['elevators_num_fixes', 'escalators_availability',
       'escalators_num_fixes', 'elevators_num_inspections',
       'elevators_num_breaks', 'escalators_num_inspections',
       'elevators_num_units', 'escalators_num_units', 'escalators_num_breaks',
       'elevators_broken_time_percentage', 'elevators_availability', 'day',
       'escalators_broken_time_percentage'],
      dtype='object')

In [38]:
escalators = data.drop(['elevators_num_fixes',
       'elevators_num_inspections',
       'elevators_num_breaks', 
       'elevators_num_units',
       'elevators_broken_time_percentage', 
        'elevators_availability'], axis=1)

In [39]:
elevators = data.drop(['escalators_availability',
       'escalators_num_fixes',
       'escalators_num_inspections',
      'escalators_num_units', 'escalators_num_breaks',
       'escalators_broken_time_percentage'], axis = 1)

In [40]:
escalators.head()

Unnamed: 0,escalators_availability,escalators_num_fixes,escalators_num_inspections,escalators_num_units,escalators_num_breaks,day,escalators_broken_time_percentage
0,0.934766,32,1,588,59,2013-06-01,0.036173
1,0.920018,73,35,588,61,2013-06-02,0.049931
2,0.935932,71,46,588,71,2013-06-03,0.032025
3,0.935348,59,63,588,58,2013-06-04,0.027606
4,0.936133,62,42,588,66,2013-06-05,0.031049


In [41]:
elevators.head()

Unnamed: 0,elevators_num_fixes,elevators_num_inspections,elevators_num_breaks,elevators_num_units,elevators_broken_time_percentage,elevators_availability,day
0,0,0,0,0,0.0,1.0,2013-06-01
1,0,0,0,0,0.0,1.0,2013-06-02
2,0,0,0,0,0.0,1.0,2013-06-03
3,0,0,0,0,0.0,1.0,2013-06-04
4,0,0,0,0,0.0,1.0,2013-06-05


# Cleaning the Data
- I'm going to start with just the escelators data, and ignore the elevators data for now, and if I have time I will do both

I want to break down the full date column into a year, month and day column, and rename the day column to a full_date column

In [42]:
escalators = escalators.rename(columns = {'day' : 'full_date'})

In [43]:
from datetime import datetime

# changes 'time_stamp' to datetime
escalators['full_date']= pd.to_datetime(escalators['full_date']) 

In [44]:
escalators.dtypes

escalators_availability                     float64
escalators_num_fixes                          int64
escalators_num_inspections                    int64
escalators_num_units                          int64
escalators_num_breaks                         int64
full_date                            datetime64[ns]
escalators_broken_time_percentage           float64
dtype: object

In [45]:
# creating new columns 
escalators['year'] = ''
escalators['month'] = ''
escalators['day'] = ''
escalators['season'] = ''

escalators['weekday'] = ''  # this will be what day it is (Mon-Sun)
escalators['season'] = ''   # this will be what season (winter = 1, spring = 2, summer = 3, fall = 4)
escalators['workday'] = ''  # if it's a workday or not (0 = not a workday, 1 = a workday)

In [46]:
# fills in 'weekday' based on 'time_stamp' 
escalators.weekday = escalators.full_date.dt.dayofweek

# fill in 'year' column based on 'time_stamp'
escalators.year = escalators.full_date.dt.year

# fill in 'month' column based on 'time_stamp'
escalators.month = escalators.full_date.dt.month

# fill in 'season' column based on 'time_stamp'
escalators.season = escalators.full_date.dt.quarter

# fill in 'day' column based on 'time_stamp'
escalators.day = escalators.full_date.dt.day

In [47]:
escalators.head()

Unnamed: 0,escalators_availability,escalators_num_fixes,escalators_num_inspections,escalators_num_units,escalators_num_breaks,full_date,escalators_broken_time_percentage,year,month,day,season,weekday,workday
0,0.934766,32,1,588,59,2013-06-01,0.036173,2013,6,1,2,5,
1,0.920018,73,35,588,61,2013-06-02,0.049931,2013,6,2,2,6,
2,0.935932,71,46,588,71,2013-06-03,0.032025,2013,6,3,2,0,
3,0.935348,59,63,588,58,2013-06-04,0.027606,2013,6,4,2,1,
4,0.936133,62,42,588,66,2013-06-05,0.031049,2013,6,5,2,2,


In [57]:
escalators.escalators_num_fixes.max()

2856

In [63]:
escalators.loc[escalators['escalators_broken_time_percentage']].max()

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  """Entry point for launching an IPython kernel.


escalators_availability                         0.934766
escalators_num_fixes                                  32
escalators_num_inspections                             1
escalators_num_units                                 588
escalators_num_breaks                                 59
full_date                            2013-06-01 00:00:00
escalators_broken_time_percentage              0.0361732
year                                                2013
month                                                  6
day                                                    1
season                                                 2
weekday                                                5
dtype: object

In [59]:
escalators.loc[escalators['escalators_num_fixes']].max()

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  """Entry point for launching an IPython kernel.


escalators_availability                         0.962585
escalators_num_fixes                                 115
escalators_num_inspections                           129
escalators_num_units                                 616
escalators_num_breaks                                110
full_date                            2018-01-22 00:00:00
escalators_broken_time_percentage              0.0572754
year                                                2018
month                                                 12
day                                                   31
season                                                 4
weekday                                                6
dtype: object