# Analysis for VEHICLE database  : Dublin Bus

In [1]:
import pandas as pd

## Vehicles data

### Load file into dataframe

In [20]:
df_v = pd.read_csv("../DB/monthlyData/rt_vehicles_DB_2018.csv",skip_blank_lines=True,index_col=False)

### View vehicle data

In [21]:
df_v.head()

Unnamed: 0,DATASOURCE,DAYOFSERVICE,VEHICLEID,DISTANCE,MINUTES,LASTUPDATE,NOTE
0,DB,23-NOV-18 00:00:00,3303848,286166,58849,04-DEC-18 08:03:09,
1,DB,23-NOV-18 00:00:00,3303847,259545,56828,04-DEC-18 08:03:09,
2,DB,28-FEB-18 00:00:00,2868329,103096,40967,08-MAR-18 10:35:59,
3,DB,28-FEB-18 00:00:00,2868330,147277,43599,08-MAR-18 10:35:59,
4,DB,28-FEB-18 00:00:00,2868331,224682,40447,08-MAR-18 10:35:59,


### Properties of features

In [22]:
df_v.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 272622 entries, 0 to 272621
Data columns (total 7 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   DATASOURCE    272622 non-null  object 
 1   DAYOFSERVICE  272622 non-null  object 
 2   VEHICLEID     272622 non-null  int64  
 3   DISTANCE      272622 non-null  int64  
 4   MINUTES       272622 non-null  int64  
 5   LASTUPDATE    272622 non-null  object 
 6   NOTE          0 non-null       float64
dtypes: float64(1), int64(3), object(3)
memory usage: 14.6+ MB


In [23]:
df_v.nunique()

DATASOURCE           1
DAYOFSERVICE       360
VEHICLEID         1152
DISTANCE        170498
MINUTES          57523
LASTUPDATE         360
NOTE                 0
dtype: int64

**REVIEW OF FEATURES**

| Features        | Observation   | Decision  |
| :--------------- |:-------------|:-----|
| **DATASOURCE**      | Non -null column with single value DB; indicates source of datasource| **Drop** |
| **DAYOFSERVICE**    | Non -null column with  360 unique entries; maps vehicles which served on specific day of year| **No action** |
| **VEHICLEID**       | Non -null column with  1152 unique entries; indicate vehicle in service| **No action** |
| **DISTANCE**        | Non -null column with  170498 unique entries; expresses km travelled by vehicle on the day. Can be binned into 10s of km| **No action** |
| **MINUTES**         | Non -null column with  57523 unique entries;  expresses time served by vehicle on the day. Can be binned into 10s of minutes| **No action** |
| **LASTUPDATE**      | Non -null column with  360 unique entries; indicates timestamp for this data entry. Not relevent for vehicle information| **Drop** |
| **Note**            | Null column | **Drop** |

In [24]:
df_v.drop(columns=['DATASOURCE','LASTUPDATE','NOTE'],inplace=True)
df_v['DAYOFSERVICE'] =  pd.to_datetime(df_v['DAYOFSERVICE'])
df_v.head()

Unnamed: 0,DAYOFSERVICE,VEHICLEID,DISTANCE,MINUTES
0,2018-11-23,3303848,286166,58849
1,2018-11-23,3303847,259545,56828
2,2018-02-28,2868329,103096,40967
3,2018-02-28,2868330,147277,43599
4,2018-02-28,2868331,224682,40447


In [25]:
VEHICLEID_unique = df_v['VEHICLEID'].unique()
group_vehicleID = df_v.groupby(by='VEHICLEID',axis=1)


In [26]:
df_v.loc[df_v['VEHICLEID'] == VEHICLEID_unique[0]].sort_values(by=['DAYOFSERVICE'])

Unnamed: 0,DAYOFSERVICE,VEHICLEID,DISTANCE,MINUTES
250722,2018-10-19,3303848,26209,3909
207495,2018-10-20,3303848,220081,50415
257566,2018-10-21,3303848,234549,45537
228050,2018-10-22,3303848,302517,65491
163915,2018-10-23,3303848,213884,58108
...,...,...,...,...
262877,2018-12-20,3303848,307391,58295
260089,2018-12-21,3303848,228080,62226
272596,2018-12-29,3303848,171668,46215
266915,2018-12-30,3303848,292171,54587
