# 2019 Bay Wheels' Trip Data Exploration and Visualization

### Preliminary Wrangling

This document explores a dataset containing [information](https://www.lyft.com/bikes/bay-wheels/system-data) about individual rides made in [Bay Wheels](https://en.wikipedia.org/wiki/Bay_Wheels)' bike-sharing system covering the greater San Francisco Bay area.  The dataset is a collection of csv files from January 2019 to December 2019 and can be manually downloaded [here](https://s3.amazonaws.com/baywheels-data/index.html).

In [13]:
# import all packages and set plots to be embedded inline
import glob
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

%matplotlib inline

In [9]:
# file path of folder containing all the csv files
path = './data'
# this will get all the filenames
all_data = glob.glob(path + "/*.csv")

# list that will contain all data in csv's
li = []

# iterate through all the files
for filename in all_data:
    # read each file into the temporary df
    temp = pd.read_csv(filename, index_col = None, header = 0)
    # append them to list
    li.append(temp)
    
# concatenate all of the csv's into a single df    
df = pd.concat(li, ignore_index=True)

In [11]:
# high level overview of data shape and composition
print(df.shape)
print(df.dtypes)

(375494, 14)
duration_sec                 int64
start_time                  object
end_time                    object
start_station_id           float64
start_station_name          object
start_station_latitude     float64
start_station_longitude    float64
end_station_id             float64
end_station_name            object
end_station_latitude       float64
end_station_longitude      float64
bike_id                      int64
user_type                   object
bike_share_for_all_trip     object
dtype: object


In [12]:
# i prefer looking at it through this instead of print(df.sample(10))
df.sample(10)

Unnamed: 0,duration_sec,start_time,end_time,start_station_id,start_station_name,start_station_latitude,start_station_longitude,end_station_id,end_station_name,end_station_latitude,end_station_longitude,bike_id,user_type,bike_share_for_all_trip
93370,1100,2019-02-15 17:33:06.4440,2019-02-15 17:51:26.9280,8.0,The Embarcadero at Vallejo St,37.799953,-122.398525,81.0,Berry St at 4th St,37.77588,-122.39317,3227,Subscriber,No
187120,310,2019-01-31 17:01:37.5500,2019-01-31 17:06:47.6510,50.0,2nd St at Townsend St,37.780526,-122.390288,23.0,The Embarcadero at Steuart St,37.791464,-122.391034,5354,Subscriber,No
85134,1377,2019-02-17 15:43:08.4870,2019-02-17 16:06:05.8870,71.0,Broderick St at Oak St,37.773063,-122.439078,115.0,Jackson Playground,37.765026,-122.398773,6453,Subscriber,Yes
178,451,2019-02-28 22:33:19.1090,2019-02-28 22:40:51.0530,197.0,El Embarcadero at Grand Ave,37.808848,-122.24968,182.0,19th Street BART Station,37.809013,-122.268247,3512,Subscriber,Yes
232731,688,2019-01-25 19:02:25.6290,2019-01-25 19:13:54.5300,23.0,The Embarcadero at Steuart St,37.791464,-122.391034,371.0,Lombard St at Columbus Ave,37.802746,-122.413579,4557,Subscriber,No
165077,906,2019-02-05 07:39:12.5880,2019-02-05 07:54:19.4470,16.0,Steuart St at Market St,37.79413,-122.39443,93.0,4th St at Mission Bay Blvd S,37.770407,-122.391198,3371,Subscriber,No
188727,292,2019-01-31 13:25:01.7160,2019-01-31 13:29:54.6970,341.0,Fountain Alley at S 2nd St,37.336188,-121.889277,281.0,9th St at San Fernando St,37.338395,-121.880797,520,Subscriber,Yes
301759,437,2019-01-14 22:59:25.5730,2019-01-14 23:06:43.0200,239.0,Bancroft Way at Telegraph Ave,37.868813,-122.258764,269.0,Telegraph Ave at Carleton St,37.86232,-122.258801,1818,Subscriber,No
217322,133,2019-01-28 11:55:00.8300,2019-01-28 11:57:14.7850,106.0,Sanchez St at 17th St,37.763242,-122.430675,120.0,Mission Dolores Park,37.76142,-122.426435,2607,Subscriber,Yes
300097,341,2019-01-15 13:18:22.2070,2019-01-15 13:24:03.9770,274.0,Oregon St at Adeline St,37.857567,-122.267558,246.0,Berkeley Civic Center,37.86906,-122.270556,4784,Subscriber,No


In [None]:
# descriive statistics for numeric variables
df.describe()

**What is the structure of your dataset?**

**What is/are the main feature(s) of interest in your dataset?**

**What features in the dataset do you think will help support your investigation into your feature(s) of interest?**

### Univariate Exploration

**Discuss the distribution(s) of your variable(s) of interest.  Were there any unusual points?  Did you need to perform any tranformations?**

**Of the features you investigated, were there any unusual distributions?  Did you perform any operations on the data to tidy, adjust, or change the form of the data?  If so, why did you do this?**

### Bivariate Exploration

**Talk about some of the relationships you observed in this part of the investigation.  How did the feature(s) of interest vary with other features in the dataset?**

**Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?**

### Multivariate Exploration

**Talk about some of the relationships you observed in this part of the investigation.  Were there features that strengthened each other in terms of looking at your feature(s) of interest?**

**Were there any interesting or surprising interactions between features?**