# 2019 Bay Wheels' Trip Data Exploration and Visualization

### Preliminary Wrangling

This document explores a dataset containing [information](https://www.lyft.com/bikes/bay-wheels/system-data) about individual rides made in [Bay Wheels](https://en.wikipedia.org/wiki/Bay_Wheels)' bike-sharing system covering the greater San Francisco Bay area.  The dataset is a collection of csv files from January 2019 to December 2019 and can be manually downloaded [here](https://s3.amazonaws.com/baywheels-data/index.html).

In [27]:
# import all packages and set plots to be embedded inline
import glob
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

%matplotlib inline

In [28]:
# file path of folder containing all the csv files
path = './data'
# this will get all the filenames
all_data = glob.glob(path + "/*.csv")

# list that will contain all data in csv's
li = []

# iterate through all the files
for filename in all_data:
    # read each file into the temporary df
    temp = pd.read_csv(filename, index_col = None, header = 0)
    # append them to list
    li.append(temp)
    
# concatenate all of the csv's into a single df    
df = pd.concat(li, ignore_index=True)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.




In [29]:
# save the combined dataframe to a csv file
df.to_csv('fordgobike_tripdata_2019.csv', index = False)

In [30]:
# high level overview of data shape and composition
print(df.shape)
print(df.dtypes)

(2506983, 15)
bike_id                      int64
bike_share_for_all_trip     object
duration_sec                 int64
end_station_id             float64
end_station_latitude       float64
end_station_longitude      float64
end_station_name            object
end_time                    object
rental_access_method        object
start_station_id           float64
start_station_latitude     float64
start_station_longitude    float64
start_station_name          object
start_time                  object
user_type                   object
dtype: object


In [31]:
# i prefer looking at it through this instead of print(df.sample(10))
df.sample(10)

Unnamed: 0,bike_id,bike_share_for_all_trip,duration_sec,end_station_id,end_station_latitude,end_station_longitude,end_station_name,end_time,rental_access_method,start_station_id,start_station_latitude,start_station_longitude,start_station_name,start_time,user_type
458374,5535,No,1834,400.0,37.804272,-122.433537,Buchanan St at North Point St,2019-03-17 17:36:15.1920,,400.0,37.804272,-122.433537,Buchanan St at North Point St,2019-03-17 17:05:40.4720,Customer
664462,2466,Yes,96,239.0,37.868813,-122.258764,Bancroft Way at Telegraph Ave,2019-10-22 15:23:52.1830,,243.0,37.86936,-122.254337,Bancroft Way at College Ave,2019-10-22 15:22:15.9210,Subscriber
641846,986,No,270,30.0,37.776598,-122.395282,San Francisco Caltrain (Townsend St at 4th St),2019-10-24 20:01:11.1160,,90.0,37.771058,-122.402717,Townsend St at 7th St,2019-10-24 19:56:40.9310,Subscriber
1643729,3636,No,212,372.0,37.804037,-122.262409,Madison St at 17th St,2019-09-03 13:13:28.9120,,7.0,37.804562,-122.271738,Frank H Ogawa Plaza,2019-09-03 13:09:56.6000,Subscriber
2075063,3034,No,2390,15.0,37.795392,-122.394203,San Francisco Ferry Building (Harry Bridges Pl...,2019-07-29 09:25:26.1420,,115.0,37.765026,-122.398773,Jackson Playground,2019-07-29 08:45:35.4570,Subscriber
198750,3601,No,294,386.0,37.752105,-122.419724,24th St at Bartlett St,2019-02-22 07:34:48.0530,,142.0,37.745739,-122.42214,Guerrero Park,2019-02-22 07:29:53.1280,Customer
918174,2817,No,1428,151.0,37.836182,-122.28718,53rd St at Hollis St,2019-04-17 09:00:39.9400,,244.0,37.873676,-122.268487,Shattuck Ave at Hearst Ave,2019-04-17 08:36:51.9330,Subscriber
65062,10485,,1536,27.0,37.788059,-122.391865,Beale St at Harrison St,2019-12-12 09:24:43.7150,,381.0,37.758238,-122.426094,20th St at Dolores St,2019-12-12 08:59:07.1660,Customer
750462,2119,No,922,460.0,37.769095,-122.386333,Terry Francois Blvd at Warriors Way,2019-10-10 18:24:03.8710,,50.0,37.780526,-122.390288,2nd St at Townsend St,2019-10-10 18:08:41.8060,Customer
2308262,567381,,507,241.0,37.852477,-122.270213,Ashby BART Station,2019-07-16 07:25:05,app,,37.840826,-122.279685,,2019-07-16 07:16:37,Subscriber


In [32]:
# descriive statistics for numeric variables
df.describe()

Unnamed: 0,bike_id,duration_sec,end_station_id,end_station_latitude,end_station_longitude,start_station_id,start_station_latitude,start_station_longitude
count,2506983.0,2506983.0,2424081.0,2506983.0,2506983.0,2426249.0,2506983.0,2506983.0
mean,27898.33,807.6483,142.7044,37.76422,-122.3459,146.5047,37.76506,-122.3499
std,114606.7,1974.714,121.4296,0.2392885,0.7080417,122.3171,0.1363698,0.3089648
min,4.0,60.0,3.0,0.0,-122.5143,3.0,0.0,-122.5143
25%,1952.0,359.0,43.0,37.77003,-122.4117,47.0,37.76931,-122.413
50%,4420.0,571.0,101.0,37.78076,-122.3981,105.0,37.78053,-122.3983
75%,9682.0,887.0,239.0,37.79587,-122.2934,243.0,37.79539,-122.2914
max,999941.0,912110.0,498.0,45.51,0.0,498.0,45.51,0.0


**What is the structure of your dataset?**

**What is/are the main feature(s) of interest in your dataset?**

**What features in the dataset do you think will help support your investigation into your feature(s) of interest?**

### Univariate Exploration

**Discuss the distribution(s) of your variable(s) of interest.  Were there any unusual points?  Did you need to perform any tranformations?**

**Of the features you investigated, were there any unusual distributions?  Did you perform any operations on the data to tidy, adjust, or change the form of the data?  If so, why did you do this?**

### Bivariate Exploration

**Talk about some of the relationships you observed in this part of the investigation.  How did the feature(s) of interest vary with other features in the dataset?**

**Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?**

### Multivariate Exploration

**Talk about some of the relationships you observed in this part of the investigation.  Were there features that strengthened each other in terms of looking at your feature(s) of interest?**

**Were there any interesting or surprising interactions between features?**