# COGS 108 - Final Project 

# Overview

We analyzed 11 quarters worth of Uber movement data from 2016-2018, where each quarter corresponds to a season (Winter, Spring, Summer, Fall). We did this by identifying shared routes across these 11 separate datasets and then combining information from different years for each quarter. We were then able to compare one quarter’s travel times to another. Finally, we performed a t-test to determine if the mean travel times between quarters were statistically different from one another and other data analysis and visualization techniques to identify any potential patterns between travel time and season.


# Names

- Brendan Wong
- Pooja Yadav
- Kaila Lee
- Rajandeep Kaur
- Zoey Chesny

# Group Members IDs

- A15749312
- A13997099
- A12792644
- A13736425
- A13303136

# Research Question

Do various quarters/seasons of the year affect Uber travel times during peak commute hours in San Francisco?



## Background and Prior Work

Emissions from vehicles have been adding to the everlasting modern-day problems of air pollution and traffic. The introduction of electric vehicles along with a greater emphasis placed on walking, biking, and/or using micro-mobility resources for transportation has reduced the impact to the growing gas emission problem as well as traffic congestion. However, there is still a reliance on vehicles to get people to their destinations, which is what rideshare services alleviate. These programs allow for multiple people to join a ride going in a similar destination, serving as a carpool, which saves gas, time, and money. 

Many people have contemplated either buying a car or just continuing to use rideshare options, especially when Uber and Lyft constantly promote their services with discount codes. New vehicles cost tens of thousand dollars, and research shows that cars are not even used 95% of the time(Barter). On average, a car is in usage for 6 out of the 168 hours of the week. This number is so small and on top of that the cost of owning a car continues to increase because of the maintenance, insurance, and the possibility of crashes. 

Rideshare programs such as Uber and Lyft are almost ubiquitous in our modern day world. Studies have shown that rideshare services have increased 37% from 1.9 billion to 2.61 billion people from 2016 to 2017. Both Uber and Lyft claim that one of their driving principles revolves around reducing traffic congestion through minimizing car ownership and usage. In urban cities such as San Francisco, Los Angeles and New York, analyzing peak commute times would prove to be a sound indicator of whether these companies are alleviating the flow of traffic. Currently, there are few projects that examine the direct relationship between rideshare usage and traffic during commute times. 

According to the automobility report, studies have shown that Lyft and Uber are actually creating more traffic and congestion instead of reducing it. For example, the report noted that Uber and Lyft added 5.7 billion miles of driving the most populated cities, contributing to around a 160% increase in driving in urban cities.

To investigate, we would analyze accumulated data about traffic patterns during peak commute times in densely populated cities since the onset of rideshare popularity, in particular, San Francisco. We would also look at specific usage trends with regards to rideshare program data to draw correlations between mean commute time and season.


References (include links):
- 1) https://www.ucsusa.org/clean-vehicles/electric-vehicles/CA-air-quality-equity
- 2) https://www.reinventingparking.org/2013/02/cars-are-parked-95-of-time-lets-check.html

# Hypothesis


We hypothesize that the different seasons will impact Uber travel times. Specifically, we predict that driving travel times will increase during the winter months (December - February) due to the cold weather conditions. In contrast, we predict that Uber travel times during peak commute times will decrease during the summer and autumn months (June - November) as a result of warmer weather. 


# Dataset(s)

## Uber Movement Data

Most of our data was pulled from Uber's Movement Dataset. Each row is the aggregated mean and standard deviation of travel time and geometric travel time over the course of each quarter in the fiscal year. At the start of the project, quarter 1 of 2016 through quarter 3 of 2018 were available. 

### We used 11 different Uber Movement Datasets 
#### (one per quarter for q1-2016 to q3-2018)

- Dataset Name: Uber Movement Data
- Link to the dataset: https://movement.uber.com/explore/san_francisco/travel-times/query?lang=en-US&si=1277&ti=&ag=censustracts&dt[tpb]=ALL_DAY&dt[wd;]=1,2,3,4,5,6,7&dt[dr][sd]=2018-12-01&dt[dr][ed]=2018-12-31&cd=&sa;=&sdn=&lat.=37.7749295&lng.=-122.4547777&z.=12
- Number of observations:

|Year|Quarter|Number of Observations|
|---|---|---|
|2016|1|5684666|
|2016|2|6598363|
|2016|3|7428235|
|2016|4|7735670|
|2017|1|7590838|
|2017|2|7983524|
|2017|3|8410747|
|2017|4|8789139|
|2018|1|8941177|
|2018|2|9226295|
|2018|3|9613339|


- Features/variables Present: sourceid (source id), dstid (destination id), hod (hour of day), mean_travel_time (mean travel time), standard_deviation_travel_time (standard deviation travel time), geometric_mean_travel_time, geometric_standard_deviation_travel_time
- Features/variables we will use: sourceid (source id), dstid (destination id), hod (hour of day), mean_travel_time (mean travel time)


Uber Movement was an initiative spearheaded by Uber to ensure that their data is accessible and useful for cities to inform the future of urban mobility. This dataset, provided by Uber, describes the amount of traffic and movement between various cities around the world. We have access to data about Uber usage in our target cities including San Francisco, Los Angeles and New York. The variables that we can manipulate include the origin and destination locations for the trip in addition to the date-time range, which correspond directly to the average travel time, displayed in minutes. This tool allows us to filter by specific weekdays or weekends and time blocks during the day, which allows us to specifically look at Uber usage during peak commute times. The number of observations varies based on the origin and destination locations. 

For example, we selected a highly congested route in San Francisco during the weekdays during peak morning commute time from 7am-10am. We were able to download a CSV file to observe data about this specific route’s traffic details, which resulted in 1,164 trip results. The average travel time for this route was 41 min and 50 sec in comparison to the same route during non-peak hours including the afternoon and weekends, where the average travel time was 30 min and 51 sec. 

We can also view the mean and range of travel times that correspond to our designated route over the span of a week or other various periods of time.  The dataset provides bar charts that we can use to visualize the number of Uber drivers and compare it to our other datasets such as DataSF. Additionally, this dataset aligns with our ethical considerations as all data is completely anonymized and aggregated to maintain driver and ride privacy. 

## How we combine 11 datasets 

We will peform our analysis only on routes that are present in all 11 datasets so that we can cleanly compare across quarters, ensuring the route is present for each comparison. We will also limit our analysis data to 100 routes because working with 11 datasets that each have millions of entries will be very computationall expensive. 

# Setup

The cell below includes the packages used for our analysis

In [200]:
# Imports

#Display plots directly in notebook
%matplotlib inline

# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn import linear_model
import patsy
import statsmodels.api as sm

# Data Cleaning

Most of our data was pulled from [Uber's Movement Dataset](https://movement.uber.com/?lang=en-US). Each row is the aggregated mean and standard deviation of travel time and geometric travel time over the course of each quarter in the fiscal year. At the start of the project, quarter 1 of 2016 through quarter 3 of 2018 were available. Sources and destinations are determined by 'sourceid' and 'dstid'. 

- 1) *Oberve the data:* check for missing values, look at shape of data, look at variables and observations
- 2) *Filter all 11 datasets:* We decided to choose the inner join of sources and destinations with over 10,000 rows as a source, and 5,000 as a destination. Furthermore we'd be tracking commute times so we'd be further filtering from rush hour in the Bay Area, all the time in San Francisco, but typically 7am - 10am and 3pm - 7pm.
- 3) *Observe filtered data:* Look at filtered data and decide how to clean it up. This involves reading in a filtered csv file to a dataframe and dropping unnecessary columns
- 4) *Find route method:* Create a method to find each route in a filtered dataframe: There are still multiple row entries for one route (a particular source and destination). This is because there are multiple hours of days corresponding to the one route. Here I take the average across these hours of the day and put them in one entry for that route. 
- 5) *Test find_route method*: I test the above method on the single df from step (3) to make sure the output is as expected
- 6) *Run find_route method on all filtered datasets*: save the new dataframes to a csv file so this code does not have to be run multiple times as it is very time consuming. 
- 7) *Create route_matching method*: This method find 100 routes that have the same source and destination across all 11 datasets and creates a new dataframe with route_name and corresponding mean travel time for each of the 4 quarters.

In [203]:
# STEP 1: Observe the data

# look at just 1 dataset to get a feel for the data and check for missing values

# 2018 quarter 3 weekdays
# will need to download separately and store in working dir
q3_2018_location = 'san_francisco-censustracts-2018-3-OnlyWeekdays-HourlyAggregate.csv'

uber_df = pd.read_csv(q3_2018_location)
uber_df.shape
# set contains almost 10 million rows of data

# check for missing values 
missing = uber_df.isnull().sum()
print("Missing Values:")
print(missing)

# sort by hours of the day 
uber_df.sort_values('hod').head()

Missing Values:
sourceid                                    0
dstid                                       0
hod                                         0
mean_travel_time                            0
standard_deviation_travel_time              0
geometric_mean_travel_time                  0
geometric_standard_deviation_travel_time    0
dtype: int64


Unnamed: 0,sourceid,dstid,hod,mean_travel_time,standard_deviation_travel_time,geometric_mean_travel_time,geometric_standard_deviation_travel_time
4806669,1498,1069,0,387.64,165.81,363.88,1.39
2732148,1742,1743,0,594.0,216.39,532.56,1.83
2732147,1743,1733,0,1319.04,326.51,1284.17,1.25
2732146,1772,1443,0,760.37,251.59,723.62,1.36
5281306,629,96,0,1102.59,329.58,1061.21,1.31


In [4]:
# STEP 2: Filter the Data

# filtered the data from files on my computer. I saved these files to a separate folder
# so we would not need to keep running this time consuming data cleanup 

start = time.time()

for root, dirs, files in os.walk("/Users/brendanwong/Desktop/DATACLEANUP"):
    for file in files:
        if file.endswith(".csv"):
            path = root +"/"+ file
            df = pd.read_csv(path)
            print(file + " shape: " + str(df.shape))
            
            df.sort_values('hod')
            df['sourceid'].value_counts();
            sources = df['sourceid'].value_counts() < 6000;
            sources = sources.reset_index()
            
            for index, items in sources.iterrows():
                if not items['sourceid']:
                    sources.drop(index, inplace=True)
            
            print("sources shape: " + str(sources.shape))
            
            for index, items in sources.iterrows():
                df = df[df.sourceid != items['index']]
                
            print(file + " shape: " + str(df.shape))
            
            df['dstid'].value_counts()
            destinations = df['dstid'].value_counts() < 3500
            destinations = destinations.reset_index()
            
            for index, items in destinations.iterrows():
                if not items['dstid']:
                    destinations.drop(index, inplace=True)
            
            print("destinations shape: " + str(destinations.shape))
            
            for index, items in destinations.iterrows():
                df = df[df.dstid != items['index']]
            
            print(file + " shape: " + str(df.shape))
            
            hours = [0,1,2,3,4,5,6,11,12,13,14,20,21,22,23,24]
            
            for hour in hours:
                df = df[df.hod != hour]
                
            print(file + " shape: " + str(df.shape))
            
            df.to_csv("filtered_" + file)
            print("FILTERED SHAPE: " + str(df.shape))
            print("\n\n")
end = time.time()


print("time:\n\n\n\n")

print(end-start)

(436433, 7)

In [58]:
# STEP 3 a: Observe 1 filtered datafile and decide how to clean it up

# read in filtered data into the dataframe for quarter 1, 2016
df1_loc = 'filtered_san_francisco-censustracts-2016-1-OnlyWeekdays-HourlyAggregate.csv'
df1 = pd.read_csv(df1_loc)
df1.head()

Unnamed: 0.1,Unnamed: 0,sourceid,dstid,hod,mean_travel_time,standard_deviation_travel_time,geometric_mean_travel_time,geometric_standard_deviation_travel_time
0,7,9,20,10,94.68,144.07,63.02,2.48
1,27,9,79,8,222.6,151.05,191.21,1.71
2,28,9,80,19,450.27,195.81,423.5,1.39
3,29,9,81,9,1424.28,366.28,1379.41,1.29
4,33,9,98,7,1191.91,302.11,1159.9,1.25


In [59]:
# STEP 3 b: drop unnecessary columns 
df1.drop(columns=['Unnamed: 0', 'hod', 'standard_deviation_travel_time', 'geometric_mean_travel_time', 'geometric_standard_deviation_travel_time'], inplace=True)
df1.head()

Unnamed: 0,sourceid,dstid,mean_travel_time
0,9,20,94.68
1,9,79,222.6
2,9,80,450.27
3,9,81,1424.28
4,9,98,1191.91


In [60]:
# STEP 4: method to find each route 

def find_route(df, year):
    # route_name : mean_travel_time (across all hours)
    route_data = {}
    for index, row in df.iterrows():
        route_name = str(row['sourceid']) + '-' + str(row['dstid'])
        if route_name not in route_data:
            to_add = (row['mean_travel_time'], 1)
            route_data[route_name] = to_add
        else: 
            count = route_data[route_name][1] + 1
            curr_sum = route_data[route_name][0] + row['mean_travel_time']
            to_add = (curr_sum, count)
            route_data[route_name] = to_add
    # now find the mean time for each route 
    route_names = [] 
    route_times = [] 
    years = []
    for route in route_data: 
        route_names.append(route)
        route_info = route_data[route]
        time = (route_info[0] / route_info[1])
        route_times.append(time)
        years.append(year)
    new_df = pd.DataFrame(list(zip(route_names, route_times)), columns=['route_name', 'mean_travel_time'])
    new_df['year'] = years
    new_df.set_index('route_name')
    return new_df 
    

In [61]:
# STEP 5: Test find_route method 

# this is the size before condensing into routes 
print('original size')
orig_size = df1.size
print(df1.size)

df1 = find_route(df1, 2016)

# this is the size of the new dataframe with routes condensed 
print('new size')
print(df1.size)
new_size = df1.size
rows_eliminated = orig_size - new_size

print('I eliminated this many rows: ' + str(rows_eliminated))
df1.head()


original size
1384290
new size
171615
I eliminated this many rows:1212675


Unnamed: 0,route_name,mean_travel_time,year
0,9.0-20.0,99.581111,2016
1,9.0-79.0,195.021111,2016
2,9.0-80.0,444.638889,2016
3,9.0-81.0,1473.006667,2016
4,9.0-98.0,1761.753333,2016
5,9.0-113.0,725.247778,2016
6,9.0-155.0,3621.33,2016
7,9.0-165.0,406.264444,2016
8,20.0-80.0,497.136667,2016
9,20.0-81.0,1449.917778,2016


In [174]:
# STEP 6: Run find_route method on all filtered dataset files 

import os
for root, dirs, files in os.walk("/Users/zoeychesny/Desktop/filtered_data"):
    cleaned_dfs = [] 
    for file in files:
        if file.endswith(".csv"):
            path = root + '/' + file
            df = pd.read_csv(path)

            print(file + " \noriginal shape: " + str(df.shape))

            #drop unnecessary columns 
            df.drop(columns=['hod', 'standard_deviation_travel_time', 'geometric_mean_travel_time', 'geometric_standard_deviation_travel_time'], inplace=True)
            
            
            # this is the size before condensing into routes 
            orig_size = df.size
            year = int(file[36:40])
            quarter = int(file[41])

            df = find_route(df, 2016)

            # this is the size of the new dataframe with routes condensed 
            print('new shape')
            print(df.shape)
            new_size = df.size
            rows_eliminated = orig_size - new_size

            print('I eliminated this many rows:' + str(rows_eliminated))
            
            df.to_csv("df_q" + str(quarter) + "_y"+ str(year) + ".csv")

            cleaned_dfs.append(df)

filtered_san_francisco-censustracts-2016-4-OnlyWeekdays-HourlyAggregate.csv 
original shape: (1258589, 8)
new shape
(163915, 3)
I eliminated this many rows:4542611



filtered_san_francisco-censustracts-2017-3-OnlyWeekdays-HourlyAggregate.csv 
original shape: (1525188, 8)
new shape
(199053, 3)
I eliminated this many rows:5503593



filtered_san_francisco-censustracts-2018-2-OnlyWeekdays-HourlyAggregate.csv 
original shape: (1950091, 8)
new shape
(256398, 3)
I eliminated this many rows:7031170



filtered_san_francisco-censustracts-2017-4-OnlyWeekdays-HourlyAggregate.csv 
original shape: (1740170, 8)
new shape
(228039, 3)
I eliminated this many rows:6276563



filtered_san_francisco-censustracts-2018-1-OnlyWeekdays-HourlyAggregate.csv 
original shape: (1847611, 8)
new shape
(240812, 3)
I eliminated this many rows:6668008



filtered_san_francisco-censustracts-2016-3-OnlyWeekdays-HourlyAggregate.csv 
original shape: (1154026, 8)
new shape
(149483, 3)
I eliminated this many rows:4167655



In [194]:
# STEP 7: create route_matching method 
find routes that are the same across all 11 quarters 

def route_matching(list_of_dfs):
    base = list_of_dfs[0]
    q1 = [0, 4, 8]
    q2 = [1, 5, 9]
    q3 = [2, 6, 10]
    q4 = [3, 7]
    all_routes = [] 
    q1_time = []
    q2_time = [] 
    q3_time = [] 
    q4_time = [] 
    
    quarters = [q1, q2, q3, q4]
    q_time = [q1_time, q2_time, q3_time, q4_time]
    
    route_count = 0 
    
    for index, row in base.iterrows():
        r_name = row['route_name']
           # now check all the other dfs for this route
        if is_route_in_all(r_name, list_of_dfs) == True:
            # find 100 routes across the datasets 
            if route_count > 100:
                break
            all_routes.append(r_name)
            route_count = route_count + 1
            q_num = 0 

                # iterate for each quarter 
            for i in range(4):
                time_sum = 0 
                for j in quarters[i]:
                    time_sum = time_sum + extract_route_time(list_of_dfs[j], r_name)
                time_mean = time_sum / len(quarters[i])
                q_time[i].append(time_mean)

        new_df = pd.DataFrame(columns=['route_name', 'Q1_mean_travel_time', 'Q2_mean_travel_time', 'Q3_mean_travel_time', 'Q4_mean_travel_time'])
        new_df['route_name'] = all_routes
        new_df['Q1_mean_travel_time'] = q1_time
        new_df['Q2_mean_travel_time'] = q2_time
        new_df['Q3_mean_travel_time'] = q3_time
        new_df['Q4_mean_travel_time'] = q4_time
    return new_df
    

def is_route_in_all(route, list_of_dfs):
    for df in list_of_dfs: 
        check = list(df['route_name'])
        if route not in check:
            return False
    return True

# route_time
def extract_route_time(df, route_name):
    row = df.loc[df['route_name'] == route_name]
    time_list = list(row['mean_travel_time'])
    return float(time_list[0])



## This is how the files are read so the above cleaning steps do not need to be run mutliple times 

In [204]:
# new dataframes from filtered data 

new_dfs = [] 

import os
for root, dirs, files in os.walk("/Users/zoeychesny/Desktop/COGS-108-Final-Project"):
    cleaned_dfs = [] 
    for file in files:
        if file.endswith(".csv"):
            if file[0:2] == 'df':
                path = root +"/"+ file
                df = pd.read_csv(path)
                df.drop(columns='Unnamed: 0', inplace=True)
                new_dfs.append(df)
                
is_route_in_all('9.0-20.0', new_dfs)




True

In [195]:
allquarter_df = route_matching(new_dfs)
print(allquarter_df.shape)
allquarter_df.head()

(101, 5)


Unnamed: 0,route_name,Q1_mean_travel_time,Q2_mean_travel_time,Q3_mean_travel_time,Q4_mean_travel_time
0,2363.0-2525.0,1229.138519,1221.216296,1259.281481,1252.651111
1,315.0-433.0,995.164444,1063.062963,1103.77963,1019.523333
2,314.0-443.0,1602.477407,1568.98037,1588.975926,1618.81
3,1733.0-1361.0,651.382593,736.626667,721.991481,667.976667
4,1743.0-1261.0,2392.205185,2555.788519,2500.797037,2447.785556


In [197]:
allquarter_df.to_csv('allquarter.csv')
allquarter_df.shape


(101, 5)

In [198]:
# read files 
new_dfs = [] 

import os
for root, dirs, files in os.walk("/Users/zoeychesny/Desktop/COGS-108-Final-Project"):
    cleaned_dfs = [] 
    for file in files:
        if file.endswith(".csv"):
            if file[0:2] == 'df':
                path = root +"/"+ file
                df = pd.read_csv(path)
                df.drop(columns='Unnamed: 0', inplace=True)
                new_dfs.append(df)
                
allquarter_df = pd.read_csv('allquarter.csv')

# Data Analysis & Results

Include cells that describe the steps in your data analysis.

Make a function that changes the quarter names to numeric values from 1-11, 1 being the earliest quarter and 11 being the latest, so they can be used for multiple linear regression.

In [None]:
def standardize_quarter(quarter):

    quarter = quarter.lower()
    
    quarter = quarter.strip()
    
    quarter = quarter.replace('q', '')
    quarter = quarter.replace('116', '1')
    quarter = quarter.replace('216', '2')
    quarter = quarter.replace('316', '3')
    quarter = quarter.replace('416', '4')
    quarter = quarter.replace('117', '5')
    quarter = quarter.replace('217', '6')
    quarter = quarter.replace('317', '7')
    quarter = quarter.replace('417', '8')
    quarter = quarter.replace('118', '9')
    quarter = quarter.replace('218', '10')
    quarter = quarter.replace('318', '11')
    
    quarter = quarter.strip()
    
    return int(quarter)

Apply function to data for all 4 routes.

In [None]:
for df_route in dfs_route_1:
    df_route['quarter'] = df_route['quarter'].apply(standardize_quarter)
for df_route in dfs_route_2:
    df_route['quarter'] = df_route['quarter'].apply(standardize_quarter)
for df_route in dfs_route_3:
    df_route['quarter'] = df_route['quarter'].apply(standardize_quarter)
for df_route in dfs_route_4:
    df_route['quarter'] = df_route['quarter'].apply(standardize_quarter)

Use linear models to check if there is a significant difference in mean drive time during each quarter over each route.

In [None]:
for df_route in dfs_route_1:
    outcome_route1, predictors_route1 = patsy.dmatrices('quarter ~ mean_travel_time', df_route)
    mod_route1 = sm.OLS(outcome_route1, predictors_route1)
    res_route1 = mod_route1.fit()

    p_value1 = res_route1.pvalues[1]
    
    if p_value1 < 0.01:
        print('There is a significant difference in mean drive time over route 1 during quarter', df_route['quarter'].iloc[0])

In [None]:
for df_route in dfs_route_2:
    outcome_route2, predictors_route2 = patsy.dmatrices('quarter ~ mean_travel_time', df_route)
    mod_route2 = sm.OLS(outcome_route2, predictors_route2)
    res_route2 = mod_route2.fit()

    p_value2 = res_route2.pvalues[1]
    
    if p_value2 < 0.01:
        print('There is a significant difference in mean drive time over route 2 during quarter', df_route['quarter'].iloc[0])

In [None]:
for df_route in dfs_route_3:
    outcome_route3, predictors_route3 = patsy.dmatrices('quarter ~ mean_travel_time', df_route)
    mod_route3 = sm.OLS(outcome_route3, predictors_route3)
    res_route3 = mod_route3.fit()

    p_value3 = res_route3.pvalues[1]
    
    if p_value3 < 0.01:
        print('There is a significant difference in mean drive time over route 3 during quarter', df_route['quarter'].iloc[0])

In [None]:
for df_route in dfs_route_4:
    outcome_route4, predictors_route4 = patsy.dmatrices('quarter ~ mean_travel_time', df_route)
    mod_route4 = sm.OLS(outcome_route4, predictors_route4)
    res_route4 = mod_route4.fit()

    p_value4 = res_route4.pvalues[1]
    
    if p_value4 < 0.01:
        print('There is a significant difference in mean drive time over route 4 during quarter', df_route['quarter'].iloc[0])

# Ethics & Privacy

## Privacy

Privacy can be guaranteed from both Uber and our analysis. User data from the dataset itself is anonymized and aggregated to ensure no personally identifiable information or user behavior can be surfaced. The dataset  only contains the average time from source to destination with no individual user data. The safe harbor method was already implemented in the data we were given, so we did not have to take further measures to anonymize the data. The geographic locations were encoded in numbered indices to ensure the specific geographical location of each route was kept private. There were no unique identifiers in the given dataset. 

## Ethics 

The goal of our project was not to make any general claims about how traffic is influenced by the season, but rather to apply methods and concepts we learned in class to real-life data as a data science investigation. We do not claim to know the travel time and traffic across the world or even in the US, but instead specifically analyzed the Uber travel patterns in San Francisco across the changing seasons from 2016-2018. The results of our data are not inherently harmful to anyone, but they could potentially impact the way Uber runs their company. While riders may want to get to their destinations as fast as possible, it still needs to be recognized that Uber cannot control traffic. No one wants increased travel times, so this could cause Uber to look into ways to improve their service especially in congested areas during peak commute times. However, it is important to realize that the results of our analysis are not intended to suggest anything about Uber as a company, but simply to analyze how the Uber travel times changes over the quarters. Our data is biased because it only accounts for Uber users, but this is acceptable in our analysis since we do not use our conclusions to make general claims about all people commuting in the San Francisco area. 

# Conclusion & Discussion

*Fill in your discussion information here*