# Divvy Bike Share Project Remake
This is my Divvy Bike Project Remake. I completed this project a year ago using R for my Google Data Analytics Certificate. I've learned a lot since then and wanted to see if I could improve upon my prior work as well as demonstrate my ability to analyze data and build visualizations. I'll be using Python for this project because I love pandas 🐼

Here is a link to my original project using R: [https://rpubs.com/Peachtaco/bike_share_analysis](https://rpubs.com/Peachtaco/bike_share_analysis)

## Background
Before diving into the project, I wanted to provide some background on the Divvy Bike Share program. Divvy is a bike sharing program in Chicago that allows users to rent bikes from stations around the city. The program is owned by Lyft and has been around since 2013. The program has over 600 stations and 6,000 bikes with the option to choose an electric bike or classic bike. They offer a variety of passes including single ride, day pass, and annual membership. Customers who purchase a single ride or day pass are considered casual and customers who purchase an annual membership are considered members. My task will be to analyze the data and provide insight on how casual riders and members use the program differently. 

##

## Prepare Data
Divvy trip data is public and available on their [website](https://divvy-tripdata.s3.amazonaws.com/index.html). I downloaded 12 datasets, one for each month from May 2022 to April 2023. I read each CSV file into a dataframe, then concatenated each one into a single dataframe. 
The resulting dataframe contained:

5,859,061 rows and 13 columns. 

Review of each column showed `ride_id`, is a unique identifier for each ride. `rideable_type` is the type of bike used for the ride. There are 2 columns with datetime data showing the start and end time of each ride. There are 8 columns with location data which include start/end station name/id with corresponding start/end latitude/longtidude. `member_casual` indicates whether the rider is casual or a member. 

In [5]:
df.rideable_type.value_counts()

rideable_type
classic_bike     490758
electric_bike    476824
docked_bike       38525
Name: count, dtype: int64

In [2]:
# Importing libraries
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import glob as glob

In [8]:
df.info(show_counts=True)

<class 'pandas.core.frame.DataFrame'>
Index: 5859061 entries, 0 to 426589
Data columns (total 13 columns):
 #   Column              Non-Null Count    Dtype  
---  ------              --------------    -----  
 0   ride_id             5859061 non-null  object 
 1   rideable_type       5859061 non-null  object 
 2   started_at          5859061 non-null  object 
 3   ended_at            5859061 non-null  object 
 4   start_station_name  5027052 non-null  object 
 5   start_station_id    5026920 non-null  object 
 6   end_station_name    4969400 non-null  object 
 7   end_station_id      4969259 non-null  object 
 8   start_lat           5859061 non-null  float64
 9   start_lng           5859061 non-null  float64
 10  end_lat             5853088 non-null  float64
 11  end_lng             5853088 non-null  float64
 12  member_casual       5859061 non-null  object 
dtypes: float64(4), object(9)
memory usage: 625.8+ MB


In [10]:
df.end_station_id.isnull().sum()

ride_id                    0
rideable_type              0
started_at                 0
ended_at                   0
start_station_name    832009
start_station_id      832141
end_station_name      889661
end_station_id        889802
start_lat                  0
start_lng                  0
end_lat                 5973
end_lng                 5973
member_casual              0
dtype: int64

In [20]:
df[df['end_lat'].astype(str).str.len() < 7]

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
9,75F6A50A05E0AA18,electric_bike,2022-05-11 07:29:29,2022-05-11 07:30:57,Southport Ave & Waveland Ave,13235,N Southport Ave & W Newport Ave,20257.0,41.948067,-87.664012,41.94,-87.66,member
27,E90446F53B9301F7,electric_bike,2022-05-17 15:06:12,2022-05-17 15:10:57,Greenwood Ave & 47th St,TA1308000002,Prairie Ave & 47th St - midblock,814,41.809770,-87.599212,41.81,-87.62,member
66,92ACD45CFCD9364E,electric_bike,2022-05-04 08:03:15,2022-05-04 08:05:44,Lawler Ave & 50th St,400,Lawler Ave & 50th St,400,41.800000,-87.750000,41.80,-87.75,member
88,E00F0FB47A56B11C,electric_bike,2022-05-08 00:25:15,2022-05-08 00:25:25,Campbell Ave & Irving Park Rd,439,Campbell Ave & Irving Park Rd,439,41.950000,-87.690000,41.95,-87.69,casual
97,3B2E1C589DC7C89F,electric_bike,2022-05-26 08:18:05,2022-05-26 08:22:12,Desplaines St & Randolph St,15535,,,41.884453,-87.644598,41.89,-87.65,member
...,...,...,...,...,...,...,...,...,...,...,...,...,...
426531,DEBAA22493DB4C15,electric_bike,2023-04-11 17:32:15,2023-04-11 17:37:42,Richmond St & Diversey Ave,15645,Milwaukee Ave & Fullerton Ave,428,41.931986,-87.701222,41.92,-87.70,member
426550,BA983D5E50EFCF67,electric_bike,2023-04-26 07:22:12,2023-04-26 07:25:53,Richmond St & Diversey Ave,15645,Milwaukee Ave & Fullerton Ave,428,41.931900,-87.701285,41.92,-87.70,member
426556,1C78E5F5AA3CDC4E,electric_bike,2023-04-06 18:22:59,2023-04-06 18:28:43,Richmond St & Diversey Ave,15645,Milwaukee Ave & Fullerton Ave,428,41.931979,-87.701276,41.92,-87.70,member
426562,C74EC0CD8B9F2EDF,electric_bike,2023-04-30 00:27:59,2023-04-30 00:34:13,Central Ave & Lake St,16905,Menard Ave & Division St,305,41.887660,-87.765470,41.90,-87.77,casual


In [14]:
df[df['end_station_name'] == 'Elizabeth St & Randolph St']

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
104963,573B3ED64C054907,electric_bike,2023-02-21 17:40:25,2023-02-21 17:57:11,Wells St & Concord Ln,TA1308000050,Elizabeth St & Randolph St,,41.912541,-87.635266,41.88,-87.66,member
104972,629697843FA7D39B,electric_bike,2023-02-28 13:49:29,2023-02-28 13:51:56,Morgan St & Lake St*,chargingstx4,Elizabeth St & Randolph St,,41.885493,-87.652319,41.88,-87.66,member
104980,6CD0CA16336B1692,classic_bike,2023-02-12 11:13:49,2023-02-12 11:26:24,Kingsbury St & Erie St,13265,Elizabeth St & Randolph St,,41.893808,-87.641697,41.88,-87.66,member
104987,7257EFB8821BCFB0,classic_bike,2023-02-10 19:28:50,2023-02-10 19:35:56,Sangamon St & Washington Blvd,13409,Elizabeth St & Randolph St,,41.883165,-87.651100,41.88,-87.66,member
104995,9E92733AA4941415,electric_bike,2023-02-11 12:48:06,2023-02-11 13:04:05,Wabash Ave & Grand Ave,TA1307000117,Elizabeth St & Randolph St,,41.891219,-87.626759,41.88,-87.66,member
...,...,...,...,...,...,...,...,...,...,...,...,...,...
267136,0155D474892754EE,classic_bike,2023-04-15 16:33:24,2023-04-15 16:51:04,Michigan Ave & Lake St,TA1305000011,Elizabeth St & Randolph St,23001,41.886022,-87.624398,41.88,-87.66,casual
267138,3654809DE55A7C6C,classic_bike,2023-04-01 18:05:36,2023-04-01 18:09:35,Ogden Ave & Race Ave,13194,Elizabeth St & Randolph St,23001,41.891795,-87.658751,41.88,-87.66,member
267174,16BE0F995252670D,classic_bike,2023-04-22 07:00:24,2023-04-22 07:16:21,Halsted St & 18th St,13099,Elizabeth St & Randolph St,23001,41.857506,-87.645991,41.88,-87.66,member
267176,3F9DE0540EF67244,classic_bike,2023-04-30 08:30:56,2023-04-30 08:45:21,Halsted St & 18th St,13099,Elizabeth St & Randolph St,23001,41.857506,-87.645991,41.88,-87.66,member


In [12]:
df2 = df[(df['end_station_id'].isnull()) & (~df['end_station_name'].isnull())]

In [13]:
df2.end_station_name.value_counts()

end_station_name
Elizabeth St & Randolph St    133
Stony Island Ave & 63rd St      8
Name: count, dtype: int64

In [6]:
# Read and concatenate data
df = pd.concat(map(pd.read_csv, glob.glob("data/*.csv")))
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5859061 entries, 0 to 426589
Data columns (total 13 columns):
 #   Column              Dtype  
---  ------              -----  
 0   ride_id             object 
 1   rideable_type       object 
 2   started_at          object 
 3   ended_at            object 
 4   start_station_name  object 
 5   start_station_id    object 
 6   end_station_name    object 
 7   end_station_id      object 
 8   start_lat           float64
 9   start_lng           float64
 10  end_lat             float64
 11  end_lng             float64
 12  member_casual       object 
dtypes: float64(4), object(9)
memory usage: 625.8+ MB


In [3]:
df = pd.read_csv('tripdata.csv', index_col=False)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 426590 entries, 0 to 426589
Data columns (total 13 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   ride_id             426590 non-null  object 
 1   rideable_type       426590 non-null  object 
 2   started_at          426590 non-null  object 
 3   ended_at            426590 non-null  object 
 4   start_station_name  362776 non-null  object 
 5   start_station_id    362776 non-null  object 
 6   end_station_name    357960 non-null  object 
 7   end_station_id      357960 non-null  object 
 8   start_lat           426590 non-null  float64
 9   start_lng           426590 non-null  float64
 10  end_lat             426155 non-null  float64
 11  end_lng             426155 non-null  float64
 12  member_casual       426590 non-null  object 
dtypes: float64(4), object(9)
memory usage: 42.3+ MB


In [25]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 426590 entries, 0 to 426589
Data columns (total 13 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   ride_id             426590 non-null  object 
 1   rideable_type       426590 non-null  object 
 2   started_at          426590 non-null  object 
 3   ended_at            426590 non-null  object 
 4   start_station_name  362776 non-null  object 
 5   start_station_id    362776 non-null  object 
 6   end_station_name    357960 non-null  object 
 7   end_station_id      357960 non-null  object 
 8   start_lat           426590 non-null  float64
 9   start_lng           426590 non-null  float64
 10  end_lat             426155 non-null  float64
 11  end_lng             426155 non-null  float64
 12  member_casual       426590 non-null  object 
dtypes: float64(4), object(9)
memory usage: 42.3+ MB


In [7]:
#Combine latitude and longitude into new column
df['start_coord'] = df['start_lat'].astype(str) + ', ' + df['start_lng'].astype(str)
df['end_coord'] = df['end_lat'].astype(str) + ', ' + df['end_lng'].astype(str)

#Drop old latitude and longitude columns
columnstodrop = ['start_lat', 'start_lng', 'end_lat', 'end_lng']
df.drop(columnstodrop, axis=1, inplace=True)
df.head(5)

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,member_casual,start_coord,end_coord,duration
0,C809ED75D6160B2A,electric_bike,2021-05-30 11:58:15,2021-05-30 12:10:39,,,,,casual,"41.9, -87.63","41.89, -87.61",0 days 00:12:24
1,DD59FDCE0ACACAF3,electric_bike,2021-05-30 11:29:14,2021-05-30 12:14:09,,,,,casual,"41.88, -87.62","41.79, -87.58",0 days 00:44:55
2,0AB83CB88C43EFC2,electric_bike,2021-05-30 14:24:01,2021-05-30 14:25:13,,,,,casual,"41.92, -87.7","41.92, -87.7",0 days 00:01:12
3,7881AC6D39110C60,electric_bike,2021-05-30 14:25:51,2021-05-30 14:41:04,,,,,casual,"41.92, -87.7","41.94, -87.69",0 days 00:15:13
4,853FA701B4582BAF,electric_bike,2021-05-30 18:15:39,2021-05-30 18:22:32,,,,,casual,"41.94, -87.69","41.94, -87.7",0 days 00:06:53


In [8]:
#Change start and end time to datetime data type and create new column with duration. 
df['started_at'] = pd.to_datetime(df['started_at'])
df['ended_at'] = pd.to_datetime(df['ended_at'])
df['duration'] = df['ended_at'] - df['started_at']
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 531633 entries, 0 to 531632
Data columns (total 12 columns):
 #   Column              Non-Null Count   Dtype          
---  ------              --------------   -----          
 0   ride_id             531633 non-null  object         
 1   rideable_type       531633 non-null  object         
 2   started_at          531633 non-null  datetime64[ns] 
 3   ended_at            531633 non-null  datetime64[ns] 
 4   start_station_name  477889 non-null  object         
 5   start_station_id    477889 non-null  object         
 6   end_station_name    473439 non-null  object         
 7   end_station_id      473439 non-null  object         
 8   member_casual       531633 non-null  object         
 9   start_coord         531633 non-null  object         
 10  end_coord           531633 non-null  object         
 11  duration            531633 non-null  timedelta64[ns]
dtypes: datetime64[ns](2), object(9), timedelta64[ns](1)
memory usage: 48.7+ 

In [23]:
df[(df['end_coord'].str.len() < 14) & (~df['end_station_id'].isnull())].head(20)

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,member_casual,start_coord,end_coord,duration
16513,5A4B53EBD5675F01,electric_bike,2021-05-26 19:03:29,2021-05-26 19:58:27,Wells St & Elm St,KA1504000135,N Carpenter St & W Lake St,20251.0,member,"41.90319483333333, -87.63481","41.89, -87.65",0 days 00:54:58
18196,DB06CF7E538F3119,electric_bike,2021-05-18 16:52:43,2021-05-18 17:00:29,Loomis St & Lexington St,13332,W Washington Blvd & N Peoria St,20247.0,casual,"41.87214183333333, -87.66140816666666","41.88, -87.65",0 days 00:07:46
18412,9E0B43DB0DFBFCAB,electric_bike,2021-05-27 18:30:33,2021-05-27 18:47:14,Michigan Ave & 18th St,13150,W Washington Blvd & N Peoria St,20247.0,member,"41.857817, -87.6244815","41.88, -87.65",0 days 00:16:41
21929,8712C2D30A998293,electric_bike,2021-05-14 18:33:27,2021-05-14 18:46:48,Larrabee St & Webster Ave,13193,N Sheffield Ave & W Wellington Ave,20256.0,casual,"41.9218425, -87.64400083333334","41.94, -87.65",0 days 00:13:21
21939,01F90B8C45155F37,electric_bike,2021-05-15 22:33:58,2021-05-15 22:49:11,Larrabee St & Webster Ave,13193,N Sheffield Ave & W Wellington Ave,20256.0,casual,"41.921817833333336, -87.6440605","41.94, -87.65",0 days 00:15:13
27176,FFF43995A2D44EDE,electric_bike,2021-05-25 13:40:33,2021-05-25 13:44:29,Damen Ave & Cortland St,13133,Damen Ave & Wabansia Ave,20.0,casual,"41.915980833333336, -87.6772375","41.91, -87.68",0 days 00:03:56
27177,22F4E2592D9AD6A2,electric_bike,2021-05-02 13:31:12,2021-05-02 14:14:40,Stockton Dr & Wrightwood Ave,13276,Damen Ave & Wabansia Ave,20.0,casual,"41.931339, -87.63870416666667","41.91, -87.68",0 days 00:43:28
27183,1DAE9B6DA6B5F175,electric_bike,2021-05-08 18:39:32,2021-05-08 18:52:51,Damen Ave & Cortland St,13133,Damen Ave & Wabansia Ave,20.0,member,"41.9159585, -87.67725383333334","41.91, -87.68",0 days 00:13:19
27186,CCFEF8AD7044D79B,electric_bike,2021-05-29 17:04:12,2021-05-29 17:19:50,Clark St & Lincoln Ave,13179,Damen Ave & Wabansia Ave,20.0,casual,"41.91566866666667, -87.63460616666667","41.91, -87.68",0 days 00:15:38
27187,EFE01C725F51BF6D,electric_bike,2021-05-18 21:59:59,2021-05-18 22:09:55,Dayton St & North Ave,13058,Damen Ave & Wabansia Ave,20.0,casual,"41.91059916666666, -87.64939466666667","41.91, -87.68",0 days 00:09:56


In [16]:
df.isnull().sum()

ride_id                   0
rideable_type             0
started_at                0
ended_at                  0
start_station_name    53744
start_station_id      53744
end_station_name      58194
end_station_id        58194
member_casual             0
start_coord               0
end_coord                 0
duration                  0
dtype: int64

In [10]:
df2 = df[df['start_coord'] == df['end_coord']]
df2['start_coord'].value_counts()

start_coord
41.892278, -87.612043                     1560
41.880958, -87.616743                     1114
41.8810317, -87.62408432                   522
41.963982, -87.638181                      521
41.926277, -87.630834                      490
                                          ... 
41.84198816666667, -87.61695566666667        1
41.786691, -87.655859                        1
41.924078, -87.635939                        1
41.831098833333336, -87.62695233333334       1
42.01, -87.72                                1
Name: count, Length: 1261, dtype: int64

In [15]:
df3 = df2[df2['end_station_name'].isnull()]
df3[~df3['start_station_name'].isnull()]

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,member_casual,start_coord,end_coord,duration
240537,056DF0F760882A83,electric_bike,2021-05-08 13:01:27,2021-05-08 13:08:22,W Oakdale Ave & N Broadway,20252.0,,,casual,"41.94, -87.64","41.94, -87.64",0 days 00:06:55
240750,6EBE7F1CA2B51F69,electric_bike,2021-05-24 18:59:03,2021-05-24 19:01:36,W Oakdale Ave & N Broadway,20252.0,,,casual,"41.94, -87.64","41.94, -87.64",0 days 00:02:33
241280,EF88F8BCD87FB162,electric_bike,2021-05-29 01:24:56,2021-05-29 01:30:18,W Oakdale Ave & N Broadway,20252.0,,,casual,"41.94, -87.64","41.94, -87.64",0 days 00:05:22
241663,3591254CC4DF11DA,electric_bike,2021-05-15 16:21:04,2021-05-15 16:24:58,S Wentworth Ave & W 111th St,20128.0,,,casual,"41.69, -87.63","41.69, -87.63",0 days 00:03:54
242911,B5177182781F6515,electric_bike,2021-05-09 16:31:44,2021-05-09 16:50:49,W Washington Blvd & N Peoria St,20247.0,,,casual,"41.88, -87.65","41.88, -87.65",0 days 00:19:05
242914,022853DB0F6F71CE,electric_bike,2021-05-20 20:36:07,2021-05-20 20:39:27,W Washington Blvd & N Peoria St,20247.0,,,casual,"41.88, -87.65","41.88, -87.65",0 days 00:03:20
247858,A9FFFC2FE362E35D,electric_bike,2021-05-21 19:03:00,2021-05-21 19:11:48,N Green St & W Lake St,20246.0,,,casual,"41.89, -87.65","41.89, -87.65",0 days 00:08:48
248498,6BF4A9DADE5DA0C6,electric_bike,2021-05-22 08:53:40,2021-05-22 08:53:54,N Green St & W Lake St,20246.0,,,member,"41.89, -87.65","41.89, -87.65",0 days 00:00:14
249939,BAB1913A74A2217B,electric_bike,2021-05-14 17:34:22,2021-05-14 17:34:35,N Green St & W Lake St,20246.0,,,casual,"41.89, -87.65","41.89, -87.65",0 days 00:00:13
251323,20E76D5484CBC920,electric_bike,2021-05-01 11:46:02,2021-05-01 11:46:26,Damen Ave & Wabansia Ave,20.0,,,casual,"41.91, -87.68","41.91, -87.68",0 days 00:00:24
