# Bikeshare data analysis

This is an analysis of bike trip data from a company in Chicago, Divvy. The financial analysts have come to the conclusion that members are more profitable to the company than casual riders. This study is commissioned by the marketing director Lily Moreno who wants to launch a campaign targetting to convert casual users to members. This analysis aims to look at the different ways casuals and members use the service. 

This notebook documents the ask, prepare, process, analyze, share phases of the data analysis cycle.

Some of the files used especially the trip_data folder are huge so here's a link to the data that you can download and reproduce this analysis. https://drive.google.com/drive/folders/1eh7afaC4Q_8OqkOaKh-XfzoI_Ei1OF9V?usp=sharing

In [2]:
#importing the required modules

import pandas as pd
from datetime import datetime 

In [3]:
# Load the data. The files are available in the data folder in the google drive link provided in the README.

january_data = pd.read_csv('trip_data/january2021.csv')
feb_data = pd.read_csv('trip_data/feb2021.csv')
march_data = pd.read_csv('trip_data/march2021.csv')
april_data = pd.read_csv('trip_data/april2021.csv')
may_data = pd.read_csv('trip_data/may2021.csv')
june_data = pd.read_csv('trip_data/june2021.csv')
july_data = pd.read_csv('trip_data/july2021.csv')
august_data = pd.read_csv('trip_data/august2021.csv')
september_data = pd.read_csv('trip_data/september2021.csv')
october_data = pd.read_csv('trip_data/october2021.csv')
november_data = pd.read_csv('trip_data/november2021.csv')
december_data = pd.read_csv('trip_data/december2021.csv')

In [4]:
# Combine the data into one dataframe.
whole_year_data = pd.concat([january_data, feb_data, march_data, april_data, may_data, june_data, july_data, august_data, september_data, october_data, november_data, december_data])

In [5]:
# Drop the columns that are not needed.
modified_year_data = whole_year_data.drop(['start_station_id', 'end_station_id', 'start_lat', 'start_lng', 'end_lat', 'end_lng'], axis=1)

In [6]:
#Check the data type of the columns and change the data types to the correct data type.

modified_year_data.dtypes
modified_year_data['started_at'] = pd.to_datetime(modified_year_data['started_at'])
modified_year_data['ended_at'] = pd.to_datetime(modified_year_data['ended_at'])

In [7]:
#Creating a new column for duration using 'started at' and 'ended at' columns

modified_year_data['duration'] = modified_year_data['ended_at'] - modified_year_data['started_at']

#Convert the timedelta object created to a float data type to enable easy manipulation of the same
modified_year_data['duration'] = modified_year_data['duration'].dt.total_seconds()

In [12]:
modified_year_data["day"] = modified_year_data['started_at'].map(lambda x: x.day)
modified_year_data["month"] = modified_year_data['started_at'].map(lambda x: x.month)
modified_year_data["year"] = modified_year_data['started_at'].map(lambda x: x.year)
modified_year_data.head()



Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,end_station_name,member_casual,duration
0,E19E6F1B8D4C42ED,electric_bike,2021-01-23 16:14:19,2021-01-23 16:24:44,California Ave & Cortez St,,member,625.0
1,DC88F20C2C55F27F,electric_bike,2021-01-27 18:43:08,2021-01-27 18:47:12,California Ave & Cortez St,,member,244.0
2,EC45C94683FE3F27,electric_bike,2021-01-21 22:35:54,2021-01-21 22:37:14,California Ave & Cortez St,,member,80.0
3,4FA453A75AE377DB,electric_bike,2021-01-07 13:31:13,2021-01-07 13:42:55,California Ave & Cortez St,,member,702.0
4,BE5E8EB4E7263A0B,electric_bike,2021-01-23 02:24:02,2021-01-23 02:24:45,California Ave & Cortez St,,casual,43.0
