# Case study: How does a bike-share navigate speedy success?

Cyclistic company has a bike-share program that features more than 5,800 bicycles and 600 docking stations

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the  exibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing exibility helps Cyclistic attract more customers, Moreno, director of marketing  believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a solid opportunity to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.

Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the team needs to beer understand how annual members and casual riders dier, why casual riders would buy a membership, and how digital media could aect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends

# ASK
Bussiness Task: **How do annual members and casual riders use Cyclistic bikes differently?**

# Prepare
**1.  Where is your data located?**

The dataset is located in Kaggle 
https://www.kaggle.com/datasets/godofoutcasts/cyclistic-bike-share-2023

**2. How is the data organized?**

The dataset has 12 csv file(Jan to Dec)

**3. Are there issues with bias or credibility in this data? Does your data ROCCC?**

The data is first-party data collected by Cyclist so there is low chance of bias but due to it being the companies own data the credibility is very high. The data also does ROCCC as it is reliable, original, comprehensive, current, and cited.

**4. How are you addressing licensing, privacy, security, and accessibility?**

The data is open source and provided by the company it is however covered by their license.The data also does not include any personal information regarding the riders to protect their privacy.

**5. How did you verify the data’s integrity?**

The data was examined and all of them are consistent regarding the columns  and the data types are consistent throughout.


In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
data1=pd.read_csv('/kaggle/input/cyclistic-bike-share-2023/202301-divvy-tripdata.csv')
data2=pd.read_csv('/kaggle/input/cyclistic-bike-share-2023/202302-divvy-tripdata.csv')
data3=pd.read_csv('/kaggle/input/cyclistic-bike-share-2023/202303-divvy-tripdata.csv')
data4=pd.read_csv('/kaggle/input/cyclistic-bike-share-2023/202304_divvy_tripdata.csv')
data5=pd.read_csv('/kaggle/input/cyclistic-bike-share-2023/202205-divvy-tripdata.csv')
data6=pd.read_csv('/kaggle/input/cyclistic-bike-share-2023/202206-divvy-tripdata.csv')
data7=pd.read_csv('/kaggle/input/cyclistic-bike-share-2023/202207-divvy-tripdata.csv')
data8=pd.read_csv('/kaggle/input/cyclistic-bike-share-2023/202208-divvy-tripdata.csv')
data9=pd.read_csv('/kaggle/input/cyclistic-bike-share-2023/202209-divvy-publictripdata.csv')
data10=pd.read_csv('/kaggle/input/cyclistic-bike-share-2023/202210-divvy-tripdata.csv')
data11=pd.read_csv('/kaggle/input/cyclistic-bike-share-2023/202211-divvy-tripdata.csv')
data12=pd.read_csv('/kaggle/input/cyclistic-bike-share-2023/202212-divvy-tripdata.csv')

In [None]:
data2.head()

In [None]:
df=pd.concat([data1,data2,data3,data4,data5,data6,data7,data8,data9,data10,data11,data12,])

In [None]:
df.info()

# Process

In [None]:
df.isna().sum()

In [None]:
df.shape

In [None]:
((df.isna().sum())/5859061)*100

In [None]:
df.dropna(axis=0, inplace=True)

In [None]:
df['started_at']=pd.to_datetime(df['started_at'])
df['ended_at']=pd.to_datetime(df['ended_at'])

In [None]:
df.info()

In [None]:
labels = 'Annual members', 'Casual riders'
colors = ['#e7f56a', '#75b2ce']
members = df['member_casual'].value_counts()

#Plot a pie chart showing the number of riders
plt.pie(members, labels=labels, autopct='%1.1f%%', startangle=140, colors=colors)
plt.title('The number of casual & annual memberships', fontsize = 12, fontweight="bold")
plt.show()
print(df['member_casual'].value_counts())

In [None]:
df['Hour'] = df.started_at.apply(lambda x: x.hour)
df['Day'] = df.started_at.apply(lambda x: x.day_name())
df['Month'] = df.started_at.apply(lambda x: x.month)

In [None]:
import datetime as datetime
from datetime import timedelta

In [None]:
 df['Total_Ride_Time'] = (df['ended_at'] - df['started_at'])

In [None]:
df['Total_Ride_Time'] = (df['Total_Ride_Time'])/timedelta(minutes=1)

In [None]:
df['Total_Ride_Time'] = (df['Total_Ride_Time']).round(decimals = 1)

In [None]:
df.head()

In [None]:
df['Lat']=df['start_lat']-df['end_lat']

In [None]:
df['Lng']=df['start_lng']-df['end_lng']

In [None]:
import math
df['distance']=np.sqrt((df['Lat']**2)+(df['Lng']**2))

In [None]:
df['distance']=df['distance']*111

In [None]:
df['distance']=df['distance'].round(decimals=2)

In [None]:
df.head()

In [None]:
month = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June', 7:'July', 8:'August', 9:'September', 10:'October', 11:'November', 12:'December'}

In [None]:
df['Month']=df['Month'].map(month)

In [None]:
sns.set_style('whitegrid')

In [None]:
plt.figure(figsize=(8,6))
sns.barplot(x='member_casual', y='distance', data=df, palette='viridis')

In [None]:

plt.figure(figsize=(8,6))
sns.barplot(x='member_casual', y='Total_Ride_Time', data=df, palette='viridis')

In [None]:
plt.figure(figsize=(10,6))
order = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
sns.countplot(x='Day', hue='member_casual', data=df, palette='viridis', order=order)
plt.tight_layout()

In [None]:
plt.figure(figsize=(12,6))
order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November','December']
sns.countplot(x='Month', hue='member_casual', data=df, palette='ocean', order=order)
plt.tight_layout()

In [None]:
sns.countplot(data=df, 
              x="rideable_type", 
              hue="member_casual")
plt.title("The type of bicycles were ride",fontweight="bold")
plt.ylabel('Number of riders')
plt.show()

In [None]:
member_start = df[df['member_casual'] == 'member']['start_station_name'].value_counts().head(10)
casual_start = df[df['member_casual'] == 'casual']['start_station_name'].value_counts().head(10)
member_destination = df[df['member_casual'] == 'member']['end_station_name'].value_counts().head(10)
casual_destination = df[df['member_casual'] == 'casual']['end_station_name'].value_counts().head(10)


In [None]:

# Assuming member_start is already computed
member_start.plot(kind='bar', figsize=(10, 6), color='skyblue')

plt.title('Number of Starts from Each Station for Members')
plt.xlabel('Start Station')
plt.ylabel('Count')
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.tight_layout()  # Adjust layout to prevent clipping of labels

plt.show()

In [None]:
# Assuming member_start is already computed
member_destination.plot(kind='bar', figsize=(10, 6), color='skyblue')

plt.title(' Top 10 destination for Members')
plt.xlabel('End_Station')
plt.ylabel('Count')
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.tight_layout()  # Adjust layout to prevent clipping of labels

plt.show()

In [None]:
# Assuming member_start is already computed
casual_start.plot(kind='bar', figsize=(10, 6), color='skyblue')

plt.title(' Top 10 start station for Members')
plt.xlabel('start_Station')
plt.ylabel('Count')
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.tight_layout()  # Adjust layout to prevent clipping of labels

plt.show()
print(casual_start)

In [None]:
# Assuming member_start is already computed
casual_destination.plot(kind='bar', figsize=(10, 6), color='skyblue')

plt.title(' Top 10 destination for Casuals')
plt.xlabel('End_Station')
plt.ylabel('Count')
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.tight_layout()  # Adjust layout to prevent clipping of labels

plt.show()

# Share
# Obeservations
1. The members are using the bike more on weekdays for shorter trips and the frequency is bit low
2. The casuals are using their bike more on weekends for longer trips and the frequency is high
3. May-August has the highest usuage
4. The Casual's top 10 start stations are Streeter Dr & Grand Ave,                DuSable Lake Shore Dr & Monroe St, Michigan Ave & Oak St, Millennium Park. DuSable Lake Shore Dr & North Blvd  
5. Classic bike is more popular 

# Act
# Reccomendations
1. Give summer plan packages and weekend ddiscounts to persuade the casuals to get memberships
2. An ad campaign at the popular start stations for casual riders can increase engagement or interest in memberships
3. Educate them with the perks of having memmembership to long trips