## Case Background
As a junior data analyst working in the marketing analyst team at Cyclistic (a bike-sharing company active in Chicago), I am tasked with understanding how casual riders and annual members use Cyclistic bikes differently. Casual riders consist of customers that purchase single-ride or full-day passes, whereas annual members subscribe yearly for unlimited biking access. The marketing director theorizes that the company's future success depends on maximizing the number of yearly memberships by converting casual riders into annual members. Pending executive approval, my team will be designing a new marketing strategy that pursues this idea. 

## About the company

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.
Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.
Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.

## Business Background
Business Model:

- Product: bike-sharing geotracked and network locked bikes across Chicago
- Customer types and revenue model: members (annual subscribers) and casual riders (single-ride and full-day purchasers)
- Competitive advantages: Bicycle variety (broad consumer segments) and pricing flexibility 

Product Background:

- 5,824 bicycles and 692 docking stations
- More than 50% of riders select traditional bikes
- 8% of riders opt for the assistive bike options
- 30% of users bike to commute to work each day
- Users are more likely to ride for leisure
- Casual riders have chosen Cyclistic for their mobility needs

- Lily Moreno: The director of marketing and my manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.

- Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. I joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals , as a junior data analyst, can help Cyclistic achieve them.

- Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

##### Moreno has assigned me a question to answer: 
### How do annual members and casual riders use Cyclistic bikes differently?


In [1]:
import os
import numpy as np
import pandas as pd

### Reading all csv files of folder

In [4]:
df = pd.read_csv(r'/Users/soumyadipghorai/Downloads/bikeride/202111-divvy-tripdata.csv')

files = [file for file in os.listdir(r'/Users/soumyadipghorai/Downloads/bikeride')]

all_data = pd.DataFrame()

for file in files:
    df = pd.read_csv(r'/Users/soumyadipghorai/Downloads/bikeride/'+file)
    all_data = pd.concat([all_data,df])
    
    all_data.to_csv('Cyclist2022.csv.csv',index = False)

In [2]:
df = pd.read_csv(r'/Users/soumyadipghorai/Downloads/Cyclist2022.csv')

In [8]:
df.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,550CF7EFEAE0C618,electric_bike,2022-08-07 21:34:15,2022-08-07 21:41:46,,,,,41.93,-87.69,41.94,-87.72,casual
1,DAD198F405F9C5F5,electric_bike,2022-08-08 14:39:21,2022-08-08 14:53:23,,,,,41.89,-87.64,41.92,-87.64,casual
2,E6F2BC47B65CB7FD,electric_bike,2022-08-08 15:29:50,2022-08-08 15:40:34,,,,,41.97,-87.69,41.97,-87.66,casual
3,F597830181C2E13C,electric_bike,2022-08-08 02:43:50,2022-08-08 02:58:53,,,,,41.94,-87.65,41.97,-87.69,casual
4,0CE689BB4E313E8D,electric_bike,2022-08-07 20:24:06,2022-08-07 20:29:58,,,,,41.85,-87.65,41.84,-87.66,casual


#### Deleting NULL values for better insights

In [9]:
df[df.isna().any(axis = 1)]
df= df.dropna(how='any')

#### Converting started_at & ended_at to datetime format.So i can measure trip duration.

In [4]:
df["started_at"] = pd.to_datetime(df["started_at"])
df["ended_at"] = pd.to_datetime(df["ended_at"])

In [5]:
df['ride_length'] = df['ended_at'] - df['started_at']

In [6]:
a = df['ride_length']
df['trip_duration'] = a.astype('timedelta64[m]')

In [7]:
df['trip_duration'] = df['trip_duration'].astype('int')

### adding a columnn of weekday(starting from monday to sunday)

In [8]:
df['Day_of_week'] = df['started_at'].dt.weekday

#### Measure distance from latitude and longitude values

In [9]:
def haversine(lat1, lon1, lat2, lon2):
    """
    haversine formula determining the great-circle distance between two points on a sphere in miles
    """
    lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])

    a = np.sin((lat2-lat1)/2.0)**2 + \
        np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2

    return 3956 * 2 * np.arcsin(np.sqrt(a))

df['distance'] = haversine(df['start_lat'],df['start_lng'],df['end_lat'],df['end_lng'])

#### Viewing processed data

In [29]:
df.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual,Month,ride_length,trip_duration,Day_of_week,distance
142,241C440C74CB31BB,classic_bike,2022-08-05 16:13:36,2022-08-05 16:22:40,DuSable Museum,KA1503000075,Cottage Grove Ave & 51st St,TA1309000067,41.791568,-87.607852,41.803038,-87.606615,casual,8,0 days 00:09:04,9,4,0.794504
271,53A7590B28ED25E2,classic_bike,2022-08-11 23:30:11,2022-08-11 23:30:56,California Ave & Milwaukee Ave,13084,California Ave & Milwaukee Ave,13084,41.922695,-87.697153,41.922695,-87.697153,casual,8,0 days 00:00:45,0,3,0.0
329,C34EE790A58C0434,classic_bike,2022-08-21 14:09:08,2022-08-21 15:10:46,California Ave & Division St,13256,California Ave & Division St,13256,41.903029,-87.697474,41.903029,-87.697474,casual,8,0 days 01:01:38,61,6,0.0
357,49259B4BA064D81B,electric_bike,2022-08-21 16:15:12,2022-08-21 16:29:30,Wood St & Chicago Ave,637,Wood St & Chicago Ave,637,41.895673,-87.672075,41.895634,-87.672069,casual,8,0 days 00:14:18,14,6,0.002734
422,BEE91D557E47FE83,classic_bike,2022-08-21 02:11:26,2022-08-21 03:44:04,California Ave & Milwaukee Ave,13084,California Ave & Milwaukee Ave,13084,41.922695,-87.697153,41.922695,-87.697153,casual,8,0 days 01:32:38,92,6,0.0


#### Saving processed dataset to a csv file

In [32]:
df.to_csv('CyclistCaseStudy.csv',index = False)