Cyclistic's marketing strategy was to build general awareness and appeal to broader consumer segments. The introduction of flexible pricing plans was one approach that helped make these things possible. Pricing plans ranged from single-ride and full-day passes to annual memberships. Customers purchasing single-ride or full-day passes are called casual riders. Customers buying annual memberships are called Cyclistic members.

Although the pricing flexibility helps Cyclistic attract more customers than usual, Lily Moreno, the marketing director, believes that expanding the number of annual members will be critical to future growth. Cyclistic's finance analysts have also concluded that annual members are much more profitable than casual riders. Rather than creating a marketing campaign targeting all-new customers, Moreno believes there is a good chance to convert casual riders into members. She notes that casual riders already know the Cyclistic program and have chosen Cyclistic for their mobility needs.

# ASK

Moreno has tasked the marketing analytics team to design effective strategies to maximize the number of annual memberships by converting casual riders of Cyclistic into yearly members. Three questions will guide the marketing analytics team:

1.How do annual members and casual riders differ regarding Cyclistic bike usage?

2.Why would casual riders buy the Cyclistic annual membership?

3.How can digital media influence casual riders of Cyclistic to become annual members?

# Description

Only data from the previous twelve months, i.e., from AUGUST 2022 to JULY 2023, will be used for this project.

After downloading the source zip files, the first step is to create a folder with appropriate file names. Each dataset was a CSV (comma-separated values) file. However, the preferred format for data imports and exports is UTF-8 (Unicode Transformation Format 8-bit) encoded CSV format. This encoding supports many special characters, like hieroglyphs and accented characters, and is backward compatible with ASCII. Therefore, we will save each file as a CSV UTF-8 (comma delimited) file in a separate folder. Placing the original dataset and the files we will use in different folders is a cautionary measure not to mix up datasets.

# Credibility of data

The credibility and integrity of our data can be determined using the ROCCC system.

The data is reliable — it has a large sample size, reflecting the population size.

The data is original — we can locate the primary source.

The data is comprehensive — it is understandable and does not contain any missing critical information needed to answer the question or find the solution, nor does it have human error.

The data is current — it is relevant and up to date, thus indicating that the source refreshes its data regularly.

The data is cited — the source has been vetted.

The data integrity and credibility are sufficient to provide reliable and comprehensive insights for the team's analysis.

# Limitations 

Data privacy issues prohibit using riders' personally identifiable information such as gender and age. Unfortunately, this means we cannot connect pass purchases to credit card numbers and determine a few identifiers, e.g., if casual riders live in areas with Cyclistic services available or if they have purchased multiple single passes.

In [1]:
import pandas as pd 
import numpy as np 

In [2]:
df=pd.concat([pd.read_csv(r'E:\divvy trips\202208-divvy-tripdata.csv'),
              pd.read_csv(r'E:\divvy trips\202209-divvy-publictripdata.csv'),
              pd.read_csv(r'E:\divvy trips\202210-divvy-tripdata.csv'),
              pd.read_csv(r'E:\divvy trips\202211-divvy-tripdata.csv'),
              pd.read_csv(r'E:\divvy trips\202212-divvy-tripdata.csv'),
              pd.read_csv(r'E:\divvy trips\202301-divvy-tripdata.csv'),
              pd.read_csv(r'E:\divvy trips\202302-divvy-tripdata.csv'),
              pd.read_csv(r'E:\divvy trips\202303-divvy-tripdata.csv'),
              pd.read_csv(r'E:\divvy trips\202304-divvy-tripdata.csv'),
              pd.read_csv(r'E:\divvy trips\202305-divvy-tripdata.csv'),
              pd.read_csv(r'E:\divvy trips\202306-divvy-tripdata.csv'),
              pd.read_csv(r'E:\divvy trips\202307-divvy-tripdata.csv')
             ])

In [3]:
df.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,550CF7EFEAE0C618,electric_bike,2022-08-07 21:34:15,2022-08-07 21:41:46,,,,,41.93,-87.69,41.94,-87.72,casual
1,DAD198F405F9C5F5,electric_bike,2022-08-08 14:39:21,2022-08-08 14:53:23,,,,,41.89,-87.64,41.92,-87.64,casual
2,E6F2BC47B65CB7FD,electric_bike,2022-08-08 15:29:50,2022-08-08 15:40:34,,,,,41.97,-87.69,41.97,-87.66,casual
3,F597830181C2E13C,electric_bike,2022-08-08 02:43:50,2022-08-08 02:58:53,,,,,41.94,-87.65,41.97,-87.69,casual
4,0CE689BB4E313E8D,electric_bike,2022-08-07 20:24:06,2022-08-07 20:29:58,,,,,41.85,-87.65,41.84,-87.66,casual


In [4]:
df.shape

(5723606, 13)

In [5]:
df.isnull().sum()

ride_id                    0
rideable_type              0
started_at                 0
ended_at                   0
start_station_name    868772
start_station_id      868904
end_station_name      925008
end_station_id        925149
start_lat                  0
start_lng                  0
end_lat                 6102
end_lng                 6102
member_casual              0
dtype: int64

In [6]:
df.duplicated().sum()

0

# Prepare

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5723606 entries, 0 to 767649
Data columns (total 13 columns):
 #   Column              Dtype  
---  ------              -----  
 0   ride_id             object 
 1   rideable_type       object 
 2   started_at          object 
 3   ended_at            object 
 4   start_station_name  object 
 5   start_station_id    object 
 6   end_station_name    object 
 7   end_station_id      object 
 8   start_lat           float64
 9   start_lng           float64
 10  end_lat             float64
 11  end_lng             float64
 12  member_casual       object 
dtypes: float64(4), object(9)
memory usage: 611.3+ MB


In [8]:
df['started_at'] = pd. to_datetime(df['started_at'])
df['ended_at']= pd. to_datetime(df['ended_at'])
print(df.info())
print("data type changed successfully")
print('\n')

<class 'pandas.core.frame.DataFrame'>
Index: 5723606 entries, 0 to 767649
Data columns (total 13 columns):
 #   Column              Dtype         
---  ------              -----         
 0   ride_id             object        
 1   rideable_type       object        
 2   started_at          datetime64[ns]
 3   ended_at            datetime64[ns]
 4   start_station_name  object        
 5   start_station_id    object        
 6   end_station_name    object        
 7   end_station_id      object        
 8   start_lat           float64       
 9   start_lng           float64       
 10  end_lat             float64       
 11  end_lng             float64       
 12  member_casual       object        
dtypes: datetime64[ns](2), float64(4), object(7)
memory usage: 611.3+ MB
None
data type changed successfully




# Process / Extracting Feature 

new column in which assigned calculated ride length values

In [9]:
df['ride_length'] = (df['ended_at'] - df['started_at'])/pd.Timedelta(minutes=1)
df['ride_length'] = df['ride_length'].astype('int64')

In [10]:
df.sort_values(by= 'ride_length')

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual,ride_length
417795,E137518FFE807752,electric_bike,2022-09-28 11:04:32,2022-09-21 06:31:11,Cornell Dr & Hayes Dr,653,,,41.780576,-87.585171,41.780000,-87.590000,member,-10353
154251,918F745F62CAC29E,classic_bike,2022-10-13 14:42:10,2022-10-13 11:53:28,Wilton Ave & Diversey Pkwy*,chargingstx0,Wilton Ave & Diversey Pkwy*,chargingstx0,41.932418,-87.652705,41.932418,-87.652705,member,-168
367827,8B6E5BA70093AAB7,electric_bike,2023-06-02 19:29:06,2023-06-02 18:28:51,,,Calumet Ave & 18th St,13102,41.860000,-87.620000,41.857618,-87.619411,casual,-60
174725,1BA46F9F216F5E17,electric_bike,2022-11-06 01:58:11,2022-11-06 01:00:12,Pine Grove Ave & Waveland Ave,TA1307000150,Broadway & Cornelia Ave,13278,41.949382,-87.646576,41.945529,-87.646439,casual,-57
246644,B5602D5BB3D517F6,electric_bike,2022-11-06 01:59:05,2022-11-06 01:02:03,Western Ave & Winnebago Ave,13068,California Ave & Milwaukee Ave,13084,41.915592,-87.687070,41.922695,-87.697153,member,-57
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
605139,F802275985D8FF81,docked_bike,2023-07-05 09:31:29,2023-08-02 04:56:32,McCormick Place,TA1305000004,,,41.851375,-87.618835,,,casual,40045
405002,7D4CB0DD5137CA9A,docked_bike,2022-10-01 15:04:38,2022-10-30 08:51:53,St. Louis Ave & Fullerton Ave,KA1504000090,,,41.924816,-87.714495,,,casual,41387
640106,F354C5AABB811F6D,docked_bike,2023-07-07 14:38:56,2023-08-06 04:07:41,Field Museum,13029,,,41.865312,-87.617867,,,casual,42568
614958,15B627A79323EEA0,docked_bike,2023-07-01 13:38:11,2023-07-31 04:06:43,Millennium Park,13008,,,41.881032,-87.624084,,,casual,42628


there are some negative values in ride_length 
length should not be less than 1 minute it is potentially false
so negative values removed which is has less than 1 minute


In [11]:
df1 = df[(df['ride_length'] < 1) | (df['ride_length'] > 480)]
df1 = df1.index
df = df.drop(df1, axis = 'index')
print(df.sort_values(by = 'ride_length'))
print('\n')
print('inconsistent values are deleted successfully')

                 ride_id  rideable_type          started_at  \
365132  7CA02490118563BB  electric_bike 2022-10-07 23:53:46   
246156  3547DF4B0E5C0EFB  electric_bike 2022-10-04 19:37:27   
644007  B9A5E89761413DF2   classic_bike 2023-06-30 15:08:04   
112724  F28C2254EC1323C1  electric_bike 2022-08-16 06:52:41   
371373  39AA7F2A64C21EDD   classic_bike 2023-05-07 12:45:45   
...                  ...            ...                 ...   
351590  88CB7FF29B13C73C  electric_bike 2022-08-10 11:50:35   
762983  3391636DF1BEAD02  electric_bike 2022-08-22 12:49:42   
192929  14701F05474937FC  electric_bike 2023-04-24 18:40:03   
425156  DAE15AFB990DBC2B  electric_bike 2022-10-04 11:14:24   
290873  F86271658A29F70E   classic_bike 2022-10-09 23:26:13   

                  ended_at             start_station_name start_station_id  \
365132 2022-10-07 23:54:51      Damen Ave & Charleston St            13288   
246156 2022-10-04 19:39:19                            NaN              NaN   
644007 20

In [12]:
from math import radians, cos, sin, asin, sqrt

In [13]:
def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the Earth (specified in decimal degrees)
    """
    # Convert decimal degrees to radians
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # Haversine formula
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(np.sqrt(a)) 
    r = 6371 # Radius of Earth in kilometers
    return c * r
df['trip_distance'] = df.apply(lambda x: haversine(x['start_lng'], x['start_lat'], x['end_lng'], x['end_lat']), axis=1)
print(df.filter(items = ['trip_distance']).head(100))
print('\n')
print('distance calculated')

     trip_distance
0         2.719286
1         3.335848
3         4.697735
4         1.386576
5         4.160656
..             ...
172       1.111949
173       4.447797
175       1.386808
176       1.386576
178       4.282782

[100 rows x 1 columns]


distance calculated


Finding hour of day and assigning them in a new column

In [14]:
df['start_hour'] = df['started_at'].dt.hour
print(df.filter(items = ['start_hour']))
print(df.sort_values(by = 'start_hour'))

        start_hour
0               21
1               14
3                2
4               20
5               13
...            ...
767644          15
767645          19
767647          13
767648          20
767649          18

[4514090 rows x 1 columns]
                 ride_id  rideable_type          started_at  \
333813  EE1E5CDBB2A8E23B  electric_bike 2022-08-01 00:08:57   
347817  2FC1C8911BC66745   classic_bike 2023-06-19 00:12:16   
131896  4DE36FB8ACA539A0   classic_bike 2023-06-22 00:02:52   
233982  CB49493C89625FCF  electric_bike 2022-11-20 00:12:18   
331095  9469B2D12614455D  electric_bike 2023-07-15 00:19:02   
...                  ...            ...                 ...   
474988  C43D2B5B87D1ADF3  electric_bike 2022-08-15 23:28:02   
509361  9D990E14234D32D5   classic_bike 2023-06-19 23:36:51   
217100  9FE587DD5515E004  electric_bike 2022-10-10 23:40:12   
474883  DED095A028364764  electric_bike 2022-08-27 23:41:06   
502588  043252E189B4E809   classic_bike 2023-06-23 

Finding day of week and assigning  in a new column

In [15]:
df['start_day'] = df['started_at'].dt.dayofweek

In [16]:
day = {6: 'sunday', 0: 'monday', 1: 'tuesday', 2: 'wednesday', 3: 'thursday', 4: 'friday', 5: 'saturday'}

def map_day(x):
    try:
        return day[x]
    except KeyError:
        return "Invalid Day"  # Handle unexpected values

df['start_day'] = df['start_day'].apply(map_day)

month of year and assigning them in a new column

In [17]:
df['month_num'] = df['started_at'].dt.month
month = {1:'january', 2:'february', 3:'march', 4:'april', 5:'may', 6:'june', 7:'july', 8:'august', 9:'september', 10:'october', 11:'november', 12:'december'}
df['start_month'] = df['month_num'].apply(lambda y:month[y])

In [18]:
df.isnull().sum()

ride_id                    0
rideable_type              0
started_at                 0
ended_at                   0
start_station_name    681467
start_station_id      681562
end_station_name      731222
end_station_id        731332
start_lat                  0
start_lng                  0
end_lat                  455
end_lng                  455
member_casual              0
ride_length                0
trip_distance            455
start_hour                 0
start_day                  0
month_num                  0
start_month                0
dtype: int64

In [19]:
df.dropna(inplace=True)

In [20]:
df.shape

(3411039, 19)

In [21]:
df.isnull().sum()

ride_id               0
rideable_type         0
started_at            0
ended_at              0
start_station_name    0
start_station_id      0
end_station_name      0
end_station_id        0
start_lat             0
start_lng             0
end_lat               0
end_lng               0
member_casual         0
ride_length           0
trip_distance         0
start_hour            0
start_day             0
month_num             0
start_month           0
dtype: int64

# Analyze and Visualize

In [22]:
import matplotlib.pyplot as plt
import seaborn as sns 

import plotly.express as ps


Finding member and casual riders total number of rides

In [23]:
df.sample(2)

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual,ride_length,trip_distance,start_hour,start_day,month_num,start_month
52472,85FFA4444AC2F036,electric_bike,2022-11-21 17:56:05,2022-11-21 17:59:48,Greenview Ave & Fullerton Ave,TA1307000001,Racine Ave & Fullerton Ave (Temp),TA1306000026,41.92544,-87.665882,41.925194,-87.657259,member,3,0.713922,17,monday,11,november
184087,58F94BD1CB470FCB,electric_bike,2022-10-06 08:53:53,2022-10-06 08:58:27,N Green St & W Lake St,20246.0,Franklin St & Lake St,TA1307000111,41.89,-87.65,41.885837,-87.6355,casual,4,1.286468,8,thursday,10,october


In [24]:
rides = df.groupby(['member_casual'])
rides = rides['ride_id'].count()
print(rides)

member_casual
casual    1278122
member    2132917
Name: ride_id, dtype: int64


In [25]:
fig= ps.pie(df,values=rides,names=rides.index,title='Rides Count')

In [26]:
fig.show()

Types of bikes used by member and casual riders

In [27]:
bike_types = df.groupby(['member_casual', 'rideable_type'])
bike_types = bike_types['ride_id'].count().reset_index().set_index('member_casual')
print(bike_types)

               rideable_type  ride_id
member_casual                        
casual          classic_bike   629047
casual           docked_bike   100465
casual         electric_bike   548610
member          classic_bike  1331993
member         electric_bike   800924


In [28]:
data = pd.DataFrame([['casual', 'classic bike', 629047], ['casual', 'docked bike', 100465], ['casual', 'electric bike', 548610],
                    ['member', 'classic bike', 1331993], ['member', 'docked bike', 0], ['member', 'electric bike', 800924]],
                    columns = ['member_casual', 'rideable_type', 'total_rides'])
fig2= ps.bar(data, x = 'rideable_type', y = 'total_rides', color = 'member_casual', barmode = 'group', 
             title = 'bike types',
             color_discrete_map = {'casual': '#FF934F', 'member': '#058ED9'})
fig2.show()

Total trips taken in each month

In [30]:
month = df.groupby(['member_casual', 'start_month','month_num'])
month = month['ride_id'].count().reset_index().sort_values(by='month_num')
print(month)

   member_casual start_month  month_num  ride_id
4         casual     january          1    21257
16        member     january          1    85156
3         casual    february          2    23436
15        member    february          2    83837
7         casual       march          3    34667
19        member       march          3   113242
0         casual       april          4    87496
12        member       april          4   168708
20        member         may          5   225989
8         casual         may          5   141341
6         casual        june          6   175855
18        member        june          6   250383
5         casual        july          7   196037
17        member        july          7   260659
1         casual      august          8   221615
13        member      august          8   274417
11        casual   september          9   179421
23        member   september          9   254019
22        member     october         10   205832
10        casual    

In [31]:
data1 = pd.DataFrame([['casual','january',21257],['member', 'january', 85156],['casual','february', 23436],
['member','february',83837],['casual',  'march',34667],['member', 'march',113242],
['casual','april',87496],['member','april',168708],['member',  'may',225989],
['casual',  'may',141341],['casual' , 'june',175855],
['member' , 'june',250383],['casual','july',196037],
['member','july',260659],['casual','august',221615],
['member','august',274417],['casual','september',179421],
['member','september',254019],['member', 'october',205832],
['casual','october',119046],['casual','november',55443],
['member','november',137052],['casual','december',22508],
['member','december',73623]],
                     columns = ['member_casual', 'start_month', 'total_rides'])
ps.line(data1, x = 'start_month', y = 'total_rides', color = 'member_casual', line_shape = 'spline', markers = True, title = 'Trips taken in each month') 

Total trips taken in each day of week

In [32]:
week = df.groupby(['member_casual', 'start_day'])
week = week['ride_id'].count()
print(week)

member_casual  start_day
casual         friday       194476
               monday       152321
               saturday     261383
               sunday       199094
               thursday     168352
               tuesday      149273
               wednesday    153223
member         friday       305780
               monday       307277
               saturday     268456
               sunday       228233
               thursday     342732
               tuesday      337280
               wednesday    343159
Name: ride_id, dtype: int64


In [33]:
data2 = pd.DataFrame([['casual', 'friday', 267455], ['casual', 'monday', 211560], ['casual', 'saturday', 431625], ['casual', 'sunday', 373308], ['casual', 'thursday', 207361], ['casual', 'tuesday', 198386], ['casual', 'wednesday', 201877],
                     ['member', 'friday', 335593], ['member', 'monday', 318276], ['member', 'saturday', 327503], ['member', 'sunday', 285963], ['member', 'thursday', 343389], ['member', 'tuesday', 356881], ['member', 'wednesday', 365644]],
                     columns = ['member_casual', 'start_day', 'total_rides'])

ps.line(data2, x = 'start_day', y = 'total_rides', color = 'member_casual', 
        markers = True, line_shape = 'spline', title = 'Trips taken in each day of week')

Total trips taken in a Hour

In [35]:
hour = df.groupby(['member_casual', 'start_hour'])
hour = hour['ride_id'].count()
data3 = pd.DataFrame([['casual', 0,39096], ['casual', 1, 28361], ['casual',2, 18104], ['casual', 3, 9449], ['casual', 4, 6188], ['casual', 5, 8167],['casual',6, 18037], ['casual', 7, 33490], ['casual', 8, 46214], ['casual', 9, 56502], ['casual',10,79253], ['casual',11, 104506], ['casual',12,124007], ['casual',13,131971], ['casual',14,135617], ['casual',15,142294], ['casual',16,155766], ['casual',17,181878], ['casual',18,163659], ['casual',19,125313], ['casual',20,90705], ['casual',21,76471], ['casual',22,69762], ['casual',23,46762],
                     ['member',0,23241], ['member',1,15074], ['member',2,8257], ['member',3,4697], ['member',4,5536], ['member',5,23494], ['member',6,65856], ['member',7,119851], ['member',8,137749], ['member',9,99558], ['member',10,96930], ['member',11,116700], ['member',12,135055], ['member',13,132494], ['member',14,130312], ['member',15,151456], ['member',16,199110], ['member',17,251895], ['member',18,210362], ['member',19,148667], ['member',20,97901], ['member',21,71146], ['member',22,53803], ['member' ,23,34105]],
                     columns = ['member_casual', 'start_hour', 'rides_hour'])

ps.line(data3, x = 'start_hour', y = 'rides_hour', color = 'member_casual',
        markers = True, title = 'Trips taken in a day', line_shape = 'spline')


Average distance traveled and duration of the ride across the year

In [36]:
duration = df.groupby(['member_casual'])
duration = round(duration['ride_length'].mean(), 2)
print("average duration:", duration)
print('\n')

distance = df.groupby(['member_casual'])
distance = round(distance['trip_distance'].mean(), 2)
print("average distance:", distance)

average duration: member_casual
casual    20.99
member    11.62
Name: ride_length, dtype: float64


average distance: member_casual
casual    2.17
member    2.09
Name: trip_distance, dtype: float64


In [39]:
data4 = pd.DataFrame([['casual', 26.67], ['member', 13.13]], columns = ['member_casual', 'ride_length'])

fig3 = ps.bar(data4, x = 'ride_length', y = 'member_casual', color = 'member_casual', title = 'Average ride Duration',
           color_discrete_map = {'casual': '#FF934F', 'member': '#058ED9'})

fig3.show()    

data5 = pd.DataFrame([['casual', 2.19], ['member', 2.11]], columns = ['member_casual', 'distance'])

fig4 = ps.bar(data5, x = 'distance', y = 'member_casual', color = 'member_casual', title = 'Average distance traveled',
           color_discrete_map = {'casual': '#FF934F', 'member': '#058ED9'})

fig4.show()

Average distance traveled and duration of the ride across the week

In [40]:
duration_day = df.groupby(['member_casual', 'start_day'])
duration_day = round(duration_day['ride_length'].mean(), 2)
print("average duration in a week:", duration_day)

average duration in a week: member_casual  start_day
casual         friday       20.38
               monday       20.81
               saturday     23.85
               sunday       24.00
               thursday     18.37
               tuesday      19.10
               wednesday    17.90
member         friday       11.52
               monday       11.07
               saturday     13.13
               sunday       12.95
               thursday     11.21
               tuesday      11.10
               wednesday    11.08
Name: ride_length, dtype: float64


In [42]:
data6 = pd.DataFrame([['casual', 'friday', 24.58], ['casual', 'monday', 27.26], ['casual', 'saturday', 28.76], ['casual', 'sunday', 30.82], ['casual', 'thursday', 22.73], ['casual', 'tuesday', 24.15], ['casual', 'wednesday', 23.19],
                     ['member', 'friday', 12.76], ['member', 'monday', 12.69], ['member', 'saturday', 14.74], ['member', 'sunday', 15.10], ['member', 'thursday', 12.31], ['member', 'tuesday', 12.36], ['member', 'wednesday', 12.42]],
                     columns = ['member_casual', 'start_day', 'ride_length'])
fig5 = ps.bar(data6, x = 'start_day', y = 'ride_length', color = 'member_casual', title = 'Average ride Duration in a week',
              barmode = 'group')
fig5.show()

In [44]:
distance_day = df.groupby(['member_casual', 'start_day'])
distance_day = round(distance_day['trip_distance'].mean(), 2)
print("average distance in a week:", distance_day)

average distance in a week: member_casual  start_day
casual         friday       2.15
               monday       2.07
               saturday     2.28
               sunday       2.21
               thursday     2.17
               tuesday      2.09
               wednesday    2.14
member         friday       2.04
               monday       2.02
               saturday     2.18
               sunday       2.13
               thursday     2.09
               tuesday      2.08
               wednesday    2.11
Name: trip_distance, dtype: float64


In [46]:
data7 = pd.DataFrame([['casual', 'friday', 2.18], ['casual', 'monday', 2.08], ['casual', 'saturday', 2.30], ['casual', 'sunday', 2.26], ['casual', 'thursday', 2.14], ['casual', 'tuesday', 2.11], ['casual', 'wednesday', 2.13],
                     ['member', 'friday', 2.07], ['member', 'monday', 2.06], ['member', 'saturday', 2.21], ['member', 'sunday', 2.21], ['member', 'thursday', 2.07], ['member', 'tuesday', 2.07], ['member', 'wednesday', 2.09]],
                     columns = ['member_casual', 'start_day', 'distance'])

fig6 = ps.bar(data7, x = 'start_day', y = 'distance', color = 'member_casual', 
              title = 'Average distance traveled in a week',
              barmode = 'group', color_discrete_map = {'casual': '#FF934F', 'member': '#058ED9'})

fig6.show()

# Observation

Cyclistic members have more trips compared to casual riders from aug 2022 to Jul 2023.

Casual riders have a higher mean trip duration and mean distance traveled than Cyclistic members over any given day of the week. One can derive from this that most Cyclistic members prefer bike-share services for more purpose-oriented rides such as work or errands. In contrast, casual riders might use them for fun, leisurely activities, and exercise.

Both users recorded the most activity in early morning. One can derive from this that both users tend to take more rides as the weather becomes pleasant. However, Cyclistic members have also recorded the most activity at 8 AM, indicating the use of bike-share services to commute to work.

Both users have the busiest activities over the weekends. It indicates the use of bike-share services for recreational purposes. However, casual riders have demonstrated higher activity over the weekends while Cyclistic members use them consistently throughout the week.

Both users have the highest number of trips taken in the summer season from July to September. However, it started decreasing from October to February. Therefore, one can assume this decline in bike-share services is due to weather changes as we approach dropping temperatures in weather.

# Recommendations

Introducing Loyalty membership or reward points program would be the most enticing concept which can attract users easily.

If we look at the trip duration and distance traveled by casual riders are higher than member riders so recommending annual membership by showing them their total expenditure for rides they taken so that casual riders can convert to annual membership.

As we seen that in weekend casual riders are spending much than weekdays so introducing some promotions can convert them to annual membership.

Finally we can Promote some of our members reviews and aware our casual riders why should they need an annual membership. We can also throw some meetups at different places to highlight our members and casual riders and a small redords . 

We can also offer casual riders EMI payment system if they need .
