# Case Study 1: How Does a Bike-Share Navigate Speedy Success?

## Scenario
You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director
of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore,
your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights,
your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives
must approve your recommendations, so they must be backed up with compelling data insights and professional data
visualizations.

### Business Task
The objective is to maximize the number of annual memberships by converting casual riders. 
Answering the following questions would give insights to possible ways that can be done.

1. How do annual members and casual riders use Cyclistic bikes differently?
2. Why would casual riders buy Cyclistic annual memberships?
3. How can Cyclistic use digital media to influence casual riders to become members?

### All about the Dataset
[Cyclistic bikes Dataset](https://divvy-tripdata.s3.amazonaws.com/index.html)  
The datasets have a different name because Cyclistic is a fictional company.  
The data is organised per month for every year period in zipped files.  
I will be using a 12 month period between 2021-10 to 2022-09 for this project.
The data is made available by Motivate International Inc. [License](https://www.divvybikes.com/data-license-agreement)  
The dataset is current and downloaded on a local device for backup and usage.

### Cleaning the Dataset and Analyzing
* PostgreSQL server: created a database for the project with all data in one table using the import function on pgAdmin.    
* cleaned the dataset with SQL, dropped columns start_lng,end_lng, start_lat,end_lat as they are not needed for the analysis, removed null values.
* connected PostgreSQL server to Power BI for visualization and reporting

#### Import Libraries

In [37]:
# pip install session info(to generate a requirements.txt file)
# pip install ipython-sql
#import psycopg2
import os
import session_info
#import pandas as pd
#from sqlalchemy import create_engine

# loads ipython-sql 
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [3]:
# this shows the packages required for the project.
session_info.show()

#### Establish a connection to database 

In [39]:
# Format to connect to postgreSQL with ipython
# %sql dialect+driver://username:password@host:port/database

%sql postgresql://***:****@localhost/cyclistic_bikes
print("Database connection established")

Database connection established


#### Query Database

In [45]:
%%sql
SELECT count(*)
    FROM trip_data;

 * postgresql://postgres:***@localhost/cyclistic_bikes
1 rows affected.


count
4431356


In [41]:
%%sql 
SELECT *
    FROM trip_data
    LIMIT 2;

 * postgresql://postgres:***@localhost/cyclistic_bikes
2 rows affected.


ride_id,rideable_type,started_at,ended_at,start_station_name,end_station_name,member_casual,duration
BEB2AFF259E60953,classic_bike,2022-08-27 07:32:02,2022-08-27 07:41:34,Morgan St & 18th St,Loomis St & Lexington St,member,9.5
1B7E975DFE3FCA86,classic_bike,2022-06-29 20:14:00,2022-06-29 20:21:00,Wabash Ave & Roosevelt Rd,Calumet Ave & 18th St,member,12.0


Remove redundant columns from the table 

In [None]:
%%sql
DELETE FROM trip_data
	WHERE end_station_name IS NULL;

In [None]:
%%sql	
ALTER TABLE trip_data
	DROP COLUMN end_station_id;

In [None]:
%%sql
ALTER TABLE trip_data
	ADD COLUMN duration DECIMAL(6,1);

Create a new column duration extracted from the date time columns on the table 

In [None]:
%%sql
UPDATE trip_data
	SET duration = subquery.duration 
	FROM (SELECT started_at,((EXTRACT(EPOCH FROM (ended_at - started_at)))/60) AS duration FROM trip_data) AS subquery
	WHERE trip_data.started_at = subquery.started_at;

In [None]:
DELETE FROM trip_data
	WHERE duration <=0;

##### Total Number of Riders Per Station

In [35]:
%%sql 
SELECT start_station_name AS "STATION NAME",
    COUNT(CASE WHEN member_casual = 'member' THEN 1 ELSE NULL END) AS "MEMBERS PER STATION",
    COUNT(CASE WHEN member_casual = 'casual' THEN 1 ELSE NULL END) AS "CASUALS PER STATION"
    FROM trip_data_2021_10
    GROUP BY 1
    ORDER BY 3 DESC
    LIMIT 3;

 * postgresql://postgres:***@localhost/cyclistic_bikes
3 rows affected.


STATION NAME,MEMBERS PER STATION,CASUALS PER STATION
Streeter Dr & Grand Ave,17164,58424
DuSable Lake Shore Dr & Monroe St,9589,32183
Millennium Park,9580,26900


In [36]:
%%sql
SELECT rideable_type AS "TYPE OF RIDE USED",
    COUNT(CASE WHEN member_casual = 'member' THEN 1 ELSE NULL END) AS "MEMBERS",
    COUNT(CASE WHEN member_casual = 'casual' THEN 1 ELSE NULL END) AS "CASUALS"
    FROM trip_data_2021_10    
    GROUP BY rideable_type

 * postgresql://postgres:***@localhost/cyclistic_bikes
3 rows affected.


TYPE OF RIDE USED,MEMBERS,CASUALS
classic_bike,1793577,939374
docked_bike,0,192018
electric_bike,1081649,870021


####  Connected Cleanded Data to Microsioft Power BI for visualization and Report 

In [None]:
casual = IF('public trip_data'[member_casual] = "casual" , 1,0) 