![](https://media.giphy.com/media/l0HluULNylbTu44Ao/giphy.gif)
## Introduction

Take an old bicycle. Paint it white. Leave it anywhere in the city. Tell people to use it. This was the first urban bike-sharing concept in history. Launched in Amsterdam in the 1960s, it was called the Witte Fietsenplan (the “white bicycle plan”). And it was not a great success.

Here, we will identify the following:
- Which stations are the most popular?
- What are the peak hours of bike usage?
- How do holidays and events affect bike usage?
- What is the possible purpose of rentals? Casual and Subscriber?
- How do station's popularity goes over time?

Company: [Divvy](https://divvybikes.com) from Chicago<br>
Location: Chicago

The dataset is provided by [Divvy](https://divvybikes.com), compiled and cleaned by [Chris](https://github.com/ca-ros) (visit [documentation](https://github.com/ca-ros/divvy-bikeshare/blob/master/data%20wrangling/README.md)).

In [14]:
# Import required packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## PostgreSQL Integration 

In [4]:
## Install libraries
# pip install ipython-sql
# pip install sqlalchemy
# pip install psycopg2

# load ipython-sql
%load_ext sql

# Import required function
from sqlalchemy import create_engine

# string format: "postgresql://username:password@host:port/database"
engine = create_engine('postgresql://postgres:password@localhost/postgres')

print('Connected')

Connected


In [None]:
%%sql
SELECT * FROM trips LIMIT 5

 * postgresql://postgres:***@localhost/postgres
5 rows affected.


ride_id,rideable_type,bike_id,start_time,end_time,trip_duration,start_station_id,start_station_name,end_station_id,end_station_name,user_type,gender,birth_year
3940,,914,2013-06-27 01:06:00,2013-06-27 09:46:00,31177,91,Clinton St & Washington Blvd,48,Larrabee St & Kingsbury St,Subscriber,Male,1982.0
4095,,480,2013-06-27 12:06:00,2013-06-27 12:11:00,301,85,Michigan Ave & Oak St,85,Michigan Ave & Oak St,Subscriber,Male,1982.0
4113,,711,2013-06-27 11:09:00,2013-06-27 11:11:00,140,88,Racine Ave & Randolph St,88,Racine Ave & Randolph St,Subscriber,Male,1982.0
4118,,480,2013-06-27 12:11:00,2013-06-27 12:16:00,316,85,Michigan Ave & Oak St,28,Larrabee St & Menomonee St,Customer,,
4119,,711,2013-06-27 11:12:00,2013-06-27 11:13:00,87,88,Racine Ave & Randolph St,88,Racine Ave & Randolph St,Subscriber,Male,1982.0


## Overview of data

In [None]:
data = pd.read_csv("trips.csv") 

  data = pd.read_csv(


### What do we see?
- We received a warning about columns (0,1,11) having mixed data types. We will declare the data types for columns ride_id, rideable_type, and gender as string.

In [5]:
data = pd.read_csv(
    "C:/Users/Chris/Documents/GitHub/large csv files/divvy-bikeshare/trips.csv", 
    dtype = {
        'ride_id': str, 
        'rideable_type': str, 
        'gender': str})

In [31]:
def overview():
    # data = pd.read_csv("trips.csv", dtype = {'ride_id': str, 'rideable_type': str, 'gender': str})
    print("The first 5 rows of data are:\n")
    print(data.head(5))
    print("\n\n\nDataset has {} rows and {} columns".format(data.shape[0], data.shape[1]))
    print("\n\n\nDatatype: \n")
    print(data.dtypes)
    print("\n\n\nThe number of null values for each column are: \n")
    print(data.isnull().sum())
    print("\n\n\nData summary: \n")
    print(data.describe())
    return data

# Lastly, assigning a variable to overview()
data = overview()

The first 5 rows of data are:

  ride_id rideable_type  bike_id           start_time             end_time  \
0    3940           NaN    914.0  2013-06-27 01:06:00  2013-06-27 09:46:00   
1    4095           NaN    480.0  2013-06-27 12:06:00  2013-06-27 12:11:00   
2    4113           NaN    711.0  2013-06-27 11:09:00  2013-06-27 11:11:00   
3    4118           NaN    480.0  2013-06-27 12:11:00  2013-06-27 12:16:00   
4    4119           NaN    711.0  2013-06-27 11:12:00  2013-06-27 11:13:00   

   trip_duration  start_station_id            start_station_name  \
0          31177              91.0  Clinton St & Washington Blvd   
1            301              85.0         Michigan Ave & Oak St   
2            140              88.0      Racine Ave & Randolph St   
3            316              85.0         Michigan Ave & Oak St   
4             87              88.0      Racine Ave & Randolph St   

   end_station_id            end_station_name   user_type gender  birth_year  
0           

### What do we see?
- There are columns that are considered as float instead of integers: bike_id, start_station_id, end_station_id, and birth_year. We will convert this to integer.

## Summary