# BikeShare

I am using a Jupyter Notebook here to work on this project and comment on the code in the sense of literate programming.

First, I import relevant packages to work on the project.

In [1]:
import time
import pandas as pd
import numpy as np


Next, I build the CITY DATA dictionary consisting of the data handed over in the three csv files.

In [2]:
CITY_DATA = { 'chicago': 'chicago.csv',
              'new york city': 'new_york_city.csv',
              'washington': 'washington.csv' }

I check out the structure of the data, looking at the first 5 rows.

In [3]:
df = pd.read_csv("chicago.csv")
print(df.head())

   Unnamed: 0           Start Time             End Time  Trip Duration  \
0     1423854  2017-06-23 15:09:32  2017-06-23 15:14:53            321   
1      955915  2017-05-25 18:19:03  2017-05-25 18:45:53           1610   
2        9031  2017-01-04 08:27:49  2017-01-04 08:34:45            416   
3      304487  2017-03-06 13:49:38  2017-03-06 13:55:28            350   
4       45207  2017-01-17 14:53:07  2017-01-17 15:02:01            534   

                   Start Station                   End Station   User Type  \
0           Wood St & Hubbard St       Damen Ave & Chicago Ave  Subscriber   
1            Theater on the Lake  Sheffield Ave & Waveland Ave  Subscriber   
2             May St & Taylor St           Wood St & Taylor St  Subscriber   
3  Christiana Ave & Lawrence Ave  St. Louis Ave & Balmoral Ave  Subscriber   
4         Clark St & Randolph St  Desplaines St & Jackson Blvd  Subscriber   

   Gender  Birth Year  
0    Male      1992.0  
1  Female      1992.0  
2    Male     

The following function is asking the use to specify a city, month and day to analyze.
It will return: 
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter.
It also prompts the user to re-enter if the entry was not correct.

In [4]:
def get_filters():
   
    # TO DO: get user input for city (chicago, new york city, washington). HINT: Use a while loop to handle invalid inputs
    city = input('What city data would you like to analyze? Enter "chicago", "new york city", "washington" or "all": ')
    while city not in ("chicago", "new york city", "washington", "all"):
        city = input('This is not a valid input. Please enter "chicago", "new york city", "washington" or "all": ')
    else: 
        print('Thank You! We will analyze data for {} for you'.format(city)) 
        
    # TO DO: get user input for month (all, january, february, ... , june)
    month = input('What month would you like to analyze? Enter the name of the month or "all" for no month filter: ')
    while month not in ('january', 'february', 'march', 'april', 
                        'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december', 'all'):
        month = input('This is not a valid input. Enter the name of the month or "all" for no month filter: ')
    else: 
        print('Thank You! We will analyze data for {} for you'.format(month))
    
    # TO DO: get user input for day of week (all, monday, tuesday, ... sunday)
    day = input('What day of the week would you like to analyze? Enter the name of the day or "all" for no day filter: ')
    while day not in ('monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday', 'all'):
        day = input('This is not a valid input. Enter the name of the month or "all" for no month filter: ')
    else: 
        print('Thank You! We will analyze data for {} for you'.format(day))

    print('-'*40)
    return city, month, day






Loading selected data and applying filters corresponding to input provided by user.

In [5]:
def load_data(city, month, day):
    """
    Loads data for the specified city and filters by month and day if applicable.

    Args:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    Returns:
        df - Pandas DataFrame containing city data filtered by month and day
    """
    # load data file into a dataframe
    df = pd.read_csv(CITY_DATA[city])
    
    # convert the Start Time column to datetime
    df['Start Time'] = pd.to_datetime(df['Start Time'])
    
    # extract month and day of week from Start Time to create new columns
    df['month'] = df['Start Time'].dt.month
    df['day_of_week'] = df['Start Time'].dt.weekday_name
    
    # filter by month if applicable
    if month != 'all':
        # use the index of the months list to get the corresponding int
        months = ['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']
        month = months.index(month) + 1
        
        # filter by month to create the new dataframe
        df = df[df['month'] == month]

    # filter by day of week if applicable
    if day != 'all':
        # filter by day of week to create the new dataframe
        df = df[df['day_of_week'] == day.title()]
    
    return df

city, month, day = get_filters()
load_data(city, month, day).head()


What city data would you like to analyze? Enter "chicago", "new york city", "washington" or "all": new york city
Thank You! We will analyze data for new york city for you
What month would you like to analyze? Enter the name of the month or "all" for no month filter: january
Thank You! We will analyze data for january for you
What day of the week would you like to analyze? Enter the name of the day or "all" for no day filter: friday
Thank You! We will analyze data for friday for you
----------------------------------------


Unnamed: 0.1,Unnamed: 0,Start Time,End Time,Trip Duration,Start Station,End Station,User Type,Gender,Birth Year,month,day_of_week
72,445709,2017-01-20 19:01:02,2017-01-20 19:06:32,330,Stanton St & Chrystie St,MacDougal St & Prince St,Subscriber,Male,1983.0,1,Friday
212,279381,2017-01-13 20:19:24,2017-01-13 20:28:35,551,W 87 St & Amsterdam Ave,11 Ave & W 59 St,Subscriber,Male,1994.0,1,Friday
214,432007,2017-01-20 09:20:14,2017-01-20 09:27:36,441,W 39 St & 9 Ave,Pershing Square South,Subscriber,Male,1980.0,1,Friday
274,252422,2017-01-13 08:06:10,2017-01-13 08:23:32,1041,Carmine St & 6 Ave,Broadway & E 22 St,Subscriber,Male,1979.0,1,Friday
290,261652,2017-01-13 11:48:49,2017-01-13 12:00:39,709,York St & Jay St,Rivington St & Chrystie St,Subscriber,Male,1968.0,1,Friday


In [6]:
def time_stats(df):
    """Displays statistics on the most frequent times of travel."""

    print('\nCalculating The Most Frequent Times of Travel...\n')
    df = load_data(city, month, day)
    # ToDo: here I will have to adjust for month (etc.) names vs. indices...!
    # TO DO: display the most common month: The month as January=1, December=12
    df['Start Time'] = pd.to_datetime(df['Start Time'])
    df['month'] = df['Start Time'].dt.month
    popular_month = df['month'].mode()[0]
    print('Most Popular Start Month:', popular_month)

    # TO DO: display the most common day of week
    df['Start Time'] = pd.to_datetime(df['Start Time'])
    df['day'] = df['Start Time'].dt.dayofweek
    popular_day = df['day'].mode()[0]
    print('Most Popular Start Day:', popular_day)

    # TO DO: display the most common start hour
    df['Start Time'] = pd.to_datetime(df['Start Time'])
    df['hour'] = df['Start Time'].dt.hour
    popular_hour = df['hour'].mode()[0]
    print('Most Popular Start Hour:', popular_hour)
    
    return time_stats 
   

time_stats(df)
print('-'*40)





Calculating The Most Frequent Times of Travel...

Most Popular Start Month: 1
Most Popular Start Day: 4
Most Popular Start Hour: 8
----------------------------------------


In [7]:
def station_stats(df):
    """Displays statistics on the most popular stations and trip."""
    df = load_data(city, month, day)
    print('\nCalculating The Most Popular Stations and Trip...\n')
    start_time = time.time()

    # TO DO: display most commonly used start station
    popular_ss = df['Start Station'].mode()[0]
    print('Most popular start station', popular_ss)

    # TO DO: display most commonly used end station
    popular_es = df['End Station'].mode()[0]
    print('Most popular end station', popular_es)

    # TO DO: display most frequent combination of start station and end station trip
    popular_trip = df.groupby(['Start Station', 'End Station']).size().nlargest(1)
    print('Most popular combination', popular_trip)
    
    return station_stats

station_stats(df)
print('-'*40)




Calculating The Most Popular Stations and Trip...

Most popular start station Pershing Square North
Most popular end station Pershing Square North
Most popular combination Start Station    End Station    
E 25 St & 1 Ave  1 Ave & E 44 St    5
dtype: int64
----------------------------------------


In [8]:
def trip_duration_stats(df):
    """Displays statistics on the total and average trip duration."""
    df = load_data(city, month, day)
    # TO DO: display total travel time
    total_duration = df['Trip Duration'].sum()
    print('Total travel time: ', total_duration)
    
    # TO DO: display mean travel time
    average_duration = df['Trip Duration'].mean()
    print('Average travel time: ', average_duration)
    
    return trip_duration_stats

trip_duration_stats(df) 
print('-'*40)



Total travel time:  3612092
Average travel time:  700.0178294573643
----------------------------------------


In [9]:
def user_stats(df):
    """Displays statistics on bikeshare users."""
    df = load_data(city, month, day)
    # TO DO: Display counts of user types
    count_user_type = df.groupby(['User Type']).count()
    print('Count of user type: ', count_user_type)
    
    # TO DO: Display counts of gender
    count_gender = df.groupby(['Gender']).count()
    print('Count of gender: ', count_gender)
    
    # TO DO: Display earliest, most recent, and most common year of birth
    earliest = df['Birth Year'].min()
    print('Earliest birth year: ', earliest)
    
    recent = df['Birth Year'].max()
    print('Most recent birth year: ', recent)
    
    common = df['Birth Year'].mode()[0]
    print('Most common birth year: ', common)

    return user_stats

user_stats(df)
print('-'*40)


Count of user type:              Unnamed: 0  Start Time  End Time  Trip Duration  Start Station  \
User Type                                                                    
Customer           114         114       114            114            114   
Subscriber        5021        5021      5021           5021           5021   

            End Station  Gender  Birth Year  month  day_of_week  
User Type                                                        
Customer            114       1           1    114          114  
Subscriber         5021    4964        4980   5021         5021  
Count of gender:          Unnamed: 0  Start Time  End Time  Trip Duration  Start Station  \
Gender                                                                   
Female        1094        1094      1094           1094           1094   
Male          3896        3896      3896           3896           3896   

        End Station  User Type  Birth Year  month  day_of_week  
Gender                

In [None]:
def main():
    while True:
        city, month, day = get_filters()
        df = load_data(city, month, day)

        time_stats(df)
        station_stats(df)
        trip_duration_stats(df)
        user_stats(df)

        restart = input('\nWould you like to restart? Enter yes or no.\n')
        if restart.lower() != 'yes':
            break


if __name__ == "__main__":
	main()
