Bike Share Data
Over the past decade, bicycle-sharing systems have been growing in number and popularity in cities across the world. Bicycle-sharing systems allow users to rent bicycles on a very short-term basis for a price. This allows people to borrow a bike from point A and return it at point B, though they can also return it to the same location if they'd like to just go for a ride. Regardless, each bike can serve several users per day.

Thanks to the rise in information technologies, it is easy for a user of the system to access a dock within the system to unlock or return bicycles. These technologies also provide a wealth of data that can be used to explore how these bike-sharing systems are used.

In this project, you will use data provided by Motivate, a bike share system provider for many major cities in the United States, to uncover bike share usage patterns. You will compare the system usage between three large cities: Chicago, New York City, and Washington, DC.

The Datasets
Randomly selected data for the first six months of 2017 are provided for all three cities. All three of the data files contain the same core six (6) columns:

Start Time (e.g., 2017-01-01 00:07:57)
End Time (e.g., 2017-01-01 00:20:53)
Trip Duration (in seconds - e.g., 776)
Start Station (e.g., Broadway & Barry Ave)
End Station (e.g., Sedgwick St & North Ave)
User Type (Subscriber or Customer)
The Chicago and New York City files also have the following two columns:

Gender
Birth Year

Statistics Computed
You will learn about bike share use in Chicago, New York City, and Washington by computing a variety of descriptive statistics. In this project, you'll write code to provide the following information:

#1 Popular times of travel (i.e., occurs most often in the start time)

most common month
most common day of week
most common hour of day

#2 Popular stations and trip

most common start station
most common end station
most common trip from start to end (i.e., most frequent combination of start station and end station)

#3 Trip duration

total travel time
average travel time

#4 User info

counts of each user type
counts of each gender (only available for NYC and Chicago)
earliest, most recent, most common year of birth (only available for NYC and Chicago)

In [1]:
import time
import pandas as pd
import numpy as np

CITY_DATA = { 'chicago': '../Notebook1/chicago.csv',
              'new york city': '../Notebook1/new_york_city.csv',
              'washington': '../Notebook1/washington.csv' }

In [2]:


def get_filters():
    print('Hello! Let\'s explore some US bikeshare data!')
    #The next block has input validation for data set selection and filteration.

    while True:
        try:
            city_choice = int(input("What city would you like to see data for (type city number)? (1-Chicago, 2-New York, 3-Washington?): "))
            cities = ['chicago','new york city','washington']
            if city_choice>=1 and city_choice<=3:
                city = cities[city_choice-1]
                break
            else:
                continue
        except:
            print("Invalid option, please retry and use only numbers!")

    #in case user choses none for filters the month and day the whole data set is fetched and the filteration condition below won't start
    month='all'
    day='all'
    data_filter = input("Would like a data filter by 'month', 'day', 'both', or 'none'?(Type the keyword specific to your option): ").lower()
    
    #if the user choses both or month in as filter condition this block operates
    if data_filter=='both' or data_filter=='month':
        while True:
            try:
                month = input("Enter a month filter from the following (january, february, march, april, may, june) if all type 'all': ").lower()
                if month in ['january', 'february', 'march', 'april', 'may', 'june','all']:
                    break
                else:
                    print("Invalid name ,please re-enter month name!")
                    continue
            except:
                print("Invalid option, please retry!")


    #if the user choses both or day in as filter condition this block operates
    if data_filter=='both' or data_filter=='day':
        while True:
            try:
                day = input("Please enter a day filter from the following (sunday, monday, tuesday, wednesday, thursday, friday, saturday) if all type 'all': ").lower()
                if day in ['all','sunday','monday','tuesday','wednesday','thursday','friday','saturday']:
                    break
                else:
                    print("Invalid name ,please re-enter day name!")
                    continue
            except:
                print("Invalid option, please retry!")

    
    print('-'*40)
    return city, month, day


def load_data(city, month, day):
    #This function reads csv file and starts to fetch and filter data also it creates columns for separated day and month
    data = pd.read_csv(CITY_DATA[city])

    data['Start Time'] = pd.to_datetime(data['Start Time'])

    data['month'] = data['Start Time'].dt.month
    data['day'] = data['Start Time'].dt.day_name()
      
    if month != 'all':
        months = ['january', 'february', 'march', 'april', 'may', 'june']
        month = months.index(month) + 1
        data = data[data['month'] == month]

    
    if day != 'all':
        data = data[data['day'] == day.title()]
       
    return data


def time_stats(df):
    #this block does some statistics on time related data columns in the data set.

    print('\nCalculating The Most Frequent Times of Travel...\n')
    start_time = time.time()

    common_month = (df['month']).mode()[0]
    print("The most common month: "+str(common_month))

    common_day = (df['day']).mode()[0]
    print("The most common day of week: "+str(common_day))

    common_hour = (df['Start Time'].dt.hour).mode()[0]
    print("The most common start hour: "+str(common_hour))

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def station_stats(df):
    #this block calculates statistics on user destinations using bike.
    print('\nCalculating The Most Popular Stations and Trip...\n')
    start_time = time.time()

    common_start_station = (df['Start Station'].mode()[0])
    print("Most common start station: "+str(common_start_station))

    common_end_station = (df['End Station'].mode()[0])
    print("Most common end station: "+str(common_end_station))

    most_common_start_end = (df['Start Station'] + ' *** End Station: ' + df['End Station']).mode()[0]
    print("Most common combination on start/end station trip: Start Station: "+str(most_common_start_end))

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def trip_duration_stats(df):
    #this block calculates statistics on trip durations and other trip time related data.

    print('\nCalculating Trip Duration...\n')
    start_time = time.time()

    total_travel_time = np.sum(df['Trip Duration'])
    print("Total travel time in seconds: "+str(total_travel_time)," seconds\n")
    print("Total travel time in hours: "+str(total_travel_time/(60*60))," hours\n")

    mean_travel_time = np.mean(df['Trip Duration'])
    print("Average travel time in seconds: "+str(mean_travel_time)," seconds\n")
    print("Average travel time in minutes: "+str(mean_travel_time/60)," minutes\n")

    print("\nTrips count: ",df.shape[0])

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def user_stats(df):
    #this block displays some useful information about users.

    print('\nCalculating User Stats...\n')
    start_time = time.time()

    user_types = pd.Series(df['User Type'].value_counts(),name="User Types")
    user_types_frame ={
        'Subscriber':user_types[0],
        'Customer':user_types[1]
    }
    user_types_dframe = pd.DataFrame(user_types_frame,index=['Count'])
    print("Display table of user types counts:\n\n",user_types_dframe,"\n")

    if 'Gender' in df:
        genders = pd.Series(df['Gender'].value_counts(),name="Gender Counts")
        genders_frame ={
            'Male':genders[0],
            'Female':genders[1]
        }
        genders_dframe = pd.DataFrame(genders_frame,index=['Count'])
        print("Display table gender counts:\n\n",genders_dframe,"\n")
    else:
        print("Data about gender is not available for this city!\n")

    if 'Birth Year' in df:
        print("\nEarliest birth year: ",df['Birth Year'].min())
        print("\nMost recent birth year: ",df['Birth Year'].max())
        print("\nMost common birth year: ",df['Birth Year'].mode()[0])
    else:
        print("Data about birth years is not available for this city!\n")

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def sample(df):
    print(df.head())
    start_index =5
    while True:
        data_r = input("Would you like to see sample data? (yes/no)")
        if data_r == 'yes':
            temp = pd.DataFrame(df.iloc[start_index:start_index+5])
            print(temp)
            start_index+=5
        else:
            break


In [3]:
def main():
    while True:
        city, month, day = get_filters()
        df = load_data(city, month, day)
        
        time_stats(df)
        station_stats(df)
        trip_duration_stats(df)
        user_stats(df)

        data_review = input("Would you like to see sample data? (yes/no)")
        if data_review.lower() =='yes':
            sample(df)
        else:
            break

        restart = input('\nWould you like to restart? Enter yes or no.\n')
        if restart.lower() != 'yes':
            break


if __name__ == "__main__":
    main()

Hello! Let's explore some US bikeshare data!
What city would you like to see data for (type city number)? (1-Chicago, 2-New York, 3-Washington?): 1
Would like a data filter by 'month', 'day', 'both', or 'none'?(Type the keyword specific to your option): none
----------------------------------------

Calculating The Most Frequent Times of Travel...

The most common month: 6
The most common day of week: Tuesday
The most common start hour: 17

This took 0.0624847412109375 seconds.
----------------------------------------

Calculating The Most Popular Stations and Trip...

Most common start station: Streeter Dr & Grand Ave
Most common end station: Streeter Dr & Grand Ave
Most common combination on start/end station trip: Start Station: Lake Shore Dr & Monroe St *** End Station: Streeter Dr & Grand Ave

This took 0.32364439964294434 seconds.
----------------------------------------

Calculating Trip Duration...

Total travel time in seconds: 280871787  seconds

Total travel time in hours: 7