# __Python Project #1__ : __Exploring US Bikeshare Data__

## __Introduction__

In this project, we will make use of Python to explore data related to bike share systems for three major cities in the United States (Chicago, New York City, and Washington) 
* We will write code to import the data and utilize descriptive statistics to answer some interesting questions about. 
* We will write a script that utilizes a user input to create an interactive experience in the terminal to present these statistics.

![Bike Share Data](https://video.udacity-data.com/topher/2018/March/5aa7718d_divvy/divvy.jpg)

## __The Data__

In this project, we will use data provided by [Motivate](https://motivateco.com/), a bike share system provider for many major cities in the United States.
We will uncover bike share usage patterns by compare the system usage between three large cities: Chicago, New York City, and Washington, DC.

### __The Datasets__

Randomly selected data for the first six months of 2017 are provided for all three cities. All three of the data files contain the same core six (6) columns:

* Start Time (e.g., 2017-01-01 00:07:57)
* End Time (e.g., 2017-01-01 00:20:53)
* Trip Duration (in seconds - e.g., 776)
* Start Station (e.g., Broadway & Barry Ave)
* End Station (e.g., Sedgwick St & North Ave)
* User Type (Subscriber or Customer)

The Chicago and New York City files also have the following two columns:

* Gender
* Birth Year

The raw data can be found here:
* [Chicago](https://ride.divvybikes.com/system-data)
* [New York City](https://ride.citibikenyc.com/system-data)
* [Washington](https://ride.capitalbikeshare.com/system-data)

Some data wrangling has been performed and condensed, cleaned data been provided by the Instructors. 

I carried a simillar operation on data provided for another bikeshare case study as the capstone project for __The Google Data Analytics Professional Certificate__, you can learn more about that by __[clicking here!](https://github.com/am-abdelfatah/Google-Data-Analytics-Capstone/blob/main/cyclistic-case-study.ipynb)__


## __Statistics Computed__

We are tasked to write an interactive script that will provide the following information upon user request and according to the filtering criteria provided by the user:

1. __Popular times of travel (i.e., occurs most often in the start time)__

* most common month
* most common day of week
* most common hour of day

2. __Popular stations and trip__

* most common start station
* most common end station
* most common trip from start to end (i.e., most frequent combination of start station and end station)

3. __Trip duration__

* total travel time
* average travel time

4. __User info__

* counts of each user type
* counts of each gender (only available for NYC and Chicago)
* earliest, most recent, most common year of birth (only available for NYC and Chicago)


## __The Coding Portion__
Cleaned files should be in the same category as the script, otherwise you will need to edit the working directory used within the code.

In [14]:
# First we will import the required libraries and packages
import time
import pandas as pd
import numpy as np

In [15]:
# We will define our data sets 

CITY_DATA = {'chicago': 'chicago.csv', 'new york city': 'new_york_city.csv', 'washington': 'washington.csv'}

We will write a function that gets the input mode of the script from the user, wether he wants to view the raw data, or filter for certain conditions (City, Month, and Day)

In [16]:
def get_output_mode():
    """
        This function takes a user input to determine the operation mode
        the user want to proceed with, to view raw data or aggregated data

    Returns:
        int : an integer value that represents the mode of operation, 1 to view raw data, or 2 to filter data
    """
    choice = 0
    while True:
        try:
            choice = int(input("\nWould you like to view the raw data or filter for the aggregated values?\n"
                               "press 1 to view raw data, or 2 to filter data: "))
            if choice in (1, 2):
                break
        except KeyboardInterrupt:
            print('Process was Interrupted by Keyboard Input')
            break

        except ValueError:
            print("Please enter a valid value that corresponds to your choice\n")
            continue
    return choice

If the user chose to view filtered data, this function will be applied, we are handling user inputs to ensure that our program runs without errors

In [17]:
def get_filters():
    """
    Asks user to specify a city, month, and day to analyze.

    Returns:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    """

    print("Hello there! Let's explore some US bikeshare data!\n",
          "We have data for Chicago, New York City and Washington\n",
          "So let's take a look!")

    city_message = ("\nPlease enter the name of the City you want to explore\n"
                    "We have data for 'Chicago', 'New York City',  and 'Washington': ")
    city_list = ('chicago', 'new york city', 'washington')

    month_message = ("\nGreat! Next, please enter the choice that corresponds to the month you want to filter by\n"
                     "\nYou can filter by: January,  February,  March,  April,  May, or June,\n additionally you can type 'All' to view data for all months: ")
    month_list = ('all', 'january', 'february',
                  'march', 'april', 'may', 'june')

    day_message = ("\nFinally, Please enter the weekday you wish to filter by\n"
                   "\n Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, or Sunday,\n \n additionally you can type 'All' to view data for the entire week: ")
    day_list = ("all", "monday", "tuesday", "wednesday",
                "thursday", "friday", "saturday", "sunday")

    while True:

        # First, we get the user input for the city (chicago, new york city, washington).
        try:
            city_key = input(city_message)

            if city_key.lower() in city_list:
                print('\nLooks like you chose', city_key.title())
                city = city_key.lower()
                break  # Using a pass statment to go for the next check
            else:
                print(
                    "\nThis place sounds wonderful! but looks like we didn't go there yet, so please enter another value")
                continue

        except KeyboardInterrupt:
            print('Process was Interrupted by Keyboard Input')
            break

        except ValueError:
            print(
                "\nThis place sounds wonderful! but looks like we didn't go there yet, so please enter another value")
            continue

    while True:

        # Secondly, we get user input for the filtering month (all, january, february, ... , june)
        try:
            month_key = input(month_message).lower()

            if month_key in month_list:
                print('\nLooks like you chose to view',
                      month_key.title())
                month = month_key.lower()
                break
            else:
                print("\nLooks like you entered an incorrect value, please try again")
                continue

        except KeyboardInterrupt:
            print('Process was Interrupted by Keyboard Input')
            break

        except ValueError:
            print("\nLooks like you entered an incorrect value, please try again\n")
            continue

    while True:
        # Finally, we get  the input for filtering with a day of the week
        try:
            day_key = input(day_message).lower()

            if day_key in day_list:
                print('\nLooks like you chose', day_key.title())
                day = day_key.lower()
                break
            else:
                print("\nLooks like you entered an incorrect value, please try again")
                continue

        except KeyboardInterrupt:
            print('Process was Interrupted by Keyboard Input')
            break

        except ValueError:
            print("\nLooks like you entered an incorrect value, please try again")
            continue

    print('-'*40)
    return city, month, day

If the user chose to view raw data, this function will be applied, we are handling user inputs to ensure that our program runs without errors, while displaying 5 rows of raw data continously upon user request.

In [18]:
def view_rawdata():
    """This function prints 5 rows of raw data each time the user requests it to
    """
    print("\nWe will now display the raw data, 5 rows at a time")

    city_message = ("\nPlease enter the number corresponding to the City you want to explore\n"
                    "1 for Chicago,  2 for New York City,  or 3 for Washington: ")
    city_list = {1: "chicago", 2: "new york city", 3: "washington"}

    while True:

        # First, we get the user input for the city (chicago, new york city, washington).
        try:
            city_key = int(input(city_message))

            if city_key in city_list:
                print('\nLooks like you chose',
                      city_list[city_key].title())
                city = city_list[city_key]
                break
            else:
                print("\nLooks like you entered an incorrect value, please try again\n")
                continue

        except KeyboardInterrupt:
            print('Process was Interrupted by Keyboard Input')
            break

        except ValueError:
            print("Please enter a valid integer value that corresponds to your choice\n")
            continue

    # loading the data into a data frame
    df = pd.read_csv(CITY_DATA[city])
    row_count = 0

    # A loop to print out the raw data
    while True:
        #
        try:
            if (df.shape[0]-1) - row_count == 0:
                print("You already viewed all the data within the file!")
                break
            elif 0 < df.shape[0] - row_count < 5:
                print("\nYou have", df.shape[0] -
                      row_count, "rows of data remaining\n")
                print(df.iloc[row_count:df.shape[0]-1, :])
                break
            elif (row_count + 5) < df.shape[0]:
                print("\nYou have", df.shape[0] -
                      row_count, "rows of data remaining\n")
                print(df.iloc[row_count:row_count+5, :])
                row_count += 5

        except KeyboardInterrupt:
            print('Process was Interrupted by Keyboard Input')
            break

        except ValueError:
            print("Please enter a valid integer value that corresponds to your choice\n")
            continue

        view_more = input(
            '\nWould you like to view 5 more rows of data? Enter yes or no.\n')
        if view_more.lower() != 'yes':
            break
        else:
            continue

This function loads the data and applies the required filtering upon user request. We are ensuring that user inputs comply with our design and that the program runs without errors.

In [19]:
def load_data(city, month, day):
    """
    Loads data for the specified city and filters by month and day if applicable.

    Args:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    Returns:
        df - Pandas DataFrame containing city data filtered by month and day
    """
    # loading the data into a data frame
    df = pd.read_csv(CITY_DATA[city.lower()])

    # formatting dates column for column manipulation
    # we change the type of column data to a more sutiable format
    df['Start Time'] = pd.to_datetime(df['Start Time'])

    # extracting month and day data to check for filtering data
    # returns month as the coresponding integar
    df['month'] = df['Start Time'].dt.month
    # returns month as the a string for better sharing of the findings
    df['month name'] = df['Start Time'].dt.month_name(locale='English')
    # returns the week day as a string
    df['weekday'] = df['Start Time'].dt.day_name(locale='English')

    # The most popular month for traveling will be calculated now
    # because filtering by month will effectively nullify this value if calculated later
    popular_month = df['month name'].mode()[0]

    # If user requested filtering by month we use:
    if month != "all":
        months = ['january', 'february', 'march', 'april', 'may', 'june']
        # We compensate for the indexing to get the actual value of the month
        month = months.index(month) + 1
        # we apply filtering to our data
        df = df[df['month'] == month]

    # If user requested filtering by day of the week we use:
    if day != "all":
        # .title() is used to adjust proper capitalization for the weekday name
        df = df[df['weekday'] == day.title()]
    return df, popular_month

We will now begin to compund statistics to provide information about __Popular times of travel (i.e., occurs most often in the start time)__:
* most common month
* most common day of week
* most common hour of day

In [20]:
def time_stats(df, city, month, day, popular_month):
    """Displays statistics on the most frequent times of travel."""

    print('\nCalculating The Most Frequent Times of Travel...\n')
    start_time = time.time()

    # display the most common month for traveling
    if month == popular_month:
        print("\nYou are viewing data of {}, which is when our riders usage peak during the year".format(
            popular_month))
    elif month != 'All':
        print(
            "\nYou filtered by {}, It might interest you to know that our riders peak during {}.".format(month, popular_month))
    else:
        print("\nThe most popular riding month in {} is {}".format(
            city, popular_month))

    # display the most common day of week
    popular_weekday = df['weekday'].mode()[0]

    if day == popular_weekday:
        print("\nYou are viewing data of {}, which is the most common day for rides".format(
            popular_weekday))
    elif day != 'All':
        print(
            "\nYou filtered by {}, It might interest you to know that our riders peak during {}.".format(day, popular_weekday))
    else:
        print("\nThe most popular riding month in {} is {}", city, popular_weekday)

    # display the most common start hour

    df['hour'] = df['Start Time'].dt.hour
    popular_hour = df['hour'].mode()[0]
    print("Most of our riders based on the filtered view, start their journeys at {} O'Clock (24-h format)".format(popular_hour))

    print("\nThis took %s seconds." % (round((time.time() - start_time), 3)))
    print('-'*40)


We will now begin to compund statistics to provide information about,  __Popular stations and trip__:

* most common start station
* most common end station
* most common trip from start to end (i.e., most frequent combination of start station and end station)

In [21]:
def station_stats(df):
    """Displays statistics on the most popular stations and trip."""

    print('\nCalculating The Most Popular Stations and Trip...\n')
    start_time = time.time()

    # display most commonly used start station
    most_common_start_station = df['Start Station'].mode()[0]
    start_station_freq = df['Start Station'].value_counts().max()
    print("The most commonly used start station :", most_common_start_station,
          "with a count of", start_station_freq, "trips")

    # display most commonly used end station
    most_common_end_station = df['End Station'].mode()[0]
    end_station_freq = df['End Station'].value_counts().max()
    print("The most commonly used end station :", most_common_end_station,
          "with a count of", end_station_freq, "trips")

    # display most frequent combination of start station and end station trip
    most_common_trip = (df['Start Station'] + ', ending at ' +
                        df['End Station']).mode()
    print(most_common_trip)
    trip_freq = (df['Start Station'] + df['End Station']).value_counts().max()
    print("The most common trip performed is from '{}', with a count of {} trips".format(
        most_common_trip[0], trip_freq))

    print("\nThis took %s seconds." % (round((time.time() - start_time), 3)))
    print('-'*40)

We will now begin to compund statistics to provide information about __Trip duration__:

* total travel time
* average travel time

In [22]:
def trip_duration_stats(df):
    """Displays statistics on the total and average trip duration."""

    print('\nCalculating Trip Duration...\n')
    start_time = time.time()

    # display total travel time
    total_travel = df['Trip Duration'].sum()
    print("\nOur riders enjoyed their bikes for :",
          round(total_travel/360, 3), "Hours, within the filtering criteria.")

    # display mean travel time
    mean_travel = df['Trip Duration'].mean()
    print("\nThe average rider spends ", round(
        mean_travel/60, 3), "minutes per trip.")

    # display the longest trip duration
    max_travel = df['Trip Duration'].max()
    print("\nThe longest trip took about", round(max_travel/60, 2), "minutes")

    print("\nTravel time for each user classification:\n")

    # display the total trip duration for each user type
    duration_grouping = df.groupby(['User Type']).sum()['Trip Duration']
    for index, user_trip in enumerate(duration_grouping):
        print("  {} spent {} minutes".format(
            duration_grouping.index[index], round(user_trip/60, 3)))

    print("\nThis took %s seconds." % (round((time.time() - start_time), 3)))
    print('-'*40)

We will now begin to compund statistics to provide information about __User info__:

* counts of each user type
* counts of each gender (only available for NYC and Chicago)
* earliest, most recent, most common year of birth (only available for NYC and Chicago)

In [23]:
def user_stats(df, city):
    """Displays statistics on bikeshare users."""

    print('\nCalculating User Stats...\n')
    start_time = time.time()

    # Display counts of user types
    user_types = df['User Type'].value_counts()
    print("The following users shared their rides with us:\n")
    for index, user_type in enumerate(user_types):
        print("  {}  {}s minutes".format(user_type, user_types.index[index]))

    if city in ('chicago', 'new york city'):
        # Display counts of gender
        user_genders = df['Gender'].value_counts()
        print("\nThey can be grouped into:\n")
        for index, user_gender in enumerate(user_genders):
            print("  {} {}s".format(
                user_gender, user_genders.index[index]))

        # Display earliest, most recent, and most common year of birth
        oldest = int(df['Birth Year'].min())
        youngest = int(df['Birth Year'].max())
        common = int(df['Birth Year'].mode())
        common_count = int(df['Birth Year'].value_counts().max())

        print("\nOur oldest rider was born in {} and still going strong, our youngest rider was born in {}, and our {} gang are making a strong presence with {} people".format(
            oldest, youngest, common, common_count))

    print("\nThis took %s seconds." % (round((time.time() - start_time), 3)))
    print('-'*40)

Now we will define the main function for the entire script, ensuring it runs continously until user asks to stop.

In [24]:
def main():
    """This is the main function used to apply the full script
    """
    while True:

        choice = get_output_mode()

        if choice == 2:
            city, month, day = get_filters()
            df, popular_month = load_data(city, month, day)
            time_stats(df, city, month, day, popular_month)
            station_stats(df)
            trip_duration_stats(df)
            user_stats(df, city)

        elif choice == 1:
            view_rawdata()

        restart = input('\nWould you like to restart? Enter yes or no.\n')
        if restart.lower() != 'yes':
            break


if __name__ == "__main__":
    main()

Hello there! Let's explore some US bikeshare data!
 We have data for Chicago, New York City and Washington
 So let's take a look!

Looks like you chose Chicago

Looks like you chose to view March

Looks like you chose All
----------------------------------------

Calculating The Most Frequent Times of Travel...


You filtered by march, It might interest you to know that our riders peak during June.

You filtered by all, It might interest you to know that our riders peak during Friday.
Most of our riders based on the filtered view, start their journeys at 17 O'Clock (24-h format)

This took 0.008 seconds.
----------------------------------------

Calculating The Most Popular Stations and Trip...

The most commonly used start station : Clinton St & Washington Blvd with a count of 604 trips
The most commonly used end station : Clinton St & Washington Blvd with a count of 588 trips
0    Calumet Ave & 33rd St, ending at State St & 33...
dtype: object
The most common trip performed is from '