# Explore US BikeShare Data
## (Udacity Data Scintist Nano-Degree Project)

## Overview

I make use of Python to explore data related to bike share systems for three major cities in the United States—Chicago, New York City, and Washington. I wrote a code to import the data and answer interesting questions about it by computing descriptive statistics. I also wrote a script that takes in raw input to create an interactive experience in the terminal to present these statistics to the user.

First I imported time, NumPy, and pandas packages to use

In [None]:
import time
import pandas as pd
import numpy as np

CITY_DATA dictionary includes all city names and csv file values for it, the same will be used throught the program.

In [None]:
CITY_DATA = {'chicago': 'chicago.csv',
            'new york city': 'new_york_city.csv',
            'washington': 'washington.csv'}

Function (1): get_filters() request and validate user inputs for (city, month, day) and returns the same to be used as arguments in producing the DataFrame in the next function.

In [None]:
def get_filters():
    """
    Asks user to specify a city, month, and day to analyze.
    Returns:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    """
    print('Hello! Let\'s explore some US bikeshare data!')

    # While loop to validate and return the user input ensuring the correct city name was entered to break
    while True:
        city = input(
                    "Please choose one of these cities to explore: Chicago, New York City, or Washington:\n"
                    ).lower()
        if city not in CITY_DATA:
            print("\nCity not found!")
            continue  
        else:
            print(
                f"\nYou have chosen to explore {city.title()}."
                )
            break
    # While loops validating user input for filtering by month and day. Each loop validate month and day inputs, invalid inputs will restart the loop.
    while True:
        month = input(
                     "Please choose a month from:\nJanuary, Feburary, March, April, May, June, or type 'all' to explore all monthly data\n"
                     ).lower()
        month_list = ["january", "february", "march", "april", "may", "june", "all"]
        if month != 'all' and month not in month_list:
            print("\nInvalid Input!\n")
            continue
        else:
            print(
                 f"\nYou have chosen to explore {month.title()} from monthly data."
                 )
            break
         
    while True:
        day = input(
                   "Please choose a weekday, or type 'all' to explore all daily data\n"
                   ).lower()
        days_list = ["monday", "tuesday", "wednesday", "thursday", "friday", "saturday", "sunday", "all"]
        if day != 'all' and day not in days_list:
            print("\nInvalid Input!\n")
            continue
        else:
            print(f"\nYou have chosen {day.title()} to explore daily data.")
            break

    print('-'*40)
    return city, month, day

Function (2): load_data() takes the three arguments (city, month, day) returned from the previous function and produce a DataFrame to be used by the rest of the remaining functions.

In [None]:
def load_data(city, month, day):
    """
    Loads data for the specified city and filters by month and day if applicable.
    Args:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    Returns:
        df - Pandas DataFrame containing city data filtered by month and day
    """
    print("Data is being prepared!")

    df = pd.read_csv(CITY_DATA[city])
    # Converting [Start Time] column to DateTime
    df['Start Time'] = pd.to_datetime(df['Start Time'])
    # Creating [Month] and [Day] columns from [Start Time] column data.
    df['Start_Month'] = df['Start Time'].dt.month
    df['Start_Day'] = df['Start Time'].dt.day_name()
    # Validating month filter if specified 
    if month != 'all':
        # Use index of month +1 to get the correct month number
        months = ["january", "february", "march", "april", "may", "june"]
        month = months.index(month) + 1
        # Create a new df filtered by month
        df = df[df['Start_Month'] == month]
    # Validating day filter if specified    
    if day != 'all':
        # Create a new df filtered by day
        df = df[df['Start_Day'] == day.title()]
    
    print('-'*40)
    return df

Function (3): time_stats() it takes the DF returned from the second function and return the following stats to the user,\
•	most common month\
•	most common day of week\
•	most common hour of day

In [None]:
def time_stats(df):
    """Displays statistics on the most frequent times of travel."""

    print('\nCalculating The Most Frequent Times of Travel...\n')
    start_time = time.time()

    # Returning the most common/popular month.
    common_month = df['Start_Month'].mode()[0]
    print(f"Most popular month is: {common_month}")

    # Returning the most common/popular day.
    common_day = df['Start_Day'].mode()[0]
    print(f"Most popular day is: {common_day}")
    
    # Returning the most common/popular starting hour based on user input.
    df['hour'] = df['Start Time'].dt.hour
    popular_hour = df['hour'].mode()[0]
    print(f"Most popular hour is: {popular_hour}:00 HRS")

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)

Function (4): station_stats() takes the DF returned from the second function and return the following stats to the user,\
•	most common start station\
•	most common end station\
•	most common trip from start to end (i.e., most frequent combination of start station and end station)

In [None]:
def station_stats(df):
    """Displays statistics on the most popular stations and trip."""

    print('\nCalculating The Most Popular Stations and Trip...\n')
    start_time = time.time()

    # Aggregatin the most common/popular starting station from [Start Station] column.
    most_popular_start_station = df['Start Station'].mode()[0]
    print(f"Most popular starting station is: {most_popular_start_station}")
    
    # Aggregatin the most common/popular ending station from [End Station] column.
    most_popular_end_station = df['End Station'].mode()[0]
    print(f"Most popular End station is: {most_popular_end_station}")
    
    # Display most frequent combination of start station and end station by combining them and extracting the mode of the new df.
    df['combination'] = df['Start Station'] + ' AND ' + df['End Station']
    most_frequent_start_and_end_stations = df['combination'].mode()[0]
    print(
         f"Most frequent start station and end stations are: {most_frequent_start_and_end_stations}"
         )

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)

Function (5): trip_duration_stats() takes the DF returned from the second function and return the following stats to the user,\
•	total travel time\
•	average travel time

In [None]:
def trip_duration_stats(df):
    """Displays statistics on the total and average trip duration."""

    print('\nCalculating Trip Duration...\n')
    start_time = time.time()

    # Calculating the total travel time from [Trip Duration] column
    total_travel_time = df['Trip Duration'].sum()
    print(f"Total travel time in secounds is: {total_travel_time}, \
    and in hours: {total_travel_time/3600}")

    # Calculating the average travel time
    average_travel_time = df['Trip Duration'].mean()
    print(f"Average travel time in secounds is: {average_travel_time}, \
        and in hours: {average_travel_time/3600}")

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)

Function (6): user_stats() takes the DF returned from the second function and return the following stats to the user,\
•	counts of each user type\
•	counts of each gender (only available for NYC and Chicago)\
•	earliest, most recent, most common year of birth (only available for NYC and Chicago)

In [None]:
def user_stats(df):
    """Displays statistics on bikeshare users."""

    print('\nCalculating User Stats...\n')
    start_time = time.time()

    # Extracting the counts of user types from [User Type] column
    print("User Types are:\n", df['User Type'].value_counts())

    # Extracting gender count from [Gender] column if available in DF
    if 'Gender' in df:
        print(f"User genders count are:\n", df['Gender'].value_counts())

    # Extracting earliest, most recent, and most common year of birth conditioned if [Birth Year] column available
    if 'Birth Year' in df:
            earliest_birth_year = int(df['Birth Year'].min())
            print(f"Earliest birth year is: {earliest_birth_year}")

            recent_birth_year = int(df['Birth Year'].max())
            print(f"Most recent birth year is: {recent_birth_year}")

            common_birth_year = int(df['Birth Year'].mode()[0])
            print(f"Most common birth year is: {common_birth_year}")

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)

Function (7): show_raw_data() takes the DF returned from the second function and return the first 5 rows of data to the user and request if the user still requires to view 5 more rows or break.

In [None]:
def show_raw_data(df):
    """Return sudsequent rows from DataFrame based of user answer
    Args:
    df - Pandas DataFrame containing city data filtered by month and day from load_data()
    """
    display_rows = 0
    answer = input("Would you like to view first 5 rows of data? [Yes/No]:\n").lower()
    # Validating user input using while loop
    while True:
        if answer == 'no':
            break
        print(df[display_rows:display_rows + 5])
        answer = input("Would you like to view next 5 rows of data? [Yes/No]:\n").lower()
        display_rows += 5

Main function to validate and controlling the flow of all above functions in sequenced manner.

In [1]:
def main():
    while True:
        city, month, day = get_filters()
        df = load_data(city, month, day)

        time_stats(df)
        station_stats(df)
        trip_duration_stats(df)
        user_stats(df)
        show_raw_data(df)

        restart = input('\nWould you like to restart? Enter yes or no.\n').lower()
        if restart != 'yes':
            break


if __name__ == "__main__":
	main()

Hello! Let's explore some US bikeshare data!
Write a city name: Chicago, New York City or Washington!new york city
Do you want to filter as month, day, all or none?all
Which month? January, Feburary, March, April, May or June?june
Which day? Monday, Tuesday, Wednesday, Thursday, Friday, Saturday or Sundaysunday
new york city
june
sunday
----------------------------------------

Calculating The Most Frequent Times of Travel...

6
Sunday
17

This took 0.07598376274108887 seconds.
----------------------------------------

Calculating The Most Popular Stations and Trip...

West St & Chambers St
12 Ave & W 40 St
Yankee Ferry Terminal to Yankee Ferry Terminal

This took 0.008941411972045898 seconds.
----------------------------------------

Calculating Trip Duration...

10750528
1252.0996971814582

This took 0.006983518600463867 seconds.
----------------------------------------

Calculating User Stats...

Subscriber    6637
Customer      1949
Name: User Type, dtype: int64
Male      4785
Fema