# Project: Explore US Bikeshare Data

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#script">Python Script</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

> ### Overview
>Bicycle-sharing systems allow users to rent bicycles on a very short-term basis for a price. This allows people to borrow a bike from point A and return it at point B, though they can also return it to the same location if they'd like to just go for a ride. Regardless, each bike can serve several users per day.

>In this project, we will use data provided by Motivate, a bike share system provider for many major cities in the United States, to uncover bike share usage patterns. You will compare the system usage between three large cities: Chicago, New York City, and Washington.

>### Dataset Description

>Randomly selected data for the first six months of 2017 are provided for all three cities. All three of the data files contain the same core six (6) columns:

>- Start Time (e.g., 2017-01-01 00:07:57).
>- End Time (e.g., 2017-01-01 00:20:53).
>- Trip Duration (in seconds - e.g., 776).
>- Start Station (e.g., Broadway & Barry Ave).
>- End Station (e.g., Sedgwick St & North Ave).
>- User Type (Subscriber or Customer).

>The Chicago and New York City files also have the following two columns:
>- Gender.
>- Birth Year.

> we will learn about bike share use in Chicago, New York City, and Washington by computing a variety of descriptive statistics. In this project, we'll write code to provide the following information:

>**Popular times of travel (i.e., occurs most often in the start time)**
>- Most common month.
>- Most common day of week.
>- Most common hour of day.

>**Popular stations and trip**
>- Most common start station.
>- Most common end station.
>- Most common trip from start to end (i.e., most frequent combination of start station and end station).

>**Trip duration**
>- Total travel time.
>- Average travel time.


>**User info**
>- Counts of each user type.
>- Counts of each gender (only available for NYC and Chicago).
>- Earliest, most recent, most common year of birth (only available for NYC and Chicago).

>### Software
> To complete this project, we will need the following softwares:
>- python.
>- time, NumPy, and pandas libaries installed.
>- Code editor & terminal program like Spyder or Jupyter notebook.

<a id='script'></a>
## Python Script 

> First we need to import the libaries which we will use.

In [1]:
# Use this cell to set up import statements 
#   for all of the packages that you plan to use.
import time
import pandas as pd
import numpy as np

> To expedite the work, let's make some variables which we will need inside the code.

In [2]:
CITY_DATA = { 'chicago': 'chicago.csv',
              'new york city': 'new_york_city.csv',
              'washington': 'washington.csv' }
months = ['january', 'february', 'march', 'april', 'may', 'june', 'all']
days = ['monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday', 'all']

> At the beginning, we will ask the user to specify a city, month, and day to analyze.

> Returns:
>- (str) city - name of the city to analyze.
>- (str) month - name of the month to filter by, or "all" to apply no month filter.
>- (str) day - name of the day of week to filter by, or "all" to apply no day filter.

In [3]:
def get_filters():
    
    print('Hello! Let\'s explore some US bikeshare data!')
    # get user input for city (chicago, new york city, washington). 
    city=input('ًWould you like to see data for chicago, new york city, washington? ').lower()
    while city not in (CITY_DATA.keys()):
        print('Wrong City')
        city=input('ًWould you like to see data for chicago, new york city, washington? ').lower()
    
    # get user input for month (all, january, february, ... , june)
    month=input('all, january, february, march, april, may, june ').lower()
    while month not in months:
        print('Wrong Month')
        month=input('all, january, february, march, april, may, june ').lower()
    
    # get user input for day of week (all, monday, tuesday, ... sunday)
    day=input('all, monday, tuesday, wednesday, thursday, friday, saturday, sunday ').lower()
    while day not in days:
        print('Wrong Day')
        day=input('all, monday, tuesday, wednesday, thursday, friday, saturday, sunday ').lower()
    print('-'*40)
    
    return city, month, day

> After that let's load the data that the user give to us for the specified city and filters by month and day if applicable.

>Args:
>- (str) city - name of the city to analyze.
>- (str) month - name of the month to filter by, or "all" to apply no month filter.
>- (str) day - name of the day of week to filter by, or "all" to apply no day filter.

>Returns:
>- df - Pandas DataFrame containing city data filtered by month and day.

In [4]:
def load_data(city, month, day):
    
   # load data file into a dataframe
    df = pd.read_csv(CITY_DATA[city])

    # convert the Start Time column to datetime
    df['Start Time'] = pd.to_datetime(df['Start Time'])

    # extract month and day of week from Start Time to create new columns
    df['month'] = df['Start Time'].dt.month
    df['day_of_week'] = df['Start Time'].dt.dayofweek
    days = {0:'Monday',1:'Tuesday',2:'Wednesday',3:'Thursday',4:'Friday',5:'Satday',6:'Sunday'}
    df['day_of_week'] = df['day_of_week'].apply(lambda x: days[x])
    df['hour'] = df['Start Time'].dt.hour


    # filter by month if applicable
    if month != 'all':
        # use the index of the months list to get the corresponding int
        months = ['january', 'february', 'march', 'april', 'may', 'june']
        month = months.index(month) + 1
    
        # filter by month to create the new dataframe
        df = df[df['month'] == month]

    # filter by day of week if applicable
    if day != 'all':
        # filter by day of week to create the new dataframe
        df = df[df['day_of_week'] == day.title()]
    
    return df

>**Now let's display some statistics from the dataset**

>1. Displays statistics on the most frequent times of travel.

In [5]:
def time_stats(df):

    print('\nCalculating The Most Frequent Times of Travel...\n')
    start_time = time.time()

    # display the most common month
    print(df['month'].mode()[0])
    
    # display the most common day of week
    print(df['day_of_week'].mode()[0])
    
    # display the most common start hour
    print(df['hour'].mode()[0])
    
    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)

>2. Displays statistics on the most popular stations and trip.

In [6]:
def station_stats(df):

    print('\nCalculating The Most Popular Stations and Trip...\n')
    start_time = time.time()

    # display most commonly used start station
    print(df['Start Station'].mode()[0])
    
    # display most commonly used end station
    print(df['End Station'].mode()[0])
    
    # display most frequent combination of start station and end station trip
    frequent_trip = df['Start Station'] + ' to ' + df['End Station']
    print(frequent_trip.mode()[0])
    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)

>3. Displays statistics on the total and average trip duration.

In [7]:
def trip_duration_stats(df):

    print('\nCalculating Trip Duration...\n')
    start_time = time.time()

    # display total travel time
    print(df['Trip Duration'].sum())
    
    # display mean travel time
    print(df['Trip Duration'].mean())
    
    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)

>4. Displays statistics on bikeshare users.

In [8]:
def user_stats(df,city):

    print('\nCalculating User Stats...\n')
    start_time = time.time()

    # display counts of user types
    print(df['User Type'].value_counts())
    
    # display counts of gender
    if city != 'washington':
        print(df['Gender'].value_counts())
    
        # display earliest, most recent, and most common year of birth
        print(df['Birth Year'].mode()[0])
        print(df['Birth Year'].max())
        print(df['Birth Year'].min())
    
    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)

> At the end, we will offer to the user if he would like to display more raw data or not?

In [9]:
def display_data(df):
    raw = input('Would you like to display raw data? Enter yes or no ')
    if raw.lower()== 'yes' :
        count = 0
        while True :
            print(df.iloc[count:count+5])
            count += 5
            ask = input('Would you like to display more raw data? Enter yes or no ')
            if ask.lower() != 'yes':
                break

> Now, let's collect all the Previous functions together.

> After that, we will offer to the user if he would like to restart?

In [10]:
def main():
    while True:
        city, month, day = get_filters()
        df = load_data(city, month, day)

        time_stats(df)
        station_stats(df)
        trip_duration_stats(df)
        user_stats(df,city)
        display_data(df)

        restart = input('\nWould you like to restart? Enter yes or no.\n')
        if restart.lower() != 'yes':
            break

> Finally, we will run the code and present the results in the terminal.

In [None]:
if __name__ == "__main__":
    main()

Hello! Let's explore some US bikeshare data!


<a id='conclusions'></a>
## Conclusion

>In this project, we used Python to explore data related to bike share systems for three major cities in the United States Chicago, New York City, and Washington. we wrote down a code to import the data and answer interesting questions about it by computing descriptive statistics. we wrote also a script that takes in raw input to create an interactive experience in the terminal to present these statistics.

>Through this project, we are able to answer the following questions about the bike share data:
>- What month occurs most often in the start time?
>- What day of the week (Monday, Tuesday, etc.) occurs most often in the start time?
>- What hour of the day occurs most often in the start time?
>- What is the total trip duration and average trip duration?
>- What is the most frequently used start station and most frequently used end station?
>- What is the most common trip (i.e., the combination of start station and end station that occurs the most often)?
>- What are the counts of each user type?
>- What are the counts of genders?