# Project 1: Explore US Bikeshare Data

## Project Overview

### Overview

In this project, you will make use of Python to explore data related to bike share systems for three major cities in the United States—Chicago, New York City, and Washington. You will write code to import the data and answer interesting questions about it by computing descriptive statistics. You will also write a script that takes in raw input to create an interactive experience in the terminal to present these statistics.

## Project Details

### Bike Share Data

Over the past decade, bicycle-sharing systems have been growing in number and popularity in cities across the world. Bicycle-sharing systems allow users to rent bicycles on a very short-term basis for a price. This allows people to borrow a bike from point A and return it at point B, though they can also return it to the same location if they'd like to just go for a ride. Regardless, each bike can serve several users per day.

Thanks to the rise in information technologies, it is easy for a user of the system to access a dock within the system to unlock or return bicycles. These technologies also provide a wealth of data that can be used to explore how these bike-sharing systems are used.

In this project, you will use data provided by Motivate, a bike share system provider for many major cities in the United States, to uncover bike share usage patterns. You will compare the system usage between three large cities: Chicago, New York City, and Washington, DC.

### The Datasets
Randomly selected data for the first six months of 2017 are provided for all three cities. All three of the data files contain the same core **six (6)** columns:

* Start Time (e.g., 2017-01-01 00:07:57)
* End Time (e.g., 2017-01-01 00:20:53)
* Trip Duration (in seconds - e.g., 776)
* Start Station (e.g., Broadway & Barry Ave)
* End Station (e.g., Sedgwick St & North Ave)
* User Type (Subscriber or Customer)

The Chicago and New York City files also have the following two columns:

* Gender
* Birth Year

The original files are much larger and messier, and you don't need to download them, but they can be accessed here if you'd like to see them (Chicago, New York City, Washington). These files had more columns and they differed in format in many cases. Some data wrangling has been performed to condense these files to the above core six columns to make your analysis and the evaluation of your Python skills more straightforward. In the Data Wrangling course that comes later in the Data Analyst Nanodegree program, students learn how to wrangle the dirtiest, messiest datasets, so don't worry, you won't miss out on learning this important skill!

### Statistics Computed
You will learn about bike share use in Chicago, New York City, and Washington by computing a variety of descriptive statistics. In this project, you'll write code to provide the following information:

**#1 Popular times of travel** (i.e., occurs most often in the start time)
* most common month
* most common day of week
* most common hour of day

**#2 Popular stations and trip**
* most common start station
* most common end station
* most common trip from start to end (i.e., most frequent combination of start station and end station)

**#3 Trip duration**
* total travel time
* average travel time

**#4 User info**
* counts of each user type
* counts of each gender (only available for NYC and Chicago)
* earliest, most recent, most common year of birth (only available for NYC and Chicago)

### The Files
To answer these questions using Python, you will need to write a Python script. To help guide your work in this project, a template with helper code and comments is provided in a **bikeshare.py** file, and you will do your scripting in there also. You will need the three city dataset files too:

* chicago.csv
* new_york_city.csv
* washington.csv

All four of these files are zipped up in the **Bikeshare** file in the resource tab in the sidebar on the left side of this page. You may download and open up that zip file to do your project work on your local machine.

Some versions of this project also include a Project Workspace page in the classroom where the bikeshare.py file and the city dataset files are all included, and you can do all your work with them there.

In [1]:
import time
import pandas as pd
import numpy as np

CITY_DATA = { 'chicago': 'chicago.csv',
              'new york city': 'new_york_city.csv',
              'washington': 'washington.csv' }

def get_filters():
    """
    Asks user to specify a city, month, and day to analyze.

    Returns:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    """
    print('Hello! Let\'s explore some US bikeshare data!')
     # get user input for city (chicago, new york city, washington). HINT: Use a while loop to handle invalid inputs
    cities= ['chicago', 'new york city', 'washington']
    
    while True:
       city = input('Please choose a City among chicago, new york city, washington\nInput Requirements: Enter the full name of city\n>').lower()
       if city in cities:
           break
       else:
           print ('PLEASE CHECK YOUR INPUT')
       
       
     # get user input for month (all, january, february, ... , june)
    months = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'agu', 'sep', 'oct', 'nov', 'dic', 'all']
    while True:
        month = input (' Select a month of the year among all: \n Input Requirements: Enter the first 3 letters of the month or "all" to choose every month\n>').lower()
        if month in months:
            break
        else: 
            print ('PLEASE CHECK YOUR INPUT')

    # get user input for day of week (all, monday, tuesday, ... sunday)
    days = ['mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun', 'all']
    while True:
        day = input ('Please, enter a day of the week from de following list: Mon, tue, wed, thu, fri, sat, sun, all \n> ').lower()
        if day in days:
             break
        else: 
            print ('PLEASE CHECK YOUR INPUT')
            
            
    print('-'*40)
    return city, month, day


def load_data(city, month, day):
    """
    Loads data for the specified city and filters by month and day if applicable.

    Args:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    Returns:
        df - Pandas DataFrame containing city data filtered by month and day
    """
    df = pd.read_csv(CITY_DATA[city])
    
       
    df['End Time'] = pd.to_datetime(df['End Time'], format='%Y-%m-%d %H:%M:%S')
    df['Start Time'] = pd.to_datetime(df['Start Time'], format='%Y-%m-%d %H:%M:%S')
    df['Month'] = df['Start Time'].dt.month_name().str.slice(stop=3).str.lower()
    df ['Day'] = df['Start Time'].dt.day_name().str.slice (stop=3).str.lower()
    df['Time'] = df['Start Time'].dt.time
       
    return df


def time_stats(df):
    """Displays statistics on the most frequent times of travel."""
    print('\nCalculating The Most Frequent Times of Travel...\n')
    start_time = time.time()

    # display the most common month
    most_common_month = df['Month'].mode()[0]
    print ('the most common month is:{}'.format(most_common_month))
    # display the most common day of week
    most_common_day = df['Day'].mode()[0]
    print ('the most common day is:{}'.format(most_common_day))
    # display the most common start hour
    most_common_starhr = df['Time'].mode()[0]
    print ('the most start hour is: '+ str(most_common_starhr))
    
    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def station_stats(df):
    """Displays statistics on the most popular stations and trip."""

    print('\nCalculating The Most Popular Stations and Trip...\n')
    start_time = time.time()

    # display most commonly used start station
    common_startst= df['Start Station'].mode()[0]
    print ('The most common start station is: '+ common_startst)
    
    # display most commonly used end station
    common_endst = df['End Station'].mode()[0]
    print ('The most common end station is:  '+ common_endst)

    # display most frequent combination of start station and end station trip
    df['sttoend'] = df['Start Station'].str.cat(df['End Station'], sep=' to ')
    comb = df['sttoend'].mode()[0]
    print ('the most frequent combination of start station and end station trip is: '+ comb)

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def trip_duration_stats(df):
    """Displays statistics on the total and average trip duration."""

    print('\nCalculating Trip Duration...\n')
    start_time = time.time()

    # display total travel time
    Total_duration = df ['Trip Duration'].sum()
    print ('The Total Travel Time is: '+ str(Total_duration))

    # display mean travel time
    Mean_travel = df['Trip Duration'].mean()
    print ('The average travel time is: '+ str (Mean_travel))

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def user_stats(df):
    """Displays statistics on bikeshare users."""

    print('\nCalculating User Stats...\n')
    start_time = time.time()
     
    try:
      # Display counts of user types   
       count_user_types = df.groupby(['User Type']).size()
       print ('The counts of user types are: ' + str (count_user_types))
      # Display counts of gender
       gender = df.groupby('Gender').size()
       print(f"\nThe count of user by gender is: "+ str (gender))
      # Display earliest, most recent, and most common year of birth
       earliest = int (df['Birth Year'].min())
       most_recent = int(df['Birth Year'].max())
       most_common_yb = int(df['Birth Year'].mode()[0])
       
       print(f"\nThe earliest year of birth: {earliest}\nThe most recent year of birth: {most_recent}\nThe most common year of birth: {most_common_yb}") 
    
    # Display earliest, most recent, and most common year of birth
       print("\nThis took %s seconds." % (time.time() - start_time))
       print('-'*40)

    except KeyError:
       pass
   
def display_raw_data(df):
    """
    Displays 5 rows of data from the csv file for the selected city.
    Args:
        param1 (df): The data frame you wish to work with.
    Returns:
        None.

    """
    index=0
    user_input=input("Would you like to display 5 rows of raw data?. Please enter 'yes' or 'no'.").lower()
    if user_input == ['yes','y','yep','yea']:
       user_input = True
    else:
        user_input = False

        print(df.iloc[index:index+5])
        index += 5
        user_input = input('would you like to display more 5 rows of raw data? ').lower()
        
def main():
    while True:
        city, month, day = get_filters()
        df = load_data(city, month, day)

        time_stats(df)
        station_stats(df)
        trip_duration_stats(df)
        user_stats(df)
        display_raw_data(df)

        restart = input('\nWould you like to restart? Enter yes or no.\n')
        if restart.lower() != 'yes':
            break


if __name__ == "__main__":
	main()

Hello! Let's explore some US bikeshare data!
Please choose a City among chicago, new york city, washington
Input Requirements: Enter the full name of city
>chicago
 Select a month of the year among all: 
 Input Requirements: Enter the first 3 letters of the month or "all" to choose every month
>feb
Please, enter a day of the week from de following list: Mon, tue, wed, thu, fri, sat, sun, all 
> tue
----------------------------------------

Calculating The Most Frequent Times of Travel...

the most common month is:jun
the most common day is:tue
the most start hour is: 16:49:13

This took 0.28983116149902344 seconds.
----------------------------------------

Calculating The Most Popular Stations and Trip...

The most common start station is: Streeter Dr & Grand Ave
The most common end station is:  Streeter Dr & Grand Ave
the most frequent combination of start station and end station trip is: Lake Shore Dr & Monroe St to Streeter Dr & Grand Ave

This took 0.609827995300293 seconds.
------