# Intro

In this project, you will make use of Python to explore [Divvy Data](https://divvy-tripdata.s3.amazonaws.com/index.html) of a bike share systems for Chicago, New York City, and Washington. 


- You will write code to import the data and answer questions by computing descriptive statistics. 

- You will also write a script that takes in raw input to create an interactive experience in the terminal to present these statistics.

# Requirements

### Files to submit

All you need to submit for this project is two files:

- `bikeshare.py`: Your code
- `readme.txt`: If you refer to other websites, books, and other resources to help you in solving tasks in the project, make sure that you document them in this file

There is no need for you to include any data files with your submission.

### Rubic Guidelines

Check here for [Rubic](https://review.udacity.com/#!/rubrics/1379/view)

# Practice Problems

In [1]:
import numpy as np
import pandas as pd

In [2]:
filename = 'chicago.csv'

# load data file into a dataframe
df = pd.read_csv(filename)
print(df.info())
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 9 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Unnamed: 0     300000 non-null  int64  
 1   Start Time     300000 non-null  object 
 2   End Time       300000 non-null  object 
 3   Trip Duration  300000 non-null  int64  
 4   Start Station  300000 non-null  object 
 5   End Station    300000 non-null  object 
 6   User Type      300000 non-null  object 
 7   Gender         238948 non-null  object 
 8   Birth Year     238981 non-null  float64
dtypes: float64(1), int64(2), object(6)
memory usage: 20.6+ MB
None


Unnamed: 0.1,Unnamed: 0,Start Time,End Time,Trip Duration,Start Station,End Station,User Type,Gender,Birth Year
0,1423854,2017-06-23 15:09:32,2017-06-23 15:14:53,321,Wood St & Hubbard St,Damen Ave & Chicago Ave,Subscriber,Male,1992.0
1,955915,2017-05-25 18:19:03,2017-05-25 18:45:53,1610,Theater on the Lake,Sheffield Ave & Waveland Ave,Subscriber,Female,1992.0
2,9031,2017-01-04 08:27:49,2017-01-04 08:34:45,416,May St & Taylor St,Wood St & Taylor St,Subscriber,Male,1981.0
3,304487,2017-03-06 13:49:38,2017-03-06 13:55:28,350,Christiana Ave & Lawrence Ave,St. Louis Ave & Balmoral Ave,Subscriber,Male,1986.0
4,45207,2017-01-17 14:53:07,2017-01-17 15:02:01,534,Clark St & Randolph St,Desplaines St & Jackson Blvd,Subscriber,Male,1975.0


## 1. Compute the Most Popular Start Hour

In [3]:
# convert the Start Time column to datetime
df['Start Time'] = pd.to_datetime(df['Start Time'])
print(df['Start Time'].head(),'\n')

# extract hour from the Start Time column to create an hour column
#df['hour'] = df['Start Time'].hour
df['hour'] = df['Start Time'].dt.hour
print(df['hour'].head(),'\n')

# find the most common hour (from 0 to 23)
popular_hour = df['hour'].mode()[0]
print('Most Frequent Start Hour:', popular_hour)

0   2017-06-23 15:09:32
1   2017-05-25 18:19:03
2   2017-01-04 08:27:49
3   2017-03-06 13:49:38
4   2017-01-17 14:53:07
Name: Start Time, dtype: datetime64[ns] 

0    15
1    18
2     8
3    13
4    14
Name: hour, dtype: int64 

Most Frequent Start Hour: 17


In [4]:
df['Start Time'].dt.dayofweek

0         4
1         3
2         2
3         0
4         1
         ..
299995    5
299996    4
299997    6
299998    6
299999    1
Name: Start Time, Length: 300000, dtype: int64

## 2. Display a Breakdown of User Types

In [5]:
# print value counts for each user type
user_types = df['User Type'].value_counts()

print(user_types)

Subscriber    238889
Customer       61110
Dependent          1
Name: User Type, dtype: int64


In [6]:
df['Start Time'].dt.day_name()

0            Friday
1          Thursday
2         Wednesday
3            Monday
4           Tuesday
            ...    
299995     Saturday
299996       Friday
299997       Sunday
299998       Sunday
299999      Tuesday
Name: Start Time, Length: 300000, dtype: object

In [7]:
df['month'] = df['Start Time'].dt.month
df['day_of_week'] = df['Start Time'].dt.day_name()
df[(df['month']==6) & (df['day_of_week']=='Friday')]

Unnamed: 0.1,Unnamed: 0,Start Time,End Time,Trip Duration,Start Station,End Station,User Type,Gender,Birth Year,hour,month,day_of_week
0,1423854,2017-06-23 15:09:32,2017-06-23 15:14:53,321,Wood St & Hubbard St,Damen Ave & Chicago Ave,Subscriber,Male,1992.0,15,6,Friday
21,1420915,2017-06-23 12:21:01,2017-06-23 12:32:54,713,Rush St & Cedar St,Halsted St & Willow St,Subscriber,Male,1985.0,12,6,Friday
27,1186035,2017-06-09 07:48:55,2017-06-09 07:52:30,215,Canal St & Madison St,Clark St & Randolph St,Subscriber,Male,1974.0,7,6,Friday
31,1539334,2017-06-30 10:56:50,2017-06-30 11:40:20,2610,McCormick Place,Adler Planetarium,Customer,,,10,6,Friday
33,1187843,2017-06-09 09:08:19,2017-06-09 09:15:09,410,Halsted St & Diversey Pkwy,Sheffield Ave & Waveland Ave,Subscriber,Male,1984.0,9,6,Friday
...,...,...,...,...,...,...,...,...,...,...,...,...
299834,1067394,2017-06-02 11:03:39,2017-06-02 11:17:51,852,Hermitage Ave & Polk St,900 W Harrison St,Subscriber,Male,1971.0,11,6,Friday
299835,1535028,2017-06-30 05:51:35,2017-06-30 06:06:44,909,Clinton St & Lake St,Wolcott Ave & Polk St,Subscriber,Male,1985.0,5,6,Friday
299882,1199218,2017-06-09 18:32:26,2017-06-09 18:36:54,268,Clinton St & Madison St,Clinton St & Lake St,Subscriber,Male,1991.0,18,6,Friday
299906,1418526,2017-06-23 08:56:40,2017-06-23 09:00:35,235,Loomis St & Lexington St,Wolcott Ave & Polk St,Subscriber,Female,1992.0,8,6,Friday


## 3. Load and Filter the Dataset

In [8]:
CITY_DATA = { 'chicago': 'chicago.csv',
              'new york city': 'new_york_city.csv',
              'washington': 'washington.csv' }

def load_data(city, month, day):
    """
    Loads data for the specified city and filters by month and day if applicable.

    Args:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    Returns:
        df - pandas DataFrame containing city data filtered by month and day
    """
    
    # load data file into a dataframe
    df = pd.read_csv(CITY_DATA[city])

    # convert the Start Time column to datetime
    df['Start Time'] = pd.to_datetime(df['Start Time'])

    # extract month and day of week from Start Time to create new columns
    df['month'] = df['Start Time'].dt.month
    df['day_of_week'] = df['Start Time'].dt.day_name()


    # filter by month if applicable
    if month != 'all':
        # use the index of the months list to get the corresponding int
        months = ['january', 'february', 'march', 'april', 'may', 'june']
        month = months.index(month)+1
    
        # filter by month to create the new dataframe
        df = df[df['month']==month]

    # filter by day of week if applicable
    if day != 'all':
        # filter by day of week to create the new dataframe
        df = df[df['day_of_week']==day.title()]
    
    return df
    
df = load_data('chicago', 'march', 'friday') 
df

Unnamed: 0.1,Unnamed: 0,Start Time,End Time,Trip Duration,Start Station,End Station,User Type,Gender,Birth Year,month,day_of_week
37,395803,2017-03-24 15:35:55,2017-03-24 15:46:10,615,Dearborn St & Erie St,State St & Van Buren St,Subscriber,Male,1989.0,3,Friday
93,395735,2017-03-24 15:32:04,2017-03-24 15:52:53,1249,Sedgwick St & Webster Ave,Western Ave & Winnebago Ave,Subscriber,Female,1964.0,3,Friday
175,395402,2017-03-24 15:10:29,2017-03-24 15:19:44,555,Franklin St & Monroe St,Aberdeen St & Monroe St,Subscriber,Male,1987.0,3,Friday
190,393400,2017-03-24 12:29:30,2017-03-24 12:48:56,1166,Southport Ave & Wellington Ave,Lake Shore Dr & North Blvd,Subscriber,Female,1984.0,3,Friday
198,427496,2017-03-31 08:25:53,2017-03-31 08:39:09,796,Clinton St & Jackson Blvd,Racine Ave (May St) & Fulton St,Subscriber,Male,1983.0,3,Friday
...,...,...,...,...,...,...,...,...,...,...,...
299816,333246,2017-03-10 17:40:53,2017-03-10 17:44:59,246,Wells St & Walton St,Rush St & Cedar St,Subscriber,Female,1992.0,3,Friday
299839,392682,2017-03-24 11:17:50,2017-03-24 11:51:44,2034,Lake Shore Dr & Monroe St,Streeter Dr & Grand Ave,Customer,,,3,Friday
299860,290125,2017-03-03 12:19:29,2017-03-03 12:32:58,809,Aberdeen St & Monroe St,Clark St & 9th St (AMLI),Subscriber,Male,1975.0,3,Friday
299865,288513,2017-03-03 07:26:48,2017-03-03 07:31:22,274,Damen Ave & Melrose Ave,Lincoln Ave & Roscoe St,Subscriber,Female,1981.0,3,Friday


In [9]:
df['User Type'].value_counts()

Subscriber    5243
Customer       570
Name: User Type, dtype: int64

In [10]:
df_2 = pd.read_csv('washington.csv')
df_3 = pd.read_csv('new_york_city.csv')
print(df_2.info())
print(df_3.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 7 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Unnamed: 0     300000 non-null  int64  
 1   Start Time     300000 non-null  object 
 2   End Time       300000 non-null  object 
 3   Trip Duration  300000 non-null  float64
 4   Start Station  300000 non-null  object 
 5   End Station    300000 non-null  object 
 6   User Type      300000 non-null  object 
dtypes: float64(1), int64(1), object(5)
memory usage: 16.0+ MB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 9 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Unnamed: 0     300000 non-null  int64  
 1   Start Time     300000 non-null  object 
 2   End Time       300000 non-null  object 
 3   Trip Duration  300000 non-null  int64  
 4   Start Station  300000 non-null  

In [11]:
df_3[pd.to_datetime(df_3['Start Time']).dt.month==3]['Birth Year'].mode()

0    1986.0
dtype: float64

In [12]:
df_3[pd.to_datetime(df_3['Start Time']).dt.month==3]['Birth Year'].min()

1885.0

In [13]:
df_3[pd.to_datetime(df_3['Start Time']).dt.month==3]['Birth Year'].max()

2001.0

In [15]:
df['month'].isin([2,3])

37        True
93        True
175       True
190       True
198       True
          ... 
299816    True
299839    True
299860    True
299865    True
299898    True
Name: month, Length: 5813, dtype: bool

In [18]:
df['day_of_week'].str.lower()

37        friday
93        friday
175       friday
190       friday
198       friday
           ...  
299816    friday
299839    friday
299860    friday
299865    friday
299898    friday
Name: day_of_week, Length: 5813, dtype: object

# Further Study

##  Building CLI app 


- [Compare Arguparse, Click and Dotpot for building CLI apps with Python](https://realpython.com/comparing-python-command-line-parsing-libraries-argparse-docopt-click/)


- [Compare the popular libraries for building CLI apps with Python](https://codeburst.io/building-beautiful-command-line-interfaces-with-python-26c7e1bb54df)


- [Packaging a CLI app using Python and shell scripts](https://medium.com/@trstringer/the-easy-and-nice-way-to-do-cli-apps-in-python-5d9964dc950d)


- [Building CLI app with click using Python](https://www.youtube.com/watch?v=kNke39OZ2k0):
    - how to add more interactivity
    - define each part of the function as command, argument and options (flags)
    - write help informaiton for commands, arguments and optionn


- [Readme template for apps](https://github.com/dbader/readme-template)


# Packaging and Setup

## File organization

```Shell



```


## `setup.py` with `setuptools`

```Python
from setuptools import setup

setup(name='appname', 
      version='1.0',
      packages = 'myappfoler'
      scripts = '['scriptname']
```

## exit code with `sys.exit(exit_code)`




- Commands (hello, goodbye)
- Arguments (name)
- Options/Flags (--greeting=<str>, --caps)

    
## Additional features:

- Version Printing (-v/--version)
- Automated Help Messages
- Error Handling