# **Explore US bikeshare Data**
>In this project, you will make use of Python to explore data related to bike share systems for three major cities in the United States—Chicago, New York City, and Washington. You will write code to import the data and answer interesting questions about it by computing descriptive statistics. You will also write a script that takes in raw input to create an interactive experience in the terminal to present these statistics.

>The Datasets
Randomly selected data for the first six months of 2017 are provided for all three cities. All three of the data files contain the same six (6) columns:
1. Start Time (e.g., 2017-01-01 00:07:57)
2. End Time (e.g., 2017-01-01 00:20:53)
3. Trip Duration (in seconds - e.g., 776)
4. Start Station (e.g., Broadway & Barry Ave)
5. End Station (e.g., Sedgwick St & North Ave)
6. User Type (Subscriber or Customer)

> The Chicago and New York City files also have the following two  (extra) columns (not in Washington file) :
1. Gender
2. Birth Year



##Understanding Datasets

### 1. Chicago City Dataset

In [1]:
import pandas as pd
df_chicago = pd.read_csv('chicago.csv')
df_chicago

Unnamed: 0.1,Unnamed: 0,Start Time,End Time,Trip Duration,Start Station,End Station,User Type,Gender,Birth Year
0,1423854,2017-06-23 15:09:32,2017-06-23 15:14:53,321,Wood St & Hubbard St,Damen Ave & Chicago Ave,Subscriber,Male,1992.0
1,955915,2017-05-25 18:19:03,2017-05-25 18:45:53,1610,Theater on the Lake,Sheffield Ave & Waveland Ave,Subscriber,Female,1992.0
2,9031,2017-01-04 08:27:49,2017-01-04 08:34:45,416,May St & Taylor St,Wood St & Taylor St,Subscriber,Male,1981.0
3,304487,2017-03-06 13:49:38,2017-03-06 13:55:28,350,Christiana Ave & Lawrence Ave,St. Louis Ave & Balmoral Ave,Subscriber,Male,1986.0
4,45207,2017-01-17 14:53:07,2017-01-17 15:02:01,534,Clark St & Randolph St,Desplaines St & Jackson Blvd,Subscriber,Male,1975.0
...,...,...,...,...,...,...,...,...,...
299995,64825,2017-01-21 13:18:00,2017-01-21 13:27:50,590,Orleans St & Elm St (*),Sheffield Ave & Webster Ave,Subscriber,Male,1965.0
299996,695993,2017-04-28 19:32:19,2017-04-28 19:51:18,1139,Ashland Ave & Blackhawk St,Clark St & Elm St,Customer,,
299997,159685,2017-02-12 09:59:01,2017-02-12 10:21:49,1368,Ravenswood Ave & Lawrence Ave,Stockton Dr & Wrightwood Ave,Subscriber,Female,1988.0
299998,564681,2017-04-16 17:07:15,2017-04-16 17:19:00,705,Sheffield Ave & Willow St,Clark St & Chicago Ave,Customer,,


In [2]:
df_chicago.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 9 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Unnamed: 0     300000 non-null  int64  
 1   Start Time     300000 non-null  object 
 2   End Time       300000 non-null  object 
 3   Trip Duration  300000 non-null  int64  
 4   Start Station  300000 non-null  object 
 5   End Station    300000 non-null  object 
 6   User Type      300000 non-null  object 
 7   Gender         238948 non-null  object 
 8   Birth Year     238981 non-null  float64
dtypes: float64(1), int64(2), object(6)
memory usage: 20.6+ MB


In [3]:
df_chicago.sample(5)

Unnamed: 0.1,Unnamed: 0,Start Time,End Time,Trip Duration,Start Station,End Station,User Type,Gender,Birth Year
192104,618054,2017-04-21 16:33:42,2017-04-21 16:40:16,394,Kingsbury St & Kinzie St,Morgan St & Lake St,Subscriber,Female,1982.0
205739,666036,2017-04-25 18:43:45,2017-04-25 19:08:22,1477,Canal St & Adams St,Leavitt St & Armitage Ave,Subscriber,Male,1980.0
239104,102772,2017-01-30 07:20:53,2017-01-30 07:32:22,689,Artesian Ave & Hubbard St,Racine Ave (May St) & Fulton St,Subscriber,Male,1972.0
72826,987,2017-01-01 14:47:19,2017-01-01 14:56:45,566,Seeley Ave & Roscoe St,Southport Ave & Waveland Ave,Subscriber,Female,1972.0
270038,724594,2017-05-04 05:45:34,2017-05-04 05:50:15,281,Green St & Madison St,Wacker Dr & Washington St,Subscriber,Male,1992.0


In [4]:
df_chicago.describe()

Unnamed: 0.1,Unnamed: 0,Trip Duration,Birth Year
count,300000.0,300000.0,238981.0
mean,776345.8,936.23929,1980.858223
std,448146.4,1548.792767,11.003329
min,4.0,60.0,1899.0
25%,387136.8,393.0,1975.0
50%,777103.5,670.0,1984.0
75%,1164065.0,1125.0,1989.0
max,1551500.0,86224.0,2016.0


In [5]:
df_chicago.describe()['Trip Duration']

count    300000.000000
mean        936.239290
std        1548.792767
min          60.000000
25%         393.000000
50%         670.000000
75%        1125.000000
max       86224.000000
Name: Trip Duration, dtype: float64

In [6]:
df_chicago.isnull().sum()

Unnamed: 0           0
Start Time           0
End Time             0
Trip Duration        0
Start Station        0
End Station          0
User Type            0
Gender           61052
Birth Year       61019
dtype: int64

In [7]:
df_chicago['End Station'].value_counts()

Streeter Dr & Grand Ave         7512
Clinton St & Washington Blvd    4166
Lake Shore Dr & Monroe St       4016
Clinton St & Madison St         4014
Lake Shore Dr & North Blvd      3863
                                ... 
Woodlawn Ave & 75th St             1
Seeley Ave & Garfield Blvd         1
Cicero Ave & Quincy St             1
Perry Ave & 69th St                1
Stony Island Ave & 82nd St         1
Name: End Station, Length: 572, dtype: int64

In [8]:
df_chicago['End Station'].unique()

array(['Damen Ave & Chicago Ave', 'Sheffield Ave & Waveland Ave',
       'Wood St & Taylor St', 'St. Louis Ave & Balmoral Ave',
       'Desplaines St & Jackson Blvd', 'Canal St & Taylor St',
       'Wood St & Hubbard St', 'Larrabee St & Armitage Ave',
       'Halsted St & Blackhawk St (*)', 'Clinton St & Washington Blvd',
       'Wilton Ave & Belmont Ave', 'Clark St & Schiller St',
       'Ada St & Washington Blvd', 'Clark St & Elm St',
       'Racine Ave (May St) & Fulton St',
       'Sheffield Ave & Wrightwood Ave', 'Daley Center Plaza',
       'Marshfield Ave & Cortland St', 'Burnham Harbor',
       'Halsted St & Wrightwood Ave', 'Halsted St & Willow St',
       'Lake Shore Dr & Belmont Ave',
       'Orleans St & Chestnut St (NEXT Apts)',
       'Halsted St & Roosevelt Rd', 'Halsted St & Roscoe St',
       'Peoria St & Jackson Blvd', 'Clark St & Randolph St',
       'Mies van der Rohe Way & Chestnut St',
       'Financial Pl & Congress Pkwy', 'Adler Planetarium',
       'Dearborn Pk

In [9]:
df_chicago['Start Station'].value_counts()

Streeter Dr & Grand Ave         6911
Clinton St & Washington Blvd    4306
Lake Shore Dr & Monroe St       4289
Clinton St & Madison St         3744
Canal St & Adams St             3443
                                ... 
Lawndale Ave & 23rd St             1
Laramie Ave & Kinzie St            1
Halsted St & 51st St               1
Shields Ave & 43rd St              1
Racine Ave & 65th St               1
Name: Start Station, Length: 568, dtype: int64

In [10]:
df_chicago['Start Station'].value_counts()

Streeter Dr & Grand Ave         6911
Clinton St & Washington Blvd    4306
Lake Shore Dr & Monroe St       4289
Clinton St & Madison St         3744
Canal St & Adams St             3443
                                ... 
Lawndale Ave & 23rd St             1
Laramie Ave & Kinzie St            1
Halsted St & 51st St               1
Shields Ave & 43rd St              1
Racine Ave & 65th St               1
Name: Start Station, Length: 568, dtype: int64

In [11]:
df_chicago['Gender'].value_counts()

Male      181190
Female     57758
Name: Gender, dtype: int64

In [12]:
df_chicago['Gender'].unique()

array(['Male', 'Female', nan], dtype=object)

In [13]:
df_chicago['User Type'].value_counts()

Subscriber    238889
Customer       61110
Dependent          1
Name: User Type, dtype: int64

In [14]:
df_chicago['User Type'].unique()

array(['Subscriber', 'Customer', 'Dependent'], dtype=object)

In [15]:
df_chicago['Birth Year'].unique()

array([1992., 1981., 1986., 1975., 1990., 1983.,   nan, 1984., 1979.,
       1993., 1964., 1961., 1985., 1967., 1987., 1980., 1989., 1968.,
       1974., 1970., 1965., 1977., 1972., 1988., 1966., 1991., 1969.,
       1978., 1960., 1963., 1955., 1998., 1994., 1959., 1982., 1971.,
       1954., 1957., 1973., 1976., 1956., 1962., 1952., 1997., 1946.,
       1953., 1949., 1996., 1995., 1951., 1950., 1944., 1999., 1958.,
       1901., 1942., 1948., 1939., 2000., 1945., 1899., 1947., 1900.,
       2002., 1940., 1918., 1930., 1916., 1941., 2016., 1934., 2001.,
       2003., 1938., 1921., 1943., 1906., 1909., 2004., 1931.])

In [16]:
df_chicago.groupby('Gender').mean()

Unnamed: 0_level_0,Unnamed: 0,Trip Duration,Birth Year
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,785847.666799,781.515946,1982.441255
Male,728266.292753,673.21175,1980.353756


In [17]:
pd.pivot_table(df_chicago,index = 'Gender',columns=['User Type'], values = 'Trip Duration').T

Gender,Female,Male
User Type,Unnamed: 1_level_1,Unnamed: 2_level_1
Customer,1268.444444,757.060976
Dependent,,311.0
Subscriber,781.44006,673.175786


### 2. New York City Dataset

In [18]:
df_new_york_city = pd.read_csv('new_york_city.csv')
df_new_york_city

Unnamed: 0.1,Unnamed: 0,Start Time,End Time,Trip Duration,Start Station,End Station,User Type,Gender,Birth Year
0,5688089,2017-06-11 14:55:05,2017-06-11 15:08:21,795,Suffolk St & Stanton St,W Broadway & Spring St,Subscriber,Male,1998.0
1,4096714,2017-05-11 15:30:11,2017-05-11 15:41:43,692,Lexington Ave & E 63 St,1 Ave & E 78 St,Subscriber,Male,1981.0
2,2173887,2017-03-29 13:26:26,2017-03-29 13:48:31,1325,1 Pl & Clinton St,Henry St & Degraw St,Subscriber,Male,1987.0
3,3945638,2017-05-08 19:47:18,2017-05-08 19:59:01,703,Barrow St & Hudson St,W 20 St & 8 Ave,Subscriber,Female,1986.0
4,6208972,2017-06-21 07:49:16,2017-06-21 07:54:46,329,1 Ave & E 44 St,E 53 St & 3 Ave,Subscriber,Male,1992.0
...,...,...,...,...,...,...,...,...,...
299995,3273600,2017-04-24 17:51:12,2017-04-24 17:59:28,495,W 25 St & 6 Ave,W 38 St & 8 Ave,Subscriber,Male,1977.0
299996,3418509,2017-04-28 12:02:29,2017-04-28 12:19:04,994,W 27 St & 7 Ave,W 52 St & 5 Ave,Subscriber,Male,1967.0
299997,5034995,2017-05-31 09:11:10,2017-05-31 09:24:16,785,3 Ave & E 72 St,W 44 St & 5 Ave,Subscriber,Male,1972.0
299998,78227,2017-01-05 08:31:37,2017-01-05 08:51:01,1164,Columbia St & Kane St,Barclay St & Church St,Subscriber,Male,1964.0


In [19]:
df_new_york_city.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 9 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Unnamed: 0     300000 non-null  int64  
 1   Start Time     300000 non-null  object 
 2   End Time       300000 non-null  object 
 3   Trip Duration  300000 non-null  int64  
 4   Start Station  300000 non-null  object 
 5   End Station    300000 non-null  object 
 6   User Type      299308 non-null  object 
 7   Gender         270791 non-null  object 
 8   Birth Year     271780 non-null  float64
dtypes: float64(1), int64(2), object(6)
memory usage: 20.6+ MB


In [20]:
df_new_york_city.sample(5)

Unnamed: 0.1,Unnamed: 0,Start Time,End Time,Trip Duration,Start Station,End Station,User Type,Gender,Birth Year
229173,5509036,2017-06-08 15:26:57,2017-06-08 15:29:17,139,Fulton St & William St,Liberty St & Broadway,Subscriber,Male,1981.0
209250,1755570,2017-03-08 19:02:28,2017-03-08 19:09:30,421,E 33 St & 5 Ave,Pershing Square North,Subscriber,Male,1964.0
160267,4617988,2017-05-21 16:16:52,2017-05-21 16:44:00,1628,E 97 St & Madison Ave,Central Park West & W 76 St,Customer,,
179644,1806033,2017-03-09 21:21:31,2017-03-09 21:29:46,494,Rivington St & Chrystie St,Peck Slip & Front St,Subscriber,Male,1980.0
273958,4721099,2017-05-23 21:46:09,2017-05-23 22:19:06,1977,Broadway & W 53 St,W 100 St & Broadway,Customer,,


In [21]:
df_new_york_city.describe()

Unnamed: 0.1,Unnamed: 0,Trip Duration,Birth Year
count,300000.0,300000.0,271780.0
mean,3407026.0,899.6842,1978.254309
std,1965617.0,5710.016,11.848045
min,33.0,61.0,1885.0
25%,1707416.0,368.0,1970.0
50%,3405756.0,609.0,1981.0
75%,5108762.0,1054.0,1988.0
max,6816152.0,2155775.0,2001.0


In [22]:
df_new_york_city.describe()['Trip Duration']

count    3.000000e+05
mean     8.996842e+02
std      5.710016e+03
min      6.100000e+01
25%      3.680000e+02
50%      6.090000e+02
75%      1.054000e+03
max      2.155775e+06
Name: Trip Duration, dtype: float64

In [23]:
df_new_york_city.isnull().sum()

Unnamed: 0           0
Start Time           0
End Time             0
Trip Duration        0
Start Station        0
End Station          0
User Type          692
Gender           29209
Birth Year       28220
dtype: int64

In [24]:
df_new_york_city['End Station'].value_counts()

Pershing Square North    3077
Broadway & E 22 St       2343
E 17 St & Broadway       2316
W 21 St & 6 Ave          2036
West St & Chambers St    1968
                         ... 
Van Vorst Park              1
Paulus Hook                 1
6 Ave & Spring St           1
Gowanus Tech Station        1
NYCBS Depot BAL - DYR       1
Name: End Station, Length: 646, dtype: int64

In [25]:
df_new_york_city['End Station'].unique()

array(['W Broadway & Spring St', '1 Ave & E 78 St',
       'Henry St & Degraw St', 'W 20 St & 8 Ave', 'E 53 St & 3 Ave',
       'Bond St & Fulton St', 'Lafayette Ave & Fort Greene Pl',
       'Broadway & Battery Pl', 'Central Park S & 6 Ave',
       'E 25 St & 2 Ave', 'Little West St & 1 Pl',
       'Liberty St & Broadway', 'Columbus Ave & W 72 St',
       'E 47 St & Park Ave', 'Bushwick Ave & Powers St',
       'W 17 St & 8 Ave', 'Johnson St & Gold St', 'E 11 St & 2 Ave',
       'E 72 St & York Ave', 'W 45 St & 6 Ave', '1 Ave & E 68 St',
       'Washington Pl & Broadway', 'Suffolk St & Stanton St',
       'Plaza St West & Flatbush Ave', 'Mott St & Prince St',
       'Columbia St & Degraw St', 'W 38 St & 8 Ave', '9 Ave & W 45 St',
       'Grand St & Greene St', 'St Marks Pl & 2 Ave',
       'Broadway & E 14 St', 'Rivington St & Chrystie St',
       'Milton St & Franklin St', 'W 54 St & 9 Ave', 'E 17 St & Broadway',
       'Bayard St & Baxter St', 'N 8 St & Driggs Ave', '8 Ave & W 31 St

In [26]:
df_new_york_city['Start Station'].value_counts()

Pershing Square North    3069
E 17 St & Broadway       2089
Broadway & E 22 St       2082
W 21 St & 6 Ave          2019
West St & Chambers St    1968
                         ... 
NYCBS Depot BAL - DYR       2
Bressler                    2
NYCBS Depot - PIT           1
6 Ave & Spring St           1
Gowanus Tech Station        1
Name: Start Station, Length: 643, dtype: int64

In [27]:
df_new_york_city['Start Station'].value_counts()

Pershing Square North    3069
E 17 St & Broadway       2089
Broadway & E 22 St       2082
W 21 St & 6 Ave          2019
West St & Chambers St    1968
                         ... 
NYCBS Depot BAL - DYR       2
Bressler                    2
NYCBS Depot - PIT           1
6 Ave & Spring St           1
Gowanus Tech Station        1
Name: Start Station, Length: 643, dtype: int64

In [28]:
df_new_york_city['Gender'].value_counts()

Male      204008
Female     66783
Name: Gender, dtype: int64

In [29]:
df_new_york_city['Gender'].unique()

array(['Male', 'Female', nan], dtype=object)

In [30]:
df_new_york_city['User Type'].value_counts()

Subscriber    269149
Customer       30159
Name: User Type, dtype: int64

In [31]:
df_new_york_city['User Type'].unique()

array(['Subscriber', 'Customer', nan], dtype=object)

In [32]:
df_new_york_city['Birth Year'].unique()

array([1998., 1981., 1987., 1986., 1992., 1982., 1984.,   nan, 1955.,
       1971., 1993., 1983., 1972., 1997., 1979., 1988., 1978., 1965.,
       1975., 1960., 1951., 1995., 1974., 1968., 1985., 1976., 1990.,
       1954., 1994., 1973., 1980., 1966., 1956., 1963., 1989., 1977.,
       1991., 1942., 1996., 1999., 1961., 1948., 1957., 1962., 1959.,
       1967., 1964., 1969., 1953., 1958., 1946., 1970., 1952., 1950.,
       1947., 2000., 1900., 1941., 1945., 1949., 1939., 1932., 1940.,
       1944., 1901., 2001., 1934., 1943., 1930., 1885., 1927., 1935.,
       1910., 1886., 1917., 1936., 1938., 1923., 1899., 1926., 1893.,
       1937., 1931., 1933., 1912., 1918., 1895., 1928., 1907., 1921.,
       1913., 1888.])

In [33]:
df_new_york_city.groupby('Gender').mean()

Unnamed: 0_level_0,Unnamed: 0,Trip Duration,Birth Year
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,3473363.0,886.734543,1979.266001
Male,3314283.0,773.506745,1977.932325


In [34]:
pd.pivot_table(df_new_york_city,index = 'Gender',columns=['User Type'], values = 'Trip Duration').T

Gender,Female,Male
User Type,Unnamed: 1_level_1,Unnamed: 2_level_1
Customer,1723.018681,3317.624811
Subscriber,862.977073,738.272233


### 3. Washington City Dataset

In [35]:
df_washington = pd.read_csv('washington.csv')
df_washington

Unnamed: 0.1,Unnamed: 0,Start Time,End Time,Trip Duration,Start Station,End Station,User Type
0,1621326,2017-06-21 08:36:34,2017-06-21 08:44:43,489.066,14th & Belmont St NW,15th & K St NW,Subscriber
1,482740,2017-03-11 10:40:00,2017-03-11 10:46:00,402.549,Yuma St & Tenley Circle NW,Connecticut Ave & Yuma St NW,Subscriber
2,1330037,2017-05-30 01:02:59,2017-05-30 01:13:37,637.251,17th St & Massachusetts Ave NW,5th & K St NW,Subscriber
3,665458,2017-04-02 07:48:35,2017-04-02 08:19:03,1827.341,Constitution Ave & 2nd St NW/DOL,M St & Pennsylvania Ave NW,Customer
4,1481135,2017-06-10 08:36:28,2017-06-10 09:02:17,1549.427,Henry Bacon Dr & Lincoln Memorial Circle NW,Maine Ave & 7th St SW,Subscriber
...,...,...,...,...,...,...,...
299995,945535,2017-04-26 03:12:14,2017-04-26 03:41:19,1745.528,Lincoln Memorial,Jefferson Dr & 14th St SW,Customer
299996,1495781,2017-06-11 09:48:52,2017-06-11 10:22:31,2018.450,Key Blvd & N Quinn St,5th & K St NW,Subscriber
299997,12860,2017-01-04 14:33:00,2017-01-04 14:43:00,583.897,17th & K St NW / Farragut Square,7th & F St NW/Portrait Gallery,Subscriber
299998,977621,2017-04-28 07:17:47,2017-04-28 07:56:31,2324.170,Jefferson Dr & 14th St SW,Washington & Independence Ave SW/HHS,Customer


In [36]:
df_washington.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 7 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Unnamed: 0     300000 non-null  int64  
 1   Start Time     300000 non-null  object 
 2   End Time       300000 non-null  object 
 3   Trip Duration  300000 non-null  float64
 4   Start Station  300000 non-null  object 
 5   End Station    300000 non-null  object 
 6   User Type      300000 non-null  object 
dtypes: float64(1), int64(1), object(5)
memory usage: 16.0+ MB


In [37]:
df_washington.sample(5)

Unnamed: 0.1,Unnamed: 0,Start Time,End Time,Trip Duration,Start Station,End Station,User Type
55949,532451,2017-03-20 13:06:00,2017-03-20 13:20:00,803.363,Jefferson Memorial,Ohio Dr & West Basin Dr SW / MLK & FDR Memorials,Customer
154065,318987,2017-02-20 10:47:00,2017-02-20 11:15:00,1664.259,Lincoln Park / 13th & East Capitol St NE,Anacostia Ave & Benning Rd NE / River Terrace,Subscriber
123475,626192,2017-03-29 16:02:00,2017-03-29 16:08:00,352.345,4th & E St SW,USDA / 12th & Independence Ave SW,Subscriber
5997,440325,2017-03-06 17:05:00,2017-03-06 17:12:00,427.093,Market Square / King St & Royal St,Henry St & Pendleton St,Subscriber
61692,1571372,2017-06-17 08:23:27,2017-06-17 08:26:44,197.178,11th & H St NE,6th & H St NE,Subscriber


In [38]:
df_washington.describe()

Unnamed: 0.1,Unnamed: 0,Trip Duration
count,300000.0,300000.0
mean,875404.4,1237.28
std,505933.5,5461.997
min,7.0,60.01
25%,436393.8,410.623
50%,875063.5,706.5015
75%,1313148.0,1229.427
max,1751446.0,1235662.0


In [39]:
df_washington.describe()['Trip Duration']

count    3.000000e+05
mean     1.237280e+03
std      5.461997e+03
min      6.001000e+01
25%      4.106230e+02
50%      7.065015e+02
75%      1.229427e+03
max      1.235662e+06
Name: Trip Duration, dtype: float64

In [40]:
df_washington.isnull().sum()

Unnamed: 0       0
Start Time       0
End Time         0
Trip Duration    0
Start Station    0
End Station      0
User Type        0
dtype: int64

In [41]:
df_washington['End Station'].value_counts()

Columbus Circle / Union Station         6048
Jefferson Dr & 14th St SW               5218
Lincoln Memorial                        5036
Massachusetts Ave & Dupont Circle NW    4483
15th & P St NW                          3733
                                        ... 
Reston Pkwy & Spectrum Dr                  4
Broschart & Blackwell Rd                   3
Shady Grove Hospital                       2
Nebraska Ave/AU East Campus                2
Columbia Pike & S Taylor St                1
Name: End Station, Length: 479, dtype: int64

In [42]:
df_washington['End Station'].unique()

array(['15th & K St NW', 'Connecticut Ave & Yuma St NW', '5th & K St NW',
       'M St & Pennsylvania Ave NW', 'Maine Ave & 7th St SW',
       'Eastern Market Metro / Pennsylvania Ave & 7th St SE',
       '8th & H St NW', 'Potomac & Pennsylvania Ave SE', '15th & P St NW',
       'Lynn & 19th St North', '37th & O St NW / Georgetown University',
       '5th St & Massachusetts Ave NW', '1st & Rhode Island Ave NW',
       '1st & M St NE', 'Lincoln Memorial', 'North Capitol St & G Pl NE',
       'New York Ave & 15th St NW', '15th & East Capitol St NE',
       '6th & S Ball St', 'Lamont & Mt Pleasant NW',
       'Smithsonian-National Mall / Jefferson Dr & 12th St SW',
       'Congress Heights Metro', 'Oklahoma Ave & D St NE',
       'Calvert St & Woodley Pl NW', "L'Enfant Plaza / 7th & C St SW",
       'Arlington Blvd & Fillmore St', '11th & M St NW',
       'Massachusetts Ave & Dupont Circle NW',
       '2nd St & Massachusetts Ave NE', 'Georgetown Harbor / 30th St NW',
       '12th & Army N

In [43]:
df_washington['Start Station'].value_counts()

Columbus Circle / Union Station         5656
Lincoln Memorial                        5043
Jefferson Dr & 14th St SW               5022
Massachusetts Ave & Dupont Circle NW    3946
15th & P St NW                          3519
                                        ... 
Reston Pkwy & Spectrum Dr                  4
Shady Grove Hospital                       4
Columbia Pike & S Taylor St                3
Key West Ave & Great Seneca Hwy            3
Columbus Ave & Gramercy Blvd               2
Name: Start Station, Length: 479, dtype: int64

In [44]:
df_washington['Start Station'].value_counts()

Columbus Circle / Union Station         5656
Lincoln Memorial                        5043
Jefferson Dr & 14th St SW               5022
Massachusetts Ave & Dupont Circle NW    3946
15th & P St NW                          3519
                                        ... 
Reston Pkwy & Spectrum Dr                  4
Shady Grove Hospital                       4
Columbia Pike & S Taylor St                3
Key West Ave & Great Seneca Hwy            3
Columbus Ave & Gramercy Blvd               2
Name: Start Station, Length: 479, dtype: int64

In [45]:
df_washington['User Type'].value_counts()

Subscriber    220786
Customer       79214
Name: User Type, dtype: int64

In [46]:
df_washington['User Type'].unique()

array(['Subscriber', 'Customer'], dtype=object)

## Problem 1
> Compute the Most Popular Start Hour by using pandas to load chicago.csv into a dataframe 

In [47]:
import pandas as pd

filename = 'chicago.csv'

# load data file into a dataframe
df = pd.read_csv(filename)

# convert the Start Time column to datetime
df['Start Time'] = pd.to_datetime(df['Start Time'])

# extract hour from the Start Time column to create an hour column
df['hour'] = df['Start Time'].dt.hour

# find the most common hour (from 0 to 23)
popular_hour = df['hour'].value_counts().head(1).index[0]
    
print('Most Frequent Start Hour:', popular_hour)

Most Frequent Start Hour: 17


## Problem 2
>Display a Breakdown of User Types by using pandas to load chicago.csv into a dataframe 

In [48]:
import pandas as pd

filename = 'chicago.csv'

# load data file into a dataframe
df = pd.read_csv(filename)

# print value counts for each user type
user_types = df['User Type'].value_counts()

print(user_types)

Subscriber    238889
Customer       61110
Dependent          1
Name: User Type, dtype: int64


## Problem 3
>Load and Filter the Dataset and do following steps :
 1. Load the dataset for the specified city. Index the global CITY_DATA dictionary object to get the corresponding filename for the given city name.
 2. Create month and day_of_week columns. Convert the "Start Time" column to datetime and extract the month number and weekday name into separate columns using the datetime module.
 3. Filter by month. Since the month parameter is given as the name of the month, you'll need to first convert this to the corresponding month number. Then, select rows of the dataframe that have the specified month and reassign this as the new dataframe.
 4. Filter by day of week. Select rows of the dataframe that have the specified day of week and reassign this as the new dataframe. (Note: Capitalize the day parameter with the title() method to match the title case used in the day_of_week column!)


In [49]:
import pandas as pd

CITY_DATA = { 'chicago': 'chicago.csv',
              'new york city': 'new_york_city.csv',
              'washington': 'washington.csv' }

def load_data(city, month, day):
    """
    Loads data for the specified city and filters by month and day if applicable.

    Args:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    Returns:
        df - Pandas DataFrame containing city data filtered by month and day
    """

    # load data file into a dataframe
    df = pd.read_csv(CITY_DATA[city])

    # convert the Start Time column to datetime
    df['Start Time'] = pd.to_datetime(df['Start Time'])

    # extract month and day of week from Start Time to create new columns
    df['month'] = df['Start Time'].dt.month
    df['day_of_week'] = df['Start Time'].dt.weekday_name

    # filter by month if applicable
    if month != 'all':
        # use the index of the months list to get the corresponding int
        months = ['january', 'february', 'march', 'april', 'may', 'june']
        month = months.index(month) + 1

        # filter by month to create the new dataframe
        df = df[df['month'] == month]

    # filter by day of week if applicable
    if day != 'all':
        # filter by day of week to create the new dataframe
        df = df[df['day_of_week'] == day.title()]

    return df

## Full Project 
>The following code Contain all functions and also main function to check all created function 


In [50]:
import time
import pandas as pd
import numpy as np

CITY_DATA = { 'chicago': 'chicago.csv',
              'new york city': 'new_york_city.csv',
              'washington': 'washington.csv' }

def get_filters():
    """
    Asks user to specify a city, month, and day to analyze.

    Returns:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    """
    print('Hello! Let\'s explore some US bikeshare data!')
    # TO DO: get user input for city (chicago, new york city, washington). HINT: Use a while loop to handle invalid inputs
    while True :
        inputed_city =  input("Enter City from the following  cities (chicago, new york city, washington) : ").lower()
        if inputed_city in list(CITY_DATA.keys()) :
            city = inputed_city
            break
        elif inputed_city in ['chi','new','wash']    :
            city_index = ['chi','new','wash'].index(inputed_city)
            city = list(CITY_DATA.keys()) [city_index]
            print('Lazy User ^_^')
            break
        else :   
           print('Invalid Input , \t Lets Start Again ')
           continue
    # TO DO: get user input for month (all, january, february, ... , june)
    while True :
        inputed_month =  input("Enter Month from the following (all,january, february, march, april, may, june) : ").lower()
        months = ['all','january', 'february', 'march', 'april', 'may', 'june']
        if inputed_month in months :
            month = inputed_month
            break
        elif inputed_month in ['all','jan','feb','mar','apr','may','jun'] :
            month_index = ['all','jan','feb','mar','apr','may','jun'].index(inputed_month)
            month = months[month_index]
            print('Lazy User ^_^')
            break
        else :   
           print('Invalid Input , \t Lets Start Again ')
           continue

    # TO DO: get user input for day of week (all, monday, tuesday, ... sunday)
    while True :
        inputed_day = input("Enter  Month from the following (all, monday, tuesday, wednesday, thursday, friday, saturday, sunday) : ").lower()
        days = ['all', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']
        if inputed_day in days :
            day = inputed_day
            break
        elif inputed_day in ['all','mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun'] :
            day_index = ['all','mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun'].index(inputed_day)
            day = days[day_index]
            print('Lazy User ^_^')
            break
        else :   
           print('Invalid Input , \t Lets Start Again ')
           continue

    print('-'*40)
    return city, month, day

def load_data(city, month, day):
    """
    Loads data for the specified city and filters by month and day if applicable.

    Args:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    Returns:
        df - Pandas DataFrame containing city data filtered by month and day
    """

    # load data file into a dataframe
    df = pd.read_csv(CITY_DATA[city])

    # convert the Start Time column to datetime
    df['Start Time'] = pd.to_datetime(df['Start Time'])

    # extract month and day of week from Start Time to create new columns
    df['month'] = df['Start Time'].dt.month
    df['day_of_week'] = df['Start Time'].dt.day_name()

    # filter by month if applicable
    if month != 'all':
        # use the index of the months list to get the corresponding int
        months = ['january', 'february', 'march', 'april', 'may', 'june']
        month = months.index(month) + 1

        # filter by month to create the new dataframe
        df = df[df['month'] == month]

    # filter by day of week if applicable
    if day != 'all':
        # filter by day of week to create the new dataframe
        df = df[df['day_of_week'] == day.title()]

    return df

def time_stats(df):
    """Displays statistics on the most frequent times of travel."""

    print('\nCalculating The Most Frequent Times of Travel...\n')
    start_time = time.time()
   # convert the Start Time column to datetime
    df['Start Time'] = pd.to_datetime(df['Start Time'])
   # extract hour from the Start Time column to create an hour column
    df['hour'] = df['Start Time'].dt.hour
    df['month'] = df['Start Time'].dt.month
    df['day_of_week'] = df['Start Time'].dt.day_name()
    # TO DO: display the most common month
    popular_month = df['month'].mode()[0]
    print('Most Commen Month',popular_month)
    # TO DO: display the most common day of week
    popular_day_of_week = df['day_of_week'].mode()[0]    
    print('Most Commen Day in Week',popular_day_of_week)    
    # TO DO: display the most common start hour
    popular_hour = df['hour'].mode()[0]
    print('Most Commen Hours in Day',popular_hour)    
    
    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def station_stats(df):
    """Displays statistics on the most popular stations and trip."""

    print('\nCalculating The Most Popular Stations and Trip...\n')
    start_time = time.time()
    # TO DO: display most commonly used start station
    popular_start_station = df['Start Station'].mode()[0]
    print('Most Commen Start Station',popular_start_station)   
    # TO DO: display most commonly used end station
    popular_end_station = df['End Station'].mode()[0]
    print('Most Commen End Station',popular_end_station)        
    # TO DO: display most frequent combination of start station and end station trip
    print('Most Common Combination of Start Station and End Station:', (popular_start_station,popular_end_station))
    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def trip_duration_stats(df):
    """Displays statistics on the total and average trip duration."""

    print('\nCalculating Trip Duration...\n')
    start_time = time.time()

    # TO DO: display total travel time
    total_travel_time = df['Trip Duration'].sum()
    print('Total Travel Time',total_travel_time)
    # TO DO: display mean travel time
    mean_travel_time = df['Trip Duration'].mean()
    print('Mean Travel Time',mean_travel_time)
    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def user_stats(df,city):
    """Displays statistics on bikeshare users."""

    print('\nCalculating User Stats...\n')
    start_time = time.time()

    # TO DO: Display counts of user types
    user_types_count = df['User Type'].value_counts()
    print(' Count of User Types :\n',user_types_count)
    if city != 'washington' :
    # TO DO: Display counts of gender
       gender_types_count = df['Gender'].value_counts()     
       print(' Count of Gender :\n',gender_types_count)
    # TO DO: Display earliest, most recent, and most common year of birth
       birth_year_types_count = df['Birth Year'].value_counts()     
       print(' Count of Birth Year :\n',birth_year_types_count)

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def main():
    while True:
        city, month, day = get_filters()
        df = load_data(city, month, day)

        time_stats(df)
        station_stats(df)
        trip_duration_stats(df)
        user_stats(df,city)

        restart = input('\nWould you like to restart? Enter yes or no.\n')
        if restart.lower() != 'yes':
            break


if __name__ == "__main__":
    	main()


Hello! Let's explore some US bikeshare data!
Enter City from the following  cities (chicago, new york city, washington) : chicago
Enter Month from the following (all,january, february, march, april, may, june) : june
Enter  Month from the following (all, monday, tuesday, wednesday, thursday, friday, saturday, sunday) : friday
----------------------------------------

Calculating The Most Frequent Times of Travel...

Most Commen Month 6
Most Commen Day in Week Friday
Most Commen Hours in Day 17

This took 0.02492833137512207 seconds.
----------------------------------------

Calculating The Most Popular Stations and Trip...

Most Commen Start Station Streeter Dr & Grand Ave
Most Commen End Station Streeter Dr & Grand Ave
Most Common Combination of Start Station and End Station: ('Streeter Dr & Grand Ave', 'Streeter Dr & Grand Ave')

This took 0.0035741329193115234 seconds.
----------------------------------------

Calculating Trip Duration...

Total Travel Time 16904624
Mean Travel Time