# Pandas Essentials:  Loading and Grokking your Data 

The notebook exercises below provide practice in loading and grokking your data.  We focus on three months of data from the [Bay Area Bike Share](http://www.bayareabikeshare.com/open-data) program.

# Imports

In the cell below, import the pandas library.

# Loading Data

In the cell below, load the "data/babs_station_data.csv" into a variable called `stations_df`.  This represents all the stations within the Bay Area Bike Share program.

In the cell below, load the "data/babs_weather_april_thru_june_2016.csv" into a variable called `weather_df`.  This represents weather data for April through June, 2016.

In the cell below, load the "babs_trips_april_thru_june_2016.csv" into a variable called `trips_df`.  This represents all rides for April through June, 2016.

# Getting the shape of your data

In the cell below, write one line of code to determine the total number of stations in the Bay Area Bike Share program. [You should get 67].

67

In the cell below, write one line of code to determine the total number of columns within the `weather_df` data frame.  [You should get 24].

24

In the cell below, write one line of code to determine the total number of rides in the `trips_df` data frame.  [You should get 83537].

83537

## Peeking at your data

In the cell below, take a peek at the *first few* rows of the `stations_df` data frame.

Unnamed: 0,station_id,name,lat,long,dockcount,landmark,installation
0,2,San Jose Diridon Caltrain Station,37.329732,-121.901782,27,San Jose,8/6/2013
1,3,San Jose Civic Center,37.330698,-121.888979,15,San Jose,8/5/2013
2,4,Santa Clara at Almaden,37.333988,-121.894902,11,San Jose,8/6/2013
3,5,Adobe on Almaden,37.331415,-121.8932,19,San Jose,8/5/2013
4,6,San Pedro Square,37.336721,-121.894074,15,San Jose,8/7/2013


In the cell below, take a peek at the *first few* rows of the `weather_df` data frame.

Unnamed: 0,PDT,Max TemperatureF,Mean TemperatureF,Min TemperatureF,Max Dew PointF,MeanDew PointF,Min DewpointF,Max Humidity,Mean Humidity,Min Humidity,...,Mean VisibilityMiles,Min VisibilityMiles,Max Wind SpeedMPH,Mean Wind SpeedMPH,Max Gust SpeedMPH,PrecipitationIn,CloudCover,Events,WindDirDegrees,ZIP
0,2016-04-01,63,58,53,48,47,46,83,70,56,...,10.0,10.0,20,13,23.0,0,4,,264,94107
1,2016-04-02,69,60,51,50,47,46,89,70,51,...,10.0,10.0,22,8,24.0,0,6,,270,94107
2,2016-04-03,63,57,51,51,48,46,89,75,60,...,10.0,8.0,22,9,24.0,0,6,,268,94107
3,2016-04-04,68,61,54,52,50,48,86,68,49,...,9.0,2.0,16,6,20.0,0,4,,228,94107
4,2016-04-05,79,64,48,52,46,39,89,60,31,...,10.0,7.0,16,6,18.0,0,1,,277,94107


In the cell below, take a peek at the *last few* rows of the `trips_df` data frame.

Unnamed: 0,Trip ID,Duration,Start Date,Start Station,Start Terminal,End Date,End Station,End Terminal,Bike #,Subscriber Type,Zip Code
83532,1260323,429,2016-06-29 23:19:00,2nd at Townsend,61,2016-06-29 23:27:00,Townsend at 7th,65,292,Subscriber,94107
83533,1260324,368,2016-06-29 23:20:00,Howard at 2nd,63,2016-06-29 23:26:00,San Francisco Caltrain 2 (330 Townsend),69,468,Subscriber,94107
83534,1260325,267,2016-06-29 23:26:00,Clay at Battery,41,2016-06-29 23:30:00,Grant Avenue at Columbus Avenue,73,132,Subscriber,94133
83535,1260326,682,2016-06-29 23:40:00,San Francisco City Hall,58,2016-06-29 23:52:00,2nd at South Park,64,549,Subscriber,94107
83536,1260327,658,2016-06-29 23:41:00,San Francisco City Hall,58,2016-06-29 23:52:00,2nd at South Park,64,348,Subscriber,94102


## Understanding indexes, columns and data types

In the cell below, write one line of code to display summary information for the `stations_df` data frame.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 67 entries, 0 to 66
Data columns (total 7 columns):
station_id      67 non-null int64
name            67 non-null object
lat             67 non-null float64
long            67 non-null float64
dockcount       67 non-null int64
landmark        67 non-null object
installation    67 non-null object
dtypes: float64(2), int64(2), object(3)
memory usage: 3.7+ KB


In the cell below, output just the column names in the `trips_df` data frame.

Index([u'Trip ID', u'Duration', u'Start Date', u'Start Station',
       u'Start Terminal', u'End Date', u'End Station', u'End Terminal',
       u'Bike #', u'Subscriber Type', u'Zip Code'],
      dtype='object')

In the cell below, output just the data types of all columns in the `trips_df` data frame.

Trip ID             int64
Duration            int64
Start Date         object
Start Station      object
Start Terminal      int64
End Date           object
End Station        object
End Terminal        int64
Bike #              int64
Subscriber Type    object
Zip Code           object
dtype: object

## Generating descriptive statistics for your data

Using the `trips_df` data frame and the `describe()` method, determine the average trip duration.  [You should get 832.5 seconds or 13.875 minutes).

Unnamed: 0,Trip ID,Duration,Start Terminal,End Terminal,Bike #
count,83537.0,83537.0,83537.0,83537.0,83537.0
mean,1203567.0,832.598884,58.626154,58.625818,416.862169
std,33073.07,2406.203818,16.187076,16.206403,163.308049
min,1145294.0,60.0,2.0,2.0,9.0
25%,1175413.0,354.0,50.0,50.0,317.0
50%,1203965.0,520.0,62.0,61.0,427.0
75%,1232189.0,737.0,70.0,70.0,538.0
max,1260327.0,85900.0,89.0,89.0,878.0


Using the `trips_df` data frame and the `describe()` method, determine the total number of unique start stations and the most popular start station.  [You should get 72 unique start stations, with the most popular being San Francisco Caltrain (Townsend at 4th), freq=6437].

Unnamed: 0,Trip ID,Duration,Start Date,Start Station,Start Terminal,End Date,End Station,End Terminal,Bike #,Subscriber Type,Zip Code
count,83537.0,83537.0,83537,83537,83537.0,83537,83537,83537.0,83537.0,83537,83509.0
unique,,,42236,72,,42245,72,,,2,1744.0
top,,,2016-04-15 08:49:00,San Francisco Caltrain (Townsend at 4th),,2016-05-17 08:21:00,San Francisco Caltrain (Townsend at 4th),,,Subscriber,94107.0
freq,,,15,6437,,13,7205,,,74163,6707.0
mean,1203567.0,832.598884,,,58.626154,,,58.625818,416.862169,,
std,33073.07,2406.203818,,,16.187076,,,16.206403,163.308049,,
min,1145294.0,60.0,,,2.0,,,2.0,9.0,,
25%,1175413.0,354.0,,,50.0,,,50.0,317.0,,
50%,1203965.0,520.0,,,62.0,,,61.0,427.0,,
75%,1232189.0,737.0,,,70.0,,,70.0,538.0,,
