# Ebike Exploration

The Ford Go e-bikes in San Francisco have provided some of it's [usage data](https://www.fordgobike.com/system-data) in the form of `csv` (Comma Separated Values).

Let's explore using a popular Python data framework known as [`pandas`](https://pandas.pydata.org).

In [None]:
import os
import pandas as pd

Since csv is such a common format, we can read straight into a pandas data structure.

In [None]:
# Read in data for the month of November 2018
bikes = pd.read_csv('201811-fordgobike-tripdata.csv')

Our variable `bikes` now refers to a [`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/dsintro.html). For the purposes of this workshop, just imagine it like a spreadsheet, rows and columns.

We can see how many rows our `DataFrame` by using the standard `len` function.

In [None]:
len(bikes)

Let's take a look at the first couple of rows using the `DataFrame.head` method.

In [None]:
bikes.head()

We can quickly explore all the numerical data that is in the `DataFrame`.

In [None]:
bikes.describe()

Because the columns are valid names, you can access specific column names by using dot notation. Each specific column is represented using another data structure known as a [`Series`](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.Series.html).

In [None]:
bikes.start_station_name.head()

`Series` have a wonderful feature called [`value_counts`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html) that counts occurrences of values. 

In [None]:
starting_stations = bikes.start_station_name.value_counts()
starting_stations.head()

[`Matplotlib`](https://matplotlib.org/) is a great data visualization and plotting library. By default it plays well with `pandas` data structures.

In [None]:
starting_stations.head(5).plot.bar()

Chaining is very common.

In [None]:
bikes.member_gender.value_counts().plot.pie()

Vectorization is part of what makes `pandas` and it's underlying library `NumPy` so fast. Instead of loops it works on all the data at once.

In [None]:
2018 - bikes.member_birth_year

You can easily assign a new `Series` to a `DataFrame` and the labels (or keys) will line things up.

In [None]:
bikes['member_age'] = 2018 - bikes.member_birth_year

In [None]:
bikes.member_age.plot.hist()

## Learn More
* [Introduction to NumPy](https://teamtreehouse.com/library/introduction-to-numpy) course on Treehouse
* [Introduction to Pandas](https://teamtreehouse.com/library/introduction-to-pandas) course on Treehouse