In [None]:
%matplotlib notebook

# Bike Trippin

For this assignment, you will be taking "Cycle Share" data from Seattle and creating charts to determine which gender borrows and uses bikes more often.

* Import your dependencies and then import your data into a pandas data frame from the CSV within the 'Data' folder
* Check for null or NaN values and remove them
* Split up your data into groups based upon the gender column
    * NOTE: There will be a garbage row with a gender of 'stoptime' which you will have to remove!
* Chart your data using a bar graph, giving it both a title and labels for the axes

In [None]:
# Import Dependencies
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

In [None]:
# Import our data into pandas from CSV
bike_trip_data_path = 'Resources/trip.csv'
bike_trips_df = pd.read_csv(bike_trip_data_path, low_memory=False)

bike_trips_df.head()

In [None]:
# Get the last 5 rows 
bike_trips_df.tail()

In [None]:
# Check for null or NaNs.
bike_trips_df.info()

In [None]:
# Create a clean DataFrame after dropping the null values.
clean_bike_trips_df = bike_trips_df.dropna()
clean_bike_trips_df.head(10)

In [None]:
# Check for null values again.
clean_bike_trips_df.info()

In [None]:
# Split up our data into groups based upon 'gender'
gender_groups = clean_bike_trips_df.groupby('gender')

# Find out how many of each gender took bike trips
gender_trips = gender_groups['tripduration'].count()
gender_trips

In [None]:
# Drop the 'stoptime' row that is contained within our group
gender_trips = gender_trips.drop(gender_trips.index[3])

# Chart our data, give it a title, and label the axes
gender_chart = gender_trips.plot(kind="bar", title="Bike Trips by Gender")
gender_chart.set_xlabel("Gender")
gender_chart.set_ylabel("Number of Trips Taken")

plt.show()
plt.tight_layout()

# Bonus!

You will now take the same base data frame before and create some code that will allow you to create individual pie charts for each bike. For this part of the activity, we want you to chart the total 'Trip Duration' of each bike, sorted by gender. Bonus points if you can come up with a method to do this without using loc or iloc to filter the original data frame! You can use loc to filter group data though.

In [None]:
# Group our data based upon 'bikeid' and 'gender'
bike_groups = clean_bike_trips_df.groupby(['bikeid','gender'])

# Create a new variable that holds the sum of our groups
sum_it_up = bike_groups.sum()
sum_it_up.head(12)

In [None]:
# Make a variable called bike_id and store a 'bikeid' in it
bike_id = "SEA00001"

# Collect the trips of the 'bikeid' above
just_one_bike = sum_it_up.loc[bike_id]

just_one_bike

In [None]:
# Create a pie chart based upon the trip duration of that single bike
bike_pie = just_one_bike.plot(kind="pie", y='tripduration', title=("Trips of " + bike_id))
bike_pie.set_ylabel("Trip Duration")

plt.show()
plt.axis("equal")