## Unit 5 | Assignment - The Power of Plots

## Background

What good is data without a good plot to tell the story?

So, let's take what you've learned about Python Matplotlib and apply it to some real-world situations. For this assignment, you'll need to complete **1 of 2** Data Challenges. As always, it's your choice which you complete. _Perhaps_, choose the one most relevant to your future career.

## Option 1: Pyber

![Ride](Images/Ride.png)

The ride sharing bonanza continues! Seeing the success of notable players like Uber and Lyft, you've decided to join a fledgling ride sharing company of your own. In your latest capacity, you'll be acting as Chief Data Strategist for the company. In this role, you'll be expected to offer data-backed guidance on new opportunities for market differentiation.

You've since been given access to the company's complete recordset of rides. This contains information about every active driver and historic ride, including details like city, driver count, individual fares, and city type.

Your objective is to build a [Bubble Plot](https://en.wikipedia.org/wiki/Bubble_chart) that showcases the relationship between four key variables:

* Average Fare ($) Per City
* Total Number of Rides Per City
* Total Number of Drivers Per City
* City Type (Urban, Suburban, Rural)

In addition, you will be expected to produce the following three pie charts:

* % of Total Fares by City Type
* % of Total Rides by City Type
* % of Total Drivers by City Type

As final considerations:

* You must use the Pandas Library and the Jupyter Notebook.
* You must use the Matplotlib libraries.
* You must include a written description of three observable trends based on the data.
* You must use proper labeling of your plots, including aspects like: Plot Titles, Axes Labels, Legend Labels, Wedge Percentages, and Wedge Labels.
* Remember when making your plots to consider aesthetics!
  * You must stick to the Pyber color scheme (Gold, Light Sky Blue, and Light Coral) in producing your plot and pie charts.
  * When making your Bubble Plot, experiment with effects like `alpha`, `edgecolor`, and `linewidths`.
  * When making your Pie Chart, experiment with effects like `shadow`, `startangle`, and `explosion`.
* You must include an exported markdown version of your Notebook called  `README.md` in your GitHub repository.
* See [Example Solution](Pyber/Pyber_Example.pdf) for a reference on expected format.


In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import csv

#import csvs
ride_csv= "../raw_data/ride_data.csv"
city_csv= "../raw_data/city_data.csv"

#read csvs into df
ride_df = pd.read_csv(ride_csv)
city_df = pd.read_csv(city_csv)

#merge dataframes
data = pd.merge(ride_df,city_df, on='city', how='outer')

In [2]:
unique_cities = data["city"].unique()
cityid_df = pd.DataFrame(unique_cities).reset_index()
data_with_cityid = pd.merge(data,cityid_df,left_on='city', right_on=0, how='left')
data_with_cityid = data_with_cityid.rename(columns={0: "city1",'index': 'city id'})
data_with_cityid['city id'] = data_with_cityid['city id'] + 101
data_with_cityid

Unnamed: 0,city,date,fare,ride_id,driver_count,type,city id,city1
0,Sarabury,2016-01-16 13:49:27,38.35,5403689035038,46,Urban,101,Sarabury
1,Sarabury,2016-07-23 07:42:44,21.76,7546681945283,46,Urban,101,Sarabury
2,Sarabury,2016-04-02 04:32:25,38.03,4932495851866,46,Urban,101,Sarabury
3,Sarabury,2016-06-23 05:03:41,26.82,6711035373406,46,Urban,101,Sarabury
4,Sarabury,2016-09-30 12:48:34,30.30,6388737278232,46,Urban,101,Sarabury
5,Sarabury,2016-08-04 00:25:52,27.20,2429366407526,46,Urban,101,Sarabury
6,Sarabury,2016-07-25 10:44:01,17.73,4467299640441,46,Urban,101,Sarabury
7,Sarabury,2016-06-22 16:24:01,23.94,6153395712431,46,Urban,101,Sarabury
8,Sarabury,2016-01-27 17:46:45,16.39,8220809448298,46,Urban,101,Sarabury
9,Sarabury,2016-04-26 11:31:30,21.80,5969441875705,46,Urban,101,Sarabury


In [3]:
data_with_cityid = data_with_cityid.set_index('city id')
data_with_cityid

Unnamed: 0_level_0,city,date,fare,ride_id,driver_count,type,city1
city id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
101,Sarabury,2016-01-16 13:49:27,38.35,5403689035038,46,Urban,Sarabury
101,Sarabury,2016-07-23 07:42:44,21.76,7546681945283,46,Urban,Sarabury
101,Sarabury,2016-04-02 04:32:25,38.03,4932495851866,46,Urban,Sarabury
101,Sarabury,2016-06-23 05:03:41,26.82,6711035373406,46,Urban,Sarabury
101,Sarabury,2016-09-30 12:48:34,30.30,6388737278232,46,Urban,Sarabury
101,Sarabury,2016-08-04 00:25:52,27.20,2429366407526,46,Urban,Sarabury
101,Sarabury,2016-07-25 10:44:01,17.73,4467299640441,46,Urban,Sarabury
101,Sarabury,2016-06-22 16:24:01,23.94,6153395712431,46,Urban,Sarabury
101,Sarabury,2016-01-27 17:46:45,16.39,8220809448298,46,Urban,Sarabury
101,Sarabury,2016-04-26 11:31:30,21.80,5969441875705,46,Urban,Sarabury


In [5]:
# Create the GroupBy object based on the "city column
fare_by_city = data_with_cityid.groupby(["city"])

# Calculate averages for fares from each city using the .mean() method
fare_by_city.mean()

Unnamed: 0_level_0,fare,ride_id,driver_count
city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Alvarezhaven,23.928710,5.351586e+12,21.0
Alyssaberg,20.609615,3.536678e+12,67.0
Anitamouth,37.315556,4.195870e+12,16.0
Antoniomouth,23.625000,5.086800e+12,21.0
Aprilchester,21.981579,4.574788e+12,49.0
Arnoldview,25.106452,5.021952e+12,41.0
Campbellport,33.711333,5.805424e+12,26.0
Carrollbury,36.606000,4.274615e+12,4.0
Carrollfort,25.395517,4.759008e+12,55.0
Clarkstad,31.051667,6.682745e+12,21.0
