### Project overview 

- Import your data into a Pandas DataFrame.
- Merge your DataFrames.
- Create a bubble chart that showcases the average fare versus the total number of rides with bubble size based on the total number of drivers for each city type, including urban, suburban, and rural.
- Determine the mean, median, and mode for the following:

  1. The total number of rides for each city type.
  2. The average fares for each city type.
  3. The total number of drivers for each city type.

- Create box-and-whisker plots that visualize each of the following to determine if there are any outliers:

  1. The number of rides for each city type.
  2. The fares for each city type.
  3. The number of drivers for each city type.

- Create a pie chart that visualizes each of the following data for each city type:

  1. The percent of total fares.
  2. The percent of total rides.
  3. The percent of total drivers.

In [1]:
# Add Matplotlib inline magic command
%matplotlib inline
# Dependencies and Setup
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
# Files to load
city_data_to_load = "Resources/city_data.csv"
ride_data_to_load = "Resources/ride_data.csv"

In [3]:
# Read the city data file and store it in a pandas DataFrame.
city_data_df = pd.read_csv(city_data_to_load)
city_data_df.head(10)

Unnamed: 0,city,driver_count,type
0,Richardfort,38,Urban
1,Williamsstad,59,Urban
2,Port Angela,67,Urban
3,Rodneyfort,34,Urban
4,West Robert,39,Urban
5,West Anthony,70,Urban
6,West Angela,48,Urban
7,Martinezhaven,25,Urban
8,Karenberg,22,Urban
9,Barajasview,26,Urban


In [4]:
# Read the ride data file and store it in a pandas DataFrame.
ride_data_df = pd.read_csv(ride_data_to_load)
ride_data_df.head(10)

Unnamed: 0,city,date,fare,ride_id
0,Lake Jonathanshire,2019-01-14 10:14:22,13.83,5739410935873
1,South Michelleport,2019-03-04 18:24:09,30.24,2343912425577
2,Port Samanthamouth,2019-02-24 04:29:00,33.44,2005065760003
3,Rodneyfort,2019-02-10 23:22:03,23.44,5149245426178
4,South Jack,2019-03-06 04:28:35,34.58,3908451377344
5,South Latoya,2019-03-11 12:26:48,9.52,1994999424437
6,New Paulville,2019-02-27 11:17:56,43.25,793208410091
7,Simpsonburgh,2019-04-26 00:43:24,35.98,111953927754
8,South Karenland,2019-01-08 03:28:48,35.09,7995623208694
9,North Jasmine,2019-03-09 06:26:29,42.81,5327642267789


### Inspect the Data in the DataFrames for nulls and datatypes

In [5]:
# Get the columns and the rows that are not null.
# We'll use the df.count() method to find the names of our columns and the number of rows that are not null.
city_data_df.count()

city            120
driver_count    120
type            120
dtype: int64

In [6]:
# Get the columns and the rows that are not null.
city_data_df.isnull().sum()

city            0
driver_count    0
type            0
dtype: int64

In [7]:
# Get the data types of each column.
# Inspect to make sure they are the correct type - especially driver_count
city_data_df.dtypes

city            object
driver_count     int64
type            object
dtype: object

In [8]:
# Get the unique values of the type of city - they are stored in an array.
city_data_df["type"].unique()

array(['Urban', 'Suburban', 'Rural'], dtype=object)

In [9]:
# Get the number of data points from the Urban cities.
sum(city_data_df["type"]=="Urban")

66

In [10]:
# Get the number of data points from the Urban cities.
sum(city_data_df["type"]=="Rural")

18

In [11]:
# Get the number of data points from the Urban cities.
sum(city_data_df["type"]=="Suburban")

36

In [12]:
# Get the columns and the rows that are not null.
ride_data_df.count()

city       2375
date       2375
fare       2375
ride_id    2375
dtype: int64

In [13]:
# Get the columns and the rows that are not null.
ride_data_df.isnull().sum()

city       0
date       0
fare       0
ride_id    0
dtype: int64

In [14]:
# Get the data types of each column.
ride_data_df.dtypes

city        object
date        object
fare       float64
ride_id      int64
dtype: object

### Merge the dataframes

In [17]:
# We may have to merge the DataFrames using the how= parameter either left, right, inner, or outer 
# depending how we want to merge the DataFrames. The default is inner.
# 
# Looking at the columns in the two DataFrames, we can see that the column the DataFrames have in common is city. 
# Therefore, we will merge the two DataFrames on the city column,
# and then add the city_data_df to the end of the ride_data_df
#  DataFrame with the constraint how="left".

# In the pyber_data_df DataFrame, all the columns from the city_data_df 
# are the first four columns after the index. The driver_count and type columns from the ride_data_df are added at the end,

# Combine the data into a single dataset
pyber_data_df = pd.merge(ride_data_df, city_data_df, how="left", on=["city", "city"])

# Display the DataFrame
pyber_data_df.head(100)

Unnamed: 0,city,date,fare,ride_id,driver_count,type
0,Lake Jonathanshire,2019-01-14 10:14:22,13.83,5739410935873,5,Urban
1,South Michelleport,2019-03-04 18:24:09,30.24,2343912425577,72,Urban
2,Port Samanthamouth,2019-02-24 04:29:00,33.44,2005065760003,57,Urban
3,Rodneyfort,2019-02-10 23:22:03,23.44,5149245426178,34,Urban
4,South Jack,2019-03-06 04:28:35,34.58,3908451377344,46,Urban
...,...,...,...,...,...,...
95,Williamsstad,2019-03-29 21:43:04,35.58,4747780858464,59,Urban
96,North Jason,2019-03-08 20:06:21,25.21,3180078116241,6,Urban
97,Christopherfurt,2019-04-17 02:25:49,23.30,7819057680096,41,Urban
98,Liumouth,2019-01-11 17:15:17,36.78,1464517778842,69,Urban


Unnamed: 0,city,date,fare,ride_id,driver_count,type
0,Lake Jonathanshire,2019-01-14 10:14:22,13.83,5739410935873,5,Urban
1,Lake Jonathanshire,2019-04-07 20:51:11,31.25,4441251834598,5,Urban
2,Lake Jonathanshire,2019-03-09 23:45:55,19.89,2389495660448,5,Urban
3,Lake Jonathanshire,2019-04-07 18:09:21,24.28,7796805191168,5,Urban
4,Lake Jonathanshire,2019-01-02 14:14:50,13.89,424254840012,5,Urban
...,...,...,...,...,...,...
95,Rodneyfort,2019-04-12 23:29:11,27.81,5669400941983,34,Urban
96,Rodneyfort,2019-03-15 10:18:45,30.83,4175051895472,34,Urban
97,Rodneyfort,2019-05-05 03:19:39,22.24,5393286273394,34,Urban
98,Rodneyfort,2019-01-12 06:17:59,7.05,4239149293035,34,Urban
