# PyBer Analysis
---
This project is an exploratory look at data collected from the ride sharing company PyBer. Below are the listed requiremnts for this analysis.

- [x] Import your data into a Pandas DataFrame.
- [ ] Merge your DataFrames.
- [ ] Create a bubble chart that showcases the average fare versus the total number of rides with bubble size based on the total number of drivers for each city type, including urban, suburban, and rural.
- [ ] Determine the mean, median, and mode for the following:
    - [ ] The total number of rides for each city type.
    - [ ] The average fares for each city type.
    - [ ] The total number of drivers for each city type.
- [ ] Create box-and-whisker plots that visualize each of the following to determine if there are any outliers:
    - [ ] The number of rides for each city type.
    - [ ] The fares for each city type.
    - [ ] The number of drivers for each city type.
- [ ] Create a pie chart that visualizes each of the following data for each city type:
    - [ ] The percent of total fares.
    - [ ] The percent of total rides.
    - [ ] The percent of total drivers.


First we need to import any libraries we need to use.

In [9]:
# import Matplotlib dependencies for graphing 
%matplotlib inline
import matplotlib.pyplot as plt

# import pandas for dataframe creation and management
import pandas as pd

#import numpy for array analysis
import numpy as np

# import os to ensure correct file name
import os

Then we can read in our files.

In [30]:
# declare the file paths
city_data_file = os.path.join("Resources", "city_data.csv")
ride_data_file = os.path.join("Resources", "ride_data.csv")

# read the CSVs into a dataframe
city_data_df = pd.read_csv(city_data_file)
ride_data_df = pd.read_csv(ride_data_file)

Unnamed: 0,city,date,fare,ride_id
0,Lake Jonathanshire,2019-01-14 10:14:22,13.83,5739410935873
1,South Michelleport,2019-03-04 18:24:09,30.24,2343912425577
2,Port Samanthamouth,2019-02-24 04:29:00,33.44,2005065760003
3,Rodneyfort,2019-02-10 23:22:03,23.44,5149245426178
4,South Jack,2019-03-06 04:28:35,34.58,3908451377344
...,...,...,...,...
2370,Michaelberg,2019-04-29 17:04:39,13.38,8550365057598
2371,Lake Latoyabury,2019-01-30 00:05:47,20.76,9018727594352
2372,North Jaime,2019-02-10 21:03:50,11.11,2781339863778
2373,West Heather,2019-05-07 19:22:15,44.94,4256853490277


The first thing to do is check the data to ensure that we have the correct data types, no null values, and make sure that the data is ready to be analyzed.

In [17]:
# look at the data types of each column
city_data_df.dtypes

city            object
driver_count     int64
type            object
dtype: object

In [18]:
# check the number of data points
city_data_df.count()

city            120
driver_count    120
type            120
dtype: int64

In [19]:
# make sure there are no missing values
city_data_df.isnull().sum()

city            0
driver_count    0
type            0
dtype: int64

In [20]:
# get the unique values of the type of city
city_data_df["type"].unique()

array(['Urban', 'Suburban', 'Rural'], dtype=object)

In [24]:
# count the number of each type
urban_count = sum(city_data_df["type"]=="Urban")
print(urban_count)

suburban_count = sum(city_data_df["type"]=="Suburban")
print(suburban_count)

rural_count = sum(city_data_df["type"]=="Rural")
print(rural_count)

66
36
18


Now we'll perform some of the same analysis for the ride data.

In [27]:
# check the data types
ride_data_df.dtypes

city        object
date        object
fare       float64
ride_id      int64
dtype: object

In [28]:
# check the number of rows
ride_data_df.count()

city       2375
date       2375
fare       2375
ride_id    2375
dtype: int64

In [29]:
# make sure all the rows have data
ride_data_df.isnull().sum()

city       0
date       0
fare       0
ride_id    0
dtype: int64

Now that we have made sure the data is workable we can merge the two datasets.

In [32]:
# Combine the data into a single dataset
pyber_data_df = pd.merge(ride_data_df, city_data_df, how="left", on=["city", "city"])

# Display the DataFrame
pyber_data_df.head()

Unnamed: 0,city,date,fare,ride_id,driver_count,type
0,Lake Jonathanshire,2019-01-14 10:14:22,13.83,5739410935873,5,Urban
1,South Michelleport,2019-03-04 18:24:09,30.24,2343912425577,72,Urban
2,Port Samanthamouth,2019-02-24 04:29:00,33.44,2005065760003,57,Urban
3,Rodneyfort,2019-02-10 23:22:03,23.44,5149245426178,34,Urban
4,South Jack,2019-03-06 04:28:35,34.58,3908451377344,46,Urban
