##Pyber Data Analysis

From the Data Analysis we can see that: 

1. The majority of revenue comes from urban clientele with greater than 60%. However, the suburban customers still account for a significant portion of the revenue, slightly over 30% since they have on average higher fares. 

2. Urban cities have a majority of rides, and the urban rides have a lower average fare than suburban and rural cities. 

3. Even though rural and suburban fares make up almost 40% of total fares, rural and suburban drivers make up less than 20% of total drivers

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np 

In [None]:
city_data_to_load = "data/city_data.csv"
ride_data_to_load = "data/ride_data.csv"

In [None]:
citydata_csv = pd.read_csv(city_data_to_load)
ridedata_csv = pd.read_csv(ride_data_to_load)

In [None]:
merged_df = pd.merge(ridedata_csv, citydata_csv, on="city", how = "left")
merged_df.head()

In [None]:
rural_df = merged_df[merged_df["type"] == "Rural"].groupby([merged_df["city"]])
suburban_df = merged_df[merged_df["type"] == "Suburban"].groupby([merged_df["city"]])
urban_df = merged_df[merged_df["type"] == "Urban"].groupby([merged_df["city"]])

In [None]:
x_rural_rides = rural_df["ride_id"].count()
y_rural_fare_avg = rural_df["fare"].mean()
rural_drivers = rural_df["driver_count"].mean()

In [None]:
x_suburban_rides = suburban_df["ride_id"].count()
y_suburban_fare_avg = suburban_df["fare"].mean()
suburban_drivers = suburban_df["driver_count"].mean()

In [None]:
x_urban_rides = urban_df["ride_id"].count()
y_urban_fare_avg = urban_df["fare"].mean()
urban_drivers = urban_df["driver_count"].mean()

In [None]:
plt.scatter(x_urban_rides, y_urban_fare_avg, label= "Urban", s=urban_drivers * 10, facecolor=["coral"], edgecolor="black", marker="o", alpha= 0.75, linewidth=1.5)
plt.scatter(x_suburban_rides, y_suburban_fare_avg, label= "Suburban", s=suburban_drivers * 10, facecolor=["lightskyblue"], edgecolor="black", marker="o", alpha= 0.75, linewidth=1.5)
plt.scatter(x_rural_rides, y_rural_fare_avg, label= "Rural", s=rural_drivers * 10, facecolor=["gold"], edgecolor="black", marker="o", alpha= 0.75, linewidth=1.5)

plt.xlabel("Total Number of Rides (Per City)")
plt.ylabel("Average Fare ($)")
plt.title("Pyber Rider Sharing Data (2016)")

plt.grid()

legend = plt.legend(title="City Types")

legend.legendHandles[0]._sizes = [20]
legend.legendHandles[1]._sizes = [20]
legend.legendHandles[2]._sizes = [20]

plt.text(42, 35, "Note: \nCircle size correlates with driver count per city")

plt.savefig("Images/Ridesharing.png")


plt.show()




In [None]:
fares_city = merged_df.groupby(["type"])
total_farecity = fares_city["fare"].sum()

plt.title = ("% of Total Fares by City Type")
labels=["Rural", "Suburban", "Urban"]
explode = (0,0,0.1)
colors = ["gold", "lightskyblue", "lightcoral"]

plt.pie(total_farecity, explode=explode, labels=labels, colors=colors, autopct="%1.1f%%", shadow=True, startangle=160)

plt.axis("auto")

plt.savefig("Images/Ridesharepie.png")

plt.show()

In [None]:
total_farecity = fares_city["ride_id"].count()

plt.title = ("% of Total Rides by City Type")
labels=["Rural", "Suburban", "Urban"]
explode = (0,0,0.1)
colors = ["gold", "lightskyblue", "lightcoral"]

plt.pie(total_farecity, explode=explode, labels=labels, colors=colors, autopct="%1.1f%%", shadow=True, startangle=160)

plt.axis("auto")

plt.savefig("Images/Ridesharepie.png")

plt.show()


In [None]:
drivers_citytype = citydata_csv.groupby(["type"])
total_farecity = drivers_citytype["driver_count"].sum()

plt.title = ("% of Total Drivers by City Type")
labels=["Rural", "Suburban", "Urban"]
explode = (0,0,0.1)
colors = ["gold", "lightskyblue", "lightcoral"]

plt.pie(total_farecity, explode=explode, labels=labels, colors=colors, autopct="%1.1f%%", shadow=True, startangle=160)

plt.axis("auto")

plt.savefig("Images/Ridesharepie.png")

plt.show()


