# Bubble Map Visualization

## Setup

- **The Dataset**: let's analyze - from a graphical point of view - this [Kaggle Dataset](https://www.kaggle.com/datafiniti/fast-food-restaurants) about fast food restaurants.

- **A new library**: during this exercise, we will learn how to manipulate `Plotly`, a very powerful graphics library! Let's install it:  https://pypi.org/project/plotly-express/

Once it's done, let's import the libraries we need: `numpy`, `pandas`, and `plotly`


In [None]:
import numpy as np
import pandas as pd
import plotly.express as px

## Bubble Map 101

The goal of this exercise is to plot a bubble map of the `FastFoodRestaurants.csv`. 
Your challenge is to manage to get:

<br>
<img src="https://wagon-public-datasets.s3.amazonaws.com/data-science-images/02-Data-Toolkit/03-Data-Visualization/020307-fastfood-restaurants-us.png" height="100%" width="100%">
Beautiful, isn't it? 🙂

### Specs:

- A bubble should represent a `city`,
- A bubble's size should depend on the number of fastfood restaurants in the city,
- The graph should be zoomed on the U.S. map,
- When hovering on a bubble you should get the name of `city`,
- The graph should have a relevant title

*Here is the documentation of bubble maps 👉 https://plot.ly/python/bubble-maps/ enjoy!*

❗ The dataset has the latitude & longitude of each restaurant, but we need the latitude & longitude of each city to plot our bubble map. To cope with this missing data, let's compute the latitude & longitude of each city on our own: assuming that the latitude & longitude of a city can be computed by taking the median of the latitude & longitude of all fastfood restaurants recorded in the city. 


------------------------------------------------------

❓ Import the `FastFoodRestaurants.csv` dataset

In [None]:
file = 'https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/FastFoodRestaurants.csv'

In [None]:
restaurants_df = pd.read_csv(file)
restaurants_df.head()

Unnamed: 0,address,city,country,keys,latitude,longitude,name,postalCode,province,websites
0,324 Main St,Massena,US,us/ny/massena/324mainst/-1161002137,44.9213,-74.89021,McDonald's,13662,NY,"http://mcdonalds.com,http://www.mcdonalds.com/..."
1,530 Clinton Ave,Washington Court House,US,us/oh/washingtoncourthouse/530clintonave/-7914...,39.53255,-83.44526,Wendy's,43160,OH,http://www.wendys.com
2,408 Market Square Dr,Maysville,US,us/ky/maysville/408marketsquaredr/1051460804,38.62736,-83.79141,Frisch's Big Boy,41056,KY,"http://www.frischs.com,https://www.frischs.com..."
3,6098 State Highway 37,Massena,US,us/ny/massena/6098statehighway37/-1161002137,44.95008,-74.84553,McDonald's,13662,NY,"http://mcdonalds.com,http://www.mcdonalds.com/..."
4,139 Columbus Rd,Athens,US,us/oh/athens/139columbusrd/990890980,39.35155,-82.09728,OMG! Rotisserie,45701,OH,"http://www.omgrotisserie.com,http://omgrotisse..."


❓ Explore the DataFrame - check the shape, missing values, data types, etc. (❗ make sure `latitude` and `longitude` columns are floats)

In [None]:
print("Shape of the dataframe:", restaurants_df.shape)
print("Columns and their data types 👇")
restaurants_df.dtypes

Shape of the dataframe: (10000, 10)
Columns and their data types 👇


address        object
city           object
country        object
keys           object
latitude      float64
longitude     float64
name           object
postalCode     object
province       object
websites       object
dtype: object

In [None]:
print("Missing values per column 👇")
restaurants_df.isnull().sum()

Missing values per column 👇


address         0
city            0
country         0
keys            0
latitude        0
longitude       0
name            0
postalCode      0
province        0
websites      465
dtype: int64

❓ Group the data by city and create an aggregated DataFrame with:
    * `count` of restaurants per city
    * `median` of latitude and longitude

In [None]:
by_city_df = restaurants_df.groupby('city')\
    .agg({'city': 'count', 'latitude': 'median', 'longitude': 'median'})\
    .rename({'city': 'restaurant count'}, axis=1)\
    .reset_index()\
    .sort_values('restaurant count', ascending=False)

❓ Plot the data as a bubble map with `Plotly` 🗺️

In [None]:
fig = px.scatter_geo(by_city_df, lat='latitude', lon='longitude',
                     hover_name="city", size="restaurant count",
                     scope="usa", title='Fastfood Restaurants Across the U.S.')
fig.show()

## Bubble Map with animation

How about plotting the evolution of fastfood restaurants in the US?

The dataset `Datafiniti_Fast_Food_Restaurants.csv` has a `dateAdded` column. Let's assume this Dataset is from UberEats database, and we want to see the evolution of fastfood restaurants available on UberEats app!
We want to have this: 
<br>
<img src="https://wagon-public-datasets.s3.amazonaws.com/data-science-images/02-Data-Toolkit/03-Data-Visualization/020307-animated-bubble-map.png" height="100%" width="100%">

👉 The slider should give the real time evolution 🙂 - each bubble referring to a new restaurant opened that year.

Let's plot this! 💪

❓ Load the `Datafiniti_Fast_Food_Restaurants.csv` dataset

In [None]:
file = "https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/Datafiniti_Fast_Food_Restaurants.csv"

In [None]:
restaurant_df = pd.read_csv(file)
restaurant_df.head()

Unnamed: 0,id,dateAdded,dateUpdated,address,categories,city,country,keys,latitude,longitude,name,postalCode,province,sourceURLs,websites
0,AVwcmSyZIN2L1WUfmxyw,2015-10-19T23:47:58Z,2018-06-26T03:00:14Z,800 N Canal Blvd,American Restaurant and Fast Food Restaurant,Thibodaux,US,us/la/thibodaux/800ncanalblvd/1780593795,29.814697,-90.814742,SONIC Drive In,70301,LA,https://foursquare.com/v/sonic-drive-in/4b7361...,https://locations.sonicdrivein.com/la/thibodau...
1,AVwcmSyZIN2L1WUfmxyw,2015-10-19T23:47:58Z,2018-06-26T03:00:14Z,800 N Canal Blvd,Fast Food Restaurants,Thibodaux,US,us/la/thibodaux/800ncanalblvd/1780593795,29.814697,-90.814742,SONIC Drive In,70301,LA,https://foursquare.com/v/sonic-drive-in/4b7361...,https://locations.sonicdrivein.com/la/thibodau...
2,AVwcopQoByjofQCxgfVa,2016-03-29T05:06:36Z,2018-06-26T02:59:52Z,206 Wears Valley Rd,Fast Food Restaurant,Pigeon Forge,US,us/tn/pigeonforge/206wearsvalleyrd/-864103396,35.803788,-83.580553,Taco Bell,37863,TN,https://www.yellowpages.com/pigeon-forge-tn/mi...,"http://www.tacobell.com,https://locations.taco..."
3,AVweXN5RByjofQCxxilK,2017-01-03T07:46:11Z,2018-06-26T02:59:51Z,3652 Parkway,Fast Food,Pigeon Forge,US,us/tn/pigeonforge/3652parkway/93075755,35.782339,-83.551408,Arby's,37863,TN,http://www.yellowbook.com/profile/arbys_163389...,"http://www.arbys.com,https://locations.arbys.c..."
4,AWQ6MUvo3-Khe5l_j3SG,2018-06-26T02:59:43Z,2018-06-26T02:59:43Z,2118 Mt Zion Parkway,Fast Food Restaurant,Morrow,US,us/ga/morrow/2118mtzionparkway/1305117222,33.562738,-84.321143,Steak 'n Shake,30260,GA,https://foursquare.com/v/steak-n-shake/4bcf77a...,http://www.steaknshake.com/locations/23851-ste...


❓ Format the `dateAdded` column to just include the year as an integer

In [None]:
restaurant_df['dateAdded'] = pd.to_datetime(restaurant_df['dateAdded']).dt.year

❓ Plot the animated bubble map 🍿 Check `Plotly` [documentation](https://plot.ly/python/bubble-maps/) for help

Here are some specs for the animated map:

* The years on the timeline need to be in order
* Each bubble should display the `name` of the restaurant on hover
* The map should be zoomed in on the U.S.
* The graph should have a relevant title

In [None]:
fig = px.scatter_geo(restaurant_df.sort_values(by=['dateAdded']),
                     lat="latitude", lon="longitude",
                     hover_name="name",
                     animation_frame="dateAdded",
                     title="Evolution Of Fastfood Restaurants Across the U.S.",
                     scope="usa")
fig.show()

# 🏁 Congrats on completing the challenge!

Remember to `git add`, `git commit` and `git push` your code 💪