# google-covid-19-mobility-data-process-world

In this notebook I will refine the process from `google-covid-19-mobility-data-process-v1` in order to get the highest resolution data possible for a map of the World.

In [1]:
import pandas as pd
import requests
import simplejson as json
import numpy as np

---

## Load a reduced CSV containing just the United States entries

In [3]:
worldDf = pd.read_csv("./output-data/world.csv")

In [4]:
worldDf.head()

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,metro_area,iso_3166_2_code,census_fips_code,place_id,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
0,AE,United Arab Emirates,Abu Dhabi,,,AE-AZ,,ChIJGczaTT5mXj4RBNmakTvGr4s,2020-02-15,1.0,6.0,-2.0,-1.0,2.0,1.0
1,AE,United Arab Emirates,Abu Dhabi,,,AE-AZ,,ChIJGczaTT5mXj4RBNmakTvGr4s,2020-02-16,-2.0,5.0,2.0,-2.0,2.0,1.0
2,AE,United Arab Emirates,Abu Dhabi,,,AE-AZ,,ChIJGczaTT5mXj4RBNmakTvGr4s,2020-02-17,-3.0,2.0,4.0,-3.0,2.0,1.0
3,AE,United Arab Emirates,Abu Dhabi,,,AE-AZ,,ChIJGczaTT5mXj4RBNmakTvGr4s,2020-02-18,-3.0,2.0,1.0,-2.0,2.0,1.0
4,AE,United Arab Emirates,Abu Dhabi,,,AE-AZ,,ChIJGczaTT5mXj4RBNmakTvGr4s,2020-02-19,-3.0,1.0,0.0,-1.0,2.0,1.0


---

**`sub_reigon_1` is regions**

In [5]:
worldDf["sub_region_1"].unique()

array(['Abu Dhabi', 'Ajman', 'Dubai', ..., 'Matabeleland North Province',
       'Matabeleland South Province', 'Midlands Province'], dtype=object)

---

## Get the lat long coordinates for each unique `place_id`

In [7]:
uniquePlaceIdsDf = worldDf[["place_id"]].drop_duplicates()

Access the google maps api to get coordinates for each `place_id`

In [9]:
with open('./secrets/googleapikey.txt', 'r') as f:
    key = f.read()

In [10]:
def get_lat_long(place_id):
    try:
        API_KEY = key.rstrip("\n")
        url = "https://maps.googleapis.com/maps/api/place/details/json?place_id=" + str(place_id) + "&key=" + str(API_KEY) + "&fields=geometry"
        
        response = (requests.get(url).text)
        response_json = json.loads(response)
        
        if "result" in response_json:
            result = response_json["result"]
            if "geometry" in result:
                geometry = result["geometry"]
                if "location" in geometry:
                    location = geometry["location"]
                    return location["lat"], location["lng"]
                else:
                    return None, None
            else:
                return None, None
        else:
            return None, None
    
    except Exception as e:
        raise e

In [11]:
uniquePlaceIdsDf.loc[:, "lat"], uniquePlaceIdsDf.loc[:, "lng"] = zip(*uniquePlaceIdsDf['place_id'].map(get_lat_long))

In [12]:
uniquePlaceIdsDf.head()

Unnamed: 0,place_id,lat,lng
0,ChIJGczaTT5mXj4RBNmakTvGr4s,23.4677,53.7369
462,ChIJHwyp6rZXXz4RerixWbtcrRE,25.4052,55.5136
924,ChIJRcbZaklDXz4R6SkAK7_QznQ,24.9822,55.4029
1386,ChIJX7kokD0y9D4RvDyz2xuxwaY,25.4111,56.2482
1848,ChIJpwnSTA5x9j4RD-KEpgxnnrk,25.6741,55.9804


I'll save these coordinates as a CSV file for later use.

In [13]:
uniquePlaceIdsDf.to_csv("./output-data/world-sub-region-1-lat-lng.csv", index=False)

In [14]:
len(uniquePlaceIdsDf)

1871

---

## Merge the coordinates with the original `sub_region_2` data frame

In [15]:
worldMergeDf = pd.merge(worldDf, uniquePlaceIdsDf, on='place_id', how='outer')

In [16]:
worldMergeDf.head()

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,metro_area,iso_3166_2_code,census_fips_code,place_id,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline,lat,lng
0,AE,United Arab Emirates,Abu Dhabi,,,AE-AZ,,ChIJGczaTT5mXj4RBNmakTvGr4s,2020-02-15,1.0,6.0,-2.0,-1.0,2.0,1.0,23.4677,53.7369
1,AE,United Arab Emirates,Abu Dhabi,,,AE-AZ,,ChIJGczaTT5mXj4RBNmakTvGr4s,2020-02-16,-2.0,5.0,2.0,-2.0,2.0,1.0,23.4677,53.7369
2,AE,United Arab Emirates,Abu Dhabi,,,AE-AZ,,ChIJGczaTT5mXj4RBNmakTvGr4s,2020-02-17,-3.0,2.0,4.0,-3.0,2.0,1.0,23.4677,53.7369
3,AE,United Arab Emirates,Abu Dhabi,,,AE-AZ,,ChIJGczaTT5mXj4RBNmakTvGr4s,2020-02-18,-3.0,2.0,1.0,-2.0,2.0,1.0,23.4677,53.7369
4,AE,United Arab Emirates,Abu Dhabi,,,AE-AZ,,ChIJGczaTT5mXj4RBNmakTvGr4s,2020-02-19,-3.0,1.0,0.0,-1.0,2.0,1.0,23.4677,53.7369


---

## Calculate 7 day rolling averages for each location

In [17]:
def add_rolling_average(df):
    df.loc[:, "retail-average"] = df["retail_and_recreation_percent_change_from_baseline"].rolling(window=7, center=True).mean()
    df.loc[:, "grocery-average"] = df["grocery_and_pharmacy_percent_change_from_baseline"].rolling(window=7, center=True).mean()
    df.loc[:, "parks-average"] = df["parks_percent_change_from_baseline"].rolling(window=7, center=True).mean()
    df.loc[:, "transit-average"] = df["transit_stations_percent_change_from_baseline"].rolling(window=7, center=True).mean()
    df.loc[:, "workplace-average"] = df["workplaces_percent_change_from_baseline"].rolling(window=7, center=True).mean()
    df.loc[:, "residential-average"] = df["residential_percent_change_from_baseline"].rolling(window=7, center=True).mean()
    
    return df

In [21]:
worldAverageDf = worldMergeDf.groupby("place_id").apply(add_rolling_average)

In [22]:
worldAverageDf.head()

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,metro_area,iso_3166_2_code,census_fips_code,place_id,date,retail_and_recreation_percent_change_from_baseline,...,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline,lat,lng,retail-average,grocery-average,parks-average,transit-average,workplace-average,residential-average
0,AE,United Arab Emirates,Abu Dhabi,,,AE-AZ,,ChIJGczaTT5mXj4RBNmakTvGr4s,2020-02-15,1.0,...,2.0,1.0,23.4677,53.7369,,,,,,
1,AE,United Arab Emirates,Abu Dhabi,,,AE-AZ,,ChIJGczaTT5mXj4RBNmakTvGr4s,2020-02-16,-2.0,...,2.0,1.0,23.4677,53.7369,,,,,,
2,AE,United Arab Emirates,Abu Dhabi,,,AE-AZ,,ChIJGczaTT5mXj4RBNmakTvGr4s,2020-02-17,-3.0,...,2.0,1.0,23.4677,53.7369,,,,,,
3,AE,United Arab Emirates,Abu Dhabi,,,AE-AZ,,ChIJGczaTT5mXj4RBNmakTvGr4s,2020-02-18,-3.0,...,2.0,1.0,23.4677,53.7369,-1.714286,3.285714,2.285714,-2.428571,1.428571,1.0
4,AE,United Arab Emirates,Abu Dhabi,,,AE-AZ,,ChIJGczaTT5mXj4RBNmakTvGr4s,2020-02-19,-3.0,...,2.0,1.0,23.4677,53.7369,-2.0,3.0,2.857143,-2.571429,1.571429,1.0


---

## Round the average figures to 1 decimal place for a smaller final file size

In [23]:
worldRoundedDf = worldAverageDf.round({
    'retail-average': 1,
    'grocery-average': 1,
    'parks-average': 1,
    'transit-average': 1,
    'workplace-average': 1,
    'residential-average': 1
})

---

## Remove any NaN `place_id`s

In [25]:
worldNotNanDf = worldRoundedDf[worldRoundedDf["place_id"].notna(
)]

---

## Convert the data into a python dictionary so it can be exported as json

In [26]:
def create_list_for_json(df):
    outputList = []
    listOfPlaceIds = df["place_id"].drop_duplicates().to_list()
    groupByPlaceId = df.groupby("place_id")
    
    for place_id in listOfPlaceIds:
        thisDf = groupByPlaceId.get_group(place_id)
        parksList = thisDf["parks-average"].to_list()
        
        # Some of the parks columns contain all NaNs, we'll skip these
        if np.isnan(parksList).all():
            continue
        
        myDict = {}
        myDict['lng'] = thisDf.iloc[0]["lng"]
        myDict["lat"] =  thisDf.iloc[0]["lat"]

        # parks_percent_change_from_baseline
        myDict["parks"] = parksList

        outputList.append(myDict)
        
    return outputList

In [28]:
worldList = create_list_for_json(worldNotNanDf)

In [29]:
len(worldList)

1345

Also get a list of the dates for reference. Taking the dates from the first `place_id`.

In [30]:
dateList = worldNotNanDf[worldNotNanDf["place_id"] == "ChIJGczaTT5mXj4RBNmakTvGr4s"]["date"].to_list()

Add the data and the dates to a new dataframe for export

In [32]:
exportDf = {}

In [33]:
exportDf["data"] = worldList

In [34]:
exportDf["dates"] = dateList

In [35]:
with open("./public/data/world-parks.json", "w") as outfile: 
    json.dump(exportDf, outfile, ignore_nan=True)