### Task 1: Visualization Human Mobility data from GeoDS Lab

In [44]:
import pandas as pd
import numpy as np
import wget, os, json
import geopandas as gpd
import matplotlib.pyplot as plt

Data loaded from SafeGraph GeoDS@UW-Madison Lab - <a href = "https://github.com/GeoDS/COVID19USFlows"> GitHub </a>

In [51]:
# checks the file and download if it not present in the path

file = "weekly_ct2ct_2021_07_05_0.csv"
if not os.path.exists(file):
    url = "https://raw.githubusercontent.com/GeoDS/COVID19USFlows-WeeklyFlows-Ct2021/master/weekly_flows/ct2ct/2021_07_05/weekly_ct2ct_2021_07_05_0.csv"
    wget.download(url)
else:
    print(f"file exists in path:{os.getcwd()}+\+{file}")

file exists in path:C:\Users\Sidrcs\Documents\Github\Geospatial_BigData_Analytics\BigdataVisualization+\+weekly_ct2ct_2021_07_05_0.csv


In [6]:
# read and display the downloaded csv
flow_df = pd.read_csv("weekly_ct2ct_2021_07_05_0.csv")
flow_df.head()

For visualization in <a href = "https://kepler.gl/demo"> kepler.gl </a>, preprocessing SafeGraph data. Also, finding unique and most frequent geoid origin through <code>value_counts()</code> method. For the given dataset, it is based in <b> Orange county, FL</b>. Therefore, the origin geoid is <code>12095017103</code>

In [11]:
flow_df.geoid_o.value_counts()

12095017103    2438
48157673101    2026
48439114103    1490
48339692001    1438
48157673200    1366
               ... 
1047957200       31
1097007700       30
23009966400      28
1087231700       14
2188000200       11
Name: geoid_o, Length: 2179, dtype: int64

In [12]:
flow_df.dtypes # determines data types present

geoid_o                             int64
geoid_d                             int64
lng_o                             float64
lat_o                             float64
lng_d                             float64
lat_d                             float64
number_devices_primary_daytime      int64
date_range                         object
visitor_flows                       int64
pop_flows                         float64
dtype: object

In [27]:
# type casting origin county and destination geoids to string
flow_df['geoid_o']=flow_df['geoid_o'].apply(lambda x: str(x).zfill(11))
flow_df['geoid_d']=flow_df['geoid_d'].apply(lambda x: str(x).zfill(11))

In [28]:
# to confirm conversion
flow_df.dtypes

geoid_o                            object
geoid_d                            object
lng_o                             float64
lat_o                             float64
lng_d                             float64
lat_d                             float64
number_devices_primary_daytime      int64
date_range                         object
visitor_flows                       int64
pop_flows                         float64
dtype: object

In [29]:
flow_df.head(5)

Unnamed: 0,geoid_o,geoid_d,lng_o,lat_o,lng_d,lat_d,number_devices_primary_daytime,date_range,visitor_flows,pop_flows
0,1001020100,1001020100,-86.490076,32.477185,-86.490076,32.477185,227,07/05/21 - 07/11/21,74,691.0
1,1001020100,1001020200,-86.490076,32.477185,-86.473375,32.474248,227,07/05/21 - 07/11/21,70,654.0
2,1001020100,1001020300,-86.490076,32.477185,-86.46019,32.475428,227,07/05/21 - 07/11/21,156,1458.0
3,1001020100,1001020400,-86.490076,32.477185,-86.443624,32.472001,227,07/05/21 - 07/11/21,91,850.0
4,1001020100,1001020500,-86.490076,32.477185,-86.422661,32.458833,227,07/05/21 - 07/11/21,425,3973.0


Slicing the data from one origin geoid to several destination geoids based most frequent origin geoid in the dataset. It is in Orange county, FL

In [42]:
# slicing flow_df based on geoid_o and geoid_d
one_flow_df = flow_df[(flow_df["geoid_o"].str.startswith("12095016902")) & flow_df["geoid_d"].str.startswith("12095")]

In [43]:
# exporting to csv for Kepler.gl visualization
one_flow_df.to_csv('Kepler_viz_data.csv')