# COVID-19 and Weather Patterns

## Imports

- *os* for interfacing with the operating system
- *pathlib* for interfacing with the file system
- *zipfile* for managing archive files

- *numpy* for array processing
- *pandas* for tabular processing
- *tensorflow* for tensor processing
- *keras* for simplified tensor processing

- *matplotlib* for visualization
- *seaborn* for enhanced visualization

In [5]:
# Custom
import data_processing

# File System
import os
from pathlib import Path
from zipfile import ZipFile

# Processing
import numpy
import pandas
import tensorflow
from tensorflow import keras

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

tensorflow.__version__

'2.7.0'

## Introduction

The goal of this work is to determine whether or not weather patterns should be considered as supporting input data when making predictions about new daily COVID-19 cases within a given geographical space. Using census, weather, and COVID-19 datasets provided by the Urban Sustain project, the authors attempt to quantify the correlation between particular weather patterns and COVID-19 transmission events.

## Defining Terms

***Urban Sustain Project*** - A joint effort between researchers at Colorado State University, Arizona State University, the University of California-Irvine, and the University of Maryland – Baltimore County.

## Loading Data

**DEVELOPER NOTE:** Download the five required datasets from Urban Sustain and place them in the cloned repository at ```./data/```. These datasets are also available at a shared OneDrive folder. This logic expects that these files exist at relative path ```../data/``` with respect to this notebook.

We'll begin by defining a path to our data directory and a list of the datasets that we expect to find there.

In [None]:
dataPath = '../data/'
expectedDatasets = [
    'covid_county.Colorado.zip',
    'neon_2d_wind.Colorado.zip',
    'neon_barometric_pressure.Colorado.zip',
    'neon_single_asp_air_temperature.Colorado.zip',
    'svi_county_GISJOIN.Colorado.zip'
]

Next, we will attempt to extract each of these archived datasets into a subdirectory within the data directory.

In [3]:
for datasetName in expectedDatasets:
    try:
        with ZipFile(dataPath + datasetName, 'r') as currentzip:
            datasetNameTokens = datasetName.split('.')
            datasetNameTokens.remove('zip')
            targetDirectory = dataPath + '.'.join(datasetNameTokens)
            if not os.path.exists(targetDirectory):
                Path(targetDirectory).mkdir()
            currentzip.extractall(targetDirectory)
    except FileNotFoundError:
        print("Unable to open " + datasetName + " at path " + dataPath + datasetName)

## File Information


## Preprocessing
Create Dataframes:

In [4]:
expected_directories = {
    'covid_county': os.path.join(dataPath, 'covid_county.Colorado'),
    'neon_2d_wind': os.path.join(dataPath, 'neon_2d_wind.Colorado'),
    'neon_single_asp_air_temperature': os.path.join(dataPath, 'neon_single_asp_air_temperature.Colorado'),
    'neon_barometric_pressure': os.path.join(dataPath, 'neon_barometric_pressure.Colorado'),
    'svi_county_GISJOIN': os.path.join(dataPath, 'svi_county_GISJOIN.Colorado')
}
covid_df = data_processing.load_flattened_datasets(
    os.path.join(expected_directories['covid_county'], 'data.json'),
    os.path.join(expected_directories['covid_county'], 'fieldLabels.json'),
    os.path.join(expected_directories['covid_county'], 'linkedGeometry.json'),
    join_on_key='GISJOIN')
wind_df = data_processing.load_flattened_datasets(
    os.path.join(expected_directories['neon_2d_wind'], 'data.json'),
    os.path.join(expected_directories['neon_2d_wind'], 'fieldLabels.json'),
    os.path.join(expected_directories['neon_2d_wind'], 'linkedGeometry.json'),
    join_on_key='site')
air_temp_df = data_processing.load_flattened_datasets(
    os.path.join(expected_directories['neon_single_asp_air_temperature'], 'data.json'),
    os.path.join(expected_directories['neon_single_asp_air_temperature'], 'fieldLabels.json'),
    os.path.join(expected_directories['neon_single_asp_air_temperature'], 'linkedGeometry.json'),
    join_on_key='site')
air_presssure_df = data_processing.load_flattened_datasets(
    os.path.join(expected_directories['neon_barometric_pressure'], 'data.json'),
    os.path.join(expected_directories['neon_barometric_pressure'], 'fieldLabels.json'),
    os.path.join(expected_directories['neon_barometric_pressure'], 'linkedGeometry.json'),
    join_on_key='site')
county_df = data_processing.load_flattened_datasets(
    os.path.join(expected_directories['svi_county_GISJOIN'], 'data.json'),
    os.path.join(expected_directories['svi_county_GISJOIN'], 'fieldLabels.json'),
    os.path.join(expected_directories['svi_county_GISJOIN'], 'linkedGeometry.json'),
    join_on_key='GISJOIN')

print(covid_df.info)
print(wind_df.info)
print(air_temp_df.info)
print(air_presssure_df.info)
print(county_df.info)



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39936 entries, 0 to 39935
Data columns (total 10 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   GISJOIN                 39936 non-null  object
 1   dateString              39936 non-null  object
 2   county                  39936 non-null  object
 3   state                   39936 non-null  object
 4   totalCaseCount          39936 non-null  int64 
 5   newCaseCount            39936 non-null  int64 
 6   totalDeathCount         39936 non-null  int64 
 7   newDeathCount           39936 non-null  int64 
 8   _id.$oid                39936 non-null  object
 9   epoch_time.$numberLong  39936 non-null  object
dtypes: int64(4), object(6)
memory usage: 3.0+ MB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1391744 entries, 0 to 1391743
Data columns (total 19 columns):
 #   Column                  Non-Null Count    Dtype  
---  ------                  -------------- 

<bound method DataFrame.info of                                    startDateTime           endDateTime  \
site                                                                     
ARIK_DP1.00001.001_200.000  2016-03-06T03:00:00Z  2016-03-06T03:30:00Z   
ARIK_DP1.00001.001_200.000  2016-03-05T18:30:00Z  2016-03-05T19:00:00Z   
ARIK_DP1.00001.001_200.000  2016-03-05T23:30:00Z  2016-03-06T00:00:00Z   
ARIK_DP1.00001.001_200.000  2016-03-06T05:30:00Z  2016-03-06T06:00:00Z   
ARIK_DP1.00001.001_200.000  2016-03-06T16:00:00Z  2016-03-06T16:30:00Z   
...                                          ...                   ...   
STER_DP1.00001.001_000.030  2017-01-01T08:00:00Z  2017-01-01T08:30:00Z   
STER_DP1.00001.001_000.030  2017-01-01T04:30:00Z  2017-01-01T05:00:00Z   
STER_DP1.00001.001_000.030  2017-01-01T15:30:00Z  2017-01-01T16:00:00Z   
STER_DP1.00001.001_000.030  2017-01-01T17:00:00Z  2017-01-01T17:30:00Z   
STER_DP1.00001.001_000.030  2017-01-01T18:00:00Z  2017-01-01T18:30:00Z   

     

<bound method DataFrame.info of                                    startDateTime           endDateTime  \
site                                                                     
ARIK_DP1.00002.001_200.000  2016-03-05T15:00:00Z  2016-03-05T15:30:00Z   
ARIK_DP1.00002.001_200.000  2016-03-05T16:00:00Z  2016-03-05T16:30:00Z   
ARIK_DP1.00002.001_200.000  2016-03-05T15:30:00Z  2016-03-05T16:00:00Z   
ARIK_DP1.00002.001_200.000  2016-03-05T14:00:00Z  2016-03-05T14:30:00Z   
ARIK_DP1.00002.001_200.000  2016-03-05T14:30:00Z  2016-03-05T15:00:00Z   
...                                          ...                   ...   
WLOU_DP1.00002.001_200.000  2021-05-30T07:00:00Z  2021-05-30T07:30:00Z   
WLOU_DP1.00002.001_200.000  2021-05-30T09:00:00Z  2021-05-30T09:30:00Z   
WLOU_DP1.00002.001_200.000  2021-05-30T10:00:00Z  2021-05-30T10:30:00Z   
WLOU_DP1.00002.001_200.000  2021-05-31T07:30:00Z  2021-05-31T08:00:00Z   
WLOU_DP1.00002.001_200.000  2021-05-31T16:00:00Z  2021-05-31T16:30:00Z   

     

<bound method DataFrame.info of                            startDateTime           endDateTime  staPresMean  \
site                                                                          
ARIK_DP1.00004.001  2016-03-05T14:30:00Z  2016-03-05T15:00:00Z     88.53888   
ARIK_DP1.00004.001  2016-03-06T01:30:00Z  2016-03-06T02:00:00Z     87.83592   
ARIK_DP1.00004.001  2016-03-06T09:00:00Z  2016-03-06T09:30:00Z     87.43784   
ARIK_DP1.00004.001  2016-03-06T14:30:00Z  2016-03-06T15:00:00Z     87.35237   
ARIK_DP1.00004.001  2016-03-06T16:30:00Z  2016-03-06T17:00:00Z     87.30535   
...                                  ...                   ...          ...   
WLOU_DP1.00004.001  2021-05-30T16:00:00Z  2021-05-30T16:30:00Z     71.86388   
WLOU_DP1.00004.001  2021-05-30T19:00:00Z  2021-05-30T19:30:00Z     71.76332   
WLOU_DP1.00004.001  2021-05-30T23:00:00Z  2021-05-30T23:30:00Z     71.80660   
WLOU_DP1.00004.001  2021-05-31T04:30:00Z  2021-05-31T05:00:00Z     72.03967   
WLOU_DP1.00004.001  

## References

## About this Notebook

**Authors:** Kyle Bassignani, Jeff Borgerson, and Christian Westbrook  
**Updated On:** 2021-11-12