# Load and combine timeseries data 

**Index**
1. [Timeseries-Estonia](#Timeseries-Estonia)
2. [Timeseries Estonian Counties](#Timeseries-Estonian-Counties)

This Notebooks shows how to load previously generated Estonian and County-level timeseries into a File Geodatabase. The data that is used, is provided by the [Maa-amet geoportaal](https://geoportaal.maaamet.ee/eng/), [Eesti Statistika](https://www.stat.ee/en) and [Terviseamet via DigiLugu](https://www.terviseamet.ee/et/koroonaviirus/avaandmed).

In [1]:
import arcpy
import os
from aglearn import remap as rm
from pprint import pprint

Set variables

In [2]:
os.getcwd()

'C:\\Users\\Markus.Benninghoff\\Notebooks\\corona'

In [3]:
path = os.path.join(os.getcwd(), r'gisdata')
gdbPath = os.path.join(path, r'covid.gdb')
tmpPath = os.path.join(path, r'temp.gdb')
arcpy.env.overwriteOutput = True
arcpy.env.workspace = gdbPath

## Timeseries Estonia

Here the timeseries that was created from the raw open data ([see here](01_Download_and_Preprocess_Data.ipynb#Timeseries-Estonia)), is loaded into the File Geodatabase.

In [4]:
eetstable = "timeseries_estonia"
result = arcpy.conversion.TableToTable(r"data/cov_ts_eesti.csv", gdbPath, eetstable)

The conversion tool automatically assumes that the numbers should be treated as float values. If this is not desired [```FieldMappings```](https://desktop.arcgis.com/en/arcmap/10.3/analyze/arcpy-classes/fieldmappings.htm) need to be created beforehand.

### Add demographic information

#### Get total population based on Ruutkaart2018

The data is stored in the point and polygon shapefiles. The arcpy cursor can be utilized to load the data. (Alternatively the [SummaryStatistics](https://pro.arcgis.com/en/pro-app/tool-reference/analysis/summary-statistics.htm) tool could be used.)

In [5]:
mkrkfn = r"maakond_ruutkaart_"

Loop through the all counties in Feature class, calculate total sum for population (```ePopTotal2018```) and fill 2 dictionaries:

- ```eePopCounty2018``` Population per County, MKOOD used as the key
- ``` eeShapeCounty``` Dictionary in which the geometries of each County (ArcObjects) are stored.

In [6]:
eePopTotal2018 = 0
eePopCounty2018 = {}
eeShapeCounty = {}
with arcpy.da.SearchCursor(mkrkfn, ["MKOOD", "TOTAL", "SHAPE@"]) as sCursor:
    for row in sCursor:
        mkood = row[0]
        pop = row[1]
        shape = row[2]
        eePopTotal2018 = int(eePopTotal2018 + pop)
        eePopCounty2018[mkood] = int(pop)
        eeShapeCounty[mkood] = shape

In [7]:
print('Total population of Estonia, according to underlying dataset was {}'.format(int(eePopTotal2018)))
print('It divides into the respective Counties , as follows:')
for mkood in eePopCounty2018.keys():
    print('{:<20} : {}'.format(rm.MKOOD_MNIMI[mkood], eePopCounty2018[mkood] )) # look up the County name from aglearn module

Total population of Estonia, according to underlying dataset was 1316490
It divides into the respective Counties , as follows:
Harju maakond        : 582296
Hiiu maakond         : 9428
Ida-Viru maakond     : 134299
Jõgeva maakond       : 29413
Järva maakond        : 32034
Lääne maakond        : 20737
Lääne-Viru maakond   : 61275
Põlva maakond        : 26228
Pärnu maakond        : 86643
Rapla maakond        : 34615
Saare maakond        : 33709
Tartu maakond        : 152072
Valga maakond        : 28191
Viljandi maakond     : 48249
Võru maakond         : 37301


#### Calculate cases relative to Population

Using the total population it is now possible to set the number of positive cases into perspective.

In [8]:
result = arcpy.AddField_management(eetstable, 'cumulativePosPer10K', 'Double')
result = arcpy.CalculateField_management(eetstable, 
                                "cumulativePosPer10K",
                                "round((!cumulativePositive! / {eePop})*10000,1)".format(eePop=eePopTotal2018),
                                "PYTHON3")
result = arcpy.AddField_management(eetstable, 'testedPerfPerPop10K', 'Double')
result = arcpy.CalculateField_management(eetstable, 
                                "testedPerfPerPop10K", 
                                "round((!testsPerformed! / {eePop})*10000,1)".format(eePop=eePopTotal2018), 
                                "PYTHON3")

In [9]:
ts_fields = arcpy.ListFields(eetstable)
ts_fns = [] # saving the fieldnames here
print("{:<25} {:<10} {:<7}".format('Fieldname:', 'Fieldtype:', 'Length:'))
for field in ts_fields:
    ts_fns.append(field.name)
    print("{:<25} {:<10} {:<7}".format(field.name, field.type, field.length))
ts_fns.remove('OBJECTID') # no need for OBJECTID later

Fieldname:                Fieldtype: Length:
OBJECTID                  OID        4      
StatisticsDate            Date       8      
negativeTests             Double     8      
confirmedCases            Double     8      
testsPerDay               Double     8      
cumulativeNegative        Double     8      
cumulativePositive        Double     8      
testsPerformed            Double     8      
positiveTestsPerc         Double     8      
positiveTestsPercCum      Double     8      
lastFeature               Double     8      
cumulativePosPer10K       Double     8      
testedPerfPerPop10K       Double     8      


## Timeseries Estonian Counties

For the comprehensive dataset of Estonia a table without spatial data was sufficient. The county data shall be displayed on a map, hence a Polygon Feature Class is needed.

In [10]:
mktsfs = "timeseries_county"

### Create empty FeatureClass

Copying the SpatialReference from the summarized dataset created before (click here to jump back: [full](02a_Download_and_Analyze_Demographic_Data.ipynb#county_full) and [simplified](02b_Create_Simplified_Polygons.ipynb#county_simple) counties).

In [11]:
sr = arcpy.Describe(mkrkfn).spatialReference

In [12]:
result = arcpy.CreateFeatureclass_management(gdbPath, mktsfs, "POLYGON", "", "", "", sr)

Adding all the fields that are present in the previos timeseries table, plus two fields identifying the counties.

In [13]:
for field in ts_fields[1:]:
    arcpy.AddField_management(mktsfs, field.name, field.type)
    
arcpy.AddField_management(mktsfs, 'MKOOD', 'TEXT')
ts_fns.append('MKOOD')
arcpy.AddField_management(mktsfs, 'County', 'TEXT')
ts_fns.append('County')

### Fill FeatureClass

Pandas is used to read the CSV file and fill the FeatureClass.

(Alternative workflow: ```TableToTable``` + [```AddJoin```](https://pro.arcgis.com/en/pro-app/tool-reference/data-management/add-join.htm) + [```CopyFeature```](https://pro.arcgis.com/en/pro-app/tool-reference/data-management/copy-features.htm) ).

In [14]:
import pandas as pd

Read the Dataframe

In [15]:
df = pd.read_csv('data/ts_maakond.csv')

# correct the MKOOD, which was falsely imported as integer.
df['MKOOD'] = df['County'].map(rm.MNIMI_MKOOD) 

# get the population from previously created dictionary
df['Population2018'] = df['MKOOD'].map(eePopCounty2018) 

# calculate two new columns (similar to previously performed CalculateField in TimeSeries estonia)
df['cumulativePosPer10K'] = round((df['cumulativePositive'] / df['Population2018'])*10000,1) #
df['testedPerfPerPop10K'] = round((df['testsPerformed'] / df['Population2018'])*10000,1)
df

Unnamed: 0,StatisticsDate,negativeTests,confirmedCases,testsPerDay,cumulativeNegative,cumulativePositive,testsPerformed,lastFeature,County,positiveTestsPerc,positiveTestsPercCum,MKOOD,Population2018,cumulativePosPer10K,testedPerfPerPop10K
0,2020-02-05,0.0,0.0,0.0,0.0,0.0,0.0,,Tartu maakond,,,0079,152072,0.0,0.0
1,2020-02-06,0.0,0.0,0.0,0.0,0.0,0.0,,Tartu maakond,,,0079,152072,0.0,0.0
2,2020-02-07,0.0,0.0,0.0,0.0,0.0,0.0,,Tartu maakond,,,0079,152072,0.0,0.0
3,2020-02-08,0.0,0.0,0.0,0.0,0.0,0.0,,Tartu maakond,,,0079,152072,0.0,0.0
4,2020-02-09,0.0,0.0,0.0,0.0,0.0,0.0,,Tartu maakond,,,0079,152072,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1315,2020-04-28,21.0,0.0,21.0,1273.0,12.0,1285.0,,Järva maakond,0.0000,0.0093,0052,32034,3.7,401.1
1316,2020-04-29,125.0,1.0,126.0,1398.0,13.0,1411.0,,Järva maakond,0.0079,0.0092,0052,32034,4.1,440.5
1317,2020-04-30,15.0,0.0,15.0,1413.0,13.0,1426.0,,Järva maakond,0.0000,0.0091,0052,32034,4.1,445.2
1318,2020-05-01,32.0,0.0,32.0,1445.0,13.0,1458.0,,Järva maakond,0.0000,0.0089,0052,32034,4.1,455.1


Create a list of fieldnames, which occure in both datasets.

In [16]:
sFields = list(set(ts_fns).intersection(list(df.columns)))
sFields.append('SHAPE@')
sFields

['confirmedCases',
 'MKOOD',
 'cumulativePositive',
 'cumulativePosPer10K',
 'positiveTestsPerc',
 'positiveTestsPercCum',
 'testsPerformed',
 'testsPerDay',
 'cumulativeNegative',
 'lastFeature',
 'County',
 'testedPerfPerPop10K',
 'StatisticsDate',
 'negativeTests',
 'SHAPE@']

Add new data with the ```InsertCursor```.

In [17]:
iCursor = arcpy.da.InsertCursor(mktsfs, sFields)
for indx,row in df[:].iterrows():
    irow = [] # row to insert (to be filled)
    for field in sFields[:-1]: # for each of the fields except 'SHAPE@'
        irow.append(row[field]) # append the value
    irow.append(eeShapeCounty[row.MKOOD]) # for the last field 'SHAPE@' get ArcObject from dictionary
    iCursor.insertRow(irow)
del iCursor