**In Progress.**  This notebook is focused on continous-time COVID-19 trends calculations.  The calculations feed interactive data products.
<br>

# Preliminaries

!rm -rf *.log
!rm -rf *.pdf

<br>

## Libraries

In [1]:
import pandas as pd
import numpy as np

import logging

import os
import pathlib
import sys

import zipfile
import requests
import io

<br>

## Paths

In [2]:
child = os.getcwd()
parent = str(pathlib.Path(child).parent)

In [3]:
root = os.path.join(child, 'warehouse')
warehouse = os.path.join(root, 'trends')

<br>

Appending paths

In [4]:
sys.path.append(parent)

<br>

## Logging

In [5]:
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

<br>

## Custom Classes

<br>

Import

In [6]:
import algorithms.base.delta
import algorithms.base.differences
import algorithms.base.quantiles
import algorithms.misc.doublet

import atlantic.base.directories

<br>

Set-up directories

In [7]:
directories = atlantic.base.directories.Directories()
directories.cleanup(listof=[warehouse])
directories.create(listof=[warehouse])

<br>
<br>

# Curves

## The Data

In [8]:
datauri = os.path.join(parent, 'warehouse', 'baselines.csv')

parse_dates = ['datetimeobject']
baselines = pd.read_csv(filepath_or_buffer=datauri, header=0, encoding='utf-8', parse_dates=parse_dates)

<br>

**Daily Positive Test Rate**

In [9]:
series = (baselines.positiveIncrease / baselines.testIncrease).fillna(value=0).values
series = np.where(np.isinf(series), 0, series)
series

array([0., 0., 0., ..., 1., 1., 1.])

In [10]:
baselines.loc[:, 'dailyPositiveTestRate'] = 100 * series

<br>

**Preview**

In [11]:
logger.info('\n{}'.format(baselines.info()))

INFO:__main__:
None


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17576 entries, 0 to 17575
Data columns (total 28 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   datetimeobject             17576 non-null  datetime64[ns]
 1   STUSPS                     17576 non-null  object        
 2   hospitalizedCurrently      17576 non-null  float64       
 3   icuCurrently               17576 non-null  float64       
 4   deathIncrease              17576 non-null  float64       
 5   deathCumulative            17576 non-null  float64       
 6   positiveIncrease           17576 non-null  float64       
 7   positiveCumulative         17576 non-null  float64       
 8   testIncrease               17576 non-null  float64       
 9   testCumulative             17576 non-null  float64       
 10  icuIncrease                17576 non-null  float64       
 11  icuCumulative              17576 non-null  float64       
 12  hosp

<br>

**Periods, Places**

* Case Hopkins: periods $\rightarrow$ $\small{array([3,  7,  9, 13, 15, 18, 21])}$

In [12]:
periods = np.concatenate((np.array([1]), np.arange(3, 6, 1), np.arange(7, 10, 1), np.arange(13, 16, 1), 
                          np.array([18]), np.arange(19, 24, 2)))
periods

array([ 1,  3,  4,  5,  7,  8,  9, 13, 14, 15, 18, 19, 21, 23])

In [13]:
placestype = 'STUSPS'
placestype

'STUSPS'

In [14]:
places = baselines[placestype].unique()
places

array(['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', 'GA',
       'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA',
       'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY',
       'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX',
       'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY', 'PR'], dtype=object)

<br>
<br>

## Positive Test Rate

### Periodic

<br>

Foremost, the **Positive Test Rates** for varying periods are evaluated via

$\qquad \qquad \rho_{\tau, \Delta} = 100 * \Large{ \frac{P_{\tau} - P_{\tau - \Delta}}{T_{\tau} - T_{\tau - \Delta}} }$

wherein

<table style="width:45%; text-align: left; border: 0px solid black; float:left; margin-left: 60px">
    <tr>
        <th style="width:20%">Variable</th><th>Description</th> 
    </tr>
    <tr>
        <td>$\tau$</td><td>date</td>
    </tr>
    <tr>
        <td>$\Delta$</td><td>days</td>
    </tr>
    <tr>
        <td>$\rho_{\tau, \Delta}$</td>
        <td>The positive test rate on date $\tau$ w.r.t. starting date $\tau$ - $\Delta$</td>
    </tr>
    <tr>
      <td>$P_{\tau}$</td><td>The cumulative number of positive cases by date $\tau$.</td>
    </tr>
    <tr>
      <td>$P_{\tau - \Delta}$</td>
      <td>The cumulative number of positive cases by starting date $\tau$ - $\Delta$</td>
    </tr>
    <tr>
      <td>$T_{\tau}$</td><td>The cumulative number of tests by date $\tau$.</td>
    </tr>
    <tr>
      <td>$T_{\tau - \Delta}$</td><td>The cumulative number of tests by starting date $\tau$ - $\Delta$.</td>
    </tr>
</table>


In [15]:
numerator = 'positiveCumulative'
denominator = 'testCumulative'

doublet = algorithms.misc.doublet.Doublet(blob=baselines, periods=periods, places=places, placestype=placestype)
ptr = doublet.exc(numerator=numerator, denominator=denominator)
ptr.rename(columns={'rates': 'positiveTestRate'}, inplace=True)

<br>
<br>

The **Tests/Case** for varying periods is evaluated via

$\qquad \qquad \text{tpc}_{_{\tau, \Delta}} = \Large{\frac{100}{\rho_{\tau, \Delta}}}$

and, similar to previous definitions,

<table style="width:45%; text-align: left; border: 0px solid black; float:left; margin-left: 60px">
    <tr>
        <th style="width:20%">Variable</th><th>Description</th> 
    </tr>
    <tr>
        <td>$\tau$</td><td>date</td>
    </tr>
    <tr>
        <td>$\Delta$</td><td>days</td>
    </tr>
    <tr>
        <td>$\rho_{\tau, \Delta}$</td>
        <td>The positive test rate on date $\tau$ w.r.t. starting date $\tau$ - $\Delta$</td>
    </tr>
    <tr>
      <td>$\text{tpc}_{_{\tau, \Delta}}$</td><td>The tests per case value on date $\tau$ w.r.t. starting date $\tau$ - $\Delta$.</td>
    </tr>
</table>




In [16]:
ptr.loc[:, 'testsPerCase'] = np.where(ptr['positiveTestRate'] > 0, ptr['positiveTestRate'].rdiv(100), 0 )

<br>
<br>

**Write**

In [17]:
ptr.to_csv(path_or_buf=os.path.join(warehouse, 'ptrPeriodic.csv'), header=True, index=False, encoding='utf-8')

In [18]:
logger.info('\n{}'.format(ptr.info()))

INFO:__main__:
None


<class 'pandas.core.frame.DataFrame'>
Int64Index: 238472 entries, 0 to 238471
Data columns (total 5 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   datetimeobject    238472 non-null  datetime64[ns]
 1   STUSPS            238472 non-null  object        
 2   period            238472 non-null  object        
 3   positiveTestRate  238472 non-null  float64       
 4   testsPerCase      238472 non-null  float64       
dtypes: datetime64[ns](1), float64(2), object(2)
memory usage: 10.9+ MB


<br>
<br>

### Running Medians Across Varying Days

In [19]:
event = 'dailyPositiveTestRate'

# Focus on
base = baselines[['datetimeobject', 'STUSPS', event]].copy()

# Pivot -> such that each field is a place, and each instance of a field is a date in time
segment = base.pivot(index='datetimeobject', columns='STUSPS', values=event)

# Quantiles
quantiles = algorithms.base.quantiles.Quantiles(data=segment, places=places, placestype=placestype)
matrix = quantiles.exc(periods=np.concatenate((np.array([1]), periods)), quantile=0.5, fieldname=(event + 'Median'))


In [20]:
matrix.rename(columns={'dailyPositiveTestRateMedian': 'dailyPTRM'}, inplace=True)

In [21]:
matrix.loc[:, 'dailyTPCM'] = np.where(matrix['dailyPTRM'] > 0, matrix['dailyPTRM'].rdiv(100), 0 )

<br>

**Write**

In [22]:
matrix[['datetimeobject', 'STUSPS', 'period', 'dailyPTRM', 'dailyTPCM']
      ].to_csv(path_or_buf=os.path.join(warehouse, 'ptrDaily.csv'), header=True, index=False, encoding='utf-8')

In [23]:
logger.info('\n{}'.format(matrix.info()))

INFO:__main__:
None


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 256048 entries, 0 to 256047
Data columns (total 5 columns):
 #   Column          Non-Null Count   Dtype         
---  ------          --------------   -----         
 0   datetimeobject  256048 non-null  datetime64[ns]
 1   STUSPS          256048 non-null  object        
 2   dailyPTRM       256048 non-null  float64       
 3   period          256048 non-null  object        
 4   dailyTPCM       256048 non-null  float64       
dtypes: datetime64[ns](1), float64(2), object(2)
memory usage: 9.8+ MB


<br>
<br>

## Observations/100K

### Running Medians Across Varying Days

For each `increase/100K` type

$\qquad \qquad \hat{\mu}_{\tau, \Delta} = median(S_{\tau - \Delta}, \; \ldots, \; S_{\tau -1}, \;  S_{\tau})$

wherein

<table style="width:45%; text-align: left; border: 0px solid black; float:left; margin-left: 60px">
    <tr>
        <th style="width:20%">Variable</th><th>Description</th> 
    </tr>
    <tr>
        <td>$\tau$</td><td>date</td>
    </tr>
    <tr>
        <td>$\Delta$</td><td>days</td>
    </tr>
    <tr>
        <td>$\hat{\mu}_{\tau, \Delta}$</td>
        <td>The median on date $\tau$ starting $\tau - \Delta$ days ago</td>
    </tr>
    <tr>
      <td>$S$</td><td>A series, e.g., $(deaths \: increase)/100K$</td>
    </tr>
    <tr>
      <td>$S_{\tau}$</td>
      <td>The series data value on date $\tau$</td>
    </tr>
</table>

<br>


In [24]:
medians: pd.DataFrame = pd.DataFrame()

for event in ['positiveIncreaseRate', 'testIncreaseRate', 'deathIncreaseRate', 'icuIncreaseRate', 'hospitalizedIncreaseRate']:

    # Focus on
    base = baselines[['datetimeobject', 'STUSPS', event]].copy()

    # Pivot -> such that each field is a place, and each instance of a field is a date in time
    segment = base.pivot(index='datetimeobject', columns='STUSPS', values=event)

    # Quantiles
    quantiles = algorithms.base.quantiles.Quantiles(data=segment, places=places, placestype=placestype)
    values = quantiles.exc(periods=periods, quantile=0.5, fieldname=(event + 'Median'))

    # Structuring
    if medians.empty:
        medians = values
    else:
        medians = medians.merge(values, how='inner', on=['datetimeobject', 'STUSPS', 'period'])


In [25]:
names = {i: i.replace('IncreaseRateMedian', 'IRM') for i in 
         ['positiveIncreaseRateMedian', 'testIncreaseRateMedian', 'deathIncreaseRateMedian', 
          'icuIncreaseRateMedian', 'hospitalizedIncreaseRateMedian']}

medians.rename(columns=names, inplace=True)

medians = medians[['datetimeobject', 'STUSPS', 'period', 'positiveIRM', 'testIRM', 'deathIRM', 'icuIRM', 'hospitalizedIRM']]

<br>
<br>

**Write**

In [26]:
medians.to_csv(path_or_buf=os.path.join(warehouse, 'medians.csv'), header=True, index=False, encoding='utf-8')

In [27]:
logger.info('\n{}'.format(medians.info()))

INFO:__main__:
None


<class 'pandas.core.frame.DataFrame'>
Int64Index: 238472 entries, 0 to 238471
Data columns (total 8 columns):
 #   Column           Non-Null Count   Dtype         
---  ------           --------------   -----         
 0   datetimeobject   238472 non-null  datetime64[ns]
 1   STUSPS           238472 non-null  object        
 2   period           238472 non-null  object        
 3   positiveIRM      238472 non-null  float64       
 4   testIRM          238472 non-null  float64       
 5   deathIRM         238472 non-null  float64       
 6   icuIRM           238472 non-null  float64       
 7   hospitalizedIRM  238472 non-null  float64       
dtypes: datetime64[ns](1), float64(5), object(2)
memory usage: 16.4+ MB


<br>
<br>

### Running Percentage Change w.r.t Time Periods

The **running percentage change** w.r.t. defined **periods** for each cumulative [observations per 100K] value type is


$\qquad \qquad \text{pc}_{\tau, \Delta} = 100 * \Large{ \frac{C_{\tau} \; - \; C_{\tau - \Delta}}{C_{\tau - \Delta}} }$

wherein

<table style="width:45%; text-align: left; border: 0px solid black; float:left; margin-left: 60px">
    <tr>
        <th style="width:20%">Variable</th><th>Description</th> 
    </tr>
    <tr>
        <td>$\tau$</td><td>date</td>
    </tr>
    <tr>
        <td>$\Delta$</td><td>days</td>
    </tr>
    <tr>
        <td>$\text{pc}_{\tau, \Delta}$</td>
        <td>The percentage change on date $\tau$ w.r.t. initial date $\tau$ - $\Delta$</td>
    </tr>
    <tr>
      <td>$C_{\tau}$</td><td>The cumulative value on date $\tau$.</td>
    </tr>
    <tr>
      <td>$C_{\tau - \Delta}$</td>
      <td>The cumulative value on initial date $\tau$ - $\Delta$.</td>
    </tr>
</table>

<br>


In [28]:
percentages = pd.DataFrame()

for event in ['deathRate', 'positiveRate', 'testRate', 'icuRate', 'hospitalizedRate', 'hospitalizedCurrentlyRate']:

    # Focus on
    base = baselines[['datetimeobject', 'STUSPS', event]]
        
    # Pivot -> such that each field is a place, and each instance of a field is a date in time
    segment = base.pivot(index='datetimeobject', columns='STUSPS', values=event)

    # The percentage differences
    delta = algorithms.base.delta.Delta(data=segment, places=places, placestype=placestype)
    dataset = delta.exc(periods=periods, fieldname=(event + 'Delta'))
        
        
    # Include the variable the delta calculations are based on
    dataset = dataset.merge(base, how='left', on=['datetimeobject', 'STUSPS'])


    if percentages.empty:
        percentages = dataset
    else:
        percentages = percentages.merge(dataset, how='inner', on=['datetimeobject', 'STUSPS', 'period'])


In [29]:
percentages = percentages[['datetimeobject', 'STUSPS', 'period', 'deathRateDelta', 'deathRate', 
                           'positiveRateDelta', 'positiveRate', 'testRateDelta', 'testRate',
                            'icuRateDelta', 'icuRate', 'hospitalizedRateDelta', 'hospitalizedRate',
                          'hospitalizedCurrentlyRateDelta', 'hospitalizedCurrentlyRate']]

<br>
<br>

**Write**

In [30]:
percentages.to_csv(path_or_buf=os.path.join(warehouse, 'percentages.csv'), header=True, index=False, encoding='utf-8')

In [31]:
logger.info('\n{}\n'.format(percentages.info()))

INFO:__main__:
None



<class 'pandas.core.frame.DataFrame'>
Int64Index: 238472 entries, 0 to 238471
Data columns (total 15 columns):
 #   Column                          Non-Null Count   Dtype         
---  ------                          --------------   -----         
 0   datetimeobject                  238472 non-null  datetime64[ns]
 1   STUSPS                          238472 non-null  object        
 2   period                          238472 non-null  object        
 3   deathRateDelta                  238472 non-null  float64       
 4   deathRate                       238472 non-null  float64       
 5   positiveRateDelta               238472 non-null  float64       
 6   positiveRate                    238472 non-null  float64       
 7   testRateDelta                   238472 non-null  float64       
 8   testRate                        238472 non-null  float64       
 9   icuRateDelta                    238472 non-null  float64       
 10  icuRate                         238472 non-null  float64

<br>

## Clean-up

In [32]:
!rm -rf *.log
!rm -rf *.pdf

<br>
<br>

## End

In [33]:
%%bash

date +"%Y-%m-%d %T"

2021-02-10 17:36:54
