In [1]:
import os
os.chdir('../..')

In [2]:
import epicas

## Load Dynamic (Time-Series) Data

Time-series data is data that is updated through out the outbreaking season. Time-series can be the target variable that we are trying to predict (e.g., COVID-19), or other independent variables that change and can be used as training features.

Dynamic data, thus, must include a date column. Location column is optional (if there is only one time-series, aka time-series for one location, there is no need for location column). In the following example, we will look at the dataset `jhu_data.xz` that looks like below.

| FIPS  | date       | confirmed_cases | confirmed_cases_norm |
|-------|------------|-----------------|----------------------|
| 1001  | 2020-02-15 | 0               | 0                    |
| 1001  | 2020-02-16 | 0               | 0                    |
| ...   | ...        | ...             | ...                  |
| 56045 | 2021-08-08 | 13              | 187                   |

In this case, the column "FIPS" is our location column, "date" is our date column, confirmed_case records incidences, confirmed_cases_norm is an extra feature. Let's import this!

To import this, we use a special class called `StructuredData`.


In [3]:
jhu = epicas.StructuredData(
        'demo/datasets/covid.xz',
        location = 'FIPS',
        date = 'date',
        incidence = 'confirmed_cases',
        )

Great! But how does it look? Let's find out.

In [4]:
print(jhu)

StructuredData['location', 'date', 'incidence', 'confirmed_cases_norm']

Variables: {'static': None, 'time_series': ['location', 'date', 'incidence', 'confirmed_cases_norm']}

         location       date  incidence  confirmed_cases_norm
0            1001 2020-02-15        0.0                   0.0
1            1001 2020-02-16        0.0                   0.0
2            1001 2020-02-17        0.0                   0.0
3            1001 2020-02-18        0.0                   0.0
4            1001 2020-02-19        0.0                   0.0
...           ...        ...        ...                   ...
1659783     56045 2021-08-04       10.0                 144.0
1659784     56045 2021-08-05       10.0                 144.0
1659785     56045 2021-08-06        9.0                 129.0
1659786     56045 2021-08-07       13.0                 187.0
1659787     56045 2021-08-08       13.0                 187.0

[1659788 rows x 4 columns]




Awesome! As you can see, the columns are also re-labeled for readability and consistency.

Now, let's try to import something else. How about mobility data?

In [5]:
mobility = epicas.StructuredData(
        'demo/datasets/mobility.csv.gz',
        location = 'FIPS',
        date = 'date'
        )

Done!

## Load Static Data
Static data is data that is not changed, or presumed to be the same through out outbreaking season. e.g., population data, asthma mortality, etc.

This seems trickier, doesn't it? Actually NO, it is even easier. *Hint:* static data does not have as many columns! The process is still *(almost)* the same, except you do not address date column.

In [6]:
population = epicas.StructuredData(
        'Reichlab_Population.csv',
        location = 'location',
        usecols = ['location', 'population']
        )

Notice that I use some special argument in Pandas, namely `usecols` to import only a subset of columns. In fact, since Epicas is built on top of Pandas, feel free to try other arguments from [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html).

How does it look though? Let's print it out...

In [7]:
print(population)

StructuredData['location', 'population']

Variables: {'static': ['location', 'population'], 'time_series': None}

      location  population
0            1   4903185.0
1            2    731545.0
2            4   7278717.0
3            5   3017804.0
4            6  39512223.0
...        ...         ...
3194     56037     42343.0
3195     56039     23464.0
3196     56041     20226.0
3197     56043      7805.0
3198     56045      6927.0

[3199 rows x 2 columns]




Nicely done! Well, we're almost there, except now we have to merge them together. I wonder what kind of evil AutoML forces its users merge DataFrames in a sketchy way...

## Merge StructureData(s) together

Intuitively, when we merge two (or more) things together, we just add them together, right? Indeed, this is also why I love Python, it is so intuitive. For example:

This is Python:
```
'hello' + 'world'
```
And this is C:
```
char src[] = " World";
char dest[30] = "Hello";
strncat(dest, src, 6);
```

![A Python vs. C mem](https://i.redd.it/o114bghz4pa31.jpg)

Ok, I digressed. Wait, no, I actually did not! This is similar to how Epicas works!

In [8]:
merged = jhu + population + mobility

What?! Well, let's print it out.

In [9]:
print(merged)

StructuredData['location', 'date', 'incidence', 'confirmed_cases_norm', 'population', 'fb_movement_change', 'fb_stationary']

Variables: {'static': ['location', 'population'], 'time_series': ['confirmed_cases_norm', 'location', 'fb_movement_change', 'date', 'incidence', 'fb_stationary']}

         location       date  incidence  confirmed_cases_norm  population  \
0            1001 2020-02-15        0.0                   0.0     55869.0   
1            1001 2020-02-16        0.0                   0.0     55869.0   
2            1001 2020-02-17        0.0                   0.0     55869.0   
3            1001 2020-02-18        0.0                   0.0     55869.0   
4            1001 2020-02-19        0.0                   0.0     55869.0   
...           ...        ...        ...                   ...         ...   
1344613     56045 2021-08-02        9.0                 129.0      6927.0   
1344614     56045 2021-08-03       10.0                 144.0      6927.0   
1344615     56045

Wow, that was *\*kinda\** effortless... Actually, you can do more things with `StructuredData`, some of hidden features are listed in Epicas's Documentation, and some are coming soon in next versions.

Thank you for reading and I hope you enjoy effortless forecasting with Epicas!