## Run the fire expansion and merging algorithm

`Fire_Forward` is responsible for reading in the preprocessed data created in the `Ingest` notebook

In [1]:
import datetime
import pandas as pd
import geopandas as gpd

import FireMain, FireTime, FireObj, FireConsts, postprocess
from utils import timed

region = ["CONUS",]  # note you don't need the shape in here, just the name
tst = [2023, 8, 28, 'AM']
ted = [2023, 9, 6, 'AM']

In [2]:
allfires, allpixels = FireMain.Fire_Forward(tst=tst, ted=ted, restart=False, region=region)

2024-02-12 14:39:16,358 - FireLog - INFO - func:read_preprocessed took: 4.63 ms
2024-02-12 14:39:16,366 - FireLog - INFO - func:read_preprocessed took: 4.82 ms
2024-02-12 14:39:16,371 - FireLog - INFO - func:read_preprocessed took: 4.10 ms
2024-02-12 14:39:16,375 - FireLog - INFO - func:read_preprocessed took: 2.76 ms
2024-02-12 14:39:16,380 - FireLog - INFO - func:read_preprocessed took: 3.76 ms
2024-02-12 14:39:16,386 - FireLog - INFO - func:read_preprocessed took: 4.36 ms
2024-02-12 14:39:16,392 - FireLog - INFO - func:read_preprocessed took: 4.95 ms
2024-02-12 14:39:16,396 - FireLog - INFO - func:read_preprocessed took: 2.63 ms
2024-02-12 14:39:16,400 - FireLog - INFO - func:read_preprocessed took: 2.66 ms
2024-02-12 14:39:16,403 - FireLog - INFO - func:read_preprocessed took: 1.43 ms
2024-02-12 14:39:16,406 - FireLog - INFO - func:read_preprocessed took: 2.74 ms
2024-02-12 14:39:16,409 - FireLog - INFO - func:read_preprocessed took: 1.27 ms
2024-02-12 14:39:16,412 - FireLog - INFO

From the logs that Eli shared we expect that for [2023, 9, 6, 'AM'] we should have:

```python
actual_fids_expand = 260
actual_fids_new = 131
actual_fids_merged = 6
actual_fids_invalid = 44
```

# Concepts
This outputs two dataframes: an `allpixels` dataframe with 1 row per pixel and an `allfires` geodataframe with one row per-fire/per-t. 

The core concept is that if you use a dataframe to back the allfires and fire objects there are well-defined ways to serialize that to disk whenever you like (aka no more pickles!).

Here's a bit of an overview of the lifecycle of each of these dataframes:

## allpixels:

- At the start of `Fire_Forward` all of the preprocessed pixel data is loaded and concatenated into one long dataframe.
- Each row represents a fire pixel and there is a unique id per row.
- As `Fire_Forward` iterates through the timesteps of interest the `allpixels` dataframe is updated in place.
- Each `Fire` object refers to the `allpixels` object as the source of truth and does not hold pixel data but instead refers to subsets of the `allpixels` dataframe to return `n_pixels` or `newpixels`.
- Merging fires at a particular `t` can update the `allpixels` at a former timestep.
- When `Fire_Forward` is complete, the `allpixels` object can be serialized to csv (or any tabular format) optionally partitioned into files by `t`.
- This dataframe can be used:
    - together with `allfires_gdf` to rehydrate the `allfires` object at the latest `t` in order to run `Fire_Forward` on one new ingest file.
    - independently to write the `nplist` output file for largefires

## allfires_gdf:

- At the start of `Fire_Forward` a new geodataframe object is initialized. It has a column for each of the `Fire` attributes that take a non-trivial amount of time to compute (`ftype`, `hull`, `fline`...).
- As `Fire_Forward` iterates through the timesteps of interest it writes a row for every fire that is burning (aka has new pixels) at the `t`.
- So each row contains the information about one fire at one `t`. The index is a MultiIndex of `(fid, t)`
- Merging fires at a particular `t` updates the `mergeid` on the existing rows (_this part I am not totally confident is correct_).
- When `Fire_Forward` is complete, the `allfires_gdf` object can be serialized to geoparquet (this is the best choice since it contains multiple geometry columns) optionally partitioned into files by `t`.
- This geodataframe can be used:
    - together with `allpixels` to rehydrate the `allfires` object at the latest `t` in order to run `Fire_Forward` on one new ingest file.
    - independently to write all the snapshot and largefires output files.

Side note: I like that in this branch the `allpixels` dataframe is referenced by all the `Fire` objects but it isn't copied around. This is different from how it works in `preprocess` where each `Fire` object (at each `t`) has its own dataframe. It is also different than the original version of this algorithm where each `Fire` object (at each `t`) holds a bunch of lists.

## Serialize to disk

- allpixels -> one file for each t (one row for each pixel).
- allfires -> one geoparquet file to hold all information about each fire at each time (one row for each burning fire at each t).

In [3]:
postprocess.save_allpixels(allpixels, ted, region)

2024-02-12 14:40:57,040 - FireLog - INFO - func:save_allpixels took: 135.02 ms


In [4]:
postprocess.save_allfires_gdf(allfires.gdf, ted, region)

2024-02-12 14:40:57,221 - FireLog - INFO - func:save_allfires_gdf took: 178.22 ms


## Read from disk

In [6]:
allpixels = postprocess.read_allpixels(ted, region)

2024-02-08 11:14:06,330 - FireLog - INFO - func:read_allpixels took: 60.85 ms


In [7]:
allfires_gdf = postprocess.read_allfires_gdf(ted, region)

2024-02-08 11:14:06,854 - FireLog - INFO - func:read_allfires_gdf took: 107.25 ms


## Pick out the large fires

Let's compare the existing object-oriented approach with the new geodataframe approach

In [8]:
%%time
import FireGpkg_sfs

large_fires_original = FireGpkg_sfs.find_largefires(allfires)

CPU times: user 37.3 ms, sys: 36 µs, total: 37.4 ms
Wall time: 36.4 ms


In [9]:
large_fires_new = postprocess.find_largefires(allfires_gdf)

2024-02-08 11:14:12,797 - FireLog - INFO - func:find_largefires took: 10.53 ms


In [10]:
assert set(large_fires_original) == set(large_fires_new), "The large fires should match"

## Rehydrate the latest allfires

For instance let's rehydrate just the last allfires object. 

This should be equivalent to the allfires object that we generated at the top of this notebook.

In [11]:
a = FireObj.Allfires.rehydrate(ted, region)
a

2024-02-08 11:14:24,835 - FireLog - INFO - func:read_allfires_gdf took: 107.92 ms
2024-02-08 11:14:24,879 - FireLog - INFO - func:read_allpixels took: 43.51 ms
2024-02-08 11:14:27,170 - FireLog - INFO - func:rehydrate took: 2.44 sec


<Allfires at t=[2023, 9, 6, 'AM'] with n_fires=4342>