## Run the fire expansion and merging algorithm

`Fire_Forward` is responsible for reading in the preprocessed data created in the `Ingest` notebook

In [1]:
import datetime
import pandas as pd
import geopandas as gpd

import FireMain, FireTime, FireObj, FireConsts, postprocess
from utils import timed

region = ["WesternUS",]  # note you don't need the shape in here, just the name
tst = [2023, 8, 28, 'AM']
ted = [2023, 9, 6, 'AM']

Since we want to use precisely the files that we just created in the Ingest notebook. We will set the `read_location` to "local".

In [2]:
allfires, allpixels = FireMain.Fire_Forward(tst=tst, ted=ted, restart=False, region=region, read_location="local")

2024-03-26 17:50:54,194 - FireLog - INFO - func:read_preprocessed took: 16.23 ms
2024-03-26 17:50:54,208 - FireLog - INFO - func:read_preprocessed took: 13.37 ms
2024-03-26 17:50:54,222 - FireLog - INFO - func:read_preprocessed took: 13.02 ms
2024-03-26 17:50:54,233 - FireLog - INFO - func:read_preprocessed took: 11.00 ms
2024-03-26 17:50:54,246 - FireLog - INFO - func:read_preprocessed took: 12.45 ms
2024-03-26 17:50:54,260 - FireLog - INFO - func:read_preprocessed took: 12.89 ms
2024-03-26 17:50:54,276 - FireLog - INFO - func:read_preprocessed took: 15.45 ms
2024-03-26 17:50:54,281 - FireLog - INFO - func:read_preprocessed took: 4.81 ms
2024-03-26 17:50:54,287 - FireLog - INFO - func:read_preprocessed took: 5.07 ms
2024-03-26 17:50:54,291 - FireLog - INFO - func:read_preprocessed took: 4.31 ms
2024-03-26 17:50:54,297 - FireLog - INFO - func:read_preprocessed took: 4.76 ms
2024-03-26 17:50:54,301 - FireLog - INFO - func:read_preprocessed took: 4.14 ms
2024-03-26 17:50:54,307 - FireLog

# Concepts
This outputs two dataframes: an `allpixels` dataframe with 1 row per pixel and an `allfires` geodataframe with one row per-fire/per-t. 

The core concept is that if you use a dataframe to back the allfires and fire objects there are well-defined ways to serialize that to disk whenever you like (aka no more pickles!).

Here's a bit of an overview of the lifecycle of each of these dataframes:

## allpixels:

- At the start of `Fire_Forward` all of the preprocessed pixel data is loaded and concatenated into one long dataframe.
- Each row represents a fire pixel and there is a unique id per row.
- As `Fire_Forward` iterates through the timesteps of interest the `allpixels` dataframe is updated in place.
- Each `Fire` object refers to the `allpixels` object as the source of truth and does not hold pixel data but instead refers to subsets of the `allpixels` dataframe to return `n_pixels` or `newpixels`.
- Merging fires at a particular `t` can update the `allpixels` at a former timestep.
- When `Fire_Forward` is complete, the `allpixels` object can be serialized to csv (or any tabular format) optionally partitioned into files by `t`.
- This dataframe can be used:
    - together with `allfires_gdf` to rehydrate the `allfires` object at the latest `t` in order to run `Fire_Forward` on one new ingest file.
    - independently to write the `nplist` output file for largefires

## allfires_gdf:

- At the start of `Fire_Forward` a new geodataframe object is initialized. It has a column for each of the `Fire` attributes that take a non-trivial amount of time to compute (`ftype`, `hull`, `fline`...).
- As `Fire_Forward` iterates through the timesteps of interest it writes a row for every fire that is burning (aka has new pixels) at the `t`.
- So each row contains the information about one fire at one `t`. The index is a MultiIndex of `(fid, t)`
- Merging fires at a particular `t` updates the `mergeid` on the existing rows (_this part I am not totally confident is correct_).
- When `Fire_Forward` is complete, the `allfires_gdf` object can be serialized to geoparquet (this is the best choice since it contains multiple geometry columns) optionally partitioned into files by `t`.
- This geodataframe can be used:
    - together with `allpixels` to rehydrate the `allfires` object at the latest `t` in order to run `Fire_Forward` on one new ingest file.
    - independently to write all the snapshot and largefires output files.

Side note: I like that in this branch the `allpixels` dataframe is referenced by all the `Fire` objects but it isn't copied around. This is different from how it works in `preprocess` where each `Fire` object (at each `t`) has its own dataframe. It is also different than the original version of this algorithm where each `Fire` object (at each `t`) holds a bunch of lists.

## Serialize to disk

- allpixels -> one file for each t (one row for each pixel).
- allfires -> one geoparquet file to hold all information about each fire at each time (one row for each burning fire at each t).

In [3]:
postprocess.save_allpixels(allpixels, tst, ted, region)

2024-03-26 17:56:42,704 - FireLog - INFO - func:save_allpixels took: 213.57 ms


'data/FEDSoutput-s3-conus/WesternUS/2023/allpixels_20230906_AM.csv'

In [4]:
postprocess.save_allfires_gdf(allfires.gdf, tst, ted, region)

2024-03-26 17:56:43,958 - FireLog - INFO - func:save_allfires_gdf took: 191.19 ms


'data/FEDSoutput-s3-conus/WesternUS/2023/allfires_20230906_AM.parq'

## Read from disk

In [5]:
allpixels = postprocess.read_allpixels(tst, ted, region, location="local")

2024-03-26 17:56:51,373 - FireLog - INFO - func:read_allpixels took: 59.02 ms


In [6]:
allfires_gdf = postprocess.read_allfires_gdf(tst, ted, region, location="local")

2024-03-26 17:56:52,055 - FireLog - INFO - func:read_allfires_gdf took: 84.43 ms


## Pick out the large fires

Let's compare the existing object-oriented approach with the new geodataframe approach

In [7]:
%%time
import FireGpkg_sfs

large_fires_original = FireGpkg_sfs.find_largefires(allfires)

CPU times: user 12 ms, sys: 2 µs, total: 12 ms
Wall time: 11.6 ms


In [8]:
large_fires_new = postprocess.find_largefires(allfires_gdf)

2024-03-26 17:56:55,685 - FireLog - INFO - func:find_largefires took: 5.46 ms


In [9]:
assert set(large_fires_original) == set(large_fires_new), "The large fires should match"

## Rehydrate the latest allfires

This is pretty experimental, but at least in theory you should be able to rehydrate the allfires object based on the allfires_gdf. If this works as expected it would let you pick up from a particular t and run another step of `Fire_Forward`.

This should be equivalent to the allfires object that we generated at the top of this notebook.

In [3]:
a = FireObj.Allfires.rehydrate(tst, ted, region, include_dead=False, read_location="local")
a

2024-03-26 17:59:07,794 - FireLog - INFO - func:read_allfires_gdf took: 46.65 ms
2024-03-26 17:59:07,837 - FireLog - INFO - func:read_allpixels took: 43.00 ms
2024-03-26 17:59:08,381 - FireLog - INFO - func:rehydrate took: 633.98 ms


<Allfires at t=[2023, 9, 6, 'AM'] with n_fires=864>