## Run the fire expansion and merging algorithm

`Fire_Forward` is responsible for reading in the preprocessed data created in the `Ingest` notebook

In [2]:
import datetime
import pandas as pd
import geopandas as gpd

import FireMain, FireTime, FireObj, FireConsts, FireVector
from utils import timed

region = ["CONUS",]  # note you don't need the shape in here, just the name
tst = [2023, 8, 28, 'AM']
ted = [2023, 9, 6, 'AM']

In [3]:
allfires, allpixels, allfires_gdf = FireMain.Fire_Forward(tst=tst, ted=ted, restart=False, region=region)

2024-02-05 16:01:11,680 - FireLog - INFO - func:read_preprocessed took: 5.93 ms
2024-02-05 16:01:11,686 - FireLog - INFO - func:read_preprocessed took: 5.16 ms
2024-02-05 16:01:11,692 - FireLog - INFO - func:read_preprocessed took: 5.03 ms
2024-02-05 16:01:11,699 - FireLog - INFO - func:read_preprocessed took: 6.04 ms
2024-02-05 16:01:11,707 - FireLog - INFO - func:read_preprocessed took: 7.27 ms
2024-02-05 16:01:11,712 - FireLog - INFO - func:read_preprocessed took: 5.00 ms
2024-02-05 16:01:11,719 - FireLog - INFO - func:read_preprocessed took: 5.98 ms
2024-02-05 16:01:11,723 - FireLog - INFO - func:read_preprocessed took: 3.22 ms
2024-02-05 16:01:11,726 - FireLog - INFO - func:read_preprocessed took: 3.07 ms
2024-02-05 16:01:11,729 - FireLog - INFO - func:read_preprocessed took: 2.07 ms
2024-02-05 16:01:11,734 - FireLog - INFO - func:read_preprocessed took: 5.27 ms
2024-02-05 16:01:11,737 - FireLog - INFO - func:read_preprocessed took: 1.93 ms
2024-02-05 16:01:11,740 - FireLog - INFO

This outputs two dataframes: an `allpixels` dataframe with 1 row per pixel and an `allfires` geodataframe with one row per-fire/per-t. 

The core concept is that if you use a dataframe to back the allfires and fire objects there are well-defined ways to serialize that to disk whenever you like (aka no more pickles!).

Here's a bit of an overview of the lifecycle of each of these dataframes:

## allpixels:

- At the start of `Fire_Forward` all of the preprocessed pixel data is loaded and concatenated into one long dataframe.
- Each row represents a fire pixel and there is a unique id per row.
- As `Fire_Forward` iterates through the timesteps of interest the `allpixels` dataframe is updated in place.
- Each `Fire` object refers to the `allpixels` object as the source of truth and does not hold pixel data but instead refers to subsets of the `allpixels` dataframe to return `n_pixels` or `newpixels`.
- Merging fires at a particular `t` can update the `allpixels` at a former timestep.
- When `Fire_Forward` is complete, the `allpixels` object can be serialized to csv (or any tabular format) optionally partitioned into files by `t`.
- This dataframe can be used:
    - together with `allfires_gdf` to rehydrate the `allfires` object at the latest `t` in order to run `Fire_Forward` on one new ingest file.
    - independently to write the `nplist` output file for largefires

## allfires_gdf:

- At the start of `Fire_Forward` a new geodataframe object is initialized. It has a column for each of the `Fire` attributes that take a non-trivial amount of time to compute (`ftype`, `hull`, `fline`...).
- As `Fire_Forward` iterates through the timesteps of interest it writes a row for every fire that is burning (aka has new pixels) at the `t`.
- So each row contains the information about one fire at one `t`. The index is a MultiIndex of `(fid, t)`
- Merging fires at a particular `t` updates the `mergeid` on the existing rows (_this part I am not totally confident is correct_).
- When `Fire_Forward` is complete, the `allfires_gdf` object can be serialized to geoparquet (this is the best choice since it contains multiple geometry columns) optionally partitioned into files by `t`.
- This geodataframe can be used:
    - together with `allpixels` to rehydrate the `allfires` object at the latest `t` in order to run `Fire_Forward` on one new ingest file.
    - independently to write all the snapshot and largefires output files.

Side note: I like that in this branch the `allpixels` dataframe is referenced by all the `Fire` objects but it isn't copied around. This is different from how it works in `preprocess` where each `Fire` object (at each `t`) has its own dataframe. It is also different than the original version of this algorithm where each `Fire` object (at each `t`) holds a bunch of lists.

## Serialize to disk

- allpixels -> one file for each t (one row for each pixel).
- allfires -> one geoparquet file to hold all information about each fire at each time (one row for each burning fire at each t).

In [4]:
%%time
for t in FireTime.t_generator(tst, ted):
    pixels = allpixels[allpixels["t"] == FireTime.t2dt(t)]
    filepath = f"out/{region[0]}/{t[0]}{t[1]:02}{t[2]:02}_{t[3]}.txt"
    pixels.to_csv(filepath)

CPU times: user 159 ms, sys: 27 µs, total: 159 ms
Wall time: 169 ms


In [6]:
%%time
t = ted
allfires_gdf.to_parquet(f"out/{region[0]}/allfires_{t[0]}{t[1]:02}{t[2]:02}_{t[3]}.parq")

CPU times: user 112 ms, sys: 51.9 ms, total: 164 ms
Wall time: 165 ms


## Read from disk

In [7]:
%%time
allpixels = pd.concat([
    pd.read_csv(f"out/{region[0]}/{t[0]}{t[1]:02}{t[2]:02}_{t[3]}.txt", index_col="uuid", parse_dates=["t"])
    for t in FireTime.t_generator(tst, ted)
])

CPU times: user 82.1 ms, sys: 3.96 ms, total: 86.1 ms
Wall time: 85 ms


In [8]:
%%time
t = ted
allfires_gdf = gpd.read_parquet(f"out/{region[0]}/allfires_{t[0]}{t[1]:02}{t[2]:02}_{t[3]}.parq")

CPU times: user 91.3 ms, sys: 43.8 ms, total: 135 ms
Wall time: 120 ms


## Pick out the large fires

Let's compare the existing object-oriented approach with the new geodataframe approach

In [57]:
%%time
import FireGpkg_sfs

large_fire_ids = FireGpkg_sfs.find_largefires(allfires)

CPU times: user 38.7 ms, sys: 64 µs, total: 38.8 ms
Wall time: 38.5 ms


In [58]:
%%time
t = ted
dt = FireTime.t2dt(t)
gdf = allfires_gdf.reset_index()

gdf = gdf[gdf.t >= dt - datetime.timedelta(days=20)]
last_seen = gdf.drop_duplicates("fireID", keep="last")
last_large = last_seen[(last_seen.farea > 4) & (last_seen.invalid == False)]
large_fires = last_large.fireID.values

CPU times: user 8.3 ms, sys: 7 µs, total: 8.31 ms
Wall time: 7.75 ms


In [56]:
assert set(large_fires) == set(large_fire_ids), "The large fires should match"

True

## Rehydrate the latest allfires

In [59]:
@timed
def rehydrate_allfires(t, allpixels, gdf, notdead=True):
    dt = FireTime.t2dt(t)
    dt_dead = dt - datetime.timedelta(days=FireConsts.limoffdays)
    
    gdf_ = gdf[(gdf.t_st <= dt) & (gdf.t_ed >= dt_dead)]
    a = FireObj.Allfires(t)
    for fid, gdf_fid in gdf_.groupby(level=0):
        f = FireObj.Fire(fid, t, allpixels)
        dt_st = gdf_fid.t_st.min()
        dt_ed = gdf_fid.t_ed.max()
        
        f.t_st = FireTime.dt2t(dt_st)
        f.t_ed = FireTime.dt2t(dt_ed)
        
        gdf_fid_t = gdf_fid.loc[(fid, dt_ed)]
        for k, v in gdf_fid_t.items():
            if k in ["hull", "ftype", "fline", "invalid"]:
                setattr(f, k, v)
        if f.isignition:
            a.fids_new.append(fid)
        else:
            a.fids_expanded.append(fid)
        a.fires[fid] = f
    return a

For instance let's rehydrate just the last allfires object. 

This should be equivalent to the allfires object that we generated at the top of this notebook.

In [60]:
a = rehydrate_allfires(ted, allpixels, allfires_gdf)
a

2024-02-05 16:15:51,863 - FireLog - INFO - func:rehydrate_allfires took: 2.01 sec


<Allfires at t=[2023, 9, 6, 'AM'] with n_fires=4342>

What does it look like to rehydrate the object for all timesteps? 

Note that it takes longer if you have more data. But it should max out at some point as long as `notdead=True`

In [9]:
%%time
for t in FireTime.t_generator(tst, ted):
    a = rehydrate_allfires(t, allpixels, allfires_gdf)

2024-01-31 09:12:51,583 - FireLog - INFO - func:rehydrate_allfires took: 184.15 ms
2024-01-31 09:12:51,927 - FireLog - INFO - func:rehydrate_allfires took: 342.98 ms
2024-01-31 09:12:52,421 - FireLog - INFO - func:rehydrate_allfires took: 493.79 ms
2024-01-31 09:12:53,004 - FireLog - INFO - func:rehydrate_allfires took: 582.32 ms
2024-01-31 09:12:53,661 - FireLog - INFO - func:rehydrate_allfires took: 655.69 ms
2024-01-31 09:12:54,438 - FireLog - INFO - func:rehydrate_allfires took: 776.19 ms
2024-01-31 09:12:55,279 - FireLog - INFO - func:rehydrate_allfires took: 840.69 ms
2024-01-31 09:12:56,288 - FireLog - INFO - func:rehydrate_allfires took: 1.01 sec
2024-01-31 09:12:57,338 - FireLog - INFO - func:rehydrate_allfires took: 1.05 sec
2024-01-31 09:12:58,476 - FireLog - INFO - func:rehydrate_allfires took: 1.14 sec
2024-01-31 09:12:59,727 - FireLog - INFO - func:rehydrate_allfires took: 1.25 sec
2024-01-31 09:13:01,028 - FireLog - INFO - func:rehydrate_allfires took: 1.30 sec
2024-01-3

CPU times: user 21 s, sys: 28.3 ms, total: 21 s
Wall time: 21 s


## TO DO:

- [ ] figure out if the merge behavior is correct
- [ ] what to do with invalid fires?
- [ ] do we really need allpixels when rehydrating?
- [ ] should the allfires object hold on to the gdf and update it itself?
- [ ] do the snapshot files have rows merged based on the fid?
- [ ] figure out how to write the desired output files solely from the allfires geodataframe and the allpixels dataframe. I think it's important not to reinstantiate the objects for this.