## Generate outputs

What we want out of this algorthym is a snapshot of all the fires at a given t. And a timeseries of each fire across time. 

In [1]:
import os
import datetime
import pandas as pd
import geopandas as gpd

import FireTime, FireObj, FireConsts, postprocess
from utils import timed

region = ["CONUS",]  # note you don't need the shape in here, just the name
tst = [2023, 8, 28, 'AM']
ted = [2023, 9, 6, 'AM']

## Read from disk

In [2]:
allpixels = postprocess.read_allpixels(ted, region)

2024-02-08 11:15:49,718 - FireLog - INFO - func:read_allpixels took: 50.33 ms


In [3]:
allfires_gdf = postprocess.read_allfires_gdf(ted, region)

2024-02-08 11:15:49,836 - FireLog - INFO - func:read_allfires_gdf took: 114.56 ms


## Write the large fires to disk


In [4]:
large_fires = postprocess.find_largefires(allfires_gdf)

2024-02-08 11:15:50,320 - FireLog - INFO - func:find_largefires took: 5.05 ms


First we'll use the `allpixels` object to create the `nplist` layer

In [5]:
%%time
for fid in large_fires:
    data = allpixels[allpixels["fid"] == fid]
    postprocess.save_fire_nplist(data, region, fid)

2024-02-08 11:15:51,336 - FireLog - INFO - func:save_fire_nplist took: 89.43 ms
2024-02-08 11:15:51,345 - FireLog - INFO - func:save_fire_nplist took: 8.58 ms
2024-02-08 11:15:51,372 - FireLog - INFO - func:save_fire_nplist took: 25.35 ms
2024-02-08 11:15:51,396 - FireLog - INFO - func:save_fire_nplist took: 23.43 ms
2024-02-08 11:15:51,411 - FireLog - INFO - func:save_fire_nplist took: 13.33 ms
2024-02-08 11:15:51,428 - FireLog - INFO - func:save_fire_nplist took: 15.06 ms
2024-02-08 11:15:51,440 - FireLog - INFO - func:save_fire_nplist took: 11.35 ms
2024-02-08 11:15:51,456 - FireLog - INFO - func:save_fire_nplist took: 14.26 ms
2024-02-08 11:15:51,468 - FireLog - INFO - func:save_fire_nplist took: 11.53 ms
2024-02-08 11:15:51,480 - FireLog - INFO - func:save_fire_nplist took: 10.89 ms
2024-02-08 11:15:51,492 - FireLog - INFO - func:save_fire_nplist took: 10.09 ms
2024-02-08 11:15:51,502 - FireLog - INFO - func:save_fire_nplist took: 9.68 ms
2024-02-08 11:15:51,513 - FireLog - INFO -

CPU times: user 947 ms, sys: 48.3 ms, total: 995 ms
Wall time: 993 ms


The rest of the layers will be created directly from the `allfires_gdf`

First let's do a naive version without the merge fixups

In [6]:
%%time
gdf = allfires_gdf.reset_index().copy()
for fid, data in gdf[gdf["fireID"].isin(large_fires)].groupby("fireID"):
    postprocess.save_fire_layers(data, region, fid)

2024-02-08 11:16:02,681 - FireLog - INFO - func:save_fire_layers took: 47.14 ms
2024-02-08 11:16:02,708 - FireLog - INFO - func:save_fire_layers took: 26.18 ms
2024-02-08 11:16:02,739 - FireLog - INFO - func:save_fire_layers took: 30.27 ms
2024-02-08 11:16:02,762 - FireLog - INFO - func:save_fire_layers took: 22.13 ms
2024-02-08 11:16:02,784 - FireLog - INFO - func:save_fire_layers took: 21.66 ms
2024-02-08 11:16:02,805 - FireLog - INFO - func:save_fire_layers took: 20.11 ms
2024-02-08 11:16:02,825 - FireLog - INFO - func:save_fire_layers took: 19.58 ms
2024-02-08 11:16:02,846 - FireLog - INFO - func:save_fire_layers took: 19.59 ms
2024-02-08 11:16:02,870 - FireLog - INFO - func:save_fire_layers took: 23.79 ms
2024-02-08 11:16:02,892 - FireLog - INFO - func:save_fire_layers took: 21.04 ms
2024-02-08 11:16:02,916 - FireLog - INFO - func:save_fire_layers took: 23.36 ms
2024-02-08 11:16:02,936 - FireLog - INFO - func:save_fire_layers took: 19.17 ms
2024-02-08 11:16:02,958 - FireLog - INFO

CPU times: user 1.24 s, sys: 37.5 ms, total: 1.28 s
Wall time: 1.27 s


Now let's do the merge as well

In [8]:
%%time
postprocess.save_large_fires_layers(allfires_gdf, region, large_fires)

2024-02-08 11:16:23,361 - FireLog - INFO - func:save_fire_layers took: 32.18 ms
2024-02-08 11:16:23,374 - FireLog - INFO - func:merge_rows took: 12.28 ms
2024-02-08 11:16:23,395 - FireLog - INFO - func:save_fire_layers took: 20.44 ms
2024-02-08 11:16:23,425 - FireLog - INFO - func:save_fire_layers took: 28.62 ms
2024-02-08 11:16:23,447 - FireLog - INFO - func:save_fire_layers took: 20.81 ms
2024-02-08 11:16:23,463 - FireLog - INFO - func:merge_rows took: 14.82 ms
2024-02-08 11:16:23,484 - FireLog - INFO - func:save_fire_layers took: 20.26 ms
2024-02-08 11:16:23,493 - FireLog - INFO - func:merge_rows took: 8.65 ms
2024-02-08 11:16:23,514 - FireLog - INFO - func:save_fire_layers took: 20.05 ms


93 rows that potentially need a merge


2024-02-08 11:16:23,536 - FireLog - INFO - func:save_fire_layers took: 21.44 ms
2024-02-08 11:16:23,562 - FireLog - INFO - func:save_fire_layers took: 24.34 ms
2024-02-08 11:16:23,591 - FireLog - INFO - func:save_fire_layers took: 27.60 ms
2024-02-08 11:16:23,613 - FireLog - INFO - func:save_fire_layers took: 21.28 ms
2024-02-08 11:16:23,641 - FireLog - INFO - func:save_fire_layers took: 26.85 ms
2024-02-08 11:16:23,664 - FireLog - INFO - func:save_fire_layers took: 22.40 ms
2024-02-08 11:16:23,687 - FireLog - INFO - func:save_fire_layers took: 22.21 ms
2024-02-08 11:16:23,710 - FireLog - INFO - func:save_fire_layers took: 21.65 ms
2024-02-08 11:16:23,734 - FireLog - INFO - func:save_fire_layers took: 23.07 ms
2024-02-08 11:16:23,758 - FireLog - INFO - func:save_fire_layers took: 22.77 ms
2024-02-08 11:16:23,789 - FireLog - INFO - func:save_fire_layers took: 30.14 ms
2024-02-08 11:16:23,834 - FireLog - INFO - func:merge_rows took: 44.53 ms
2024-02-08 11:16:23,875 - FireLog - INFO - fun

CPU times: user 1.42 s, sys: 66.3 ms, total: 1.49 s
Wall time: 1.48 s


## Merge Experiments

These ones need some merge help:

In [7]:
merge_needed = (gdf.mergeid != gdf.fireID) & (gdf.invalid == False)
print(f"{merge_needed.sum()} rows that potentially need a merge")

# we'll set the "fireID" to "mergeid" in those spots
gdf.loc[merge_needed, "fireID"] = gdf.loc[merge_needed, "mergeid"]

93 rows that potentially need a merge


I have two different ideas of how to merge rows:

1) The first version of the `merge_rows` function uses a unary union to join the hull and then recalculated the fline and the ftype.
2) The second version of the `merge_rows` function uses code that is more similar to the existing merge function. It constructs a MultiGeometry out of the various geometry objects.

In [8]:
@timed
def merge_rows(data):
    """For a subset of allfires data containing only one fire, merge any
    rows that have the same `t`
    """
    
    from shapely.ops import unary_union

    dd = FireGpkg_sfs.getdd("all")
    output = data.set_index("t").copy()
    
    # clean up any merges that are needed
    for dt, rows in data[data.t.duplicated(False)].groupby("t"):
        f = FireObj.Fire(fid, FireTime.dt2t(dt), allpixels)
        f.t_st = FireTime.dt2t(rows["t_st"].min())
        f.hull = unary_union(rows["hull"].values)
        
        # this might be doing more work than it needs to
        f.updatefline()

        # ftype is unused in the output files
        f.ftype = rows.ftype.iloc[0]
    
        for k, tp in dd.items():
            if tp == "datetime64[ns]":
                output.loc[dt, k] = FireTime.t2dt(getattr(f, k))
            else:
                output.loc[dt, k] = getattr(f, k)
    
    for k, tp in dd.items():
        output[k] = output[k].astype(tp)
        
    return output.drop_duplicates().reset_index()

In [9]:
@timed
def merge_rows(data):
    """For a subset of allfires data containing only one fire, merge any
    rows that have the same `t`
    """
    output = data.drop_duplicates(subset=["t"]).set_index("t").copy()
    
    # clean up any merges that are needed
    for dt, rows in data[data.t.duplicated(False)].groupby("t"):
        # first get the weighted sums for pixden and meanFRP
        pixweight = (rows["pixden"] * rows["farea"]).sum()
        FRPweight = (rows["meanFRP"] * rows["n_pixels"]).sum()
        
        for col in ["n_pixels", "n_newpixels", "farea", "fperim", "flinelen"]:
            output.loc[dt, col] = rows[col].sum()

        output.loc[dt, "t_st"] = rows["t_st"].min()
        output.loc[dt, "pixden"] = pixweight / output.loc[dt, "farea"]
        output.loc[dt, "meanFRP"] = FRPweight / output.loc[dt, "n_pixels"]

        dissolved = rows.dissolve()
        for col in ["hull", "fline", "nfp"]:
            output.loc[dt, col] = dissolved[col].item()
        
    return output.reset_index()

In [10]:
%%time
for fid, data in gdf[gdf["fireID"].isin(large_fires)].groupby("fireID"):
    
    # merge any rows that have the same t
    if data.t.duplicated().any():
        data = merge_rows(data)
                
    output_dir = f"out/CONUS/fires/{fid}"
    os.makedirs(output_dir, exist_ok=True)

    for layer in ["perimeter", "fireline", "newfirepix"]:
        columns = [col for col in FireGpkg_sfs.getdd(layer)]
        subset = data[columns].copy()
        if layer == "perimeter":
            subset["geometry"] = data["hull"]
        elif layer == "newfirepix":
            subset["geometry"] = data["nfp"]
        elif layer == "fireline":
            subset["geometry"] = data["fline"]
            subset = subset.dropna(subset=["geometry"])
        subset = subset.set_geometry("geometry")
        
        subset.to_file(f"{output_dir}/{layer}.fgb", driver="FlatGeobuf")

2024-02-06 13:25:55,474 - FireLog - INFO - func:merge_rows took: 9.33 ms
2024-02-06 13:25:55,555 - FireLog - INFO - func:merge_rows took: 13.59 ms
2024-02-06 13:25:55,585 - FireLog - INFO - func:merge_rows took: 8.72 ms
2024-02-06 13:25:55,895 - FireLog - INFO - func:merge_rows took: 67.29 ms
2024-02-06 13:25:55,966 - FireLog - INFO - func:merge_rows took: 7.64 ms
2024-02-06 13:25:56,057 - FireLog - INFO - func:merge_rows took: 17.19 ms
2024-02-06 13:25:56,104 - FireLog - INFO - func:merge_rows took: 23.05 ms
2024-02-06 13:25:56,184 - FireLog - INFO - func:merge_rows took: 8.15 ms
2024-02-06 13:25:56,532 - FireLog - INFO - func:merge_rows took: 12.45 ms
2024-02-06 13:25:56,560 - FireLog - INFO - func:merge_rows took: 7.71 ms


CPU times: user 1.12 s, sys: 36.2 ms, total: 1.16 s
Wall time: 1.16 s


## Experiments

Does it make a big difference if you filter first rather than after?

In [11]:
%%time
for fid, data in gdf[gdf["fireID"].isin(large_fires)].groupby("fireID"):
    f = fid

CPU times: user 22.6 ms, sys: 376 µs, total: 23 ms
Wall time: 21.8 ms


In [12]:
%%time
for fid, data in gdf.groupby("fireID"):
    if fid in large_fires:
        f = fid

CPU times: user 564 ms, sys: 13 ms, total: 577 ms
Wall time: 572 ms


In [13]:
%%time
for fid in large_fires:
    data = gdf[gdf["fireID"] == fid]
    f = fid

CPU times: user 28.7 ms, sys: 276 µs, total: 29 ms
Wall time: 27.7 ms


In [14]:
%%time
for fid in large_fires:
    data = allfires_gdf.loc[fid]
    f = fid

CPU times: user 20.2 ms, sys: 2.82 ms, total: 23 ms
Wall time: 22.4 ms
