# Experiments

This notebook constains some more experimental code.

## Merge Experiments

When creating the large fire outputs there are scenarios where clusters that were previously treated as separate fires at a certain timestep merge into one fire. When that happens the target fireID in is stored in the `mergeID` column. So we need to treat the rows after the fact and merge the rows that have the same fireID/mergeID and t.

In [1]:
import datetime
import pandas as pd
import geopandas as gpd

from fireatlas import FireMain, FireTime, FireObj, FireConsts, postprocess, preprocess
from fireatlas.utils import timed

tst = [2023, 8, 28, 'AM']
ted = [2023, 9, 6, 'AM']
region = ('WesternUS',[-125.698046875,31.676476158707615,
                       -101.00078125,49.51429477264348])
list_of_ts = list(FireTime.t_generator(tst, ted))

In [3]:
allpixels = postprocess.read_allpixels(tst, ted, region, location="local")
allfires_gdf = postprocess.read_allfires_gdf(tst, ted, region, location="local")

2024-03-26 18:09:37,669 - FireLog - INFO - func:read_allpixels took: 45.98 ms
2024-03-26 18:09:37,744 - FireLog - INFO - func:read_allfires_gdf took: 74.35 ms


In [4]:
large_fires = postprocess.find_largefires(allfires_gdf)

2024-03-26 18:09:40,297 - FireLog - INFO - func:find_largefires took: 6.41 ms


We'll start by looking for the places where we need to perform this merge.

In [5]:
gdf = allfires_gdf.reset_index()

merge_needed = (gdf.mergeid != gdf.fireID) & (gdf.invalid == False)
print(f"{merge_needed.sum()} rows that potentially need a merge")

# we'll set the "fireID" to "mergeid" in those spots
gdf.loc[merge_needed, "fireID"] = gdf.loc[merge_needed, "mergeid"]

45 rows that potentially need a merge


Here are two different ideas of how to merge rows:

1) The first version of the `merge_rows` function uses a unary union to join the hull and then recalculates the fline and the ftype. This is inspired by Lisa's single-fire workflow.
2) The second version of the `merge_rows` function uses code that is more similar to the existing merge function. It constructs a MultiGeometry out of the various geometry objects.

In [None]:
data_per_fid = {}

for fid, data in gdf[gdf["fireID"].isin(large_fires)].groupby("fireID"):
    # merge any rows that have the same t
    if data.t.duplicated().any():
        data_per_fid[fid] = merge_rows(data)

In [88]:
def merge_rows(data):
    def cumunion(x):
        from shapely.ops import unary_union
        for i in range(1, len(x)):
            x[i] = unary_union([x[i-1], x[i]])
        return x
        
    # summarize file by t. make sure you dissolve the hull
    merged_t = data.set_geometry("hull").dissolve(by='t', aggfunc={
        'meanFRP': lambda x: (data.loc[x.index, 'meanFRP'] * data.loc[x.index, 'n_newpixels']).sum() / data.loc[x.index, 'n_newpixels'].sum(),
        'n_newpixels': 'sum',
        'duration': 'max'
    }).reset_index()

    # calculate cumulative sum of n_newpixels
    merged_t['n_pixels'] = merged_t['n_newpixels'].cumsum()

    # combine the geometries from previous days
    merged_t['hull'] = cumunion(merged_t['hull'].tolist())
    merged_t["fline"] = cumunion(data.set_geometry("fline").dissolve(by='t')["fline"].tolist())

    # do the rest of the calculations
    merged_t['farea'] = merged_t['hull'].area / 10**6  # convert to km^2
    merged_t['pixden'] = merged_t['n_pixels'] / merged_t['farea']

    # reorder columns
    col_order = ['t', 'duration', 'n_pixels', 'n_newpixels', 'meanFRP', 'pixden', 'farea', 'hull', 'fline']
    merged_t = merged_t[col_order]
    return merged_t

In [None]:
def merge_rows(data):
    """For a subset of allfires data containing only one fire, merge any
    rows that have the same `t`
    """
    output = data.drop_duplicates(subset=["t"]).set_index("t").copy()
    
    # clean up any merges that are needed
    for dt, rows in data[data.t.duplicated(False)].groupby("t"):
        # first get the weighted sums for pixden and meanFRP
        pixweight = (rows["pixden"] * rows["farea"]).sum()
        FRPweight = (rows["meanFRP"] * rows["n_pixels"]).sum()
        
        for col in ["n_pixels", "n_newpixels", "farea", "fperim", "flinelen"]:
            output.loc[dt, col] = rows[col].sum()

        output.loc[dt, "t_st"] = rows["t_st"].min()
        output.loc[dt, "pixden"] = pixweight / output.loc[dt, "farea"]
        output.loc[dt, "meanFRP"] = FRPweight / output.loc[dt, "n_pixels"]

        dissolved = rows.dissolve()
        for col in ["hull", "fline", "nfp"]:
            output.loc[dt, col] = dissolved[col].item()
        
    return output.reset_index()

## Groupby experiments

Does it make a big difference if you filter first rather than after?

In [None]:
%%time
for fid, data in gdf[gdf["fireID"].isin(large_fires)].groupby("fireID"):
    f = fid

In [None]:
%%time
for fid, data in gdf.groupby("fireID"):
    if fid in large_fires:
        f = fid

In [None]:
%%time
for fid in large_fires:
    data = gdf[gdf["fireID"] == fid]
    f = fid

In [None]:
%%time
for fid in large_fires:
    data = allfires_gdf.loc[fid]
    f = fid