## Ingest

This notebook explains all the preprocesing steps that can happen in parallel before you even get to `Fire_Forward`. In practice many of these steps happen automatically and the output lives on s3.

In [1]:
# If you haven't installed the fireatlas code yet, uncomment the following line and run this cell.

# !pip install -e .. -q

# After this runs, restart the notebook kernel.

[0m

In [2]:
from fireatlas import preprocess
from fireatlas import FireTime

tst = [2023, 8, 28, 'AM']
ted = [2023, 9, 6, 'AM']
region = ('WesternUS',[-125.698046875,31.676476158707615,
                       -101.00078125,49.51429477264348])
list_of_ts = list(FireTime.t_generator(tst, ted))

## Once per region

Preprocess the region to get rid of static flare sources. Save that new "swiss cheese" shape off into a geojson file for later.

In [3]:
preprocess.preprocess_region(region, force=True)

2024-03-28 08:34:48,936 - fireatlas.FireLog - INFO - func:preprocess_region took: 51.88 sec


'data/FEDSpreprocessed/WesternUS/WesternUS.json'

## Once per input file

Next process each NRT file into half day files. First we'll get all the times that are of interest. This could also be done by inspecting all looking at all the files that exist and seeing which have not been preprocessed yet.

In [4]:
%%time
for sat in ["SNPP", "NOAA20"]:
    for t in list_of_ts[::2]:
        preprocess.preprocess_NRT_file(t, sat)

2024-03-28 08:34:50,150 - fireatlas.FireLog - INFO - preprocessing SUOMI_VIIRS_C2_Global_VNP14IMGTDL_NRT_2023240.txt
2024-03-28 08:34:51,725 - fireatlas.FireLog - INFO - func:preprocess_input_file took: 1.58 sec
2024-03-28 08:34:51,743 - fireatlas.FireLog - INFO - preprocessing SUOMI_VIIRS_C2_Global_VNP14IMGTDL_NRT_2023241.txt
2024-03-28 08:34:53,277 - fireatlas.FireLog - INFO - func:preprocess_input_file took: 1.53 sec
2024-03-28 08:34:53,292 - fireatlas.FireLog - INFO - preprocessing SUOMI_VIIRS_C2_Global_VNP14IMGTDL_NRT_2023242.txt
2024-03-28 08:34:54,602 - fireatlas.FireLog - INFO - func:preprocess_input_file took: 1.31 sec
2024-03-28 08:34:54,619 - fireatlas.FireLog - INFO - preprocessing SUOMI_VIIRS_C2_Global_VNP14IMGTDL_NRT_2023243.txt
2024-03-28 08:34:55,781 - fireatlas.FireLog - INFO - func:preprocess_input_file took: 1.16 sec
2024-03-28 08:34:55,798 - fireatlas.FireLog - INFO - preprocessing SUOMI_VIIRS_C2_Global_VNP14IMGTDL_NRT_2023244.txt
2024-03-28 08:34:57,284 - fireatlas

CPU times: user 16.2 s, sys: 626 ms, total: 16.9 s
Wall time: 30 s


## Once per region and t

Do initial filtering and clustering using the preprocessed region and the half day files. For this notebook we will read all the data from local storage. In practice some or all of it will likely be available on s3.

Note: for the purpose of timing I am running this in a for loop, but each of these steps could run in a separate proccess fully in parallel.

In [5]:
%%time
for t in list_of_ts:
    preprocess.preprocess_region_t(t, region=region, read_location="local", force=True)

2024-03-28 08:35:19,185 - fireatlas.FireLog - INFO - func:read_region took: 256.38 ms
2024-03-28 08:35:19,186 - fireatlas.FireLog - INFO - filtering and clustering 2023-8-28 AM, VIIRS, WesternUS
2024-03-28 08:35:19,267 - fireatlas.FireLog - INFO - func:read_preprocessed_input took: 81.11 ms
2024-03-28 08:35:19,342 - fireatlas.FireLog - INFO - func:read_preprocessed_input took: 73.67 ms
2024-03-28 08:36:30,064 - fireatlas.FireLog - INFO - func:do_clustering took: 34.24 ms
2024-03-28 08:36:30,122 - fireatlas.FireLog - INFO - func:preprocess_region_t took: 1.19 min
2024-03-28 08:36:30,340 - fireatlas.FireLog - INFO - func:read_region took: 217.18 ms
2024-03-28 08:36:30,341 - fireatlas.FireLog - INFO - filtering and clustering 2023-8-28 PM, VIIRS, WesternUS
2024-03-28 08:36:30,413 - fireatlas.FireLog - INFO - func:read_preprocessed_input took: 71.06 ms
2024-03-28 08:36:30,486 - fireatlas.FireLog - INFO - func:read_preprocessed_input took: 72.15 ms
2024-03-28 08:36:58,294 - fireatlas.FireLo

CPU times: user 5min 34s, sys: 698 ms, total: 5min 34s
Wall time: 5min 56s
