## Ingest

This notebook explains all the preprocesing steps that can happen in parallel before you even get to `Fire_Forward`. In practice many of these steps happen automatically and the output lives on s3.

In [1]:
#!pip install -e ..

In [2]:
from fireatlas import preprocess
from fireatlas import FireTime

tst = [2023, 8, 28, 'AM']
ted = [2023, 9, 6, 'AM']
region = ('WesternUS',[-125.698046875,31.676476158707615,
                       -101.00078125,49.51429477264348])
list_of_ts = list(FireTime.t_generator(tst, ted))

## Once per region

Preprocess the region to get rid of static flare sources. Save that new "swiss cheese" shape off into a geojson file for later.

In [2]:
preprocess.preprocess_region(region, force=True)

2024-03-26 17:44:44,226 - FireLog - INFO - func:preprocess_region took: 43.18 sec


'data/FEDSpreprocessed/WesternUS/WesternUS.json'

## Once per input file

Next process each NRT file into half day files. First we'll get all the times that are of interest. This could also be done by inspecting all looking at all the files that exist and seeing which have not been preprocessed yet.

In [3]:
%%time
for sat in ["SNPP", "NOAA20"]:
    for t in list_of_ts[::2]:
        preprocess.preprocess_NRT_file(t, sat)

2024-03-26 17:44:44,438 - FireLog - INFO - preprocessing SUOMI_VIIRS_C2_Global_VNP14IMGTDL_NRT_2023240.txt
2024-03-26 17:44:46,009 - FireLog - INFO - func:preprocess_input_file took: 1.57 sec
2024-03-26 17:44:46,025 - FireLog - INFO - preprocessing SUOMI_VIIRS_C2_Global_VNP14IMGTDL_NRT_2023241.txt
2024-03-26 17:44:47,602 - FireLog - INFO - func:preprocess_input_file took: 1.58 sec
2024-03-26 17:44:47,614 - FireLog - INFO - preprocessing SUOMI_VIIRS_C2_Global_VNP14IMGTDL_NRT_2023242.txt
2024-03-26 17:44:49,022 - FireLog - INFO - func:preprocess_input_file took: 1.41 sec
2024-03-26 17:44:49,036 - FireLog - INFO - preprocessing SUOMI_VIIRS_C2_Global_VNP14IMGTDL_NRT_2023243.txt
2024-03-26 17:44:50,382 - FireLog - INFO - func:preprocess_input_file took: 1.35 sec
2024-03-26 17:44:50,396 - FireLog - INFO - preprocessing SUOMI_VIIRS_C2_Global_VNP14IMGTDL_NRT_2023244.txt
2024-03-26 17:44:51,943 - FireLog - INFO - func:preprocess_input_file took: 1.55 sec
2024-03-26 17:44:51,959 - FireLog - INFO

CPU times: user 17.7 s, sys: 536 ms, total: 18.2 s
Wall time: 30.6 s


## Once per region and t

Do initial filtering and clustering using the preprocessed region and the half day files. For this notebook we will read all the data from local storage. In practice some or all of it will likely be available on s3.

Note: for the purpose of timing I am running this in a for loop, but each of these steps could run in a separate proccess fully in parallel.

In [4]:
%%time
for t in list_of_ts:
    preprocess.preprocess_region_t(t, sensor="VIIRS", region=region, read_location="local", force=True)

2024-03-26 17:45:15,067 - FireLog - INFO - func:read_region took: 209.79 ms
2024-03-26 17:45:15,068 - FireLog - INFO - filtering and clustering 2023-8-28 AM, VIIRS, WesternUS
2024-03-26 17:45:15,130 - FireLog - INFO - func:read_preprocessed_input took: 61.76 ms
2024-03-26 17:45:15,189 - FireLog - INFO - func:read_preprocessed_input took: 58.41 ms
2024-03-26 17:46:08,501 - FireLog - INFO - func:do_clustering took: 411.77 ms
2024-03-26 17:46:08,562 - FireLog - INFO - func:preprocess_region_t took: 53.70 sec
2024-03-26 17:46:08,747 - FireLog - INFO - func:read_region took: 184.20 ms
2024-03-26 17:46:08,748 - FireLog - INFO - filtering and clustering 2023-8-28 PM, VIIRS, WesternUS
2024-03-26 17:46:08,843 - FireLog - INFO - func:read_preprocessed_input took: 94.73 ms
2024-03-26 17:46:08,938 - FireLog - INFO - func:read_preprocessed_input took: 94.19 ms
2024-03-26 17:46:30,698 - FireLog - INFO - func:do_clustering took: 18.09 ms
2024-03-26 17:46:30,745 - FireLog - INFO - func:preprocess_regi

CPU times: user 4min 23s, sys: 540 ms, total: 4min 23s
Wall time: 4min 35s
