**Name**:
Build_ML_train df

**Description**:  
In this notebook, we will build the dataset for training the ML model.

**Date created**:  
`2024-12-26`

**Author**:  
Asaf Vanunu

---

- First we will upload our libraries and the df.

In [1]:
import os
import pandas as pd
import geopandas as gpd
import numpy as np
import ML_toolbox as MLT ## Custom module
from rasterio.plot import show
import itertools

In [2]:
main_path = os.getcwd() ## get the current working directory
temporal_df_path = "temporal_df"
for file in os.listdir(main_path): ## loop through the files in the current working directory
    if file.endswith(".csv"): ## check if the file is a csv file
        NC_VIIRS_df = pd.read_csv(file) ## read the csv file
temporal_df_csv_list = [os.path.join(temporal_df_path, file) for file in os.listdir(temporal_df_path) if file.endswith(".csv")]
for file in temporal_df_csv_list: ## loop through the files in the temporal_df folder
    if file.endswith(".csv"): ## check if the file is a csv file
        file_name = os.path.basename(file) ## get the name of the file
        if file_name.startswith("no_4"): ## check if the file is the no_4 csv file
            no_4_df = pd.read_csv(file) ## read the no_4 csv file
        elif file_name.startswith("temporal"): ## check if the file is the temporal csv file
            temporal_df = pd.read_csv(file)## read the temporal csv file

- So now we have 3 df:
- NetCDF df with matching VIIRS
- Temporal df with matching GOES time
- df who doesent have 4 temporal files

- Now we will filter out files that doesent have 4 temporal files. from our df.

In [3]:
NC_VIIRS_df = NC_VIIRS_df[~np.isin(NC_VIIRS_df["GOES_date_time"], no_4_df["No_4_files"])].reset_index(drop=True)

In [4]:
NC_VIIRS_df[:5]

Unnamed: 0,GOES_file_name,GOES_date_time,MCMI,FDC,ACM,VIIRS_file,VIIRS_file_full_path
0,s202201010731.nc,2022-01-01 07:31,F:\ML_project\GOES_16\MCMI\OR_ABI-L2-MCMIPC-M6...,F:\ML_project\GOES_16\FDC\OR_ABI-L2-FDCC-M6_G1...,F:\ML_project\GOES_16\ACM\OR_ABI-L2-ACMC-M6_G1...,VNP14IMG.A2022001.0724.002.2024075110909.nc,F:\ML_project\east_us\VIIRS\VIIRS_fire\VNP14IM...
1,s202201010911.nc,2022-01-01 09:11,F:\ML_project\GOES_16\MCMI\OR_ABI-L2-MCMIPC-M6...,F:\ML_project\GOES_16\FDC\OR_ABI-L2-FDCC-M6_G1...,F:\ML_project\GOES_16\ACM\OR_ABI-L2-ACMC-M6_G1...,VNP14IMG.A2022001.0906.002.2024075110907.nc,F:\ML_project\east_us\VIIRS\VIIRS_fire\VNP14IM...
2,s202201011846.nc,2022-01-01 18:46,F:\ML_project\GOES_16\MCMI\OR_ABI-L2-MCMIPC-M6...,F:\ML_project\GOES_16\FDC\OR_ABI-L2-FDCC-M6_G1...,F:\ML_project\GOES_16\ACM\OR_ABI-L2-ACMC-M6_G1...,VNP14IMG.A2022001.1842.002.2024075110906.nc,F:\ML_project\east_us\VIIRS\VIIRS_fire\VNP14IM...
3,s202201012021.nc,2022-01-01 20:21,F:\ML_project\GOES_16\MCMI\OR_ABI-L2-MCMIPC-M6...,F:\ML_project\GOES_16\FDC\OR_ABI-L2-FDCC-M6_G1...,F:\ML_project\GOES_16\ACM\OR_ABI-L2-ACMC-M6_G1...,VNP14IMG.A2022001.2018.002.2024075110907.nc,F:\ML_project\east_us\VIIRS\VIIRS_fire\VNP14IM...
4,s202201012031.nc,2022-01-01 20:31,F:\ML_project\GOES_16\MCMI\OR_ABI-L2-MCMIPC-M6...,F:\ML_project\GOES_16\FDC\OR_ABI-L2-FDCC-M6_G1...,F:\ML_project\GOES_16\ACM\OR_ABI-L2-ACMC-M6_G1...,VNP14IMG.A2022001.2024.002.2024075110907.nc,F:\ML_project\east_us\VIIRS\VIIRS_fire\VNP14IM...


- Now we can load our VIIRS files

In [5]:
ML_project_path = "F:\\ML_project"
AOI_list = ["mexico", "east_us"] ## list of AOIs
for dir in os.listdir(ML_project_path): ## loop through the directories in the ML_project folder
    if dir in AOI_list: ## check if the directory is in the AOI_list
        print(f"Now working in {dir} directory")
        AOI_path = os.path.join(ML_project_path, dir) ## get the path of the AOI directory
        for sub_dir in os.listdir(AOI_path): ## loop through the directories in the AOI directory
            if sub_dir == "VIIRS": ## check if the directory is the VIIRS directory
                print(f"Now working in {sub_dir} directory")
                VIIRS_path = os.path.join(AOI_path, sub_dir)
                for sub_VIIRS_dir in os.listdir(VIIRS_path):## loop through the directories in the VIIRS directory
                    if sub_VIIRS_dir == "VIIRS_points": ## check if the directory is the VIIRS_points directory
                        print(f"Now working in {sub_VIIRS_dir} directory")
                        VIIRS_points_path = os.path.join(VIIRS_path, sub_VIIRS_dir) ## get the path of the VIIRS_points directory
                        for file in os.listdir(VIIRS_points_path): ## loop through the files in the VIIRS_points directory
                            if file.endswith(".shp"):
                                print(f"Now working in {file} file")
                                VIIRS_file_path = os.path.join(VIIRS_points_path, file)
                                if dir == "mexico":
                                    mexico_VIIRS_gdf = gpd.read_file(VIIRS_file_path)
                                elif dir == "east_us":
                                    east_us_VIIRS_gdf = gpd.read_file(VIIRS_file_path)
    


Now working in east_us directory
Now working in VIIRS directory
Now working in VIIRS_points directory
Now working in VIIRS_points_east_US.shp file
Now working in mexico directory
Now working in VIIRS directory
Now working in VIIRS_points directory
Now working in VIIRS_points_mexico.shp file


- Now we will concat the VIIRS files into one gdf

In [6]:
VIIRS = pd.concat([mexico_VIIRS_gdf, east_us_VIIRS_gdf], ignore_index=True)
VIIRS[:5]

Unnamed: 0,latitude,longitude,Fire_file,Unique_dat,Unique_tim,DATE,TIME,DATE_TIME,FRP,FP_MAD_DT,...,FP_T4,FP_T5,FP_con,FP_con_str,scan_line,grid_sampl,View_Zenit,night/day,region,geometry
0,28.68441,-106.08828,VNP14IMG.A2022001.0906.002.2024075110907.nc,A2022001,912,2022-01-01,09:12,2022-01-01 09:12,1.660624,-0.422919,...,307.34314,283.351349,8,n,2772,4281,33.149998,Night,mexico,POINT (-106.08828 28.68441)
1,23.640375,-103.911507,VNP14IMG.A2022001.2018.002.2024075110907.nc,A2022001,2024,2022-01-01,20:24,2022-01-01 20:24,10.240311,0.511257,...,341.236176,303.78421,8,n,5708,2173,31.449999,Day,mexico,POINT (-103.91151 23.64038)
2,23.96534,-104.134377,VNP14IMG.A2022001.2018.002.2024075110907.nc,A2022001,2024,2022-01-01,20:24,2022-01-01 20:24,13.236159,0.489004,...,336.16922,302.682037,8,n,5812,2203,30.51,Day,mexico,POINT (-104.13438 23.96534)
3,24.011993,-104.437927,VNP14IMG.A2022001.2018.002.2024075110907.nc,A2022001,2024,2022-01-01,20:24,2022-01-01 20:24,3.556983,0.610675,...,334.978088,298.806671,8,n,5838,2261,28.699999,Day,mexico,POINT (-104.43793 24.01199)
4,24.008282,-104.43737,VNP14IMG.A2022001.2018.002.2024075110907.nc,A2022001,2024,2022-01-01,20:24,2022-01-01 20:24,15.063047,0.701251,...,339.781769,304.147705,8,n,5837,2261,28.699999,Day,mexico,POINT (-104.43737 24.00828)


* Now we can create a fire index column for VIIRS

In [7]:
## Create a new column called "Fire_index" in the VIIRS dataframe
# this is the normalized difference between the brightness temperature of the two channels I4 and I5
# I4 is 3.74 um and I5 is 11.45 um
VIIRS["Fire_index"] = (VIIRS["FP_T4"] - VIIRS["FP_T5"])/(VIIRS["FP_T4"] + VIIRS["FP_T5"])

* Now we will crop GOES bands to matching VIIRS and create a fire index

In [8]:
B7 = MLT.crop_GOES_using_VIIRS(GOES_path=NC_VIIRS_df["MCMI"].iloc[0], GOES_band=7, VIIRS_path=NC_VIIRS_df["VIIRS_file_full_path"].iloc[0])
B14 = MLT.crop_GOES_using_VIIRS(GOES_path=NC_VIIRS_df["MCMI"].iloc[0], GOES_band=14, VIIRS_path=NC_VIIRS_df["VIIRS_file_full_path"].iloc[0])

In [9]:
## open cloud mask file
cloud_mask = MLT.crop_GOES_using_VIIRS(GOES_path=NC_VIIRS_df["ACM"].iloc[0], GOES_band="ACM", VIIRS_path=NC_VIIRS_df["VIIRS_file_full_path"].iloc[0])
cloud_mask_values = cloud_mask.values[0]

In [10]:
## all MCMI bands
all = MLT.crop_GOES_using_VIIRS(GOES_path=NC_VIIRS_df["MCMI"].iloc[0], GOES_band="all", VIIRS_path=NC_VIIRS_df["VIIRS_file_full_path"].iloc[0])

In [11]:
FI = (B7.values[0] - B14.values[0])/(B7.values[0] + B14.values[0])

In [12]:
B7.values[0].shape

(1095, 1744)

* We can also open the cloud mask

* Now we will filter the VIIRS gdf so it will match the GOES image we are going to use

In [13]:
VIIRS_file = NC_VIIRS_df["VIIRS_file"].iloc[0]
VIIRS_filter = VIIRS[VIIRS["Fire_file"] == VIIRS_file]

* Now we will use the rasterize function to create an array for the VIIRS gdf

In [14]:
rasterize_VIIRS = MLT.rasterize_VIIRS(cropped_GOES_image=B7, filter_VIIRS_gdf=VIIRS_filter,
                                      rasterize_type="count", number_of_VIIRS_points=1, VIIRS_band = None)

* Now we will use the rasterize VIIRS to get the actual locations of the GOES fire pixels

In [15]:
GOES_fp_pixel_list = MLT.get_GOES_actual_fire_pixel_locations(GOES_Fire_Index_array=FI, rasterize_VIIRS=rasterize_VIIRS)

In [16]:
len(GOES_fp_pixel_list)

35

* Here we will create a list of pixel locations that are fires or fire neighbors. So that means we cant use them for labeling non fire pixels
* We will get the locations of: VIIRS fire pixels and neighbors, GOES fire pixels and neighbors and nan values
* We also make sure that the list dosent have duplicates

In [16]:
VIIRS_to_kill = MLT.VIIRS_locations_to_kill(rasterize_VIIRS=rasterize_VIIRS)
GOES_to_kill = MLT.GOES_locations_to_kill(GOES_fp_list=GOES_fp_pixel_list,GOES_Fire_Index_array=FI)
nan_kill_list = MLT.nan_locations_to_kill(GOES_Fire_Index_array=FI)
kill_list = GOES_to_kill + VIIRS_to_kill + nan_kill_list ## combine the GOES_to_kill, VIIRS_to_kill and nan_loc_list
corrected_kill  = list(kill_list for kill_list,_ in itertools.groupby(kill_list)) ## remove duplicates

* After that we can label the non fire pixels from the GOES image

In [17]:
non_fire_pixels = MLT.get_random_non_fire_pixels(GOES_Fire_Index_array=FI, number_of_non_fire_pixels=1000, corrected_kill_list=corrected_kill)

* Now we can create a df of fire pixels to train

In [18]:
df_fire_pixels = MLT.get_fire_pixel_values_in_all_bands(pixel_location_list=GOES_fp_pixel_list,
                                            MCMI_path=NC_VIIRS_df["MCMI"].iloc[0],
                                            FDC_path=NC_VIIRS_df["FDC"].iloc[0],
                                            ACM_path=NC_VIIRS_df["ACM"].iloc[0],
                                            VIIRS_path=NC_VIIRS_df["VIIRS_file_full_path"].iloc[0],
                                            GOES_date_time=NC_VIIRS_df["GOES_date_time"].iloc[0],
                                            rasterize_VIIRS=rasterize_VIIRS,
                                            cloud_probability_list=[3,4])

in pixel 245, 804 all of the neighbors are clouds
in pixel 245, 804 all of the neighbors are clouds
in pixel 245, 804 all of the neighbors are clouds
in pixel 245, 804 all of the neighbors are clouds
in pixel 245, 804 all of the neighbors are clouds
in pixel 273, 739 all of the neighbors are clouds
in pixel 273, 739 all of the neighbors are clouds
in pixel 273, 739 all of the neighbors are clouds
in pixel 273, 739 all of the neighbors are clouds
in pixel 273, 739 all of the neighbors are clouds
in pixel 274, 741 all of the neighbors are clouds
in pixel 274, 741 all of the neighbors are clouds
in pixel 274, 741 all of the neighbors are clouds
in pixel 274, 741 all of the neighbors are clouds
in pixel 274, 741 all of the neighbors are clouds
in pixel 295, 730 all of the neighbors are clouds
in pixel 295, 730 all of the neighbors are clouds
in pixel 295, 730 all of the neighbors are clouds
in pixel 295, 730 all of the neighbors are clouds
in pixel 295, 730 all of the neighbors are clouds


In [19]:
df_fire_pixels

Unnamed: 0,t0_MCMI_file,t0_GOES_date_time,row,col,VIIRS_fp_max,VIIRS_fp_sum,t0_B01_value,t0_B01_mean,t0_B01_median,t0_B01_std,...,t0_tFI_mean,t0_FI_median,t0_FI_std,t0_FI_min,t0_FI_max,t0_FDC_value,t0_ACM_value,is_day,is_night,is_day_night
0,s202201010731.nc,2022-01-01 07:31,245,804,2.0,2.0,0.0,-999.0,-999.0,-999.0,...,-999.0,-999.0,-999.0,-999.0,-999.0,240.0,3.0,0,1,0
1,s202201010731.nc,2022-01-01 07:31,273,739,1.0,2.0,0.0,-999.0,-999.0,-999.0,...,-999.0,-999.0,-999.0,-999.0,-999.0,200.0,3.0,0,1,0
2,s202201010731.nc,2022-01-01 07:31,274,741,1.0,1.0,0.0,-999.0,-999.0,-999.0,...,-999.0,-999.0,-999.0,-999.0,-999.0,200.0,3.0,0,1,0
3,s202201010731.nc,2022-01-01 07:31,295,730,2.0,4.0,0.0,-999.0,-999.0,-999.0,...,-999.0,-999.0,-999.0,-999.0,-999.0,100.0,3.0,0,1,0
4,s202201010731.nc,2022-01-01 07:31,303,805,1.0,1.0,0.0,0.0,0.0,0.0,...,-0.001789,-0.001742,0.000901,-0.003022,-0.000345,100.0,1.0,0,1,0
5,s202201010731.nc,2022-01-01 07:31,308,806,1.0,1.0,0.0,0.0,0.0,0.0,...,-0.000726,-0.000663,0.000377,-0.001214,-0.000123,100.0,0.0,0,1,0
6,s202201010731.nc,2022-01-01 07:31,326,1003,1.0,1.0,0.0,0.0,0.0,0.0,...,-0.002071,-0.001897,0.001101,-0.003993,-0.000654,100.0,1.0,0,1,0
7,s202201010731.nc,2022-01-01 07:31,332,497,1.0,1.0,0.0,0.0,0.0,0.0,...,-0.001148,-0.000702,0.001541,-0.003858,0.000871,100.0,1.0,0,1,0
8,s202201010731.nc,2022-01-01 07:31,331,823,2.0,2.0,0.0,0.0,0.0,0.0,...,-0.001019,-0.000889,0.000573,-0.00186,-0.000386,100.0,0.0,0,1,0
9,s202201010731.nc,2022-01-01 07:31,334,882,1.0,1.0,0.0,0.0,0.0,0.0,...,-0.002587,-0.002755,0.001049,-0.003918,-0.000729,100.0,2.0,0,1,0


* Now for the same fire pixels we will take their temporal data

In [None]:
df_temporal_fire_pixels = MLT.get_temporal_fire_pixel_values_in_all_bands(temporal_df=temporal_df,
                                                                          pixel_location_list=GOES_fp_pixel_list,
                                                                          VIIRS_path=NC_VIIRS_df["VIIRS_file_full_path"].iloc[0],
                                                                          GOES_date_time=NC_VIIRS_df["GOES_date_time"].iloc[0],
                                                                          temporal_images=4,
                                                                          cloud_probability_list=[3,4])

in pixel 245, 804 all of the neighbors are clouds
in pixel 245, 804 all of the neighbors are clouds
in pixel 245, 804 all of the neighbors are clouds
in pixel 245, 804 all of the neighbors are clouds
in pixel 245, 804 all of the neighbors are clouds
in pixel 273, 739 all of the neighbors are clouds
in pixel 273, 739 all of the neighbors are clouds
in pixel 273, 739 all of the neighbors are clouds
in pixel 273, 739 all of the neighbors are clouds
in pixel 273, 739 all of the neighbors are clouds
in pixel 245, 804 all of the neighbors are clouds
in pixel 245, 804 all of the neighbors are clouds
in pixel 245, 804 all of the neighbors are clouds
in pixel 245, 804 all of the neighbors are clouds
in pixel 245, 804 all of the neighbors are clouds
in pixel 273, 739 all of the neighbors are clouds
in pixel 273, 739 all of the neighbors are clouds
in pixel 273, 739 all of the neighbors are clouds
in pixel 273, 739 all of the neighbors are clouds
in pixel 273, 739 all of the neighbors are clouds


* Now we can concat the the df and add a label column

In [21]:
df_fire_pixels_concat_temporal = pd.concat([df_fire_pixels, df_temporal_fire_pixels], axis=1)

In [22]:
df_fire_pixels_concat_temporal["fire_label"] = 1

In [23]:
df_fire_pixels_concat_temporal[:5]

Unnamed: 0,t0_MCMI_file,t0_GOES_date_time,row,col,VIIRS_fp_max,VIIRS_fp_sum,t0_B01_value,t0_B01_mean,t0_B01_median,t0_B01_std,...,t4_B16_max,t4_FI_value,t4_FI_mean,t4_FI_median,t4_FI_std,t4_FI_min,t4_FI_max,t4_FDC_value,t4_ACM_value,fire_label
0,s202201010731.nc,2022-01-01 07:31,245,804,2.0,2.0,0.0,-999.0,-999.0,-999.0,...,-999.0,0.013518,-999.0,-999.0,-999.0,-999.0,-999.0,240.0,3.0,1
1,s202201010731.nc,2022-01-01 07:31,273,739,1.0,2.0,0.0,-999.0,-999.0,-999.0,...,-999.0,0.00954,-999.0,-999.0,-999.0,-999.0,-999.0,100.0,3.0,1
2,s202201010731.nc,2022-01-01 07:31,274,741,1.0,1.0,0.0,-999.0,-999.0,-999.0,...,-999.0,0.010895,-999.0,-999.0,-999.0,-999.0,-999.0,240.0,3.0,1
3,s202201010731.nc,2022-01-01 07:31,295,730,2.0,4.0,0.0,-999.0,-999.0,-999.0,...,268.74057,0.006053,0.004273,0.004349,0.000522,0.003545,0.004847,100.0,3.0,1
4,s202201010731.nc,2022-01-01 07:31,303,805,1.0,1.0,0.0,0.0,0.0,0.0,...,270.503174,-0.002857,-0.002898,-0.00283,0.000714,-0.004262,-0.002012,100.0,1.0,1


* Now we will do the same for the non fire pixels

In [24]:
df_non_fire_pixels = MLT.get_fire_pixel_values_in_all_bands(pixel_location_list=non_fire_pixels,
                                            MCMI_path=NC_VIIRS_df["MCMI"].iloc[0],
                                            FDC_path=NC_VIIRS_df["FDC"].iloc[0],
                                            ACM_path=NC_VIIRS_df["ACM"].iloc[0],
                                            VIIRS_path=NC_VIIRS_df["VIIRS_file_full_path"].iloc[0],
                                            GOES_date_time=NC_VIIRS_df["GOES_date_time"].iloc[0],
                                            rasterize_VIIRS=rasterize_VIIRS)

in pixel 207, 643 all of the neighbors are clouds
in pixel 207, 643 all of the neighbors are clouds
in pixel 207, 643 all of the neighbors are clouds
in pixel 207, 643 all of the neighbors are clouds
in pixel 207, 643 all of the neighbors are clouds
in pixel 230, 908 all of the neighbors are clouds
in pixel 230, 908 all of the neighbors are clouds
in pixel 230, 908 all of the neighbors are clouds
in pixel 230, 908 all of the neighbors are clouds
in pixel 230, 908 all of the neighbors are clouds
in pixel 269, 617 all of the neighbors are clouds
in pixel 269, 617 all of the neighbors are clouds
in pixel 269, 617 all of the neighbors are clouds
in pixel 269, 617 all of the neighbors are clouds
in pixel 269, 617 all of the neighbors are clouds
in pixel 92, 1017 all of the neighbors are clouds
in pixel 92, 1017 all of the neighbors are clouds
in pixel 92, 1017 all of the neighbors are clouds
in pixel 92, 1017 all of the neighbors are clouds
in pixel 92, 1017 all of the neighbors are clouds


In [25]:
df_temporal_non_fire_pixels = MLT.get_temporal_fire_pixel_values_in_all_bands(temporal_df=temporal_df,
                                                                          pixel_location_list=non_fire_pixels,
                                                                          VIIRS_path=NC_VIIRS_df["VIIRS_file_full_path"].iloc[0],
                                                                          GOES_date_time=NC_VIIRS_df["GOES_date_time"].iloc[0],
                                                                          temporal_images=4)

in pixel 207, 643 all of the neighbors are clouds
in pixel 207, 643 all of the neighbors are clouds
in pixel 207, 643 all of the neighbors are clouds
in pixel 207, 643 all of the neighbors are clouds
in pixel 207, 643 all of the neighbors are clouds
in pixel 230, 908 all of the neighbors are clouds
in pixel 230, 908 all of the neighbors are clouds
in pixel 230, 908 all of the neighbors are clouds
in pixel 230, 908 all of the neighbors are clouds
in pixel 230, 908 all of the neighbors are clouds
in pixel 269, 617 all of the neighbors are clouds
in pixel 269, 617 all of the neighbors are clouds
in pixel 269, 617 all of the neighbors are clouds
in pixel 269, 617 all of the neighbors are clouds
in pixel 269, 617 all of the neighbors are clouds
in pixel 92, 1017 all of the neighbors are clouds
in pixel 92, 1017 all of the neighbors are clouds
in pixel 92, 1017 all of the neighbors are clouds
in pixel 92, 1017 all of the neighbors are clouds
in pixel 92, 1017 all of the neighbors are clouds


* Now we can concat them and label them

In [26]:
df_non_fire_pixels_concat_temporal = pd.concat([df_non_fire_pixels, df_temporal_non_fire_pixels], axis=1)

In [27]:
df_non_fire_pixels_concat_temporal["fire_label"] = 0 ## add a column called "fire_label" and set it to 0

* Finally we can concat the fire and non fire pixels to a df

In [28]:
train_df = pd.concat([df_fire_pixels_concat_temporal, df_non_fire_pixels_concat_temporal]).reset_index(drop=True)

In [None]:
train_df[:5]

Unnamed: 0,t0_MCMI_file,t0_GOES_date_time,row,col,VIIRS_fp_max,VIIRS_fp_sum,t0_B01_value,t0_B01_mean,t0_B01_median,t0_B01_std,...,t4_B16_max,t4_FI_value,t4_FI_mean,t4_FI_median,t4_FI_std,t4_FI_min,t4_FI_max,t4_FDC_value,t4_ACM_value,fire_label
0,s202201010731.nc,2022-01-01 07:31,245,804,2.0,2.0,0.0,-999.0,-999.0,-999.0,...,-999.0,0.013518,-999.0,-999.0,-999.0,-999.0,-999.0,240.0,3.0,1
1,s202201010731.nc,2022-01-01 07:31,273,739,1.0,2.0,0.0,-999.0,-999.0,-999.0,...,-999.0,0.00954,-999.0,-999.0,-999.0,-999.0,-999.0,100.0,3.0,1
2,s202201010731.nc,2022-01-01 07:31,274,741,1.0,1.0,0.0,-999.0,-999.0,-999.0,...,-999.0,0.010895,-999.0,-999.0,-999.0,-999.0,-999.0,240.0,3.0,1
3,s202201010731.nc,2022-01-01 07:31,295,730,2.0,4.0,0.0,-999.0,-999.0,-999.0,...,268.74057,0.006053,0.004273,0.004349,0.000522,0.003545,0.004847,100.0,3.0,1
4,s202201010731.nc,2022-01-01 07:31,303,805,1.0,1.0,0.0,0.0,0.0,0.0,...,270.503174,-0.002857,-0.002898,-0.00283,0.000714,-0.004262,-0.002012,100.0,1.0,1


# Now we can do it as a one function

In [17]:
no_VIIRS_points_GOES_files_list = [] ## create an empty list to store the GOES files with no VIIRS points
error_list = [] ## create an empty list to store the errors
ml_df_list = [] ## create an empty list to store the ml_df

for i in range(len(NC_VIIRS_df[:500])): ## loop through the first 2 rows of the NC_VIIRS_df
    MCMI_path = NC_VIIRS_df["MCMI"].iloc[i] ## get the MCMI path
    FDC_path = NC_VIIRS_df["FDC"].iloc[i] ## get the FDC path
    ACM_path = NC_VIIRS_df["ACM"].iloc[i] ## get the ACM path
    VIIRS_path = NC_VIIRS_df["VIIRS_file_full_path"].iloc[i] ## get the VIIRS path
    GOES_date_time = NC_VIIRS_df["GOES_date_time"].iloc[i] ## get the GOES date time
    VIIRS_file = NC_VIIRS_df["VIIRS_file"].iloc[i] ## get the VIIRS file
    VIIRS_filter = VIIRS[VIIRS["Fire_file"] == VIIRS_file] ## filter the VIIRS dataframe
    
    if len(VIIRS_filter) == 0: ## if the VIIRS length is 0
        print(f"No VIIRS points for {GOES_date_time}")
        no_VIIRS_points_GOES_files_list.append(GOES_date_time)
        continue ## skip the iteration
    else: ## if the VIIRS length is greater than 0
        try: ## try the following code
            ## create the ml_df
            ml_df = MLT.create_ML_training_df(MCMI_path=MCMI_path,
                                      FDC_path=FDC_path,
                                      ACM_path=ACM_path,
                                      VIIRS_path=VIIRS_path,
                                      GOES_date_time=GOES_date_time,
                                      filter_VIIRS=VIIRS_filter,
                                      temporal_df=temporal_df,
                                      VIIRS_threshold=1,
                                      number_of_temporal_GOES_images=4,
                                      number_of_non_fire_pixels=500,
                                      cloud_probability_list=[3,4])
            ml_df_list.append(ml_df) ## append the ml_df to the ml_df_list
            print(f"{i+1} out of {len(NC_VIIRS_df)} completed") ## print the progress
        except Exception as e: ## if there is an error
            print(f"Error: {e} in {GOES_date_time}") ## print the error
            error_list.append(GOES_date_time) ## append the GOES date time to the error list
            continue ## skip the iteration
            

Now working of GOES time stamp: 2022-01-01 07:31
rasterize VIIRS is done for GOES time stamp: 2022-01-01 07:31
GOES fire pixel list is done for GOES time stamp: 2022-01-01 07:31
list of locations not to sample is done for GOES time stamp: 2022-01-01 07:31
Starting to genrate 500 random non-fire pixels
Genrated 500 random non-fire pixels for GOES time stamp: 2022-01-01 07:31
staring to genrate fire pixel values for GOES time stamp: 2022-01-01 07:31
Starting to get the fire pixel values for GOES time stamp: 2022-01-01 07:31
done. Now starting working on the temporal data
done. Now starting working on the non-fire pixels
done. Now starting working on the temporal data
done. df is ready for GOES time stamp: 2022-01-01 07:31
1 out of 4449 completed
Now working of GOES time stamp: 2022-01-01 09:11
rasterize VIIRS is done for GOES time stamp: 2022-01-01 09:11
GOES fire pixel list is done for GOES time stamp: 2022-01-01 09:11
list of locations not to sample is done for GOES time stamp: 2022-01

In [18]:
ml_df_concat = pd.concat(ml_df_list).reset_index(drop=True) ## concatenate the ml_df_list

In [19]:
out_dir = r'C:\Users\asaf_rs\Dropbox\Fire_Detection\python_ML_project\create_ML_df\train_df' ## output directory

In [20]:
if not os.path.exists(out_dir): ## check if the output directory does not exist
    os.makedirs(out_dir) ## create the output directory
    ml_df_concat.to_csv(os.path.join(out_dir, "train_df.csv"), index=False) ## save the ml_df_concat to a csv file
else: ## if the output directory exists
    ml_df_concat.to_csv(os.path.join(out_dir, "train_df.csv"), index=False)

In [22]:
ml_df_concat.to_pickle(os.path.join(out_dir, "train_df.pkl")) ## save the ml_df_concat to a pickle file

In [29]:
len(ml_df_concat[ml_df_concat["fire_label"] == 0])

243000