#### UTILITARY SCRIPT TO CONVERT IMAGES IN AGGREGATED DATA

This notebook, built for all collaborators, allows the conversion of satellite image data into a single value, for each spectral band function, for each forest plot of a given campaign.
Output is a dataframe with one row for each forest's plot, and one column for each spectral bands function.

As it is explained in [this diagram](shematic_diagram_of_images_acquisition.png) taken from the presentation of the project, the principle was to cut out in a the global satellite image (size : 200 X 200 meters) a reduced square (of the size of the parcels approximately, 25 X 25 meters...) and to aggregate it, with a function like the mean of all the pixels value.

In [1]:
import pandas as pd
import numpy as np
import json
import itertools
import copy


Size of the data square in the global image :

In [2]:
DIM_AGREGATION = 25 # X 25 meters

Import initial list of parcelles :

In [3]:
df = pd.read_excel('DataFrames/data_parcelles_with_gps.xlsx')

In [4]:
df_base = df[['PARCELLE','LFI']]

----------------

#### Option 1 / To Import and to convert a unique JSON File of satellites images :

In [29]:
LFI = 1 # define the campaign

First_parcelle = '98383' #the name of the first parcelle of the dict

In [30]:
Extension = '_pb2' #in the case of problematics parcelles file, to fill with expression like "_pb1"

In [31]:
with open(f"Results_Images_Stock/Images_LANDSAT_LFI{LFI}{Extension}.json", 'r') as openfile:
 
    data = json.load(openfile)

Verifications (lenght, LFI and dims) :

In [32]:
len(data)

4

Try with a parcelle:

In [33]:
data[First_parcelle]['LFI']

'LFI1'

In [34]:
np.shape(data[First_parcelle]['IMAGES_SAT']['NDVI'])

(201, 201)

LFI Agregation (if error : correspondance false)

In [35]:
LFI = data[First_parcelle]['LFI']
df_result = df_base.loc[df_base['LFI']==LFI,:].reset_index(drop=True)

Script and loop :

In [36]:
# to define variables and empty variables
# vectorization into numpy array formats (for computing time !)

nb_parc = len(df_result)
problematic_parcelles = []
list_parcelle = df_result['PARCELLE'].tolist()

width = np.shape(data[First_parcelle]['IMAGES_SAT']['NDVI'])[0]
height = np.shape(data[First_parcelle]['IMAGES_SAT']['NDVI'])[1]
range_w = np.arange(int((width - DIM_AGREGATION)/2), int((width + DIM_AGREGATION)/2))
range_h = np.arange(int((height - DIM_AGREGATION)/2), int((height + DIM_AGREGATION)/2))
ndvi_arr = np.empty(nb_parc)
evi_arr = np.empty(nb_parc)
ndmi_arr = np.empty(nb_parc)
ndwi_arr = np.empty(nb_parc)
dswi_arr = np.empty(nb_parc)

MAPPING = {
    'NDVI' : ndvi_arr,
    'EVI' : evi_arr,
    'NDMI' : ndmi_arr,
    'NDWI' : ndwi_arr,
    'DSWI' : dswi_arr
}

# main loop with mean aggregation for each forest plot :

for ind, parc in enumerate(list_parcelle):
    try:
        for func_name, arr in MAPPING.items():
            pixel_values = data[str(parc)]['IMAGES_SAT'][func_name]
            arr[ind] = round(np.mean([pixel_values[tupl[0]][tupl[1]] for tupl in list(itertools.product(range_w,range_h))]),4)
    except:
        problematic_parcelles.append(parc) # to stock a list of problematics forest plots
        for _ , arr in MAPPING.items():
            arr[ind] = np.nan

# writing results :
df_result['NDVI'] = ndvi_arr
df_result['EVI'] = evi_arr
df_result['NDMI'] = ndmi_arr
df_result['NDWI'] = ndwi_arr
df_result['DSWI'] = dswi_arr

Verify number of parcelles with "NaN" data :

In [37]:
df_result['DSWI'].isnull().sum()

2399

In [38]:
len(problematic_parcelles)

2399

Export :

In [39]:
df_result.to_excel(f'./DATA_aggregated/Data_from_satellites_images_{LFI}_pb2.xlsx')

------------------

#### Option 2) To Import and to convert multiple JSON Files of satellites images :

The same as above with several part files for one campaign (depending on the user's choice when exporting the satellite images to json in the last step)

Parameters :

In [229]:
nb_parts = 8 # numbers of files

LFI_name = 'LFI4'

LFI = 4

width = 200 # of initial images

height = 200 # of initial images


Loop for conversion in a list of 'nb_parts' dataframes...

In [173]:
df_result_base = df_base.loc[df_base['LFI']==LFI_name,:].reset_index(drop=True)
dataframes_results = []
nb_parc = len(df_result_base)
list_parcelle = df_result_base['PARCELLE'].tolist()
range_w = np.arange(int((width - DIM_AGREGATION)/2), int((width + DIM_AGREGATION)/2))
range_h = np.arange(int((height - DIM_AGREGATION)/2), int((height + DIM_AGREGATION)/2))


for i in range(nb_parts):
    try:
        with open(f"Results_Images_Stock/Images_LANDSAT_LFI{LFI}_part{i+1}.json", 'r') as openfile:
            data = json.load(openfile)
        print(f'Opening file part {i+1} with lenght {len(data)} ...')

        empty_parcelles = []
        df_result = copy.copy(df_result_base)
        ndvi_arr = np.empty(nb_parc)
        evi_arr = np.empty(nb_parc)
        ndmi_arr = np.empty(nb_parc)
        ndwi_arr = np.empty(nb_parc)
        dswi_arr = np.empty(nb_parc)
        
        MAPPING = {
            'NDVI' : ndvi_arr,
            'EVI' : evi_arr,
            'NDMI' : ndmi_arr,
            'NDWI' : ndwi_arr,
            'DSWI' : dswi_arr
        }

        print('Conversion and aggregation...')
        try:
            for ind, parc in enumerate(list_parcelle):
                try:
                    for func_name, arr in MAPPING.items():
                        pixel_values = np.empty(2)
                        pixel_values = data[str(parc)]['IMAGES_SAT'][func_name]
                        arr[ind] = round(np.mean([pixel_values[tupl[0]][tupl[1]] for tupl in list(itertools.product(range_w,range_h))]),4)
                except:
                    empty_parcelles.append(parc)
                    for _ , arr in MAPPING.items():
                        arr[ind] = np.nan
        except:
            print(f'Conversion\'s problem with file part {i+i} ...')

        print('Creation of a dataframe...')
                
        df_result['NDVI'] = ndvi_arr
        df_result['EVI'] = evi_arr
        df_result['NDMI'] = ndmi_arr
        df_result['NDWI'] = ndwi_arr
        df_result['DSWI'] = dswi_arr

        dataframes_results.append(df_result)

        print(f"Difference of empty data for NDVI : {len(empty_parcelles) - df_result['NDVI'].isnull().sum()}")
        print(f"Difference of empty data for EVI : {len(empty_parcelles) - df_result['EVI'].isnull().sum()}")
        print(f"Difference of empty data for NDMI : {len(empty_parcelles) - df_result['NDMI'].isnull().sum()}")
        print(f"Difference of empty data for NDWI : {len(empty_parcelles) - df_result['NDWI'].isnull().sum()}")
        print(f"Difference of empty data for DSWI : {len(empty_parcelles) - df_result['DSWI'].isnull().sum()}")

    except:
        print(f'Opening problem with file part {i+i} ...')
    print('Done...')
print('Finish...')              

Opening file part 1 with lenght 25 ...
Conversion and aggregation...
Creation of a dataframe...
Difference of empty data for NDVI : 0
Difference of empty data for EVI : 0
Difference of empty data for NDMI : 0
Difference of empty data for NDWI : 0
Difference of empty data for DSWI : 0
Done...
Opening file part 2 with lenght 25 ...
Conversion and aggregation...
Creation of a dataframe...
Difference of empty data for NDVI : 0
Difference of empty data for EVI : 0
Difference of empty data for NDMI : 0
Difference of empty data for NDWI : 0
Difference of empty data for DSWI : 0
Done...
Opening file part 3 with lenght 25 ...
Conversion and aggregation...
Creation of a dataframe...
Difference of empty data for NDVI : 0
Difference of empty data for EVI : 0
Difference of empty data for NDMI : 0
Difference of empty data for NDWI : 0
Difference of empty data for DSWI : 0
Done...
Opening file part 4 with lenght 25 ...
Conversion and aggregation...
Creation of a dataframe...
Difference of empty data 

Join with the 'n_parts' dataframes ...

In [220]:
nb_lines = len(df_result_base)
result_arr = np.empty((nb_lines,5))

for i in range(nb_parts):
    c=0
    arr = np.empty((nb_lines,5))
    arr = dataframes_results[i].iloc[:,2:].to_numpy()
    for line in range(nb_lines):
        for col in range(5):
            if str(arr[line][col]) != 'nan':
                result_arr[line][col] = copy.copy(arr[line][col])
                c+=1
    print(f'Join n°{i} : {c} data copied ...')
    
    

Join n°0 : 115 data copied ...
Join n°1 : 120 data copied ...
Join n°2 : 100 data copied ...
Join n°3 : 105 data copied ...
Join n°4 : 120 data copied ...
Join n°5 : 110 data copied ...
Join n°6 : 110 data copied ...
Join n°7 : 130 data copied ...


Creation of the datafame result :

In [230]:
df_result = copy.copy(df_result_base)
df_result[['NDVI','EVI','NDMI','NDWI','DSWI']] = result_arr
df_result

Unnamed: 0,PARCELLE,LFI,NDVI,EVI,NDMI,NDWI,DSWI
0,51,LFI4,,,,,
1,384,LFI4,,,,,
2,1239,LFI4,,,,,
3,1419,LFI4,,,,,
4,1431,LFI4,,,,,
...,...,...,...,...,...,...,...
2398,164918,LFI4,,,,,
2399,164922,LFI4,,,,,
2400,164999,LFI4,,,,,
2401,165003,LFI4,,,,,


Test with a parcelle :

In [231]:
df_result.loc[df_result['PARCELLE']==12298,:]

Unnamed: 0,PARCELLE,LFI,NDVI,EVI,NDMI,NDWI,DSWI
37,12298,LFI4,0.1773,-0.0027,0.0684,-0.1201,0.1701


Final export :

In [232]:
df_result.to_excel(f'./DATA_aggregated/Data_from_satellites_images_{LFI_name}.xlsx')