# Notebook to prepare MERIT data 




- Date : fev 2024    
- Author : G. Sainton    


In this Notebook, don't use the LEO data, data must be retrieve from /Data/thnguyen/data/process_FI/ in CALER Server where the original Yamakazi data were stored by Hang.

Message from Hang:
- /Data/thnguyen/data/LEO/DOWN_xxx_DATA is GIEMS-D3 data (Aires et al., 2017) and not related to MERIT. At that time, GIEMS was downscaled monthly to 90m using HydroSHED data.
  xxx is the i-month since January 1993. So, for example, 013 is January 1994.
  xxx ranges from 001 to 180 for 180 months in the period 1993-2007.

- There are several folders of Yamazaki and they are quite messy. But the "good" directory for each variable (slope, HAND, etc.) was appointed by Yamazaki when he copied the folder to us, and I put them in the ̀HydroCell.m.



Dependances :
- MERIT_DP.m   : code of preprocessing made by Ranuy Fan during his M1 intership. This code is using 

  - HydroCell.m   
  - WindowCell.m






In [1]:
import os, sys
from pprint import pprint

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd 
import geopandas as gpd

# allow images to display in the notebook
from IPython.display import Image

# remove warnings
import warnings
warnings.filterwarnings('ignore')

# import libraries
import json
from libPeatland import *
from libLandsat8 import *

ModuleNotFoundError: No module named 'gdal'

## Define AOI

Just beware that the AOI here is the same that in the other Notebook

In [None]:
# Region of interest
AOI = {"xmin": -103.1, "ymin": 47.9, "xmax": -79.2, "ymax": 60.5} # Hudson Bay
#AOI ={"xmin": 58.5, "ymin": 54.8, "xmax": 90.1, "ymax": 70.7} # Russia
#AOI = {"xmin": 24, "ymin": 62.9, "xmax": 29.06, "ymax": 68.1} # Finland

# Parameters
plot_maps = True
verbose = False

In [None]:
# path2merit -> 
# beware to mount the volume MERIT on CALER first
path2merit = "/home/gsainton/MERIT/Yamazaki/MERIT/v0.4_original_data_distributed"

# Directory with the width data
width_data_dir = os.path.join(path2merit, "width")

# Check if the directory exists
if os.path.exists(width_data_dir):
    print("Directory exists: ", width_data_dir)
else:
    sys.exit(f"Check if the directory exists: {width_data_dir} of is properly mounted")
# get the list of file in the directory
list_files = os.listdir(width_data_dir)

# Number of files in the directory
print("Number of files in the directory: ", len(list_files))



In [None]:

# Create a dataframe with the list of files
df_merit = pd.DataFrame(columns=["file_name"])

df_merit["file_name"] = list_files 
df_merit["file_path"] = df_merit["file_name"]
df_merit["file_type"] = df_merit["file_name"].apply(lambda x: x.split(".")[-2])

# Split the file name to extract the latitude
df_merit["file_name"] = df_merit["file_name"].apply(lambda x: x.split("/")[-1])
# if the fist character is a "s", the latitude is negative
df_merit["lat_start"] = df_merit["file_name"].apply(lambda x: -int(x[1:3]) if x[0] == "s" else int(x[1:3]))
# if the 4th character is a "w", the longitude is negative
df_merit["lon_start"] = df_merit["file_name"].apply(lambda x: -int(x[4:7]) if x[3] == "w" else int(x[4:7]))


display(df_merit.head())

In [None]:
# Get the list of files in the AOI
df_merit = df_merit[(df_merit["lat_start"] >= AOI["ymin"]) & (df_merit["lat_start"]+5 <= AOI["ymax"])]
df_merit = df_merit[(df_merit["lon_start"] >= AOI["xmin"]) & (df_merit["lon_start"]+5 <= AOI["xmax"])]

# Order by latitude and longitude
df_merit = df_merit.sort_values(by=["lat_start", "lon_start"])

print("Number of files in the AOI: ", df_merit.shape[0])

print(AOI)
display(df_merit.head())