Users need to run this notebook in order to setup the folder structure needed to run the rest of the notebooks in this repository.

Before running the rest of this notebook, users need to download the DSWx-HLS CalVal data base from [this link](https://search.earthdata.nasa.gov/search/granules?p=C2603501575-POCLOUD&pg[0][v]=f&pg[0][gsk]=-start_date&q=opera%20CalVal%20Database&tl=1699902781.803!3!!), extract the zip file (`opera-calval-database-dswx.zip`), and move the extracted folder (`DB`) to the `data/` folder. The repository folder structure should then be as follows:

```
    .
    ├── data
    │   ├─ DB # Place the CalVal database here
    │   │  ├─ 1_1
    │   │  ├─ 1_5 
    │   │  ├─ ...
    │   │  ├─ 4_42
    │   │  ├─ ...
    │   │  └─ validation_table.geojson
    │   ├─ new_validation_table.csv 
    │   └─ validation_table.csv 
    ├── notebooks
    │   ├─ 0-Setup-folder-structures.ipynb # This notebook
    │   └─ ...    
    ├── environment.yml
    └── README.md       
```

In [12]:
from pathlib import Path
import pandas as pd

In [11]:
data_path = Path('../data')
calval_db_path = data_path / 'DB'

# Check that the DB folder has been placed in the correct directory
assert calval_db_path.exists(), "Folder does not exist! Make sure that the CalVal database has been downloaded and placed within the 'data/' folder"

In [16]:
df = pd.read_csv(data_path / 'validation_table.csv')
df.set_index('site_name', inplace=True)
df.head()

Unnamed: 0_level_0,planet_id,dswx_id,hls_id,dswx_urls,validation_dataset_url,water_stratum,geometry
site_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
4_21,20210903_150800_60_2458,OPERA_L3_DSWx-HLS_T18UXG_20210902T154154Z_2023...,HLS.L30.T18UXG.2021245T154154.v2.0,https://opera-pst-rs-pop1.s3.us-west-2.amazona...,https://opera-calval-database-dswx.s3.us-west-...,3.0,"POLYGON ((-71.870513357149 55.11001696376937, ..."
4_11,20210903_152641_60_105c,OPERA_L3_DSWx-HLS_T19UDA_20210902T154911Z_2023...,HLS.S30.T19UDA.2021245T154911.v2.0,https://opera-pst-rs-pop1.s3.us-west-2.amazona...,https://opera-calval-database-dswx.s3.us-west-...,3.0,POLYGON ((-69.17307071901621 54.40592422230064...
1_31,20210904_093422_44_1065,OPERA_L3_DSWx-HLS_T33JYG_20210905T082559Z_2023...,HLS.S30.T33JYG.2021248T082559.v2.0,https://opera-pst-rs-pop1.s3.us-west-2.amazona...,https://opera-calval-database-dswx.s3.us-west-...,1.0,POLYGON ((17.282441488515342 -29.9714135761361...
3_28,20210906_101112_28_225a,OPERA_L3_DSWx-HLS_T30TYN_20210905T105621Z_2023...,HLS.S30.T30TYN.2021248T105621.v2.0,https://opera-pst-rs-pop1.s3.us-west-2.amazona...,https://opera-calval-database-dswx.s3.us-west-...,2.0,POLYGON ((-0.0438908706972531 43.0523272022019...
1_37,20210909_000649_94_222b,OPERA_L3_DSWx-HLS_T54JTM_20210908T003848Z_2023...,HLS.L30.T54JTM.2021251T003848.v2.0,https://opera-pst-rs-pop1.s3.us-west-2.amazona...,https://opera-calval-database-dswx.s3.us-west-...,1.0,POLYGON ((138.25958887036043 -30.3281075679621...


In [31]:
for site_id in [x for x in list(calval_db_path.glob('*')) if x.is_dir()]:
    planet_id = df.loc[site_id.name].planet_id
    
    planet_data_dir = data_path / planet_id
    cropped_planet_data_dir = data_path / 'planet_images_cropped' / planet_id

    planet_data_dir.mkdir(parents=True, exist_ok=True)
    cropped_planet_data_dir.mkdir(parents=True, exist_ok=True)

    # first move the dswx folder from the database
    source_path = calval_db_path / site_id.name / 'dswx'
    destination_path = planet_data_dir / 'dswx'
    if source_path.exists():
        source_path.rename(destination_path)

    # move the remaining files 
    _ = [x.rename(cropped_planet_data_dir/x.name) for x in (calval_db_path / site_id.name).glob('*')]

    _ = (calval_db_path/site_id.name).rmdir()    