# Data Collection and preprocessing

## Setup

In [2]:
%load_ext autoreload
%autoreload 2

In [1]:
import sys
from pathlib import Path
import geopandas as gpd

# Add the project root to the Python path to import the modules
project_root = Path().absolute().parent
sys.path.append(str(project_root))

In [3]:
from utils.geography_helpers import create_rotated_square_aoi

# Set bounding box for the area of interest (AOI)
create_rotated_square_aoi(
    lat=31.513,
    lon=34.449,
    size_km=4,
    angle_deg=48, 
    filename="../utils/AOI_bboxes/aoi_shifa.geojson"
)

gdf = gpd.read_file("../utils/AOI_bboxes/aoi_shifa.geojson")
print(gdf.geometry[0].wkt)

POLYGON ((34.45033358953366 31.48755359582277, 34.47444640417723 31.51433358953367, 34.44766641046633 31.538446404177233, 34.42355359582277 31.511666410466333, 34.45033358953366 31.48755359582277))


## A/ Sentinel-1 data


The idea here is to download 10 pre-conflict (reference) and 6 post-conflict (post) images of Sentinel-1 (GRD-HD; Polarization: VV+VH).

The images will be distinguished depending on the orbit: Ascending and Descending.

Here are the product names from [ASF Data Search Vertex](https://search.asf.alaska.edu):

Reference images:
S1A_IW_GRDH_1SDV_20230427T154057_20230427T154122_048284_05CE71_87D9 : 27 April 2023, ASCENDING (Orbit 87)
S1A_IW_GRDH_1SDV_20230428T034428_20230428T034453_048291_05CEB1_527A : 28 April 2023, DESCENDING (Orbit 94)
S1A_IW_GRDH_1SDV_20230509T154057_20230509T154122_048459_05D442_3CB6 : 9 May 2023, ASCENDING (Orbit 87)
S1A_IW_GRDH_1SDV_20230510T034429_20230510T034454_048466_05D480_D2F1: 10 May 2023, DESCENDING (Orbit 94)
S1A_IW_GRDH_1SDV_20230521T154058_20230521T154123_048634_05D970_718E: 21 May 2023, ASCENDING (Orbit 87)
S1A_IW_GRDH_1SDV_20230522T034430_20230522T034455_048641_05D9AE_4C24: 22 May 2023, DESCENDING (Orbit 94)
S1A_IW_GRDH_1SDV_20230602T154059_20230602T154124_048809_05DEA1_D36D: 2 June, 2023, ASCENDING (Orbit 87)
S1A_IW_GRDH_1SDV_20230603T034430_20230603T034455_048816_05DEDF_69FD: 3 Juin, 2023, DESCENDING (Orbit 94)

UNOSAT labels:
3 May 2024

Post- images (dates as close to labeling as possible to limit newly destroyed buildings with negative labels)
S1A_IW_GRDH_1SDV_20240503T154102_20240503T154127_053709_068641_952F: 03 May 2024, ASCENDING (Orbit 87)
S1A_IW_GRDH_1SDV_20240504T034434_20240504T034459_053716_06868A_1FEE: 04 May 2024, DESCENDING (Orbit 94)
S1A_IW_GRDH_1SDV_20240515T154102_20240515T154127_053884_068C8C_2024: 15 May 2024, ASCENDING (Orbit 87)
S1A_IW_GRDH_1SDV_20240516T034433_20240516T034458_053891_068CCC_19DD: 16 May 2024, DESCENDING (Orbit 94)
S1A_IW_GRDH_1SDV_20240527T154102_20240527T154127_054059_0692A2_DF93: 27 May 2024, ASCENDING (Orbit 87)
S1A_IW_GRDH_1SDV_20240528T034434_20240528T034459_054066_0692E4_E9D2: 28 May 2024, DESCENDING (Orbit 94)



Each image will undergo and process of 
1. Subsetting: to only keep the area of interest in Gaza City
2. Preprocessing:
    - orbit correction
    - border and thermal noise removal
    - radiometric calibration
    - terrain correction

The script that downloads one product (zip) is ran with:

```console
python scripts/download_single_sentinel_file.py <url>
```

The script that preprocessed the resulting zip is ran with:

```console
python preprocess_single_sentinel_zip.py <path/to/zipfile.zip> <path/to/aoi.geojson> <output_directory> <path/to/snap/gpt>
```

Both scripts have been combined in the `download_process_single_sentinel_file.py` script.

For one URL, the downloading, preprocessing (and deletion of raw file) is done by running:

```console
cd scripts

python download_process_single_sentinel_file.py <url.zip> ../utils/AOI_bboxes/aoi_shifa.geojson ../data/preprocessed/sentinel --snap_gpt_path /Applications/esa-snap/bin/gpt
```

For all the URLs, run the cells in `scripts/download_process_sentinel.ipynb`

## B/ UNOSAT labels

The script in `scripts/download_footprints_UNOSAT` automates the data collection of:

-  building footprings from [Ballinger (2024)](https://github.com/oballinger/PWTT/tree/main?tab=readme-ov-file)
- building damage labels from [UNOSAT](https://unosat.org/products/4047).  

```console
python scripts/download_footprints_UNOSAT
```
 will fetch the data online and save it to `data/raw`.

In [None]:
from utils.preprocessing_UNOSAT_helpers import convert_layer_to_long_format

gdb_path = "../data/raw/labels/25_02_2025/UNOSAT_GazaStrip_CDA_25February2025.gdb"
layer_name = "Damage_Sites_GazaStrip_20250225"
# output_path = "../data/raw/labels/gaza_unosat_labels.geojson"

unosat_labels_df = convert_layer_to_long_format(gdb_path, layer_name)

Reading layer: Damage_Sites_GazaStrip_20250225 from ../data/raw/labels/25_02_2025/UNOSAT_GazaStrip_CDA_25February2025.gdb


KeyboardInterrupt: 

## C/ Building footprints

The script in `scripts/download_footprints_UNOSAT` automates the data collection of:

-  building footprings from [Ballinger (2024)](https://github.com/oballinger/PWTT/tree/main?tab=readme-ov-file)
- building damage labels from [UNOSAT](https://unosat.org/products/4047).  

```console
python scripts/download_footprints_UNOSAT
```
 will fetch the data online and save it to `data/raw`.