# Data access

## Data preparation

First, we need to download Sentinel-2 images containing ships that will make up the segementation dataset.

For the purpose of selecting the different images and then generating the segmentation masks in a subsquent step, we use CSV files files containing those fields a the date of acquisition of the corresponding Sentinel-2 images :
| Field	            | Description |
| :---------------- | :-------- |
| Type of mobile	| Describes what type of target this message is received from (class A AIS Vessel, Class B AIS vessel, etc) |
| MMSI	            | MMSI number of vessel  |
| Latitude	        | Latitude of the vessel in degrees (from -90° to 90°) |
| Longitude	        | Longitude of the vessel in degrees (from -180° to 180°) |
| Heading	        | Heading of the vessel in degrees (from 0° to 359°) |
| Width	            | Width of the vessel (meters) |
| Length	        | Length of the vessel (meters)|

This function encapsulates the wholeprocess of downloading a 256 x 256 image centered around a ship. The function bbox_from_centroid enables us to generate the bounding box corresponding to a fixed size images in order to make an AI-ready database. This bounding box is then used to download the Sentinel-2 L2A image at the right date.

In [3]:
from eotdl.access import download_sentinel_imagery
from eotdl.tools import bbox_from_centroid
from pathlib import Path

def rename_image(image_name, output_dir, date):
    image_path = Path(output_dir) / f"{image_name}.tif"

    # bug in the naming convention in some cases, cf issue #241
    downloaded_image_path = Path(output_dir) / f"{image_name}_{date}.tif"
    if downloaded_image_path.exists():
        # Rename image
        downloaded_image_path.rename(image_path)

        # Rename json
        downloaded_json_path = Path(output_dir) / f"{image_name}_{date}.json"
        json_path = Path(output_dir) / f"{image_name}.json"
        downloaded_json_path.rename(json_path)

    return image_path


def download_boat_tile(row, date, output_dir, sensor="sentinel-2-l2a", image_size=128):
    
    if row["Type of mobile"] in ["Class A", "Class B"] and row["Length"] >= 15 and not row.isnull().values.any():
        
        # Get the bouding box
        y = row["Longitude"]
        x = row["Latitude"]
        bbox = bbox_from_centroid(x=x, y=y, pixel_size=10, width=image_size, height=image_size) 
        image_name = f"ship_{int(row['MMSI'])}_{date}"

        # Download the image
        download_sentinel_imagery(
            output=output_dir,
            time_interval=date,
            bounding_box=bbox,
            sensor=sensor,
            name=image_name
        )        
        
        image_path = rename_image(image_name, output_dir, date)
        
        # Check that the image is downloaded
        if not image_path.exists():
            image_name = None
        
        return image_name
            


In the case, we have chosen 2 differents files whose associated Sentinel-2 images we want to download  

In [4]:
ais_csv = {
    # date: ais_csv_path
    "2022-08-12": "data/ais/crop_S2A_MSIL1C_20220812T103031_N0400_R108_T32UNF_20220812T155113.SAFE.csv",
    "2022-08-25": "data/ais/crop_S2A_MSIL1C_20220825T103641_N0400_R008_T32VPH_20220825T173926.SAFE.csv",
}

sensor = 'sentinel-2-l2a'

## Data Download

In [5]:
import pandas as pd
from functools import partial

for date, ais_csv_path in ais_csv.items():
    print(f"Downloading images for {Path(ais_csv_path).name} :")
    ais_df = pd.read_csv(ais_csv_path)
    output_dir = Path("data/sentinel_2") 

    # Download boat images found in the csv
    download_function = partial(download_boat_tile, date=date, output_dir=output_dir)
    ais_df["ImageId"] = ais_df.apply(download_function, axis=1)

    # Remove lines for which no image is found
    ais_df = ais_df.dropna(axis=0, subset=["ImageId"])

    # Save csv with the upadated image paths
    ais_df.to_csv(output_dir / Path(ais_csv_path).name, index=False)

Downloading images for crop_S2A_MSIL1C_20220812T103031_N0400_R108_T32UNF_20220812T155113.SAFE.csv :
Downloading images for crop_S2A_MSIL1C_20220825T103641_N0400_R008_T32VPH_20220825T173926.SAFE.csv :
