# Uploading MOASA Flight Data
This notebook illustrates the steps involved in uploading the Met Office Atmospheric Survey Aircraft flight data to the Clean Air data store.


It is written to run on an internal Met Office system and access data in a particular folder location with a specific structure. As a result, the code to locate the data is not generalised and not expected to be reusable. However, the steps where the metadata is extracted and data uploaded is likely to be helpful to those who are looking to do similar with their own data.


## Config

In [1]:
AIRCRAFT_DATA_LOCATION = '/project/obr/CleanAir'
OBJECT_STORE_BUCKET = 'caf-data'  # Use this for uploading to the live data storage location
# OBJECT_STORE_BUCKET = 'caf-test'  # Use this for testing

## Imports & Helper Functions

In [2]:
from pathlib import Path
from typing import Dict, Generator, Tuple


def find_aircraft_data() -> Generator[Tuple[str, Path], None, None]:
    """
    Yields the directories containing netcdf files we're interested in.
    Uses a generator, so we can process results more efficiently, not having to traverse all the folders before getting any results.

    Assumes data is organised in this directory structure: /project/obr/CleanAir/{dataset_name}/processed/

    Specifically is looking for netcdf files in a directory called 'processed' and filters out files with 'old' or 'OLD' in the path
    or files within directories named 'raw'.
    """
    data_file_path = Path(AIRCRAFT_DATA_LOCATION)
    for ds_path in data_file_path.glob('**/processed/*.nc'):
        path_str = str(ds_path)
        if any(bad_str in path_str for bad_str in {'old', 'OLD', '/raw/', '_pbp_'}):
            continue

        yield ds_path


def get_dataset_name(ds_path: Path) -> str:
    """
    Gets the name to use for the dataset, based on the path.
    Assumes data is organised in this directory structure: /project/obr/CleanAir/{dataset_name}/processed/
    """
    return ds_path.parts[4] if ds_path.is_absolute() else str(ds_path.parent.stem)


## Find & Load the Data

In [3]:
import warnings
import iris

cubes: Dict[str, iris.cube.Cube] = {}
paths: Dict[str, Path] = {}
loading_errors: Dict[str, Exception] = {}

for path in find_aircraft_data():

    ds_name = get_dataset_name(path)
    try:
        # Temporarily hide iris warnings from the output, as they get in the way
        with warnings.catch_warnings():
            warnings.simplefilter("ignore", UserWarning)
            cubes[ds_name] = iris.load(str(path))
            paths[ds_name] = path  # Only store this if iris loaded the data successfully

    except Exception as e:
        loading_errors[path] = e
        print(f"Failed to load {path} due to {repr(e)}")
print(f"Loaded {len(cubes)}/{len(cubes) + len(loading_errors)} datasets")


Failed to load /project/obr/CleanAir/M285/processed/clean_air_moasa_data_20210330_M285_v0.nc due to AttributeError("type object 'str' has no attribute 'str'")
Failed to load /project/obr/CleanAir/M286/processed/clean_air_moasa_data_20210331_M286_v0.nc due to AttributeError("type object 'str' has no attribute 'str'")
Failed to load /project/obr/CleanAir/M272/processed/clean_air_moasa_data_20201007_M272_v0.nc due to AttributeError("type object 'str' has no attribute 'str'")
Failed to load /project/obr/CleanAir/M280/processed/clean_air_moasa_data_20210309_M280_v0.nc due to AttributeError("type object 'str' has no attribute 'str'")
Failed to load /project/obr/CleanAir/M281/processed/clean_air_moasa_data_20210317_M281_v0.nc due to AttributeError("type object 'str' has no attribute 'str'")
Failed to load /project/obr/CleanAir/M254 - test flight/processed/clean_air_moasa_data_20191015_M254_v0.nc due to AttributeError("type object 'str' has no attribute 'str'")
Failed to load /project/obr/Clea

## Extract Metadata

In [4]:
from clean_air.models import Metadata
from clean_air.data.extract_metadata import extract_metadata

metadata_dict: Dict[str, Metadata] = {}
extraction_errors: Dict[str, Exception] = {}
for ds_name, cube in cubes.items():
    try:
        metadata_dict[ds_name] = extract_metadata(cube, ds_name, ['clean_air:type=aircraft'], [], [])
    except Exception as e:
        extraction_errors[ds_name] = e
        print(f"Failed to extract metadata for {ds_name} due to {repr(e)}")

print(f"Converted {len(metadata_dict)}/{len(cubes)}")

  ret = geos_linearring_from_py(shell)
IllegalArgumentException: Points of LinearRing do not form a closed linestring


Failed to extract metadata for M319 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M247 due to ValueError('GEOSGeom_createLinearRing_r returned a NULL pointer')


IllegalArgumentException: Points of LinearRing do not form a closed linestring


Failed to extract metadata for M294 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M313 due to ValueError('GEOSGeom_createLinearRing_r returned a NULL pointer')
Failed to extract metadata for M295 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M314 due to TypeError("Expect polygon, received <class 'tuple'>")


IllegalArgumentException: Points of LinearRing do not form a closed linestring


Failed to extract metadata for M300 due to ValueError('GEOSGeom_createLinearRing_r returned a NULL pointer')
Failed to extract metadata for M296 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M256 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M315 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M282 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M297 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M257 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M316 due to TypeError("Expect polygon, received <class 'tuple'>")


IllegalArgumentException: Points of LinearRing do not form a closed linestring


Failed to extract metadata for M323 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M258 due to ValueError('GEOSGeom_createLinearRing_r returned a NULL pointer')
Failed to extract metadata for M324 due to TypeError("Expect polygon, received <class 'tuple'>")


IllegalArgumentException: Points of LinearRing do not form a closed linestring


Failed to extract metadata for M266 due to ValueError('GEOSGeom_createLinearRing_r returned a NULL pointer')
Failed to extract metadata for M325 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M292 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M252 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M311 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M326 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M293 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M253 due to TypeError("Expect polygon, received <class 'tuple'>")


IllegalArgumentException: Points of LinearRing do not form a closed linestring


Failed to extract metadata for M312 due to ValueError('GEOSGeom_createLinearRing_r returned a NULL pointer')


IllegalArgumentException: Points of LinearRing do not form a closed linestring


Failed to extract metadata for M320 due to ValueError('GEOSGeom_createLinearRing_r returned a NULL pointer')
Failed to extract metadata for M262 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M277 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M322 due to TypeError("Expect polygon, received <class 'tuple'>")


IllegalArgumentException: Points of LinearRing do not form a closed linestring


Failed to extract metadata for M278 due to ValueError('GEOSGeom_createLinearRing_r returned a NULL pointer')


IllegalArgumentException: Points of LinearRing do not form a closed linestring


Failed to extract metadata for M264 due to ValueError('GEOSGeom_createLinearRing_r returned a NULL pointer')
Failed to extract metadata for M279 due to TypeError("Expect polygon, received <class 'tuple'>")


IllegalArgumentException: Points of LinearRing do not form a closed linestring


Failed to extract metadata for M265 due to ValueError('GEOSGeom_createLinearRing_r returned a NULL pointer')
Failed to extract metadata for M251 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M273 due to TypeError("Expect polygon, received <class 'tuple'>")


IllegalArgumentException: Points of LinearRing do not form a closed linestring


Failed to extract metadata for M275 due to ValueError('GEOSGeom_createLinearRing_r returned a NULL pointer')
Failed to extract metadata for M276 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M283 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M302 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M298 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M317 due to TypeError("Expect polygon, received <class 'tuple'>")
Failed to extract metadata for M299 due to TypeError("Expect polygon, received <class 'tuple'>")
Converted 0/43


## Create DataSets

In [40]:
from clean_air.models import DataSet

datasets: Dict[str, DataSet] = {}

for ds_name, metadata in metadata_dict.items():
    ds_file = paths[ds_name]
    ds = DataSet([ds_file], metadata)
    datasets[ds_name] = ds
    print(f"Created {ds}")



## Upload DataSets

In [42]:
from clean_air.data.storage import create_dataset_store

dataset_store = create_dataset_store(OBJECT_STORE_BUCKET, anon=False)

for ds_name, ds in datasets.items():
    print(f"Uploading {ds}...", end="")
    dataset_store.put(ds)
    print("... Successful")

print("")
print(f"Uploaded {len(datasets)} datasets to {OBJECT_STORE_BUCKET}")


Uploaded 0 datasets to caf-data
