# Uploading MOASA Flight Data
This notebook illustrates the steps involved in uploading the Met Office Atmospheric Survey Aircraft flight data to the Clean Air data store.


It is written to run on an internal Met Office system and access data in a particular folder location with a specific structure. As a result, the code to locate the data is not generalised and not expected to be reusable. However, the steps where the metadata is extracted and data uploaded is likely to be helpful to those who are looking to do similar with their own data.

## Config

In [26]:
import io

AIRCRAFT_DATA_LOCATION = '/project/obr/CleanAir'

## Imports & Helper Functions

In [2]:
import warnings
from pathlib import Path
from typing import Dict, Generator, Tuple

import iris
from clean_air.data.extract_metadata import extract_metadata


def find_aircraft_data() -> Generator[Tuple[str, Path], None, None]:
    """
    Yields the directories containing netcdf files we're interested in.
    Uses a generator, so we can process results more efficiently, not having to traverse all the folders before getting any results.

    Assumes data is organised in this directory structure: /project/obr/CleanAir/{dataset_name}/processed/

    Specifically is looking for netcdf files in a directory called 'processed' and filters out files with 'old' or 'OLD' in the path
    or files within directories named 'raw'.
    """
    data_file_path = Path(AIRCRAFT_DATA_LOCATION)
    for ds_path in data_file_path.glob('**/processed/*.nc'):
        path_str = str(ds_path)
        if any(bad_str in path_str for bad_str in {'old', 'OLD', '/raw/', '_pbp_'}):
            continue

        yield ds_path


def get_dataset_name(ds_path: Path) -> str:
    """
    Gets the name to use for the dataset, based on the path.
    Assumes data is organised in this directory structure: /project/obr/CleanAir/{dataset_name}/processed/
    """
    return ds_path.parts[4] if ds_path.is_absolute() else str(ds_path.parent.stem)


ERROR 1: PROJ: proj_create_from_database: Open of /home/h04/twilson/.conda/envs/cap_env/share/proj failed


## Find & Load the Data

In [3]:
cubes = {}
loading_errors: Dict[str, Exception] = {}


import traceback, sys
from contextlib import redirect_stderr, redirect_stdout

print_stream = sys.stdout
if True:
    import os
    print_stream  = open(f'{os.getcwd()}/logfile.txt', 'w')

with redirect_stdout(print_stream), redirect_stderr(print_stream):
    for path in find_aircraft_data():

        ds_name = get_dataset_name(path)
        try:
            # Temporarily hide iris warnings from the output, as they get in the way
            with warnings.catch_warnings():
                # warnings.simplefilter("ignore", UserWarning)
                print(f"Loading {path}")
                cubes[ds_name] = iris.load(str(path))
                print(f"Loaded {path}")

        except Exception as e:
            loading_errors[path] = e
            print(traceback.format_exc())
            print(f"Failed to load {path} due to {repr(e)}")
        print("---")
    print(f"Loaded {len(cubes)}/{len(cubes) + len(loading_errors)} datasets")

if isinstance(print_stream, io.IOBase):
    print_stream.close()

## Filter Data

In [28]:
for ds_name, cube in cubes.items():
	try:
		# Should solve: ValueError('The cube must contain x and y axes.')
		problem_cube = cube.extract_cube(iris.Constraint(name="POPS_bin_boundaries"))
		cube.remove(problem_cube)
	except Exception as e:
		print({repr(e)})


{'ConstraintMismatchError("Got 0 cubes for constraint Constraint(name=\'POPS_bin_boundaries\'), expecting 1.")'}
{'ConstraintMismatchError("Got 0 cubes for constraint Constraint(name=\'POPS_bin_boundaries\'), expecting 1.")'}
{'ConstraintMismatchError("Got 0 cubes for constraint Constraint(name=\'POPS_bin_boundaries\'), expecting 1.")'}
{'ConstraintMismatchError("Got 0 cubes for constraint Constraint(name=\'POPS_bin_boundaries\'), expecting 1.")'}
{'ConstraintMismatchError("Got 0 cubes for constraint Constraint(name=\'POPS_bin_boundaries\'), expecting 1.")'}
{'ConstraintMismatchError("Got 0 cubes for constraint Constraint(name=\'POPS_bin_boundaries\'), expecting 1.")'}
{'ConstraintMismatchError("Got 0 cubes for constraint Constraint(name=\'POPS_bin_boundaries\'), expecting 1.")'}
{'ConstraintMismatchError("Got 0 cubes for constraint Constraint(name=\'POPS_bin_boundaries\'), expecting 1.")'}
{'ConstraintMismatchError("Got 0 cubes for constraint Constraint(name=\'POPS_bin_boundaries\'), 

## Extract Metadata

In [33]:
metadata = []
extraction_errors: Dict[str, Exception] = {}
for ds_name, cube in cubes.items():
    try:
        metadata.append(extract_metadata(cube, ds_name, ['clean_air:type=aircraft'], [], []))
    except Exception as e:
        extraction_errors[ds_name] = e
        #print(f"Failed to convert {ds_name} due to {repr(e)}")
        if (repr(e) == "ValueError('GEOSGeom_createLinearRing_r returned a NULL pointer')"):
            for var in cube[var]:
                iris.Constraint(var!='_')
    else:
        print(f"{ds_name} worked!")

print(f"Converted {len(metadata)}/{len(cubes)}")

IllegalArgumentException: Points of LinearRing do not form a closed linestring
IllegalArgumentException: Points of LinearRing do not form a closed linestring
IllegalArgumentException: Points of LinearRing do not form a closed linestring


wind flow vector north component for the AIMMS instrument / (m s-1) (time: 8581)
    Dimension coordinates:
        time                                                             x
    Auxiliary coordinates:
        aircraft GPS height measured by the AIMMS                        x
        latitude                                                         x
        longitude                                                        x
    Attributes:
        ACRONYMS                                                    'AIMMS: Aircraft Integrated Meteorological Measurement System, AQ: Air...
        ADDRESS                                                     'Met Office, FitzRoy Road, Exeter, EX1 3PB'
        CITATION                                                    'Please cite use of these data as follows:  (1) Met Office, Observation...
        COMMENT                                                     'clean_air_MOASA_data_yyyymmdd_Mnnn_vn.nc where yyyymmdd is the flight...
        CO

IllegalArgumentException: Points of LinearRing do not form a closed linestring
IllegalArgumentException: Points of LinearRing do not form a closed linestring


wind flow vector north component for the AIMMS instrument / (m s-1) (time: 7201)
    Dimension coordinates:
        time                                                             x
    Auxiliary coordinates:
        aircraft GPS height measured by the AIMMS                        x
        latitude                                                         x
        longitude                                                        x
    Attributes:
        ACRONYMS                                                    'AIMMS: Aircraft Integrated Meteorological Measurement System, AQ: Air...
        ADDRESS                                                     'Met Office, FitzRoy Road, Exeter, EX1 3PB'
        CITATION                                                    'Please cite use of these data as follows:  (1) Met Office, Observation...
        COMMENT                                                     'clean_air_MOASA_data_yyyymmdd_Mnnn_vn.nc where yyyymmdd is the flight...
        CO

IllegalArgumentException: Points of LinearRing do not form a closed linestring
IllegalArgumentException: Points of LinearRing do not form a closed linestring
IllegalArgumentException: Points of LinearRing do not form a closed linestring
IllegalArgumentException: Points of LinearRing do not form a closed linestring
IllegalArgumentException: Points of LinearRing do not form a closed linestring
IllegalArgumentException: Points of LinearRing do not form a closed linestring


corrected blue (450nm) scattering (by gas and particles, with dark count subtracted) over 0 - 170 degrees, smoothed to 15s, measured by the Nephelometer / (Mm-1) (time: 10273)
    Dimension coordinates:
        time                                                                                                                                                           x
    Auxiliary coordinates:
        aircraft GPS height measured by the AIMMS                                                                                                                      x
        latitude                                                                                                                                                       x
        longitude                                                                                                                                                      x
    Attributes:
        ACRONYMS                                                              