# Uploading AQUM Hourly O3 Data

Similar to the 'uploading_moasa_flight_data' notebook, this notebook explains the steps taken to upload some model data to the Clean Air Data store.

However, in this case we are first the raw netCDF's from the object store, and then reuploading them with their associated metadata profiles to the live data storage in a sibling location.

## Imports

In [2]:
import s3fs
import iris
import os
from typing import Dict
from pathlib import Path

## Discover the Data
`anon=False` means you will have to set up your access credentials before being able to get the data - see the notebook 'access_datastore' for help.

In [14]:
client_kwargs = {"endpoint_url": "https://caf-o.s3-ext.jc.rl.ac.uk"}
fs = s3fs.S3FileSystem(anon=False, client_kwargs=client_kwargs)
files = fs.ls('aqum-hourly-o3')

# Download the Data
These files are pretty big, so we're only downloading those that aren't already in our local `temp_holder/` directory, or our `caf-data` store. 

Use `number_of_files` to specify how many files you want to download at a time, rather than doing them all at once.

In [7]:
number_of_files = 2

caf_data = fs.ls('caf-data')
uploaded_data = []
for file in caf_data:
	uploaded_data.append(Path(file).name)

i = 0
for file in files:
	if (Path(file).name not in os.listdir('./temp_holder/') and 
		Path(file).stem not in uploaded_data and 
		i < number_of_files):
		i += 1
		fs.get(file, './temp_holder/')

# Load the Data
From here on the notebook is mostly identical to 'uploading_moasa_flight_data'.

In [8]:
import warnings
import iris, iris.cube

cubes: Dict[str, iris.cube.Cube] = {}
paths: Dict[str, Path] = {}
loading_errors: Dict[str, Exception] = {}

for path in Path('./temp_holder').glob('*.nc'):
    ds_name = Path(path).stem
    try:
        # Temporarily hide iris warnings from the output, as they get in the way
        with warnings.catch_warnings():
            warnings.simplefilter("ignore", UserWarning)
            cubes[ds_name] = iris.load(str(path))
            paths[ds_name] = path  # Only store this if iris loaded the data successfully
            print(f'loaded path {path}')

    except Exception as e:
        loading_errors[path] = e
        print(f"Failed to load {path} due to {repr(e)}")
print(f"Loaded {len(cubes)}/{len(cubes) + len(loading_errors)} datasets")


loaded path temp_holder/aqum_hourly_o3_20190401.nc
loaded path temp_holder/aqum_hourly_o3_20190326.nc
loaded path temp_holder/aqum_hourly_o3_20190327.nc
loaded path temp_holder/aqum_hourly_o3_20190328.nc
loaded path temp_holder/aqum_hourly_o3_20190330.nc
loaded path temp_holder/aqum_hourly_o3_20190329.nc
loaded path temp_holder/aqum_hourly_o3_20190331.nc
Loaded 7/7 datasets


# Extract Metadata

In [9]:
from clean_air.models import Metadata
from clean_air.data.extract_metadata import extract_metadata

metadata_dict: Dict[str, Metadata] = {}
extraction_errors: Dict[str, Exception] = {}
for ds_name, cube in cubes.items():
    try:
        metadata_dict[ds_name] = extract_metadata(cube, ds_name, 
            ['clean_air:type=gridded', 
            'clean_air:horizontal_coverage=UK', 
            'clean_air:horizontal_resolution=0.11 degree', 
            'clean_air:vertical_resolution=63 Levels'], [], ['PP'], 'AQUM Forecast')
    except Exception as e:
        extraction_errors[ds_name] = e
        print(f"Failed to extract metadata for {ds_name} due to {repr(e)}")

print(f"Converted {len(metadata_dict)}/{len(cubes)}")

Converted 7/7


  return ccrs.TransverseMercator(


# Explore the Metadata
Use this to probe any metadata details you're interested in, before uploading.

In [24]:
name = 'aqum_hourly_o3_20190326'

if name in metadata_dict.keys():
	print(metadata_dict[name].extent.temporal.bounds)

ScalarBounds(lower=real_datetime(2019, 3, 26, 0, 59, 59, 999987), upper=real_datetime(2019, 3, 27, 0, 0))


## Create DataSets

In [10]:
from clean_air.models import DataSet

datasets: Dict[str, DataSet] = {}

for ds_name, metadata in metadata_dict.items():
    ds_file = paths[ds_name]
    ds = DataSet([ds_file], metadata)
    datasets[ds_name] = ds
    print(f"Created {ds.metadata.id}")

Created aqum_hourly_o3_20190401
Created aqum_hourly_o3_20190326
Created aqum_hourly_o3_20190327
Created aqum_hourly_o3_20190328
Created aqum_hourly_o3_20190330
Created aqum_hourly_o3_20190329
Created aqum_hourly_o3_20190331


## Upload DataSets


In [13]:
from clean_air.data.storage import create_dataset_store

dataset_store = create_dataset_store('caf-data', anon=False)

for ds_name, ds in datasets.items():
    try:
        print(f"Uploading {ds.metadata.id}...", end="")
        dataset_store.put(ds)
        print("... Successful")
    except Exception as e:
        print(f"Failed to upload {ds.metadata.id} due to {repr(e)}")

print("")
print(f"Uploaded {len(datasets)} datasets to caf-data")

Uploading aqum_hourly_o3_20190401...... Successful
Uploading aqum_hourly_o3_20190326...... Successful
Uploading aqum_hourly_o3_20190327...... Successful
Uploading aqum_hourly_o3_20190328...... Successful
Uploading aqum_hourly_o3_20190330...... Successful
Uploading aqum_hourly_o3_20190329...... Successful
Uploading aqum_hourly_o3_20190331...... Successful

Uploaded 7 datasets to caf-data
