# Extraction

This notebook provides some commands to quickly extract catchment averaged data or subsetted gridded data from the model inputs or outputs

The following aspects are covered:
    
    1. Import required libraries
    2. Extraction of aggregated catchment data to pandas dataframes
    3. Extract gridded datasets
    4. Exercises
        4.1 Extract your own catchment
        4.2 Save the array to netCDF

### 1. Import required libraries

In [None]:
%matplotlib inline
import os

from matplotlib import pyplot as plt

from awrams.utils import config_manager, extents
from awrams.utils import datetools as dt
from awrams.utils.gis import ShapefileDB
from awrams.utils.io.data_mapping import SplitFileManager
from awrams.utils.processing.extract import extract_from_filemanager

In [None]:
sys_settings = config_manager.get_system_profile().get_settings()
base_data_path = sys_settings['DATA_PATHS']['BASE_DATA']
catchment_shapefile = os.path.join(base_data_path, 'spatial/shapefiles/Final_list_all_attributes.shp')

catchments = ShapefileDB(catchment_shapefile)

### 2.  Extract and spatially aggregate catchments

In [None]:
training_folder = sys_settings['CLIMATE_DATASETS']['TRAINING']['FORCING']['PATH']
training_folder

In [None]:
# Capture files of the same variable
_, var_name = sys_settings['CLIMATE_DATASETS']['TRAINING']['FORCING']['MAPPING']['precip']

data_path = os.path.join(training_folder, var_name)
pattern = data_path + '/%s*' % var_name

sfm = SplitFileManager.open_existing(data_path, pattern, var_name) # The sfm tool needs the full path to work

In [None]:
# Specify period, parent extent  and collate all extents
period = dt.dates('jul 2010 - jun 2011')

georef = sfm.get_extent()

extent_map = {'204007':catchments.get_extent_by_field('StationID','204007',georef),
              '421103':catchments.get_extent_by_field('StationID','421103',georef),
              '003303':catchments.get_extent_by_field('StationID','003303',georef)}

In [None]:
# Extract the data
df = extract_from_filemanager(sfm, extent_map, period)
df

In [None]:
ax = plt.figure(figsize=(18, 6)).gca()
df.plot(ax=ax)

### 3. Extract catchment gridded data

Requires osgeo.ogr to process shapefiles

In [None]:
## Specify date

period = dt.dates('jul 2010')

In [None]:
# Specify catchment

catchment = extent_map['421103']
catchment.cell_count

In [None]:
data = sfm.get_data(period, catchment)

sfm.close_all()

data.shape, catchment.cell_count # You will note that the data extracted is for the rectangle containing the catchment

In [None]:
plt.figure(figsize=(6, 8))
im = plt.imshow(data.mean(axis=0), interpolation='None', cmap=plt.get_cmap('Blues'))
plt.colorbar(im)

In [None]:
type(data)

In [None]:
## On the list to do: Insert shapefile over the grid for visualisation purposes
## At the moment can be done using a bit of Python

### 4. Exercise

#### 4.1 Extract any extent from your own shapefile
Put together the extent definition from the fundamentals and undertake extraction process as above

#### 4.2  Save data into netcdf file
Follow: http://pyhogs.github.io/intro_netcdf4.html