# Exposures and Centroids (using pandas and cartopy)

Prepared by G. Aznar Siguan

In these exercises we will work with the `Exposures` and `Centroids` classes of climada. 

# Exposures class

Looking at the documentation, in the `Attributes` section, we see that this class contains the following attributes:

 * tag (Tag): information about the source data
 * ref_year (int): reference year
 * value_unit (str): unit of the exposures values
 * id (np.array): an id for each exposure
 * coord (np.array or Coordinates): 2d array with lat in first column and lon in second, or Coordinates instance. "lat" and "lon" are descriptors of the latitude and longitude respectively.
 * value (np.array): a value for each exposure
 * impact_id (np.array): impact function id corresponding to each exposure
 * deductible (np.array, default): deductible value for each exposure
 * cover (np.array, default): cover value for each exposure
 * category_id (np.array, optional): category id for each exposure (when defined)
 * region_id (np.array, optional): region id for each exposure (when defined)
 * assigned (dict, optional): for a given hazard, id of the centroid(s) affecting each exposure. Filled in 'assign' method.

Some of the variables are *optional*. This means that climada also works without these variables. For instance, the `region_id` and `category_id` values only provide additional information. The `assigned` variable can be computed if not provided.

Other variables are *default*. These are the ones that receive default values if not provided. The default values are assigned in the `check()` method, and this is automatically called when reading a file. `cover` receives the value of the exposure as default, whilst `deductible` receives zero values.

In [None]:
from climada.entity import Exposures

help(Exposures)

We start importing the name of the Excel file we will use, which is stored in the variable `ENT_TEST_XLS`. You might have a look to this file, which is in the path shown next:

In [None]:
import os
from climada.util import SOURCE_DIR
ENT_TEST_XLS = os.path.join(SOURCE_DIR, 'entity/test/data', 'demo_today.xlsx')
print(ENT_TEST_XLS)

Internally climada uses Pandas DataFrames to retrieve the excel data.

# On pandas

Short explanation on pandas `DataFrame` capabilities. 

A dataframe is obtained when reading an excel file as follows:

In [None]:
import pandas as pd
# the sheet which contains the exposures data is assets
dfr = pd.read_excel(ENT_TEST_XLS, 'assets') # Dataframe
dfr

Selecting columns and row: 

To select one column, index the name of the column. To get the rows, use the indixing as in numpy arrays. If you want to get multiple columns, use a list of strings:

In [None]:
print('Latitude \n', dfr[:10]['Latitude'])
print()
print('Latitude and Longitude \n',dfr[:10][['Latitude', 'Longitude']])

Notice that the type of the returned column is a `Series` and pandas Series are internally numpy arrays. 
Adding `.values` to the end of the `Series` you get the array:

In [None]:
print('Column type:', type(dfr[:10]['Latitude']))
print('Internal numpy array:', dfr[:10]['Latitude'].values)

Which are the maximum and minimum latitudes and longitudes?

In [None]:
print('Min lat lon: \n', dfr[['Latitude', 'Longitude']].min())
print()
print('Max lat lon: \n', dfr[['Latitude', 'Longitude']].max())

Which are the damage functions (DamageFunID) used? Use `value_counts()` to answer this:

In [None]:
dfr['DamageFunID'].value_counts()

Logical indexing can be also performed. 

Example: Return the exposures that satisfy both conditions:
- latitude in [26.5, max(Latitude) - 0.1] 
- longitude in [min(Longitude) + 0.1, -80.5]

In [None]:
# Latitudes in range
sel_lat = (dfr['Latitude'] <= dfr['Latitude'].max() - 0.1) & (dfr['Latitude'] >= 26.5)
# Longitudes in range
sel_lon = (dfr['Longitude'] <= -80.5) & (dfr['Longitude'] >= dfr['Longitude'].min() + 0.1)
# Latitude, Longitude and Value of selected exposures
dfr[['Latitude', 'Longitude', 'Value']][sel_lat & sel_lon]

Pandas also has plot functionalities:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
dfr[['Value']].plot.hist(bins = 3)
plt.xlabel('Value')
plt.ylabel('N')
plt.title('Value histogram');

## On cartopy

Cartopy enables to add to the plot different earth values in different projections. 

The instructions `plt.axes(projection=ccrs.PlateCarree())` and `plt.subplot(projection=ccrs.PlateCarree())` set up a `GeoAxes` instance. This is a subclass of `matplotlib.axes.Axes` class that represents a map projection. As such, it exposes a variety of map related methods, as for example the `coastlines()` method to add coast lines to the map.

A list of the available projections to be used with matplotlib can be found on the [Cartopy projection list](http://scitools.org.uk/cartopy/docs/v0.15/crs/projections.html#cartopy-projections) page. PlateCarree is the equirectangular projection.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import cartopy.crs as ccrs

plt.figure(figsize=(12, 9))     # set figure size
gs = plt.GridSpec(1, 2)         # define subplots
ax1 = plt.subplot(gs[0, 0], projection=ccrs.PlateCarree()) # axis with PlateCarree projection
ax2 = plt.subplot(gs[0, 1], projection=ccrs.Robinson())    # axis with Robinson projection

ax1.coastlines() # add coast lines to first axis
ax2.coastlines() # add coast lines to second axis

Once you have the map just the way you want it, data can be added to it in exactly the same way as with normal matplotlib axes. Here an example with the Exposures data:

In [None]:

import pandas as pd
from mpl_toolkits.axes_grid1 import make_axes_locatable
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER

# Data
dfr = pd.read_excel(ENT_TEST_XLS, 'assets') 

# Generate axis with equirectangular projection
ax = plt.axes(projection=ccrs.PlateCarree()) 

# Set axis labels and values format
grid = ax.gridlines(draw_labels=True)
grid.xlabels_top = grid.ylabels_right = False
grid.xformatter = LONGITUDE_FORMATTER
grid.yformatter = LATITUDE_FORMATTER

# Add coastlines with 10m resolution
ax.coastlines(resolution='10m') 

# 2d histogram plot
hex_bin = ax.hexbin(dfr['Longitude'].values, dfr['Latitude'].values, C=dfr['Value'].values, gridsize=15)

# set axis limits
extent = [dfr['Latitude'].values.min() - 1.5, dfr['Latitude'].values.max() + 1.5, 
          dfr['Longitude'].values.min() - 1.5, dfr['Longitude'].values.max() + 1.5]
ax.set_extent([-82, -79, 25, 27.5], ccrs.PlateCarree())

# Create colorbar in this axis
cbax = make_axes_locatable(ax).append_axes('right', size="6.5%", pad=0.1, axes_class=plt.Axes)
cbar = plt.colorbar(hex_bin, cax=cbax, orientation='vertical')
cbar.set_label('USD')

More examples here: http://scitools.org.uk/cartopy/docs/v0.15/gallery.html

## Back to Exposures

Just like with the `ImpactFuncSet` class, `Exposures` can be filled at instantiation or with the `read()` method. An info log appears for every optional variable that has not been filled. The default minimum logging level shown is INFO. To modify the logging level or other configuration parameters, generate a `climada.conf` file with the same structure as the default `defaults.conf` file and locate it in the climada folder (see [Configuration options](https://github.com/davidnbresch/climada_python/blob/master/README.md)).

Exposures contains a `tag` variable which contains the file name(s) loaded and description(s) of each, if a description is provided.

In [None]:
exp = Exposures(ENT_TEST_XLS) # Without description
print('Tag information:')
print(exp.tag)
print()
exp = Exposures(ENT_TEST_XLS, 'Exposures in Florida.') # With description
print('Tag information:')
print(exp.tag)

The values of the exposures can be visualized using the `plot()` function. This function accepts the `kwargs` arguments of the matplotlib [matplotlib hexbin](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.hexbin.html) function, so that different options can be easily set. Moreover, the plot functions return the figure and axes, so that they can be modified afterwards.

### EXERCISE

1. Get the mean and standard deviation of the exposure values.
2. Get the coordinates where the minimum and maximum exposure values are reached. 
3. Plot the values with the default settings.
4. Plot the values with a gridsize of 10 and a colormap ("cmap") of your choice: https://matplotlib.org/examples/color/colormaps_reference.html.

In [None]:
# Put your code here





In [None]:
%matplotlib inline
import numpy as np

# 1. value variable is an numpy array
print('Mean value:', exp.value.mean())
print('Mean std:', exp.value.std())

# 2. latitude and longitude can be directly accessed (they are properties of the class)
print('Coordinates maximum value: (lat, lon) = (', exp.lat[np.argmax(exp.value)],',', exp.lon[np.argmax(exp.value)], ')')
print('Coordinates maximum value: (lat, lon) = (', exp.lat[np.argmin(exp.value)],',', exp.lon[np.argmin(exp.value)], ')')

# 3.
exp.plot() # Default configuration

# 4.
exp.plot(gridsize=10, cmap='rainbow')  # Decrease number of bins and change colormap

We have loaded the excel file without having to specify which are the names of the variables contained in the file. This is possible, because climada contains default variable names for each file type. 

To see which are the name of the variables that have been read, you can use the `get_def_file_var_names()` method. Under the value `col_name`, the dictionary contains a key for each `Exposures` variable and a value which is the variable name in the excel file:

In [None]:
exp.get_def_file_var_names('.xls')

Trying to import an excel file with other variable names, produces an ERROR message and trace as follows:

In [None]:
from climada.util import DATA_DIR # folder containing all the test and demo data
WS_EXP = DATA_DIR + '/demo/WS_Europe.xls'

try:
    exp_eu = Exposures(WS_EXP)
except KeyError:
    print('Error caught.')

To load an excel file with different variable names, we can modify the previous values and provide that dictionary as input to the `read()` function.

### EXERCISE

Read the `climada/test/data/demo/WS_Europe.xls` exposures data and plot the exposures values. Notice that this file contains the corresponding impact function id under the variable `VulnCurveID`.

In [None]:
# Put your code here:





In [None]:
# Solution
# 1. Define exposures instance
exp_eu = Exposures()

# 2. Define new variable names:
# Retrieve default variable names
var_names = exp_eu.get_def_file_var_names('.xls')
# Modify variable name
var_names['col_name']['imp'] = 'VulnCurveID'

# 3. Read with new variable names:
description = 'Europe exposures winter storm'
exp_eu.read(WS_EXP, description, var_names)
exp_eu.plot() # plot default
exp_eu.plot(pop_name=False) # plot without populated places names

Two Exposures can be appended. Since one geographical coordinate can have several exposures values, repeated coordinates are allowed in the `Exposures` class. 

In [None]:
print('Number of exposures in Florida: ', exp.id.size)
print('Number of exposures in EU: ', exp_eu.id.size)
exp.append(exp_eu)
print('Number exposures Florida + EU: ', exp.id.size)

Several files can be also be read jointly providing a list with the file names or the name of a folder.

In [None]:
from climada.util import ENT_DEMO_MAT
file_names = [ENT_DEMO_MAT, ENT_TEST_XLS]
exp_all = Exposures(file_names)

# Centroids class

The `Centroids` class contains geographical coordinates. This class is used to define the grid (regular or irregular) where the hazard events are going to be resampled. The attributes are the following:

* tag (Tag): information about the source
* coord (np.array or Coordinates): 2d array with lat in first column and lon in second, or Coordinates instance. "lat" and "lon" are descriptors of the latitude and longitude respectively.
* id (np.array): an id for each centroid
* region_id (np.array, optional): region id for each centroid (when defined)
* dist_coast (np.array, optional): distance to coast in km
* admin0_name (str, optional): admin0 country name
* admin0_iso3 (str, optional): admin0 ISO3 country name

The `coord` variable is the one containing the coordinates. Actually, it is an instance of the `GridPoints` class (defined in the `util` package), which offers additional functionalities as checking if a grid is regular or not. `coord` can be simply used a 2d numpy array.

Climada classes can read different file formats. To check which ones, execute the function `get_sup_file_format()`. This method can be executed on the class directly, since it's a static mathod:

In [None]:
%matplotlib inline
from climada.hazard import Centroids

Centroids.get_sup_file_format()

### EXERCISE

Similarly as with the `Exposures` and `ImpactFuncSet` classes, do the following:
 - read climada's GLB_CENTROIDS_MAT. This file contains a grid of centroids over the whole Earth and is used as default data when no centroids are provided to climada.
 - plot the centroids

In [None]:
# Put your code here:





In [None]:
# Solution:
from climada.util import GLB_CENTROIDS_MAT

glb_cent = Centroids(GLB_CENTROIDS_MAT)

glb_cent.plot() # Plot with default settings

fig, ax = glb_cent.plot(s=0.9) # Change marker size
ax.set_xlim(-12, 25)      # Zoom Europe
ax.set_ylim(36, 72)

When computing the impacts, the values defined at `Exposures` are interpolated to the `Centroids` coordinates. This is done with the `assign()` method of the Exposures. Currently climada supports the nearest neighbor implementation with two different distances:

In [None]:
# the interpolation module contains the interpolation functions
from climada.util.interpolation import METHOD, DIST_DEF
print('Methods', METHOD)
print('Distances', DIST_DEF)

The `approx` implementation works fast for small amounts of data, whilst `haversine` is faster for large amounts of data. `haversine` is used in most of the cases, as for instance in the `assign()` method.

The Earth centroids `GLB_CENTROIDS_MAT` contains 1.656.093 points, and the European entities for winter storms contain 6.187 points. Let's measure how much time it takes to compute the nearest neighbors:

In [None]:
# The Earth centroids GLB_CENTROIDS_MAT contains 1656093 points and
# the European entities for winter storms contains 6186 points.
import time
time0 = time.time()
# The results are the closest centroids indexes for each exposure
interp_val = glb_cent.coord.resample_nn(exp_eu.coord)
timef = time.time()
print('The resampling took', timef - time0, 'seconds')
print('Result size:', interp_val.size)
print('Closest centroid of the first exposure:', glb_cent.lat[interp_val[0]], glb_cent.lon[interp_val[0]])

Two `Centroids` instances can be appended as well with the `append()` method. Since there can not be two centroids with the same coordinates, a check is performed to remove duplicates.

### EXERCISE

Read, append and plot the following two centroids files: `HAZ_DEMO_MAT` and `BRB_CENT`.

In [None]:
from climada.util import HAZ_DEMO_MAT, SOURCE_DIR
BRB_CENT = os.path.join(SOURCE_DIR, 'hazard/centroids/test/data', 'centr_brb_test.mat')

# Put your code here





In [None]:
# SOLUTION:
from climada.util import HAZ_DEMO_MAT, SOURCE_DIR
BRB_CENT = os.path.join(SOURCE_DIR, 'hazard/centroids/test/data', 'centr_brb_test.mat')

# Option 1: Read individually and append
fl_centr = Centroids(HAZ_DEMO_MAT)
brb_centr = Centroids(BRB_CENT)
fl_centr.append(brb_centr)
fl_centr.plot()

# Option 2: Read both together
all_centr = Centroids([HAZ_DEMO_MAT, BRB_CENT])
all_centr.plot(s=5);

### EXERCISE

From the last generated centroids, get the id of the centroid closest to lat, lon = (13.7, 60)

In [None]:
# Put your code here





In [None]:
# SOLUTION:
print('Closest centroid id:', all_centr.get_nearest_id(13.7, 60))