# Variable values and conflict data

## Preparations

In [1]:
from copro import utils, pipeline

import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import os, sys
import warnings
warnings.simplefilter("ignore")

For better reproducibility, the version numbers of all key packages are provided.

In [2]:
utils.show_versions()

Python version: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 01:53:57) [MSC v.1916 64 bit (AMD64)]
copro version: 0.0.6b
geopandas version: 0.8.0
xarray version: 0.15.1
rasterio version: 1.1.0
pandas version: 1.0.3
numpy version: 1.18.1
scikit-learn version: 0.23.2
matplotlib version: 3.2.1
seaborn version: 0.11.0
rasterstats version: 0.14.0


In the cfg-file, all the settings for the analysis are defined. By 'parsing' (i.e. reading) it, all settings and file paths are known to the model. This is a simple way to make the code independent of the input data and settings.

In [3]:
settings_file = 'example_settings.cfg'

Based on this cfg-file, the set-up of the run can be initialized. One part of the cfg-file is the specification and creation of an output folder.

In [4]:
config, out_dir = utils.initiate_setup(settings_file)

saving output to folder C:\Users\hoch0001\Documents\_code\copro\example\OUT

no conflict file was specified, hence downloading data from http://ucdp.uu.se/downloads/ged/ged201-csv.zip to C:\Users\hoch0001\Documents\_code\copro\example\example_data\UCDP\ged201-csv.zip



So be able to continue from the previous notebook, some output has to be read in again.

In [5]:
conflict_gdf = gpd.read_file(os.path.join(out_dir, 'selected_conflicts.shp'))
selected_polygons_gdf = gpd.read_file(os.path.join(out_dir, 'selected_polygons.shp'))

## Read the files and store the data

This is an essential part of the code. Here, we go through all model years as specified in the cfg-file and do the following:

1. Get a 0/1 classifier whether a conflict took place in a geographical unit (here water province) or not;
2. Loop through various files with climate or environmental variables, and get mean variable value per geographical unit (here water province).

This information is stored in a XY-array with then is split in two different arrays. The X-array represents all climate/environmental variable values per polygon per year, while the Y-array represents the binary classifier whether conflict took place or not. In case some variables did contain no data for a given water province, this data points is dropped entirely.

Since we did not specify a npy-file in the cfg-file, the provided files are read.

In [6]:
config.get('pre_calc', 'XY')

''

Now let's get to it:

In [7]:
X, Y = pipeline.create_XY(config, selected_polygons_gdf, conflict_gdf)

{'poly_ID': Series([], dtype: float64), 'poly_geometry': Series([], dtype: float64), 'total_evaporation': Series([], dtype: float64), 'precipitation': Series([], dtype: float64), 'temperature': Series([], dtype: float64), 'irr_water_demand': Series([], dtype: float64), 'conflict': Series([], dtype: int32)}

INFO: reference run
reading data for period from 2000 to 2015


entering year 2000

listing the geometry of all geographical units
calculating mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\example_data\totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2000
calculating mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\example_data\precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2000
calculating mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\example_data\temperature_monthAvg_outp

calculating mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\example_data\precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2009
calculating mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\example_data\temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2009
calculating mean irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\example_data\irrWaterDemand.nc for year 2009

entering year 2010

listing the geometry of all geographical units
calculating mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\example_data\totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2010
calculating mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\example_data\precipitation_monthTot_output_

At the end of this function, the resulting XY-array (i.e. before splitting to make it easier) is by default stored to the input directory. This is handy because we now do not need to repeat the file reading and data storing anymore. At least as long as the settings do not change!

In [8]:
os.path.isfile(os.path.join(os.path.abspath(config.get('general', 'input_dir')), 'XY.npy'))

True