# Variable values and conflict data

## Preparations

In [1]:
from copro import utils, pipeline

%matplotlib inline

import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import os, sys
import warnings
warnings.simplefilter("ignore")

For better reproducibility, the version numbers of all key packages are provided.

In [2]:
utils.show_versions()

Python version: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 01:53:57) [MSC v.1916 64 bit (AMD64)]
copro version: 0.0.7b
geopandas version: 0.8.0
xarray version: 0.15.1
rasterio version: 1.1.0
pandas version: 1.0.3
numpy version: 1.18.1
scikit-learn version: 0.23.2
matplotlib version: 3.2.1
seaborn version: 0.11.0
rasterstats version: 0.14.0


### The configurations-file (cfg-file)

In the cfg-file, all the settings for the analysis are defined. By 'parsing' (i.e. reading) it, all settings and file paths are known to the model. This is a simple way to make the code independent of the input data and settings.

**Note** that the cfg-file can be stored anywhere, not per se in the same directory where the model data is stored (as in this example case). Make sure that the paths in the cfg-file are updated if you use relative paths and change the folder location of th cfg-file.

In [3]:
settings_file = 'example_settings.cfg'

Based on this cfg-file, the set-up of the run can be initialized. One part of the cfg-file is the specification and creation of an output folder.

In [4]:
config, out_dir = utils.initiate_setup(settings_file)


#### CoPro version 0.0.7b ####
#### For information about the model, please visit https://copro.readthedocs.io/ ####
#### Copyright (2020-2020): Jannis M. Hoch, Sophie de Bruin, Niko Wanders ####
#### Contact via: j.m.hoch@uu.nl ####
#### The model can be used and shared under the MIT license ####

INFO: verbose mode on: False
INFO: saving output to folder C:\Users\hoch0001\Documents\_code\copro\example\OUT


So be able to continue from the previous notebook, some output has to be read in again.

In [5]:
conflict_gdf = gpd.read_file('conflicts.shp')
selected_polygons_gdf = gpd.read_file('polygons.shp')

## Read the files and store the data

This is an essential part of the code. Here, we go through all model years as specified in the cfg-file and do the following:

1. Get a 0/1 classifier whether a conflict took place in a geographical unit (here water province) or not;
2. Loop through various files with climate or environmental variables, and get mean variable value per geographical unit (here water province).

This information is stored in a XY-array with then is split in two different arrays. The X-array represents all climate/environmental variable values per polygon per year, while the Y-array represents the binary classifier whether conflict took place or not. In case some variables did contain no data for a given water province, this data points is dropped entirely.

Since we did not specify a npy-file in the cfg-file, the provided files are read per year.

In [6]:
config.get('pre_calc', 'XY')

''

Now let's get to it:

In [7]:
X, Y = pipeline.create_XY(config, selected_polygons_gdf, conflict_gdf)

INFO: reading data for period from 2000 to 2015
INFO: entering year 2000
INFO: entering year 2001
INFO: entering year 2002
INFO: entering year 2003
INFO: entering year 2004
INFO: entering year 2005
INFO: entering year 2006
INFO: entering year 2007
INFO: entering year 2008
INFO: entering year 2009
INFO: entering year 2010
INFO: entering year 2011
INFO: entering year 2012
INFO: entering year 2013
INFO: entering year 2014
INFO: entering year 2015
INFO: all data read
INFO: saving XY data by default to file C:\Users\hoch0001\Documents\_code\copro\example\OUT\XY.npy


At the end of this function, the resulting XY-array (i.e. before splitting to make it easier) is by default stored to the input directory. This is handy because we now do not need to repeat the file reading and data storing anymore. At least as long as the settings do not change!

In [8]:
os.path.isfile(os.path.join(os.path.abspath(config.get('general', 'output_dir')), 'XY.npy'))

True

**Note** that the XY.npy can be stored anywhere as long its location is correctly specified in the cfg-file.