# Samples matrix and target values

In this notebook, we will show how CoPro reads the samples matrix and target values needed to establish a machine-learning model.

## Preparations

Start with loading the required packages.

In [1]:
from copro import utils, pipeline, data

%matplotlib inline

import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import os, sys
import warnings
warnings.simplefilter("ignore")

For better reproducibility, the version numbers of all key packages are provided.

In [2]:
utils.show_versions()

Python version: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 01:53:57) [MSC v.1916 64 bit (AMD64)]
copro version: 0.0.7b
geopandas version: 0.8.0
xarray version: 0.15.1
rasterio version: 1.1.0
pandas version: 1.0.3
numpy version: 1.18.1
scikit-learn version: 0.23.2
matplotlib version: 3.2.1
seaborn version: 0.11.0
rasterstats version: 0.14.0


To be able to also run this notebooks, some of the previously saved data needs to be loaded.

In [3]:
conflict_gdf = gpd.read_file('conflicts.shp')
selected_polygons_gdf = gpd.read_file('polygons.shp')

### The configurations-file (cfg-file)

To be able to continue the simulation with the same settings as in the previous notebook, the cfg-file has to be read again and the model needs to be initialised subsequently.

In [4]:
settings_file = 'example_settings.cfg'

In [5]:
config, out_dir, root_dir = utils.initiate_setup(settings_file)


#### CoPro version 0.0.7b ####
#### For information about the model, please visit https://copro.readthedocs.io/ ####
#### Copyright (2020-2020): Jannis M. Hoch, Sophie de Bruin, Niko Wanders ####
#### Contact via: j.m.hoch@uu.nl ####
#### The model can be used and shared under the MIT license ####

INFO: verbose mode on: True
INFO: saving output to folder C:\Users\hoch0001\Documents\_code\copro\example\./OUT
DEBUG: remove files in folder C:\Users\hoch0001\Documents\_code\copro\example\OUT
DEBUG: sparing XY.npy


## Read the files and store the data

### Background

This is an essential part of the code. For a machine-learning model to work, it requires a samples matrix (X), representing the 'drivers' of conflict, and target values (Y) representing the conflicts themselves. By fitting a machine-learning model, a relation between X and Y is established, which in turn can be used to make projections.

Additional information can be found on [scikit-learn](https://scikit-learn.org/stable/getting_started.html#fitting-and-predicting-estimator-basics).

Since CoPro simulates conflict risk not only globally, but also spatially explicit for provided polygons, it is furthermore needed to be able to associate each polygons with the corresponding data points in X and Y.

### Implementation

CoPro goes through all model years as specified in the cfg-file. Per year, CoPro loops over all polygons remaining after the selection procedure (see previous notebook) and does the following to obtain the X-data.

1. Assing ID to polygon and retrieve geometry information;
2. Calculate the mean value per polygon from each input file specified in the cfg-file in section 'data'.

And to obtain the Y-data:

1. Assign a Boolean value whether a conflict took place in a polygon or not - the number of casualties or conflicts per year is not relevant in thise case.

This information is stored in a X-array and a Y-array. The X-array has 2+n columns whereby n denotes the number of samples provided. The Y-array has obviously only 1 column.
In both arrays is the number of rows determined as number of years times the number of polygons. In case a row contains a missing value, the entire row is removed from the XY-array.

Note that the sample values can still range a lot depending on their units, measurement, etc. In the next notebook, the X-data will be scaled to be able to compare the different values in the samples matrix.

Since we did not specify a pre-calculated npy-file in the cfg-file, the provided files are read per year.

In [6]:
config.get('pre_calc', 'XY')

''

In [7]:
X, Y = pipeline.create_XY(config, out_dir, root_dir, selected_polygons_gdf, conflict_gdf)

{'poly_ID': Series([], dtype: float64), 'poly_geometry': Series([], dtype: float64), 'total_evaporation': Series([], dtype: float64), 'precipitation': Series([], dtype: float64), 'temperature': Series([], dtype: float64), 'irr_water_demand': Series([], dtype: float64), 'conflict_t-1': Series([], dtype: float64), 'conflict': Series([], dtype: bool)}

INFO: reading data for period from 2000 to 2015
DEBUG: determining matrix with neighboring polygons
INFO: entering year 2000
DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2000
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_ye

INFO: entering year 2001
DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2001
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2001
DEBUG: ... done.
DEBUG: calculating log-transformed mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2001
DEBUG: ... done.
DEBUG: calculating log-transformed mean irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro

INFO: entering year 2002
DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2002
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2002
DEBUG: ... done.
DEBUG: calculating log-transformed mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2002
DEBUG: ... done.
DEBUG: calculating log-transformed mean irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro

INFO: entering year 2003
DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2003
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2003
DEBUG: ... done.
DEBUG: calculating log-transformed mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2003
DEBUG: ... done.
DEBUG: calculating log-transformed mean irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro

DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2004
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2004
DEBUG: ... done.
DEBUG: calculating log-transformed mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2004
DEBUG: ... done.
DEBUG: calculating log-transformed mean irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/irrWaterDemand.nc for ye

INFO: entering year 2005
DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2005
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2005
DEBUG: ... done.
DEBUG: calculating log-transformed mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2005
DEBUG: ... done.
DEBUG: calculating log-transformed mean irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro

INFO: entering year 2006
DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2006
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2006
DEBUG: ... done.
DEBUG: calculating log-transformed mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2006
DEBUG: ... done.
DEBUG: calculating log-transformed mean irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro

DEBUG: ... done.
DEBUG: calculating log-transformed mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2007
DEBUG: ... done.
DEBUG: calculating log-transformed mean irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/irrWaterDemand.nc for year 2007
DEBUG: ... done.
DEBUG: computing log-transformed count of conflicts at t-1
DEBUG: searching neighbors of watprovID 24
DEBUG: watprovID 24 has neighbor 1317 with 1 conflict(s) in previous timestep
DEBUG: total number of conflicts at t-1 for watprovID 24 is 1.0
DEBUG: searching neighbors of watprovID 43
DEBUG: watprovID 43 has neighbor 44 with 4 conflict(s) in previous timestep
DEBUG: total number of conflicts at t-1 for watprovID 43 is 4.0
DEBUG: searching neighbors of watprovID 44
DEBUG: watprovID 44 has neighbor 43 with 3 conflict(

INFO: entering year 2008
DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2008
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2008
DEBUG: ... done.
DEBUG: calculating log-transformed mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2008
DEBUG: ... done.
DEBUG: calculating log-transformed mean irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro

INFO: entering year 2009
DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2009
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2009
DEBUG: ... done.
DEBUG: calculating log-transformed mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2009
DEBUG: ... done.
DEBUG: calculating log-transformed mean irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro

INFO: entering year 2010
DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2010
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2010
DEBUG: ... done.
DEBUG: calculating log-transformed mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2010
DEBUG: ... done.
DEBUG: calculating log-transformed mean irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro

INFO: entering year 2011
DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2011
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2011
DEBUG: ... done.
DEBUG: calculating log-transformed mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2011
DEBUG: ... done.
DEBUG: calculating log-transformed mean irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro

DEBUG: watprovID 1312 has neighbor 1224 with 19 conflict(s) in previous timestep
DEBUG: total number of conflicts at t-1 for watprovID 1312 is 19.0
DEBUG: searching neighbors of watprovID 1315
DEBUG: watprovID 1315 has neighbor 108 with 4 conflict(s) in previous timestep
DEBUG: watprovID 1315 has neighbor 1311 with 2 conflict(s) in previous timestep
DEBUG: watprovID 1315 has neighbor 1316 with 20 conflict(s) in previous timestep
DEBUG: watprovID 1315 has neighbor 1317 with 6 conflict(s) in previous timestep
DEBUG: total number of conflicts at t-1 for watprovID 1315 is 32.0
DEBUG: searching neighbors of watprovID 1316
DEBUG: watprovID 1316 has neighbor 103 with 3 conflict(s) in previous timestep
DEBUG: watprovID 1316 has neighbor 108 with 4 conflict(s) in previous timestep
DEBUG: watprovID 1316 has neighbor 1311 with 2 conflict(s) in previous timestep
DEBUG: watprovID 1316 has neighbor 1315 with 228 conflict(s) in previous timestep
DEBUG: total number of conflicts at t-1 for watprovID 1

DEBUG: watprovID 1306 has neighbor 89 with 3 conflict(s) in previous timestep
DEBUG: total number of conflicts at t-1 for watprovID 1306 is 3.0
DEBUG: searching neighbors of watprovID 1311
DEBUG: watprovID 1311 has neighbor 108 with 4 conflict(s) in previous timestep
DEBUG: watprovID 1311 has neighbor 1313 with 3 conflict(s) in previous timestep
DEBUG: watprovID 1311 has neighbor 1315 with 184 conflict(s) in previous timestep
DEBUG: watprovID 1311 has neighbor 1316 with 10 conflict(s) in previous timestep
DEBUG: total number of conflicts at t-1 for watprovID 1311 is 201.0
DEBUG: searching neighbors of watprovID 1313
DEBUG: watprovID 1313 has neighbor 17 with 1 conflict(s) in previous timestep
DEBUG: watprovID 1313 has neighbor 24 with 5 conflict(s) in previous timestep
DEBUG: watprovID 1313 has neighbor 1311 with 4 conflict(s) in previous timestep
DEBUG: watprovID 1313 has neighbor 1315 with 184 conflict(s) in previous timestep
DEBUG: watprovID 1313 has neighbor 1317 with 29 conflict(s

DEBUG: watprovID 426 has neighbor 1015 with 8 conflict(s) in previous timestep
DEBUG: total number of conflicts at t-1 for watprovID 426 is 8.0
DEBUG: searching neighbors of watprovID 783
DEBUG: watprovID 783 has neighbor 784 with 157 conflict(s) in previous timestep
DEBUG: watprovID 783 has neighbor 981 with 7 conflict(s) in previous timestep
DEBUG: total number of conflicts at t-1 for watprovID 783 is 164.0
DEBUG: searching neighbors of watprovID 784
DEBUG: watprovID 784 has neighbor 783 with 27 conflict(s) in previous timestep
DEBUG: watprovID 784 has neighbor 981 with 7 conflict(s) in previous timestep
DEBUG: watprovID 784 has neighbor 987 with 3 conflict(s) in previous timestep
DEBUG: total number of conflicts at t-1 for watprovID 784 is 37.0
DEBUG: searching neighbors of watprovID 785
DEBUG: watprovID 785 has neighbor 82 with 9 conflict(s) in previous timestep
DEBUG: watprovID 785 has neighbor 1015 with 8 conflict(s) in previous timestep
DEBUG: watprovID 785 has neighbor 1018 wit

INFO: entering year 2014
DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2014
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2014
DEBUG: ... done.
DEBUG: calculating log-transformed mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2014
DEBUG: ... done.
DEBUG: calculating log-transformed mean irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro

DEBUG: watprovID 784 has neighbor 768 with 1 conflict(s) in previous timestep
DEBUG: watprovID 784 has neighbor 783 with 11 conflict(s) in previous timestep
DEBUG: watprovID 784 has neighbor 981 with 7 conflict(s) in previous timestep
DEBUG: watprovID 784 has neighbor 987 with 1 conflict(s) in previous timestep
DEBUG: total number of conflicts at t-1 for watprovID 784 is 20.0
DEBUG: searching neighbors of watprovID 785
DEBUG: watprovID 785 has neighbor 82 with 2 conflict(s) in previous timestep
DEBUG: watprovID 785 has neighbor 786 with 10 conflict(s) in previous timestep
DEBUG: watprovID 785 has neighbor 787 with 2 conflict(s) in previous timestep
DEBUG: watprovID 785 has neighbor 1015 with 24 conflict(s) in previous timestep
DEBUG: watprovID 785 has neighbor 1018 with 12 conflict(s) in previous timestep
DEBUG: total number of conflicts at t-1 for watprovID 785 is 50.0
DEBUG: searching neighbors of watprovID 786
DEBUG: watprovID 786 has neighbor 770 with 1 conflict(s) in previous time

INFO: entering year 2015
DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2015
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2015
DEBUG: ... done.
DEBUG: calculating log-transformed mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2015
DEBUG: ... done.
DEBUG: calculating log-transformed mean irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro

INFO: all data read
INFO: saving XY data by default to file C:\Users\hoch0001\Documents\_code\copro\example\./OUT\XY.npy
DEBUG: number of data points including missing values: 4384
DEBUG: number of data points excluding missing values: 4272
DEBUG: a fraction of 15.94 percent in the data corresponds to conflicts.


Depending on sample and file size, obtaining the X-array and Y-array can be time-consuming. Therefore, CoPro automatically stores a combined XY-array as npy-file if not specified otherwise in the cfg-file.

In [11]:
os.path.isfile(os.path.join(os.path.abspath(config.get('general', 'output_dir')), 'XY.npy'))

True