# Samples matrix and target values

In this notebook, we will show how CoPro reads the samples matrix and target values needed to establish a machine-learning model.

## Preparations

Start with loading the required packages.

In [1]:
from copro import utils, pipeline, data

%matplotlib inline

import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import os, sys
import warnings
warnings.simplefilter("ignore")

For better reproducibility, the version numbers of all key packages are provided.

In [2]:
utils.show_versions()

Python version: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 01:53:57) [MSC v.1916 64 bit (AMD64)]
copro version: 0.0.7b
geopandas version: 0.8.0
xarray version: 0.15.1
rasterio version: 1.1.0
pandas version: 1.0.3
numpy version: 1.18.1
scikit-learn version: 0.23.2
matplotlib version: 3.2.1
seaborn version: 0.11.0
rasterstats version: 0.14.0


To be able to also run this notebooks, some of the previously saved data needs to be loaded.

In [3]:
conflict_gdf = gpd.read_file('conflicts.shp')
selected_polygons_gdf = gpd.read_file('polygons.shp')

### The configurations-file (cfg-file)

To be able to continue the simulation with the same settings as in the previous notebook, the cfg-file has to be read again and the model needs to be initialised subsequently.

In [5]:
settings_file = 'example_settings.cfg'

In [6]:
config, out_dir, root_dir = utils.initiate_setup(settings_file)


#### CoPro version 0.0.7b ####
#### For information about the model, please visit https://copro.readthedocs.io/ ####
#### Copyright (2020-2020): Jannis M. Hoch, Sophie de Bruin, Niko Wanders ####
#### Contact via: j.m.hoch@uu.nl ####
#### The model can be used and shared under the MIT license ####

INFO: verbose mode on: True
INFO: saving output to folder C:\Users\hoch0001\Documents\_code\copro\example\./OUT
DEBUG: remove files in folder C:\Users\hoch0001\Documents\_code\copro\example\OUT
DEBUG: sparing clf.pkl
DEBUG: sparing XY.npy


## Read the files and store the data

### Background

This is an essential part of the code. For a machine-learning model to work, it requires a samples matrix (X), representing the 'drivers' of conflict, and target values (Y) representing the conflicts themselves. By fitting a machine-learning model, a relation between X and Y is established, which in turn can be used to make projections.

Additional information can be found on [scikit-learn](https://scikit-learn.org/stable/getting_started.html#fitting-and-predicting-estimator-basics).

Since CoPro simulates conflict risk not only globally, but also spatially explicit for provided polygons, it is furthermore needed to be able to associate each polygons with the corresponding data points in X and Y.

### Implementation

CoPro goes through all model years as specified in the cfg-file. Per year, CoPro loops over all polygons remaining after the selection procedure (see previous notebook) and does the following to obtain the X-data.

1. Assing ID to polygon and retrieve geometry information;
2. Calculate the mean value per polygon from each input file specified in the cfg-file in section 'data'.

And to obtain the Y-data:

1. Assign a Boolean value whether a conflict took place in a polygon or not - the number of casualties or conflicts per year is not relevant in thise case.

This information is stored in a X-array and a Y-array. The X-array has 2+n columns whereby n denotes the number of samples provided. The Y-array has obviously only 1 column.
In both arrays is the number of rows determined as number of years times the number of polygons. In case a row contains a missing value, the entire row is removed from the XY-array.

Note that the sample values can still range a lot depending on their units, measurement, etc. In the next notebook, the X-data will be scaled to be able to compare the different values in the samples matrix.

Since we did not specify a pre-calculated npy-file in the cfg-file, the provided files are read per year.

In [7]:
config.get('pre_calc', 'XY')

''

In [8]:
X, Y = pipeline.create_XY(config, out_dir, root_dir, selected_polygons_gdf, conflict_gdf)

{'poly_ID': Series([], dtype: float64), 'poly_geometry': Series([], dtype: float64), 'total_evaporation': Series([], dtype: float64), 'precipitation': Series([], dtype: float64), 'temperature': Series([], dtype: float64), 'irr_water_demand': Series([], dtype: float64), 'conflict_t-1': Series([], dtype: bool), 'conflict': Series([], dtype: bool)}

INFO: reading data for period from 2000 to 2015
DEBUG: finding touching neighbours for identifier watprovID 15
DEBUG: finding touching neighbours for identifier watprovID 16
DEBUG: finding touching neighbours for identifier watprovID 17
DEBUG: finding touching neighbours for identifier watprovID 18
DEBUG: finding touching neighbours for identifier watprovID 24
DEBUG: finding touching neighbours for identifier watprovID 25
DEBUG: finding touching neighbours for identifier watprovID 29
DEBUG: finding touching neighbours for identifier watprovID 30
DEBUG: finding touching neighbours for identifier watprovID 31
DEBUG: finding touching neighbours f

DEBUG: finding touching neighbours for identifier watprovID 786
DEBUG: finding touching neighbours for identifier watprovID 787
DEBUG: finding touching neighbours for identifier watprovID 797
DEBUG: finding touching neighbours for identifier watprovID 798
DEBUG: finding touching neighbours for identifier watprovID 799
DEBUG: finding touching neighbours for identifier watprovID 800
DEBUG: finding touching neighbours for identifier watprovID 801
DEBUG: finding touching neighbours for identifier watprovID 802
DEBUG: finding touching neighbours for identifier watprovID 803
DEBUG: finding touching neighbours for identifier watprovID 804
DEBUG: finding touching neighbours for identifier watprovID 805
DEBUG: finding touching neighbours for identifier watprovID 824
DEBUG: finding touching neighbours for identifier watprovID 827
DEBUG: finding touching neighbours for identifier watprovID 849
DEBUG: finding touching neighbours for identifier watprovID 850
DEBUG: finding touching neighbours for i

DEBUG: finding touching neighbours for identifier watprovID 1593
DEBUG: finding touching neighbours for identifier watprovID 1594
DEBUG: finding touching neighbours for identifier watprovID 1595
DEBUG: finding touching neighbours for identifier watprovID 1596
DEBUG: finding touching neighbours for identifier watprovID 1597
INFO: entering year 2000
DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2000
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2000
DEBUG: ... done.
DEBUG: calculating mean temperature per aggregation unit from file C:\Users\hoch0001\D

DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2002
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2002
DEBUG: ... done.
DEBUG: calculating mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2002
DEBUG: ... done.
DEBUG: calculating sum irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/irrWaterDemand.nc for year 2002
DEBUG: ... done.
DEBUG: c

DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2004
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2004
DEBUG: ... done.
DEBUG: calculating mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2004
DEBUG: ... done.
DEBUG: calculating sum irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/irrWaterDemand.nc for year 2004
DEBUG: ... done.
DEBUG: c

INFO: entering year 2006
DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2006
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2006
DEBUG: ... done.
DEBUG: calculating mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2006
DEBUG: ... done.
DEBUG: calculating sum irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/irrWaterDemand.nc for year 2006


DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2008
DEBUG: ... done.
DEBUG: calculating mean temperature per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/temperature_monthAvg_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2008
DEBUG: ... done.
DEBUG: calculating sum irr_water_demand per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/irrWaterDemand.nc for year 2008
DEBUG: ... done.
DEBUG: computing log-transformed count of conflicts at t-1
searching neighbors of watprovID 24
...neighbors are [  17 1313 1317]
searching neighbors of watprovID 42
...neighbors are [40 41 44 76]
searching neighbors of watprovID 43
...neighbors are [44 46 83]
searching neighbors of watprovID 47


...neighbors are [ 67  70  72 962 969 972 973]
searching neighbors of watprovID 976
...neighbors are [766 780 782 959 970 977 978]
searching neighbors of watprovID 977
...neighbors are [780 782 783 976 978 983 984 986]
searching neighbors of watprovID 978
...neighbors are [969 970 976 977 979 986]
searching neighbors of watprovID 979
...neighbors are [960 962 969 978 984 986]
searching neighbors of watprovID 994
...neighbors are [  53   54   55   99  100  991  992  993 1014]
searching neighbors of watprovID 1005
...neighbors are [1006 1225]
searching neighbors of watprovID 1006
...neighbors are [ 998 1005 1007 1008 1009 1016]
searching neighbors of watprovID 1012
...neighbors are [ 991 1010 1011 1013 1014 1017 1020]
searching neighbors of watprovID 1015
...neighbors are [ 426  770  785  786 1008 1016 1018 1020]
searching neighbors of watprovID 1016
...neighbors are [1006 1007 1008 1013 1015 1020]
searching neighbors of watprovID 1018
...neighbors are [  82  785  787 1015 1020]
searchin

DEBUG: ... done.
DEBUG: computing log-transformed count of conflicts at t-1
searching neighbors of watprovID 43
...neighbors are [44 46 83]
searching neighbors of watprovID 44
...neighbors are [40 42 43 46 47]
searching neighbors of watprovID 47
...neighbors are [ 40  44  45  46 765 766 959 970]
searching neighbors of watprovID 72
...neighbors are [  69   70   71   73  973  975 1302 1304]
searching neighbors of watprovID 81
...neighbors are [ 49  52  82 773 785 787]
searching neighbors of watprovID 82
...neighbors are [  50   52   59   80   81  774  779  785  787 1014 1018 1019 1020]
searching neighbors of watprovID 103
...neighbors are [ 104  106  107  108 1221 1311 1316]
searching neighbors of watprovID 105
...neighbors are [106 107]
searching neighbors of watprovID 106
...neighbors are [103 105 107 108]
searching neighbors of watprovID 108
...neighbors are [ 103  106 1311 1315 1316]
searching neighbors of watprovID 202
...neighbors are [204 205 207]
searching neighbors of watprovID 

...neighbors are [ 103  108 1221 1310 1313 1315 1316]
searching neighbors of watprovID 1313
...neighbors are [  16   17   24 1311 1312 1315 1317]
searching neighbors of watprovID 1315
...neighbors are [ 108 1311 1313 1316 1317]
searching neighbors of watprovID 1316
...neighbors are [ 103  108 1311 1315]
searching neighbors of watprovID 1317
...neighbors are [  17   24 1313 1315]
INFO: entering year 2013
DEBUG: getting the geometry of all geographical units
DEBUG: calculating log-transformed mean total_evaporation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/totalEvaporation_monthTot_output_2000_2015_Africa_yearmean.nc for year 2013
DEBUG: ... done.
DEBUG: calculating log-transformed mean precipitation per aggregation unit from file C:\Users\hoch0001\Documents\_code\copro\example\./example_data\hydro/precipitation_monthTot_output_2000-01-31_to_2015-12-31_Africa_yearmean.nc for year 2013
DEBUG: ... done.
DEBUG: calculating mean tempe

...neighbors are [768 776 781 782 783 981 987]
searching neighbors of watprovID 785
...neighbors are [  49   81   82  773  777  786  787 1015 1018]
searching neighbors of watprovID 786
...neighbors are [ 770  777  785 1015]
searching neighbors of watprovID 787
...neighbors are [  81   82  785 1018]
searching neighbors of watprovID 863
...neighbors are [ 56  60  66 866 869]
searching neighbors of watprovID 969
...neighbors are [ 67  68  70 962 970 975 978 979]
searching neighbors of watprovID 970
...neighbors are [ 40  47  67  68 959 969 976 978]
searching neighbors of watprovID 971
...neighbors are [ 973  974 1302]
searching neighbors of watprovID 972
...neighbors are [ 962  973  975 1522 1527]
searching neighbors of watprovID 973
...neighbors are [  72  971  972  974  975 1302 1523 1527]
searching neighbors of watprovID 975
...neighbors are [ 67  70  72 962 969 972 973]
searching neighbors of watprovID 976
...neighbors are [766 780 782 959 970 977 978]
searching neighbors of watprovID

INFO: all data read
INFO: saving XY data by default to file C:\Users\hoch0001\Documents\_code\copro\example\./OUT\XY.npy
DEBUG: number of data points including missing values: 4384
DEBUG: number of data points excluding missing values: 4272
DEBUG: a fraction of 15.94 percent in the data corresponds to conflicts.


Depending on sample and file size, obtaining the X-array and Y-array can be time-consuming. Therefore, CoPro automatically stores a combined XY-array as npy-file if not specified otherwise in the cfg-file.

In [9]:
os.path.isfile(os.path.join(os.path.abspath(config.get('general', 'output_dir')), 'XY.npy'))

True