# Analyzing the `pinochet` dataset

Taken from <https://github.com/danilofreire/pinochet>

In [5]:
!tree -L 2 ../pinochet

[01;34m../pinochet[00m
├── _config.yml
├── [01;34mdata[00m
│   ├── pinochet.csv
│   ├── pinochet.RData
│   └── pinochet.xlsx
├── [01;34mdocs[00m
│   ├── 404.html
│   ├── [01;34marticles[00m
│   ├── authors.html
│   ├── docsearch.css
│   ├── docsearch.js
│   ├── index.html
│   ├── LICENSE-text.html
│   ├── [01;35mlink.svg[00m
│   ├── pkgdown.css
│   ├── pkgdown.js
│   ├── pkgdown.yml
│   └── [01;34mreference[00m
├── [01;34mfigures[00m
│   ├── [01;35mmap.png[00m
│   └── [01;35mtime-trend.png[00m
├── [01;34mmanuscript[00m
│   ├── [01;34marticle[00m
│   ├── _config.yml
│   └── [01;34monline-appendix[00m
└── README.md

8 directories, 18 files


In [7]:
!cat ../pinochet/README.md

# Deaths and Disappearances in the Pinochet Regime: A New Dataset

[![CRAN\_Status\_Badge](http://www.r-pkg.org/badges/version/pinochet)](https://cran.r-project.org/package=pinochet) 
[![Travis-CI Build
Status](https://travis-ci.org/danilofreire/pinochet.svg?branch=package)](https://travis-ci.org/danilofreire/pinochet)
[![DOI](https://zenodo.org/badge/103286196.svg)](https://zenodo.org/badge/latestdoi/103286196)
[![](http://cranlogs.r-pkg.org/badges/grand-total/pinochet?color=blue)](https://cran.r-project.org/package=pinochet)

This Github repository contains data and documented R code for [Deaths and Disappearances in the Pinochet Regime: A New Dataset](https://doi.org/10.31235/osf.io/vqnwu) by Freire et al (2019). We coded the personal details of 2,398 victims named in the Chilean Truth Commission Report along with information about the perpetrators and geographical coordinates for all identifiable atrocity locations. The dataset covers from 1973 to 1990 and includes 59 indic

In [12]:
!head -2 ../pinochet/data/pinochet.csv 

individual_id,group_id,start_date_daily,end_date_daily,start_date_monthly,end_date_monthly,last_name,first_name,minor,age,male,occupation,occupation_detail,victim_affiliation,victim_affiliation_detail,violence,method,interrogation,torture,mistreatment,targeted,press,war_tribunal,number_previous_arrests,perpetrator_affiliation,perpetrator_affiliation_detail,nationality,place_1,start_location_1,latitude_1,longitude_1,exact_coordinates_1,place_2,location_2,latitude_2,longitude_2,exact_coordinates_2,place_3,end_location_3,latitude_3,longitude_3,exact_coordinates_3,place_4,end_location_4,latitude_4,longitude_4,exact_coordinates_4,place_5,end_location_5,latitude_5,longitude_5,exact_coordinates_5,place_6,end_location_6,latitude_6,longitude_6,exact_coordinates_6,page,additional_comments
1,1,1973-09-12,1973-09-12,1973-09-01,1973-09-01,Corredera Reyes,Mercedes del Pilar,1,NA,0,School Student,high school,NA,NA,Killed,Gun,NA,NA,NA,NA,0,0,NA,NA,NA,Chilean,In Public,Calle Gran Avenida,-33.501342,-7

In [13]:
pinopath = "../pinochet/data/pinochet.csv"

In [204]:
import pandas as pd
import numpy as np

df = pd.read_csv(pinopath, sep=',', header=0)
df.head()

Unnamed: 0,individual_id,group_id,start_date_daily,end_date_daily,start_date_monthly,end_date_monthly,last_name,first_name,minor,age,...,latitude_5,longitude_5,exact_coordinates_5,place_6,end_location_6,latitude_6,longitude_6,exact_coordinates_6,page,additional_comments
0,1,1,1973-09-12,1973-09-12,1973-09-01,1973-09-01,Corredera Reyes,Mercedes del Pilar,1.0,,...,,,,,,,,,159,
1,2,2,1973-09-11,1973-09-12,1973-09-01,1973-09-01,Torres Torres,Benito Heriberto,0.0,57.0,...,,,,,,,,,159-60,
2,3,3,1973-09-12,1973-09-12,1973-09-01,1973-09-01,Lira Morales,Juan Manuel,0.0,23.0,...,,,,,,,,,160,
3,4,4,1973-09-12,1973-09-14,1973-09-01,1973-09-01,Fontela Alonso,Alberto Mariano,0.0,26.0,...,,,,,,,,,160,
4,5,5,1973-09-12,1973-09-12,1973-09-01,1973-09-01,Quintilliano Cardozo,Tulio Roberto,0.0,29.0,...,,,,,,,,,160-61,


In [98]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2398 entries, 0 to 2397
Data columns (total 59 columns):
individual_id                     2398 non-null int64
group_id                          2398 non-null int64
start_date_daily                  2232 non-null object
end_date_daily                    2249 non-null object
start_date_monthly                2289 non-null object
end_date_monthly                  2307 non-null object
last_name                         2398 non-null object
first_name                        2398 non-null object
minor                             2331 non-null float64
age                               1625 non-null float64
male                              2333 non-null float64
occupation                        1820 non-null object
occupation_detail                 1732 non-null object
victim_affiliation                1453 non-null object
victim_affiliation_detail         1327 non-null object
violence                          2393 non-null object
method      

In [99]:
df.describe()

Unnamed: 0,individual_id,group_id,minor,age,male,interrogation,torture,mistreatment,press,war_tribunal,...,exact_coordinates_3,latitude_4,longitude_4,exact_coordinates_4,latitude_5,longitude_5,exact_coordinates_5,latitude_6,longitude_6,exact_coordinates_6
count,2398.0,2398.0,2331.0,1625.0,2333.0,561.0,796.0,685.0,2398.0,2398.0,...,466.0,111.0,111.0,111.0,32.0,32.0,32.0,6.0,6.0,6.0
mean,1199.5,750.821101,0.040326,29.677674,0.956708,0.151515,0.35804,0.334307,0.03628,0.039199,...,0.706009,-31.901536,-70.992123,0.63964,-28.514519,-67.730817,0.71875,-33.658462,-70.584126,0.166667
std,692.387295,590.077072,0.196765,10.551773,0.203557,0.35887,0.479725,0.472092,0.187026,0.194109,...,0.456078,7.25484,3.380823,0.482282,14.31513,17.045161,0.456803,0.085408,0.030511,0.408248
min,1.0,1.0,0.0,0.22,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,-40.65,-102.55278,0.0,-40.65847,-73.155567,0.0,-33.69333,-70.646406,0.0
25%,600.25,312.25,0.0,22.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,-33.594037,-71.227613,0.0,-33.585206,-71.23861,0.0,-33.69333,-70.57167,0.0
50%,1199.5,603.5,0.0,27.0,1.0,0.0,0.0,0.0,0.0,0.0,...,1.0,-33.484124,-70.671424,1.0,-33.27833,-70.694232,1.0,-33.69333,-70.57167,0.0
75%,1798.75,1001.0,0.0,35.0,1.0,0.0,1.0,1.0,0.0,0.0,...,1.0,-33.323599,-70.622579,1.0,-23.110638,-69.523302,1.0,-33.69333,-70.57167,0.0
max,2398.0,2722.0,1.0,85.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,23.6345,-58.46361,1.0,42.73388,25.48583,1.0,-33.484124,-70.57167,1.0


In [100]:
df.columns

Index(['individual_id', 'group_id', 'start_date_daily', 'end_date_daily',
       'start_date_monthly', 'end_date_monthly', 'last_name', 'first_name',
       'minor', 'age', 'male', 'occupation', 'occupation_detail',
       'victim_affiliation', 'victim_affiliation_detail', 'violence', 'method',
       'interrogation', 'torture', 'mistreatment', 'targeted', 'press',
       'war_tribunal', 'number_previous_arrests', 'perpetrator_affiliation',
       'perpetrator_affiliation_detail', 'nationality', 'place_1',
       'start_location_1', 'latitude_1', 'longitude_1', 'exact_coordinates_1',
       'place_2', 'location_2', 'latitude_2', 'longitude_2',
       'exact_coordinates_2', 'place_3', 'end_location_3', 'latitude_3',
       'longitude_3', 'exact_coordinates_3', 'place_4', 'end_location_4',
       'latitude_4', 'longitude_4', 'exact_coordinates_4', 'place_5',
       'end_location_5', 'latitude_5', 'longitude_5', 'exact_coordinates_5',
       'place_6', 'end_location_6', 'latitude_6', 'lon

## Codebook

Del _codebook_ ubicado en <https://osf.io/8fkxq/>, es posible obtener las descripciones de las columnas. El texto se encuentra disponible online y de acceso público, por lo que no hace falta duplicar esa información.



Sin embargo, algunas columnas contienen información que explica la naturaleza de este dataset.

¿Qué tipo de métodos de violencia fueron utilizados por los perpetradores?

In [101]:
df['method'].unique().tolist()

['Gun',
 nan,
 'Hung',
 'Denied Medical Treatment',
 'Beaten',
 'Asphyxiated',
 'Torture',
 'Jumped',
 'Cardio-respiratory Arrest',
 'Gun and Acute Loss of Blood',
 'Immersion',
 'Gun and Asphyxiated',
 'Cardio-respiratory Arrest and Torture',
 'Denied Medical Treatment and Torture',
 'Poisoned',
 'Knife',
 'Gun and Knife',
 'Burned',
 'Bomb',
 'Gun and Bomb',
 'Tear Gas',
 'Gun and Torture',
 'Beaten and Immersion',
 'Electrocuted',
 'Acute Loss of Blood',
 'Intentional Car Crash']

¿Cuántos menores figuran en el informe?

In [102]:
df[df['minor'] == 1.0].count()['minor']

94

¿Cuál es la edad promedio de las personas en el informe?

In [103]:
df['age'].mean()

29.677673846153848

¿Qué edad tenía la persona más joven del informe?

In [104]:
print(f'{df["age"].min() * 12} meses')  # los valores están en años, queremos expresarlos en meses.

2.64 meses


¿Cómo se distribuyen las ocupaciones?

In [107]:
df[['occupation', 'individual_id']].groupby(['occupation']).count().sort_values(by='individual_id', ascending=False)

Unnamed: 0_level_0,individual_id
occupation,Unnamed: 1_level_1
Blue Collar,896
White Collar,429
Non-military Government,151
University Student,140
School Student,88
Military,59
Unemployed,15
Housewife,8
White Collar and University Student,7
Blue Collar and White Collar,5


_Blue collar_ se refiere a obreros, de acuerdo a [Wikipedia](https://en.wikipedia.org/wiki/Blue-collar_worker), mientras que _White collar_ se refiere [a trabajos administrativos](https://en.wikipedia.org/wiki/White-collar_worker)

¿Qué afiliación política tienen los perpetradores de los hechos indicados?

In [106]:
df[['perpetrator_affiliation', 'individual_id']].groupby(['perpetrator_affiliation']).count().sort_values(by='individual_id', ascending=False)

Unnamed: 0_level_0,individual_id
perpetrator_affiliation,Unnamed: 1_level_1
Regime,2056
Opposition,125


¿Cuántos hechos aparecieron en la prensa?

In [119]:
df[['press','individual_id']].groupby('press').count()

Unnamed: 0_level_0,individual_id
press,Unnamed: 1_level_1
0,2311
1,87


## Información geográfica

Es de interés para el caso de nuestro análisis, el identificar la información geoespacial del dataset, por lo que trabajaremos con las siguientes columnas:

In [37]:
columns_geo = []
for c in df.columns:
    if any(x in c for x in ("location", "place", "longitude", "latitude", "coordinates")):
        columns_geo.append(c)
        
df[columns_geo].head()

Unnamed: 0_level_0,place_1,start_location_1,latitude_1,longitude_1,exact_coordinates_1,place_2,location_2,latitude_2,longitude_2,exact_coordinates_2,...,place_5,end_location_5,latitude_5,longitude_5,exact_coordinates_5,place_6,end_location_6,latitude_6,longitude_6,exact_coordinates_6
individual_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,In Public,Calle Gran Avenida,-33.501342,-70.654242,0.0,In Hospital,Medical Legal Institute (by the Barros Luco Ho...,-33.484124,-70.646406,1.0,...,,,,,,,,,,
2,Home,Santiago,-33.44889,-70.66927,0.0,In Custody,Towards the 26th police station,-33.447846,-70.73953,0.0,...,,,,,,,,,,
3,In Public,La Legua shantytown,-33.48722,-70.63556,0.0,In Hospital,Barros Luco Hospital,-33.484124,-70.646406,1.0,...,,,,,,,,,,
4,In Custody,Tacna Regiment,-33.596477,-70.704573,1.0,,,,,,...,,,,,,,,,,
5,In Custody,Military Academy,-33.411545,-70.584206,1.0,,,,,,...,,,,,,,,,,


Del _codebook_ se observa lo siguiente (énfasis mío).


> `exact_coordinates_n` : We matched the event sites with coordinates of latitude and longitude.
As the report does not have the precise location of all events, we used the closest reference
available. This is a dummy variable stating whether coordinates are precise (street level) or
not. 1 = yes. There are six variables in the dataset, each pertaining to one location where the
individual was found or taken to.
>
> `location_n` : Where the individual was seen or found. There are up to 6 locations, so we
coded them as location_1 to location_6 . The same pattern repeats in the variables below.
The compilation of the location_n variable was based completely on information given in
the Truth Report. However, since this information was in a string format (e.g. intersection
of Calle Grecia and Avenida Rosa), creating a new variable incorporating each location’s
latitude and longitude was necessary to pursue further analysis of the trends in deaths and
disappearances. The format chosen was **decimal coordinates**.
>
> `place_n` : Place where the individual was spotted/reported to be seen. (in chronological order,
from 1 to 6 places). 
>
>     Categories:
>            – Home; Work; University; In custody; In public; In hospital; Unknown

Usaremos la primera ubicación indicada, es decir `n=1`, usando `shapely` y `folium`. Pero para esto primero debemos aislar estos puntos, eliminando las columnas con `n>1`.

In [129]:
dfn1 = df.drop(axis=1,
    labels=[
        'place_2', 'location_2', 'latitude_2', 'longitude_2',
        'exact_coordinates_2', 'place_3', 'end_location_3', 'latitude_3',
        'longitude_3', 'exact_coordinates_3', 'place_4', 'end_location_4',
        'latitude_4', 'longitude_4', 'exact_coordinates_4', 'place_5',
        'end_location_5', 'latitude_5', 'longitude_5', 'exact_coordinates_5',
        'place_6', 'end_location_6', 'latitude_6', 'longitude_6',
        'exact_coordinates_6',])

In [130]:
dfn1.head()

Unnamed: 0,individual_id,group_id,start_date_daily,end_date_daily,start_date_monthly,end_date_monthly,last_name,first_name,minor,age,...,perpetrator_affiliation,perpetrator_affiliation_detail,nationality,place_1,start_location_1,latitude_1,longitude_1,exact_coordinates_1,page,additional_comments
0,1,1,1973-09-12,1973-09-12,1973-09-01,1973-09-01,Corredera Reyes,Mercedes del Pilar,1.0,,...,,,Chilean,In Public,Calle Gran Avenida,-33.501342,-70.654242,0.0,159,
1,2,2,1973-09-11,1973-09-12,1973-09-01,1973-09-01,Torres Torres,Benito Heriberto,0.0,57.0,...,Regime,Policemen,Chilean,Home,Santiago,-33.44889,-70.66927,0.0,159-60,
2,3,3,1973-09-12,1973-09-12,1973-09-01,1973-09-01,Lira Morales,Juan Manuel,0.0,23.0,...,Regime,Military,Chilean,In Public,La Legua shantytown,-33.48722,-70.63556,0.0,160,
3,4,4,1973-09-12,1973-09-14,1973-09-01,1973-09-01,Fontela Alonso,Alberto Mariano,0.0,26.0,...,Regime,Military (Tacna Regiment),Chilean,In Custody,Tacna Regiment,-33.596477,-70.704573,1.0,160,
4,5,5,1973-09-12,1973-09-12,1973-09-01,1973-09-01,Quintilliano Cardozo,Tulio Roberto,0.0,29.0,...,Regime,Military,Chilean,In Custody,Military Academy,-33.411545,-70.584206,1.0,160-61,


In [131]:
import folium
print(folium.__version__)

0.10.0


In [141]:
dfn1[['latitude_1', 'longitude_1']].describe()

Unnamed: 0,latitude_1,longitude_1
count,2109.0,2109.0
mean,-33.997637,-70.943345
std,5.127197,2.486779
min,-53.16383,-77.03687
25%,-36.60626,-71.850008
50%,-33.48694,-70.69517
75%,-33.440641,-70.64909
max,48.85661,2.35222


In [156]:
dfn1[['latitude_1', 'longitude_1']].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2398 entries, 0 to 2397
Data columns (total 2 columns):
latitude_1     2109 non-null float64
longitude_1    2109 non-null float64
dtypes: float64(2)
memory usage: 37.6 KB


In [174]:
dfn1[['latitude_1', 'longitude_1']].isnull().sum()  # eliminamos aquellos valores nulos

latitude_1     289
longitude_1    289
dtype: int64

In [178]:
dfn1 = dfn1.dropna(subset=['latitude_1', 'longitude_1'])
dfn1[['latitude_1', 'longitude_1']].isnull().sum()  # eliminamos aquellos valores nulos

latitude_1     0
longitude_1    0
dtype: int64

Objetivos

1. Dibujar los puntos de la localizacion N1
    - caracterizar los íconos por tipo de violencia, edad, y ocupación a través de un radiobutton
    - otros
2. Dibujar los caminos realizados por cada trayecto cuando N es mayor que 1

Grafiquemos los diez primeros eventos

In [236]:
import itertools
pinomap = folium.Map(location=[dfn1['latitude_1'][0], dfn1['longitude_1'][0]], zoom_start=10)

for row in itertools.islice(dfn1.itertuples(), 5): # iterating over the first 5 elements only
    (
        folium.Marker(
            location=[
                row.latitude_1,
                row.longitude_1],
            tooltip = f'{row.first_name} {row.last_name}. Type of violence: {row.method}'
        ).add_to(pinomap)
    )
pinomap

Podemos mejorar este gráfico de la siguiente manera
- Añadir un buscador por nombre
- Diferenciar eventos por el atributo `method`
- Añadir un texto descriptivo

### Añadir un buscador por nombre

Esto sólo funciona si nuestro dataframe contiene una propiedad geométrica, la cual es posible obtener usando `geopandas`.

In [232]:
# ToDo

In [233]:

# add a searchbox bound to the Description of lines
# statesearch = Search(
#     layer=pinomap,
#     geom_type='Point',
#     placeholder='Escribe un nombre',
#     collapsed=False,
#     search_label='full_name',
#     weight=3
# ).add_to(pinomap)



### Añadir un texto descriptivo

La idea es que el _hover_ muestre un texto más descriptivo, por lo que nos ayudaremos con funciones auxiliares.

In [None]:
import typing as T

def row_desc_mapper(row: T.NamedTuple) -> str:
    raise NotImplemented("To Do")

### Modificar el ícono

In [231]:
import typing as T

def row_icon_mapper(row: T.NamedTuple) -> folium.Icon:
    raise NotImplemented("To Do")