# Preparación de los datos de partida
La aplicación de visualización tomará datos de un cluster de mongoDB, por lo cual se realiza una preparación inicial de los datos pertenecientes al dataset de ataques terroristas a nivel mundial, junto con su inserción en la base de datos.
1. Selección de los datos para el período de tiempo objetivo (último año disponible). Por motivos de espacio sólo se utiliza el último año disponible.
2. Tratamiento de los datos.
3. Inserción en el cluster de mongoDB.
--------------------------------------------------------------------------------------------------
Eduardo Bustos Miranda, 02/05/2018

In [65]:
import pandas as pd
import numpy as np
from pymongo import MongoClient
import pprint

## 1. Selección de datos objetivo
La selección y exportación de los datos del 2015 no se especifica para no adjuntar el csv original (gran volumen)

## 2. Tratamiento de los datos
Se realiza una limpieza básica de datos, destinada a almacenar los datos de interés.

In [32]:
df = pd.read_csv('terrorismDB.csv', encoding='ISO-8859-1')

  interactivity=interactivity, compiler=compiler, result=result)


In [33]:
# Se muestran los priemros registros para ver la composición del dataset.
df.head()

Unnamed: 0.1,Unnamed: 0,eventid,iyear,imonth,iday,approxdate,extended,resolution,country,country_txt,...,addnotes,scite1,scite2,scite3,dbsource,INT_LOG,INT_IDEO,INT_MISC,INT_ANY,related
0,140969,201412030034,2015,1,3,2015-01-03 00:00:00,0,,95,Iraq,...,,"""Iraq: Roundup of Security Incidents 31 Decemb...",,,START Primary Collection,-9,-9,0,-9,
1,141657,201412220095,2015,1,1,,0,,28,Bosnia-Herzegovina,...,,"""Bosnian imam attacked 7 times over call to st...","""Attacks Silence Bosnian Imam Who Took On Isla...","""Can Moderate Bosnian Voices Avert The Danger ...",START Primary Collection,-9,-9,0,-9,
2,142009,201501010001,2015,1,1,,0,,95,Iraq,...,,"""Iraq: Roundup of Security Incidents 31 Decemb...",,,START Primary Collection,-9,-9,0,-9,
3,142010,201501010002,2015,1,1,,0,,198,Sweden,...,,"""BBC News - Sweden protest after three mosque ...","""Sweden Hit by Third Mosque Arson Attack Since...","""Swedish police investigate third mosque arson...",START Primary Collection,-9,-9,0,-9,
4,142011,201501010003,2015,1,1,2015-01-01 00:00:00,0,,113,Libya,...,,"""Libya: Open Source Highlights 1 January 2015,...",,,START Primary Collection,0,0,0,0,


In [34]:
# Se muestran las columnas originales
print('Columnas del dataset:')
for i in df.columns:
    print("\t{}".format(i))

Columnas del dataset:
	Unnamed: 0
	eventid
	iyear
	imonth
	iday
	approxdate
	extended
	resolution
	country
	country_txt
	region
	region_txt
	provstate
	city
	latitude
	longitude
	specificity
	vicinity
	location
	summary
	crit1
	crit2
	crit3
	doubtterr
	alternative
	alternative_txt
	multiple
	success
	suicide
	attacktype1
	attacktype1_txt
	attacktype2
	attacktype2_txt
	attacktype3
	attacktype3_txt
	targtype1
	targtype1_txt
	targsubtype1
	targsubtype1_txt
	corp1
	target1
	natlty1
	natlty1_txt
	targtype2
	targtype2_txt
	targsubtype2
	targsubtype2_txt
	corp2
	target2
	natlty2
	natlty2_txt
	targtype3
	targtype3_txt
	targsubtype3
	targsubtype3_txt
	corp3
	target3
	natlty3
	natlty3_txt
	gname
	gsubname
	gname2
	gsubname2
	gname3
	gsubname3
	motive
	guncertain1
	guncertain2
	guncertain3
	individual
	nperps
	nperpcap
	claimed
	claimmode
	claimmode_txt
	claim2
	claimmode2
	claimmode2_txt
	claim3
	claimmode3
	claimmode3_txt
	compclaim
	weaptype1
	weaptype1_txt
	weapsubtype1
	weapsubtype1_txt
	wea

In [46]:
# Selección de las columnas de interés
columnasInteres = ['eventid', 'iyear', 'imonth', 'iday', 'approxdate', 'country_txt', 'region_txt', 'city', 'latitude', 'longitude', 'scite1', 'attacktype1_txt', 'nkill']

In [47]:
df_interes.head()

Unnamed: 0,eventid,iyear,imonth,iday,approxdate,country_txt,region_txt,city,latitude,longitude,scite1,attacktype1_txt,nkill
0,201412030034,2015,1,3,2015-01-03 00:00:00,Iraq,Middle East & North Africa,Baghdad,33.349705,44.514869,"""Iraq: Roundup of Security Incidents 31 Decemb...",Bombing/Explosion,2.0
1,201412220095,2015,1,1,,Bosnia-Herzegovina,Eastern Europe,Trnovi,45.183961,15.828342,"""Bosnian imam attacked 7 times over call to st...",Armed Assault,0.0
2,201501010001,2015,1,1,,Iraq,Middle East & North Africa,Baghdad,33.341992,44.276368,"""Iraq: Roundup of Security Incidents 31 Decemb...",Bombing/Explosion,1.0
3,201501010002,2015,1,1,,Sweden,Western Europe,Uppsala,59.857979,17.639822,"""BBC News - Sweden protest after three mosque ...",Facility/Infrastructure Attack,0.0
4,201501010003,2015,1,1,2015-01-01 00:00:00,Libya,Middle East & North Africa,Benghazi,32.116136,20.066488,"""Libya: Open Source Highlights 1 January 2015,...",Bombing/Explosion,


## 3. Inserción de los datos en el cluster de mongoDB
A partir de los datos de interés, se insertan en el cluster de mongoDB

In [112]:
# Conectamos a la base de datos
client = MongoClient('localhost', 27017)
db = client.terrorismDB

In [115]:
#Realizamos un test para comprobar el funcionamiento
collection = db.test
x = collection.find()

print('Test terrorismDB:')
for i in x:
    print(i)

Test terrorismDB:
{'_id': ObjectId('5ae9b8f95971ba82771359b9'), 'author': 'Eduardo'}


In [129]:
# Insertamos los datos en la base de datos
# Se crea una array para insertar
print('Leyendo los datos')
toInsert = []
for index, row in df_interes.iterrows():
    register = {}
    for col in columnasInteres:
        register[col] = row[col]
   
    toInsert.append(register)
    
# Se insertan
print('Insertando los datos')
collection = db.data
collection.insert_many(toInsert)

Leyendo los datos
Insertando los datos


<pymongo.results.InsertManyResult at 0x266e3166fc0>

In [137]:
# Se testea la inserción
print('Datos totales: {}'.format(collection.find().count()))

Datos totales: 0
