# ***Global Terrosism Dataset - Model Training***
---
*Martín García*

*David Melo*

*Juan Andrés Ruiz*

### **Environment**

In [None]:
import os
print(os.getcwd())
try:
    os.chdir('../../GlobalTerrorismAnalysis_ETL')
except FileNotFoundError:
    print("""
        Posiblemente ya ejecutaste este bloque dos o más veces o tal vez el directorio está incorrecto. 
        ¿Ya ejecutaste este bloque antes y funcionó? Recuerda no ejecutarlo de nuevo. 
        ¿Estás en el directorio incorrecto? Puedes cambiarlo. 
        Recuerda el directorio donde estás:
        """)
print(os.getcwd())

### **Libraries**

In [2]:
import pandas as pd
from src.database.db_operations import creating_engine
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

### **Reading the dataset**

In [3]:
engine = creating_engine()
query = 'SELECT * FROM fact_table'

11/03/2024 09:58:59 PM Engine created. You can now connect to the database.


In [4]:
df = pd.read_sql_query(query, engine)
df

Unnamed: 0,eventid,extended,multiple,success,suicide,nkill,property,ishostkid,nwound,id_location,id_date,id_attack,id_perpetrator,id_disorder
0,201707070031,0,0.0,1,0,1.0,0,0.0,0.0,1536Karachi,19700101,28153.05111999,201707070031Unknown0,5
1,201707070032,1,0.0,1,0,2.0,0,1.0,1.0,14711Sapele,19700101,614147.05111999,201707070032Unknown0,5
2,201707080002,0,0.0,1,0,4.0,1,0.0,7.0,6010Arish,19700101,3360.06111999,201707080002Unknown0,5
3,201707080003,0,0.0,1,0,0.0,0,0.0,1.0,1536Panjgur,19700101,314153.06111999,201707080003Unknown0,5
4,201707080012,1,0.0,1,0,20.0,0,1.0,0.0,9510Jurf al-Sakhar,19700101,61495.0131110,201707080012Asa'ib Ahl al-Haqq0,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88374,201707070019,0,1.0,1,0,1.0,999,0.0,5.0,2149Luhansk,19700101,31214.06111999,201707070019Unknown0,5
88375,201707070020,0,1.0,1,0,0.0,999,0.0,2.0,2149Luhansk,19700101,314214.06111999,201707070020Unknown0,5
88376,201707070021,0,0.0,1,0,0.0,1,0.0,0.0,20910Baykan district,19700101,32209.061110,201707070021Kurdistan Workers' Party (PKK)0,5
88377,201707070026,0,0.0,1,0,0.0,1,0.0,0.0,2171Dallas,19700101,714217.08111999,201707070026Anti-LGBT extremists0,5


### **AI model - Let's training**

In [5]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Lasso, LassoCV, LassoLars, LassoLarsIC, LassoLarsCV, Ridge, RidgeCV
from xgboost import XGBRegressor
from sklearn.metrics import r2_score

In [6]:
X = df.drop(['eventid', 'id_location', 'id_date', 'id_attack', 'id_perpetrator', 'id_disorder', 'nkill', 'property', 'multiple'], axis=1)
y = df['nkill']

models = {
    'Linear Regression': LinearRegression(),
    'Lasso': Lasso(),
    'Ridge': Ridge(),
    'RidgeCV': RidgeCV(),
    'LassoCV': LassoCV(), 
    'LassoLars': LassoLars(),
    'LassoLarsCV': LassoLarsCV(),
    'LassoLarsIC': LassoLarsIC(),
    'XGBRegressor': XGBRegressor(), #Hasta ahora la más precisa
}

def train_classification_model(model, X, y):
    """Train a classification model and return accuracy"""
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    model.fit(X_train, y_train)

    y_predict = model.predict(X_test)
    
    accuracy = r2_score(y_test, y_predict)
    
    return accuracy

for model_name, model in models.items():
    accuracy = train_classification_model(model, X, y)
    print(f'Model: {model_name}')
    print(f'Accuracy: {accuracy:.4f}')

Model: Linear Regression
Accuracy: 0.1510
Model: Lasso
Accuracy: 0.1262
Model: Ridge
Accuracy: 0.1510
Model: RidgeCV
Accuracy: 0.1510
Model: LassoCV
Accuracy: 0.1257
Model: LassoLars
Accuracy: 0.1262
Model: LassoLarsCV
Accuracy: 0.1510
Model: LassoLarsIC
Accuracy: 0.1510
Model: XGBRegressor
Accuracy: 0.1571


### **Conclusión**

Después de revisar la matriz de correlación (lease: 002_GlobalTerrorismEDA.ipynb), no podemos encontrar una fuerte y util correlación de muchos de los *facts*. Dado la precisión de los modelos entrenados anteriormente y el analisis de correlación, se decide que el modelo no se puede realizar.