# Predicting Terrorist Attacks
## Objective
To build a classifier that can predict the group responsible for individual terrorist attacks around the world.
## Introduction
The classifier is trained using the [Global Terrorism Database (GTD)](http://apps.start.umd.edu/gtd/downloads/dataset/Codebook.pdf). The dataset consists of nearly 200,000 terrorist attacks including bombings, assassinations, and kidnappings that have occured since 1970. Each of these attacks includes information on over 45 variables including location, type of weapon used, and nationality of the perpetrator.

This classifer is focused on using information about an attack to predict the responsible group.

## Preparing the Data
Before training, the data must be prepared by selecting certain features to remove or augment. The following features from the dataset are kept (reference the [GTD](http://apps.start.umd.edu/gtd/downloads/dataset/Codebook.pdf) for more information on each feature):
- <b>Numerical Features (Bucketized)</b><br/>
nkill, nkillus, nkillter, nwound, nwoundus, nperps

- <b>Categorical Features</b><br/>
crit1, crit2, crit3, attacktype1, attacktype2, attacktype3, weaptype1, weapsubtype1, weaptype2, weapsubtype2, weaptype3, weapsubtype3, weaptype4, weapsubtype4, natlty1, natlty2, natlty3, targtype1, targtype2, targtype3, claimed, doubtterr, country, multiple, success, suicide<br/>

- <b>Label</b><br/>
gnome


In [1]:
# make necessary imports
import tensorflow as tf
import pandas as pd
import xlrd
import csv

In [2]:
# define global variables
columns_to_use = ['country', 'crit1', 'crit2', 'crit3', 'doubtterr', 'multiple', 'success', 'suicide', 'attacktype1', 'attacktype2', 'attacktype3', 'targtype1', 'natlty1', 'targtype2', 'natlty2', 'targtype3', 'natlty3', 'nperps', 'claimed', 'weaptype1', 'weapsubtype1', 'weaptype2', 'weapsubtype2', 'weaptype3', 'weapsubtype3', 'weaptype4', 'weapsubtype4', 'nkill', 'nkillus', 'nkillter', 'nwound', 'nwoundus', 'gname']
input_file = 'data/gtd1993_0617dist.xlsx'
csv_file = 'data/gtd1993_0617dist.csv'

In [3]:
# convert the excel data file to a csv
wb = xlrd.open_workbook(input_file)
sh = wb.sheets()[0]
output_csv = open(csv_file, 'w')
wr = csv.writer(output_csv, quoting=csv.QUOTE_ALL)

for rownum in range(sh.nrows):
    wr.writerow(sh.row_values(rownum))

output_csv.close()

## Constructing the Deep Neural Network
Now the neural network must be constructed. It is a deep neural network that uses a mix of numerical, bucketized columns and categorical columns.

In [4]:
# initialize the numerical columns
num_killed = tf.feature_column.numeric_column(key='nkill')
num_wounded = tf.feature_column.numeric_column(key='nwound')
num_us_killed = tf.feature_column.numeric_column(key='nkillus')
num_us_wounded = tf.feature_column.numeric_column(key='nwoundus')
num_perps = tf.feature_column.numeric_column(key='nperps')
num_perps_killed = tf.feature_column.numeric_column(key='nkillter')

In [5]:
# bucketize each of the numerical columns
num_killed = tf.feature_column.bucketized_column(source_column=num_killed, boundaries=[5, 25, 50])
num_wounded = tf.feature_column.bucketized_column(source_column=num_wounded, boundaries=[5, 25, 50])
num_us_killed = tf.feature_column.bucketized_column(source_column=num_us_killed, boundaries=[5, 25, 50])
num_us_wounded = tf.feature_column.bucketized_column(source_column=num_us_wounded, boundaries=[5, 25, 50])
num_perps = tf.feature_column.bucketized_column(source_column=num_perps, boundaries=[1, 3])
num_perp_killed = tf.feature_column.bucketized_column(source_column=num_perps_killed, boundaries=[1, 3])

In [6]:
# initialize the categorical columns
terror_criteria1 = tf.feature_column.categorical_column_with_identity(key='crit1', num_buckets=2)
terror_criteria2 = tf.feature_column.categorical_column_with_identity(key='crit2', num_buckets=2)
terror_criteria3 = tf.feature_column.categorical_column_with_identity(key='crit3', num_buckets=2)
attack_type1 = tf.feature_column.categorical_column_with_identity(key='attacktype1', num_buckets=9)
attack_type2 = tf.feature_column.categorical_column_with_identity(key='attacktype2', num_buckets=9)
attack_type3 = tf.feature_column.categorical_column_with_identity(key='attacktype3', num_buckets=9)
weapon_type1 = tf.feature_column.categorical_column_with_identity(key='weaptype1', num_buckets=13)
weapon_subtype1 = tf.feature_column.categorical_column_with_identity(key='weapsubtype1', num_buckets=30)
weapon_type2 = tf.feature_column.categorical_column_with_identity(key='weaptype2', num_buckets=13)
weapon_subtype2 = tf.feature_column.categorical_column_with_identity(key='weapsubtype2', num_buckets=30)
weapon_type3 = tf.feature_column.categorical_column_with_identity(key='weaptype3', num_buckets=13)
weapon_subtype3 = tf.feature_column.categorical_column_with_identity(key='weapsubtype3', num_buckets=30)
weapon_type4 = tf.feature_column.categorical_column_with_identity(key='weaptype4', num_buckets=13)
weapon_subtype4 = tf.feature_column.categorical_column_with_identity(key='weapsubtype4', num_buckets=30)
target_nationality1 = tf.feature_column.categorical_column_with_identity(key='natlty1', num_buckets=10004)
target_nationality2 = tf.feature_column.categorical_column_with_identity(key='natlty2', num_buckets=10004)
target_nationality3 = tf.feature_column.categorical_column_with_identity(key='natlty3', num_buckets=10004)
target_type1 = tf.feature_column.categorical_column_with_identity(key='targtype1', num_buckets=22)
target_type2 = tf.feature_column.categorical_column_with_identity(key='targtype2', num_buckets=22)
target_type3 = tf.feature_column.categorical_column_with_identity(key='targtype3', num_buckets=22)
responsibility_claimed = tf.feature_column.categorical_column_with_identity(key='claimed', num_buckets=2)
terrorism_doubt = tf.feature_column.categorical_column_with_identity(key='doubtterr', num_buckets=2)
country_occurred = tf.feature_column.categorical_column_with_identity(key='country', num_buckets=1004)
multiple_incidents = tf.feature_column.categorical_column_with_identity(key='multiple', num_buckets=2)
was_successful = tf.feature_column.categorical_column_with_identity(key='success', num_buckets=2)
was_suicide = tf.feature_column.categorical_column_with_identity(key='suicide', num_buckets=2)

In [7]:
# convert each of the categorical columns to indicator functions
terror_criteria1 = tf.feature_column.indicator_column(terror_criteria1)
terror_criteria2 = tf.feature_column.indicator_column(terror_criteria2)
terror_criteria3 = tf.feature_column.indicator_column(terror_criteria3)
attack_type1 = tf.feature_column.indicator_column(attack_type1)
attack_type2 = tf.feature_column.indicator_column(attack_type2)
attack_type3 = tf.feature_column.indicator_column(attack_type3)
weapon_type1 = tf.feature_column.indicator_column(weapon_type1)
weapon_subtype1 = tf.feature_column.indicator_column(weapon_subtype1)
weapon_type2 = tf.feature_column.indicator_column(weapon_type2)
weapon_subtype2 = tf.feature_column.indicator_column(weapon_subtype2)
weapon_type3 = tf.feature_column.indicator_column(weapon_type3)
weapon_subtype3 = tf.feature_column.indicator_column(weapon_subtype3)
weapon_type4 = tf.feature_column.indicator_column(weapon_type4)
weapon_subtype4 = tf.feature_column.indicator_column(weapon_subtype4)
target_nationality1 = tf.feature_column.indicator_column(target_nationality1)
target_nationality2 = tf.feature_column.indicator_column(target_nationality2)
target_nationality3 = tf.feature_column.indicator_column(target_nationality3)
target_type1 = tf.feature_column.indicator_column(target_type1)
target_type2 = tf.feature_column.indicator_column(target_type2)
target_type3 = tf.feature_column.indicator_column(target_type3)
responsibility_claimed = tf.feature_column.indicator_column(responsibility_claimed)
terrorism_doubt = tf.feature_column.indicator_column(terrorism_doubt)
country_occurred = tf.feature_column.indicator_column(country_occurred)
multiple_incidents = tf.feature_column.indicator_column(multiple_incidents)
was_successful = tf.feature_column.indicator_column(was_successful)
was_suicide = tf.feature_column.indicator_column(was_suicide)

In [8]:
# construct a list of each of the deep columns
deep_columns = [
    num_killed,
    num_us_killed,
    num_us_wounded,
    num_perps,
    num_perp_killed,
    terror_criteria1,
    terror_criteria2,
    terror_criteria3,
    attack_type1,
    attack_type2,
    attack_type3,
    weapon_type1,
    weapon_subtype1,
    weapon_type2,
    weapon_subtype2,
    weapon_type3,
    weapon_subtype3,
    weapon_type4,
    weapon_subtype4,
    target_nationality1,
    target_nationality2,
    target_nationality3,
    target_type1,
    target_type2,
    target_type3,
    responsibility_claimed,
    terrorism_doubt,
    country_occurred,
    multiple_incidents,
    was_successful,
    was_suicide
]

In [9]:
# initializing the DNN
dnn = tf.estimator.DNNClassifier(
    n_classes=100,
    feature_columns=deep_columns,
    hidden_units=[100, 50],
    activation_fn="relu",
    dropout=None)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/xm/l9l7xyqx2qs1f_7yfkshv9v40000gn/T/tmpyx_9221j', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x11fe7b518>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


## Training the Network
Now the network must be trained on the GTD data from 1993.

In [10]:
# a function to load in the training data
def load_data(label_name):
    train = pd.read_csv(csv_file, usecols=columns_to_use)
    train_x, train_y = train, train.pop(label_name)
    
    # replace all NaN values with -1
    train_x.fillna(-1, inplace=True)
    
    # convert the features into a tensor
    train_final_x = {}
    for key in train_x:
        train_final_x[key] = tf.convert_to_tensor(train_x[key].astype(int))
    
    # return the features and labels as tensors
    return train_final_x, tf.convert_to_tensor(train_y)
    
training_data = load_data(label_name='gname')

In [11]:
# train the network on the training set
dnn.train(input_fn=lambda: training_data)

INFO:tensorflow:Calling model_fn.


TypeError: 'str' object is not callable

## TODO
- K-fold splitting of the data
- Testing out which epoch and batch sizes are optimal
- Graph accuracy over epoch count of training set and test set