# Team 5 - Global Terrorism


### What is your data about?
Information on 180,000 terrorist attacks
Around the world from 1970 through 2017. 
- It includes systematic data on domestic as well as international terrorist. 
- There are no plots, conspiracies and unsuccessful attacks. 

### Target Audience: 
- Business Invester who want to find a safe place to invest their money without the fear about terrorist attacks.

### Data details
- Incidents location
- Date and time the incident happened
- Incident information
  - Attack type
  - Attack target
  - Group of Perpetrator 
  - Incident Motive

### Evaluation metrics
Based on 2 metrics
- Casualties: a combination of killed and wounded victims
- Damaged Property value: the estimation of damaged property by USD.

### Hypothesis:
- Which is the most countries was aim by terrorism?
- Where is the most dangerous place to to stay in terms of terrorism?
- Where is the place we can get least damage from a terrorist attack?




### Exploration ideas:
- The rise of all terrorism around the world
- Top 10 nationality of target victim and a casuality 
- Top 10 Countries that most affected by terrorism
- Top 10 terrorism Activities by Type of attack and target
- What is the common motive of all kind terrorism in the world
- Top 10 Terrorist Groups with Highest Terror Attacks
- Growth of damaged property value and casuality of US
- Which US target that was attacked mostly?




# Import data

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

sns.set_style("whitegrid")

  import pandas.util.testing as tm


In [0]:
from google.colab import drive
drive.mount('/content/drive')

In [0]:
# Fill the blanks
terr = pd.read_csv('drive/My Drive/FTMLE - Tonga/Week_3/assignments/datasets/05-global-terrorism/terrorism.csv',encoding='latin-1')


In [0]:
# Show a summary of the data
terr.info()


In [0]:
# Show a sample
terr.sample(10)

# Clean data

### Check data duplication

In [0]:
# Check if ID column is unique
terr['eventid'].nunique() == terr.shape[0]

### Rename used columns

In [0]:
# Rename used columns
terr.rename(columns={'iyear':'Year','imonth':'Month','propvalue':'DamagedPropertyValue',
                       'iday':'Day','country_txt':'Country','latitude':'Latitude','longitude':'Longitude',
                       'region_txt':'Region','attacktype1_txt':'AttackType','city':'City',
                       'target1':'Target','nkill':'Killed','nkillus':'KilledUS',
                       'nwound':'Wounded','nwoundus':'WoundedUS','eventid':'ID',
                       'gname':'Group','targtype1_txt':'TargetType','natlty1_txt':'TargetNationality',
                       'weaptype1_txt':'WeaponType','motive':'Motive'},inplace=True)


### Get used columns

In [0]:
# Get columns
terr=terr[['Year','Month','Day','Country',
           'Region','City','WoundedUS',
           'KilledUS','AttackType','ID','Latitude','Longitude',
           'Killed','Wounded','Target','DamagedPropertyValue',
           'Group','TargetType','WeaponType',
           'Motive','TargetNationality']]


In [0]:
# Drop data before 1997
terr.drop(labels=terr[terr['Year'] < 1997].index, axis=0, inplace=True)


In [0]:
# Show a summary of the data 
terr.info()

### Manipulate **data**

In [0]:
# Check for missing data
terr.isnull().sum()


In [0]:
# Fill NaN to city, Target, Summary, Motive
terr['City'].fillna("Unknown", inplace=True)
terr['Target'].fillna("Unknown", inplace=True)
terr['Region'] = terr['Region'].astype('category')
terr['Motive'].fillna("Unknown", inplace=True)

In [0]:
# Fill NaN to 0 for Killed and Wounded
terr['Killed'].fillna(0, inplace=True)
terr['Killed'] = terr['Killed'].astype(int) 
terr['Wounded'].fillna(0, inplace=True)
terr['Wounded'] = terr['Wounded'].astype(int) 
terr['KilledUS'].fillna(0, inplace=True)
terr['KilledUS'] = terr['KilledUS'].astype(int)
terr['WoundedUS'].fillna(0, inplace=True)
terr['WoundedUS'] = terr['WoundedUS'].astype(int)


In [0]:
# Fill NaN to Nationality False
terr['TargetNationality'].fillna("No data", inplace=True)
terr['TargetNationality'] = terr['TargetNationality'].astype('category')

# File NaN to Property Damage
terr['DamagedPropertyValue'].fillna(0, inplace=True)
terr['DamagedPropertyValue'] = terr['DamagedPropertyValue'].apply(lambda x: 0 if x < 0 else x)


In [0]:
# Change data type
terr['Country'] = terr['Country'].astype('category')
terr['City'] = terr['Country'].astype('category')
terr['Group'] = terr['Group'].astype('category')
terr['WeaponType'] = terr['WeaponType'].astype('category')
terr['Target'] = terr['Target'].astype('category')


In [0]:
terr[terr['Latitude'].isnull()]

In [0]:
# Check for missing data
terr.info()


#### Add casualties column

In [0]:
# Add casualities
terr['Casualities']= terr['Killed']+terr['Wounded']


In [0]:
IsSuccessfull = (terr['Casualities'] > 0) + (terr['DamagedPropertyValue'] > 0)
IsSuccessfull.value_counts()
terr['IsSuccessfull'] = IsSuccessfull

In [0]:
terr.groupby('Country')['Casualities'].mean().sort_values().head(40)

In [0]:
# Create new ID
terr.drop(columns=["ID"], inplace=True)
terr.reset_index(drop=True, inplace=True)


In [0]:
terr.sample(1)

In [0]:
# Export to csv file
terr.to_csv('/content/sample_data/clean_terrorism.csv')

## Exploration data

### The rise of all terrorism around the world

In [0]:
# number of terrorism around the world from 1997 to 2017 
plt.figure(figsize=(15,6))
ax = sns.countplot(x='Year', data=terr, color='red')
ax.set_xticklabels(ax.get_xticklabels(), rotation=0, ha="center")
plt.tight_layout()
plt.show()

### Top 10 nationality of target victim and a casuality

In [0]:
# Select 10 nationalities have most victim
top10_casuality = terr.groupby('TargetNationality').sum().sort_values('Casualities', ascending = False).head(10)['Casualities']
# Plot
top10_casuality.plot.bar(color = "red",figsize = (20,6))
plt.ylabel('Casuality per person')
plt.title('10 nationalities have most victims')
plt.show()

### Top 10 Countries that most affected by terrorism

In [0]:
#top 10 Countries that most affected by terrorism
top10country = terr.groupby('Country', as_index=False)['DamagedPropertyValue'].sum().sort_values('DamagedPropertyValue', ascending=False).head(10)
top10country['DamagedPropertyValue'] = top10country['DamagedPropertyValue'] // 1000000
top10country['DamagedPropertyValue'] = top10country['DamagedPropertyValue'].apply(np.ceil)
ax = top10country.plot(kind='bar', colors='red',figsize=(15,6))

ax.set_xticklabels(top10country['Country'], rotation=90, ha="center")
ax.set_ylabel("Millions")
ax.set_yticklabels(np.arange(0,450,50))
plt.tight_layout()
plt.show()

### Top 10 terrorism Activities by Attack Type and target


In [0]:
# Display the number of case for each target type
plt.figure(figsize=(20,6))
terr['TargetType'].value_counts().sort_values().head(10).plot.barh(color="red")
plt.xlabel('Cases')
plt.ylabel('Type')
plt.title('Target type of terrorism')
plt.show()

In [0]:
# number of all AttackType for terrorism
plt.figure(figsize=(15,6))
ax = sns.countplot(x="AttackType", data=terr, color='red')
ax.set_xticklabels(ax.get_xticklabels(), rotation=30, ha="center")
plt.tight_layout()
plt.show()

### What is the common motive of all kind terrorism in the world
#### Simple word filter

In [0]:
import re
import wordcloud


# Pick out text from "Motive" column
motive_text = terr['Motive'].tolist()

# Keep only alphabet characters
regex = re.compile('[^A-Za-z0-9\s]+')
whole_text = ' '.join(motive_text)
whole_text = re.sub(regex, '', whole_text)
whole_text

In [0]:
def calculate_frequencies(file_contents):
    # List of uninteresting words
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "may","has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just", \
    "in", "not", "for", "should", "would", "so", "shall", "on", "thou", "thee", "thy", "than", "s","d", "o", "ll", "unknown","specific", \
    "motive","attack","however","noted","claimed","stated","sources","noted","targeted","incident","carried","responsibility","out",\
    "part","violence","because","larger","part","larger","victims","between","trend","against","suspected","members","forces","speculated",\
    "attacks","group","related","security","also","recent","victim","response","occured","minority","state","accused","government"]

    result = {}

    whole_text_words = whole_text.split()
    words_filter = [word for word in whole_text_words if word.lower() not in uninteresting_words]
    for i in words_filter:
      if i not in result:
        result[i] = 1
      else:
        result[i] += 1
    #wordcloud
    cloud = wordcloud.WordCloud(scale=5)
    cloud.generate_from_frequencies(result)
    return cloud.to_array()

In [0]:
# Display wordcloud image
myimage = calculate_frequencies(whole_text)
plt.figure(figsize = (15,8))
plt.imshow(myimage, interpolation = 'nearest', aspect='auto')

plt.axis('off')
plt.show()

#### NLP Word filter

In [0]:
# Instal dependencies
!pip install nltk
import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
 

In [0]:
import matplotlib.pyplot as plt
import pandas as pd
import string
from matplotlib import rcParams
from nltk import WordNetLemmatizer

from wordcloud import WordCloud, STOPWORDS
from nltk.corpus import stopwords
from nltk import pos_tag, sent_tokenize, word_tokenize, BigramAssocMeasures,\
    BigramCollocationFinder, TrigramAssocMeasures, TrigramCollocationFinder
from subprocess import check_output


In [0]:
def get_bitrigrams(full_text, threshold=30):
    if isinstance(full_text, str):
        text = full_text
    else:
        text = " ".join(full_text)
    bigram_measures = BigramAssocMeasures()
    finder = BigramCollocationFinder.from_words(text.split())
    finder.apply_freq_filter(3)
    bigrams = {" ".join(words): "_".join(words)
               for words in finder.above_score(bigram_measures.likelihood_ratio, threshold)}
    return bigrams


def replace_bitrigrams(text, bigrams):
    if isinstance(text, str):
        texts = [text]
    else:
        texts = text
    new_texts = []
    for t in texts:
        t_new = t
        for k, v in bigrams.items():
            t_new = t_new.replace(" " + k + " ", " " + v + " ")
        new_texts.append(t_new)
    if len(new_texts) == 1:
        return new_texts[0]
    else:
        return new_texts


def process_text(text, lemmatizer, translate_table, stopwords):
    processed_text = ""
    for sentence in sent_tokenize(text):
        tagged_sentence = pos_tag(word_tokenize(sentence.translate(translate_table)))
        for word, tag in tagged_sentence:
            word = word.lower()
            if word not in stopwords:
                if tag[0] != 'V':
                    processed_text += lemmatizer.lemmatize(word) + " "
    return processed_text


def get_all_processed_texts(texts, lemmatizer, translate_table, stopwords):
    processed_texts = []
    for index, doc in enumerate(texts):
        processed_texts.append(process_text(doc, wordnet_lemmatizer, translate_table, stop))
    bigrams = get_bitrigrams(processed_texts)
    very_processed_texts = replace_bitrigrams(processed_texts, bigrams)
    return " ".join(very_processed_texts)


In [0]:
wordnet_lemmatizer = WordNetLemmatizer()
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "may","has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just", \
    "in", "not", "for", "should", "would", "so", "shall", "on", "thou", "thee", "thy", "than", "s","d", "o", "ll", "unknown","specific", \
    "motive","attack","however","noted","claimed","stated","sources","noted","targeted","incident","carried","responsibility","out",\
    "part","violence","because","larger","part","larger","victims","between","trend","against","suspected","members","forces","speculated",\
    "attacks","group","related","security","also","recent","victim","response","occured","minority","state","accused","government"]
stop = uninteresting_words
stop.extend(stopwords.words('english'))
stop = set(stop)
translate_table = dict((ord(char), " ") for char in string.punctuation)


In [0]:
motive = terr[(terr['Motive'] != 'Unknown') & (terr['Motive'] != 'The specific motive for the attack is unknown.') 
          & (terr['Motive'] != 'The specific motive for the attack is unknown..')
          & (terr['Motive'] != 'The specific motive for the attack is unknown or was not reported.')
          & (terr['Motive'] != 'The specific motive for the attack is unknown')]
motive = motive['Motive']
motive.value_counts()
print(f"Run in {len(motive.tolist())}")


In [0]:
def use_ngrams_only(texts, lemmatizer, translate_table, stopwords):
    processed_texts = []
    for index, doc in enumerate(texts):
        processed_texts.append(process_text(doc, wordnet_lemmatizer, translate_table, stop))
    bigrams = get_bitrigrams(processed_texts)
    indexed_texts = []
    for doc in processed_texts:
        current_doc = []
        for k, v in bigrams.items():
            current_doc += [v] * doc.count(" " + k + " ")
        indexed_texts.append(" ".join(current_doc))
    return " ".join(indexed_texts)

In [0]:
wordcloud = WordCloud(stopwords=STOPWORDS,width=1200, height=600,scale=5,collocations=False,max_words=100).\
    generate(use_ngrams_only(motive.tolist(), wordnet_lemmatizer, translate_table, stop))
plt.figure(figsize = (15,8),facecolor='k')
plt.imshow(wordcloud.to_array(), interpolation = 'nearest', aspect='auto')
plt.axis('off')
plt.tight_layout(pad=0)
plt.show()




### Top 10 Terrorist Groups with Highest Terror Attacks

In [0]:
# Top 10 Terrorist groups with highest Terror Attacks

terr_group = terr["Group"].value_counts().head(11)
terr_group = terr_group[1:]
#Top 10 Terrorist groups with highest Terror Attacks To US
us_terr_incidents = terr[terr['TargetNationality'] == 'United States']
terr_group_us = us_terr_incidents["Group"].value_counts().head(11)
terr_group_us = terr_group_us[1:]
# Plot
plt.figure(figsize=(15,15))
plt.subplot(2,1,1)
terr_group.plot(kind = "bar", color = "red")
plt.title("Top 10 groups with highest attacks")

plt.subplot(2,1,2)

terr_group_us.plot(kind = "bar", color = "red")
plt.title("Top 10 groups with highest attacks to US")
plt.tight_layout()

### Growth of damaged property value and casuality of US




In [0]:
# Find damaged property value of US
us_terr_incidents = terr[terr['TargetNationality'] == 'United States']
us_property_damage = us_terr_incidents.groupby('Year', as_index=False)['DamagedPropertyValue'].sum()
us_property_damage['DamagedPropertyValue'] = us_property_damage['DamagedPropertyValue'].divide(1e6).apply(np.ceil)

# Find US Casuality
us_casuality = us_terr_incidents.groupby('Year', as_index=False)['Casualities'].sum()['Casualities']


# Draw plot
years = us_property_damage['Year'].tolist()
us_pd = us_property_damage['DamagedPropertyValue'].tolist()

fig, ax = plt.subplots(2,1,figsize = (15,12))
ax[0].bar(years,us_pd,color = "red")
ax[0].set(xlabel='Year',ylabel='Millions',title='Damaged Property Value by year',xticks=years)
ax[0].set_xticklabels(years, rotation=90, ha="center")
ax[0].grid(False)
ax[1].bar(years,us_casuality,color = "red")
ax[1].set(xlabel='Year',ylabel='Person',title='Casualities by year',xticks=years)
ax[1].set_xticklabels(years, rotation=90, ha="center")
ax[1].grid(False)
plt.subplots_adjust(hspace = 0.5)


plt.show()

### Which US target that was attacked mostly?





In [0]:
# Find a target based on attack times
pd_and_at_map = us_terr_incidents.groupby('TargetType',axis=0)['AttackType'].value_counts().unstack().fillna(0)
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(pd_and_at_map, annot=True, fmt=".1f",cmap="Reds")
plt.title("The most target and attack type based on incidents")
plt.show()

In [0]:
 # Find a target based on Damage Property value
pd_per_at = us_terr_incidents.groupby(['TargetType','AttackType'],axis=0, as_index=False)['DamagedPropertyValue'].sum()
pd_per_at.sort_values('DamagedPropertyValue', inplace=True, ascending=False)
pd_per_at['DamagedPropertyValue'] = pd_per_at['DamagedPropertyValue'].divide(1e6).apply(np.ceil)
p_d_per_at = pd_per_at.pivot(index='TargetType',columns='AttackType',values='DamagedPropertyValue').fillna(0)
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(p_d_per_at, annot=True, fmt=".1f",cmap="Reds")
plt.title("The most target and attack type based on damaged property value")
plt.show()

In [0]:
# Find a target based on casuality
c_per_at = us_terr_incidents.groupby(['TargetType','AttackType'],axis=0, as_index=False)['Casualities'].sum()
c_per_at.sort_values('Casualities', inplace=True, ascending=False)
p_c_per_at = c_per_at.pivot(index='TargetType',columns='AttackType',values='Casualities').fillna(0)
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(p_c_per_at, annot=True, fmt=".0f",cmap="Reds")
plt.title("The most target and attack type based on casuality")
plt.show()

## Visualizing Geo Data

In [0]:
# Instal dependencies
!pip install git+git://github.com/geopandas/geopandas.git
!apt install proj-bin libproj-dev libgeos-dev
!pip install git+git://github.com/ResidentMario/geoplot.git

In [0]:
# Import libraries
import geopandas as gpd
import geoplot as gplt

In [0]:
# Import Geo Data
geo_data = gpd.read_file('/content/drive/My Drive/FTMLE - Tonga/Data/geo_data/ne_10m_admin_0_countries.shp')

In [0]:
geo_data['CONTINENT'].unique()

In [0]:
# Select only relavant data i.e. Country & geometry
# geo_data = geo_data[['SOVEREIGNT', 'geometry']]
geo_data = geo_data[(geo_data['CONTINENT'] == 'Asia')][['SOVEREIGNT', 'geometry']]
geo_data.columns = ['country', 'geometry']

In [0]:
geo_data['country'].unique()

In [0]:
# Create terrorist data
terr_data = terr


In [0]:

# Make sure that the Country columns in two datasets are matching3
for c in terr_data['Country'].value_counts().index:
  if c not in geo_data['country'].value_counts().index:
    print(c)


In [0]:
# Replace unmatched values
replace_country = {'United States': 'United States of America',
                   'Northern Ireland': 'United Kingdom',
                   'Great Britain': 'United Kingdom',
                   'Macau':'China',
                   'Hong Kong':'China',
                   'St. Lucia': 'Saint Lucia',
                   'Czech Republic': 'Czechia'}


terr_data['Country'].replace(replace_country, inplace = True)

terr_data = terr_data[terr_data['Country'].isin(geo_data['country'].value_counts().index)]
terr_data['Region'].unique()
terr_data = terr_data[terr_data['Region'] == 'Middle East & North Africa']


In [0]:
 # Get incidents per country
#  plot_data = terr_data.groupby('Country')[['ID']].count()
#  plot_data.columns = ['IncidentCount']
print(plot_data)

In [0]:
# Merge two datasets to add geometry info
# plot_data = plot_data.merge(geo_data, how= 'right', left_index = True, right_on = 'country')
# plot_data.dropna(inplace=True)


In [0]:
# Convert the Pandas DataFrame to a GeoPandas DataFrame
plot_data = gpd.GeoDataFrame(plot_data, geometry = 'geometry')

In [0]:
# Plot 
gplt.choropleth(plot_data, hue = 'IncidentCount', cmap = 'Reds', figsize = (10,10), legend=True)


#Add data label to the map
plot_data['coords'] = plot_data['geometry'].apply(lambda x: x.representative_point().coords[:])
plot_data['coords'] = [coords[0] for coords in plot_data['coords']]
top_5_data = plot_data.head(5)
for _, data in top_5_data.iterrows():

  plt.text(x = data['coords'][0], y = data['coords'][1], 
           s = data['country'], ha = 'center', color = 'red')
  plt.text(x = data['coords'][0], y = data['coords'][1] - 2, 
          s = f"Incident: {data['IncidentCount']:.2f}", ha = 'center', color = 'red')
