## RAND Database of Worldwide Terrorism Incidents

The RAND Database of Worldwide Terrorism Incidents (RDWTI) is a compilation of data from 1968 through 2009.

This legacy RAND project developed and maintained a database of terrorism incidents stretching back to 1968, which provides comprehensive information on international and domestic terrorism. Over the years, many public and private sponsors have contributed to the maintenance of the RDWTI and its predecessors, the RAND Terrorism Chronology and the RAND-MIPT Terrorism Incident Database.

The data can be found here: https://www.rand.org/nsrd/projects/terrorism-incidents.html

With over 40,000 incidents of terrorism coded and detailed, the quality and completeness of the RDWTI was remarkable for its time. RAND staff conducted extensive research on candidate terrorist attacks, drawing on staff with regional expertise, relevant language skills, and in-country field work experience.

1. Please navigate to this website and download the data
2. Save the data as a csv file
3. Read the csv file into a pandas dataframe.

In [1]:
import pandas as pd

In [2]:
df_RAND = pd.read_csv("RAND_Database.csv", encoding='latin-1')

## Pandas groupby and get_group
Please review pandas groupby and get_group methods here: https://www.geeksforgeeks.org/python-pandas-dataframe-groupby/

These methods will be very helpful in completing your homeword!


Part 1: Please use the data, and provide evidence to answer the questions below. Please use visualizations/plots where appropriate to tell convey your evidence more completely.

1. What are the top 5 countries where terrorism incidents occured?

In [3]:
top5_countries = df_RAND['Country'].value_counts().head(5).index
top5_countries

Index(['Iraq', 'West Bank/Gaza', 'Afghanistan', 'Thailand', 'Colombia'], dtype='object')


2. In those top 5 countries, what were the common perpetrators, weapons and objectives?

In [4]:
df_top5 = df_RAND[df_RAND['Country'].isin(top5_countries)]
common_prep = df_top5['Perpetrator'].value_counts().head(6)
common_prep

Unknown                                          14694
Taliban                                            957
Revolutionary Armed Forces of Colombia (FARC)      608
Hamas (Islamic Resistance Movement)                453
National Liberation Army of Colombia (ELN)         273
Tanzim QaÕidat al-Jihad fi Bilad al-Rafidayn       208
Name: Perpetrator, dtype: int64

In [5]:
weapons = df_top5['Weapon'].value_counts().head(5)
weapons

Explosives                    8340
Firearms                      7580
Unknown                       1216
Remote-detonated explosive     898
Fire or Firebomb               460
Name: Weapon, dtype: int64

3. What were the top 10 most deadly attacks?

In [6]:
top10_fatalities = df_RAND.nlargest(10, 'Fatalities')
top10_fatalities

Unnamed: 0,Date,City,Country,Perpetrator,Weapon,Injuries,Fatalities,Description
12718,11-Sep-01,New York City,United States,Al Qaeda,Other,2261,2749,Hijacked American Airlines Flight 11 from Bost...
35403,14-Aug-07,Sinjar,Iraq,Unknown,Explosives,1500,500,Four truck bombs hit a poor rural area near th...
8062,11-Jan-98,,Algeria,Armed Islamic Group (GIA),Firearms,0,400,A group of 100 armed men opened fire on civili...
19357,01-Sep-04,Beslan,Russia,Riyad us-Saliheyn Martyrs' Brigade,Firearms,727,331,A group of thirty to thirty-five (sources vari...
4060,23-Jun-85,Montreal,Canada,Other,Explosives,0,329,CANADA. An Air-India Boeing 747 en route from...
6796,12-Mar-93,Bombay,India,Other,Explosives,1200,317,INDIA. Within three hours in the city of Bomb...
5393,21-Dec-88,Lockerbie,United Kingdom,Other,Explosives,12,270,"UNITED KINGDOM. SCOTLAND. Pan Am Flight 103,..."
12535,11-Aug-01,Luanda,Angola,National Union for the Total Independence of A...,Remote-detonated explosive,165,252,A train carrying refugees was derailed by an e...
3466,23-Oct-83,Beirut,Lebanon,Hizballah,Explosives,81,241,LEBANON. The buildings housing the U.S. Marin...
18121,21-Feb-04,Lira,Uganda,Lord's Resistance Army (LRA),Fire or Firebomb,60,239,The Lords Resistance Army (LRA) killed more th...



4. How often was "kidnapping" or some derviation of the word mentioned in all of the incident reports?

In [7]:
kidnapping_count = len(df_RAND[df_RAND['Description'].str.contains(r'kidnap|kidnapped|kidnapper|kidnapping|kidnappee|abduct|abduction|traffick|trafficking', case=False, na=False, regex=True)])
kidnapping_count

2880

5. When kindapping was mentioned, how often did the incident result in fatalities?

In [8]:
fatalities_kidnapping = len(df_RAND.query('Description.str.contains(r"kidnap|kidnapped|kidnapper|kidnapping|kidnappee|abduct|abduction|traffick|trafficking", case=False, na=False, regex=True) and Fatalities > 0'))
fatalities_kidnapping

939

6. When kidnapping was mentioned, how often was "ransom" mentioned?

In [9]:
ransom_kidnapping = len(df_RAND.query('Description.str.contains(r"kidnap|kidnapped|kidnapper|kidnapping|kidnappee|abduct|abduction|traffick|trafficking", case=False, na=False, regex=True) and Description.str.contains("ransom", case=False, na=False, regex=True)'))
ransom_kidnapping

158

7. In all of the incidents, how often were "students" mentioned as perpertators?

In [10]:
student_perp = len(df_RAND.query('Description.str.contains("student", case=False, na=False, regex=True)'))
student_perp

452


8. What was the first incident where a "suicide bomber" was mentioned?

In [11]:
suicide_bomber = df_RAND.query('Description.str.contains("suicide bomber", case=False, na=False, regex=True)').head(1)
suicide_bomber

Unnamed: 0,Date,City,Country,Perpetrator,Weapon,Injuries,Fatalities,Description
3889,06-Feb-85,,Lebanon,Unknown,Explosives,10,0,LEBANON. Ten Israeli soldiers were wounded in...


9. How often were "priests" or "clergy" mentioned?

In [12]:
priests_clergy = len(df_RAND.query('Description.str.contains(r"priests|clergy", case=False, na=False, regex=True)'))
priests_clergy

23

10. Name all the incidents where a woman or women were identified as terrorists. Not the victims, the terrorists.

In [13]:
!pip install spacy



In [14]:
import spacy
nlp = spacy.load("en_core_web_sm")

def extract_info(description):
    if isinstance(description, str):
        doc = nlp(description)
        is_female_terrorist = False
        for token in doc:
            if token.text.lower() in ["woman", "women", "female"]:
                if "terrorist" in [t.text.lower() for t in token.head.subtree]:
                    is_female_terrorist = True
                elif "attacker" in [t.text.lower() for t in token.head.subtree]:
                    is_female_terrorist = True
                elif "insurgent" in [t.text.lower() for t in token.head.subtree]:
                    is_female_terrorist = True 
                elif "militant" in [t.text.lower() for t in token.head.subtree]:
                    is_female_terrorist = True 
        return is_female_terrorist
    else:
        return False

df_RAND['IsFemaleTerrorist'] = df_RAND['Description'].apply(extract_info)

female_terrorists = df_RAND[df_RAND['IsFemaleTerrorist']]

female_terrorists

Unnamed: 0,Date,City,Country,Perpetrator,Weapon,Injuries,Fatalities,Description,IsFemaleTerrorist
1793,13-Oct-77,,Spain,Popular Front for the Liberation of Palestine ...,Unknown,0,1,SPAIN. A Lufthansa 737 Boeing jet was hijacke...,True
3309,08-Mar-83,Bonn,Federal Republic of Germany,Other,Explosives,0,0,FEDERAL REPUBLIC OF GERMANY. No one was injur...,True
5302,19-Aug-88,,Lebanon,Other,Explosives,0,1,LEBANON. A suicide attack on an Israeli convo...,True
6892,06-Jul-93,Jerusalem,Israel,Other,Knives & sharp objects,1,0,ISRAEL. A suspected Palestinian terrorist sta...,True
17732,05-Dec-03,Yessentuki,Russia,Black Widows,Explosives,165,46,A suicide bomber killed at least forty-six peo...,True
19694,14-Oct-04,Najaf,Iraq,Unknown,Explosives,0,0,Iraqi police arrested a woman wearing an explo...,True
37708,21-Apr-08,Baqubah,Iraq,Al Qaeda,Explosives,4,3,A female suicide attacker detonated her explos...,True
37773,29-Apr-08,Baqubah,Iraq,Al Qaeda,Explosives,5,1,One member of Al-Sahwa forces was killed and a...,True
37916,17-May-08,Baqubah,Iraq,Al Qaeda,Explosives,15,1,A female suicide attacker detonated her explos...,True
37956,21-May-08,Rutba,Iraq,Al Qaeda,Explosives,3,5,A female suicide attacker detonated her explos...,True


## Part 2
In the incident reports investigate motivations of terrorism and categorize them into economic, political, religious or some combination of those categories. How would you do it? Is it a dictionary of economic words? The mention of money? The mention of religion? The word "liberation?" What's the signal here? Could you use vectors? You could use research or domain knowledge into perpetrators and their stated goals. You will have to resolve or define new categories. For example, is the Irish Republican Army religious or poltical or RP as a new category? Is Hamas religious, political or both? Find the signals and create a new column that categorizes the incidents. 

There is no "right" answer here, just a way for you as a emerging data scientist to think about how to parse data and categorize it. Just declare categories and justify your logic! I am as interested in your reasoning, as I am in your code!


In [15]:
econ_words = ["economy", "market", "inflation", "supply and demand", "GDP", "monetary policy", "fiscal policy", "trade", "investment", "commerce", "capitalism", "taxation", "unemployment", "consumer", "entrepreneur", "macroeconomics", "microeconomics", "demand curve", "supply curve", "inflation rate", "interest rates", "budget deficit", "exchange rate", "economic growth", "globalization", "business cycle", "monopoly", "oligopoly", "economic inequality", "commodity"]
pol_words = ["politics", "government", "democracy", "election", "policy", "voting", "law", "legislation", "congress", "president", "parliament", "ideology", "partisanship", "lobbying", "constitution", "policy-making", "political party", "bureaucracy", "separation of powers", "international relations", "diplomacy", "human rights", "civil rights", "voter turnout", "political ideology", "gerrymandering", "public opinion", "political science", "lobbyist", "foreign policy"]
rel_words = ["religion", "faith", "spirituality", "church", "temple", "mosque", "belief", "worship", "god", "prayer", "theology", "sacred", "doctrine", "spiritual", "clergy", "worshipper", "clergyman", "revelation", "scripture", "denomination", "pilgrimage", "ritual", "divine", "prophet", "afterlife", "paganism", "spiritualism", "synagogue", "blessing", "sacrifice"]

In [16]:
import nltk
from nltk.corpus import stopwords

nltk.download('stopwords')

def classify_motive(description):
    if isinstance(description, str):
        description = description.lower()
        motive = None

        for word in econ_words:
            if word in description:
                motive = "Economic"
                break

        for word in pol_words:
            if word in description:
                motive = "Political"
                break

        for word in rel_words:
            if word in description:
                motive = "Religious"
                break

        return motive

df_RAND['Motive'] = df_RAND['Description'].apply(classify_motive)

df_RAND.to_csv("classified_data.csv", index=False)

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/rakkshetsinghaal/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [None]:
#Couldn't make this work as it didn't stop running but looks interesting!
"""
def classify_motive(description):
    if isinstance(description, str):
        doc = nlp(description.lower())

        motive_scores = {
            "Economic": max(doc.similarity(nlp(keyword)) for keyword in econ_words),
            "Political": max(doc.similarity(nlp(keyword)) for keyword in pol_words),
            "Religious": max(doc.similarity(nlp(keyword)) for keyword in rel_words)
        }

        assigned_motive = max(motive_scores, key=motive_scores.get)
        return assigned_motive
    else:
        return None

df_RAND['Motive'] = df_RAND['Description'].apply(classify_motive)

df_RAND.to_csv("classified_data.csv", index=False)
"""