In [1]:
import pandas as pd
import numpy as np
import os
import wrangle
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt


- What groups are most likely to be attacked
- What groups are most active
- What are the most common methods of attack 
- What locations are most prone to being attacked
- Are any certain terrorist groups gaining momentum/ falling off
- Can it be told when there is a “power vacuum” that is opening up
- Has the methodology of attacks changed as time progressed?
- What nationalities are most at risk?
- Are property structures being targeted as a way to cause more damage?
- How successful are the attacks?
- What groups are most likely to launch suicide attacks?
- What nationalities are most at risk?
- How has the frequency of attacks changed over the years?
- is there a relationship when a property was involved in the attack and the casualty count?

- relationship between type of attack and number wounded
- relationship between group and success
- relationship between type of attack and success
- relationship between day and type of attack

Initial Hypotheses: 
- Terrorism has changed over time and the fall of some terrorist groups will leave room for other terrorist groups grasping for power. 
- Certain groups are more likely to be targeted than others. 
- There is a possible relationship between when structures are being targeted and high casualty rates. 
- capital/ major cities are much more likely to be attacked than other areas. 
- The way that terrorist attack has changed over time, i.e. there has probably been a change from using things like guns to other methods like explosives/ ieds. 
- Terrorist groups have grown to be more violent as time has progressed. 

In [2]:
df = wrangle.get_terrorism_data()

In [3]:
df = wrangle.prep_df(df)

## Has the number of terrorist attacks changed over the years?

In [59]:
px.histogram(df, x = 'year')

- a quick visualization shows us that the number of attacks increased drastically from 2010 to 2014, but the number seemed to decline some after 2014

# How has the activity of major terrorist groups changed over time?

In [93]:
# create a df of the top 20 groups with their number of attacks
top_groups = pd.DataFrame(df.atk_group.value_counts().head(20))

In [81]:
# create a list of the top 20 groups
top_list= list(top_groups.index)

In [94]:
# create a new df of the top 20 groups that contains all the info of the original df
df_top = df[df['atk_group'].isin(top_list)]

In [95]:
df_top.atk_group.value_counts()

unknown                                        27631
taliban                                         4809
islamic state of iraq and the levant (isil)     3700
al-shabaab                                      1559
kurdistan workers' party (pkk)                   949
tehrik-i-taliban pakistan (ttp)                  870
al-qaida in iraq                                 495
sinai province of the islamic state              367
khorasan chapter of the islamic state            271
baloch republican army (bra)                     265
muslim extremists                                176
baloch liberation front (blf)                    165
baloch liberation army (bla)                     147
al-nusrah front                                  130
hamas (islamic resistance movement)              129
lashkar-e-jhangvi                                119
houthi extremists (ansar allah)                  117
gunmen                                           106
janjaweed                                     

In [102]:
# create a histogram of the top groups over the time frame
color_discrete_map = {'unknown': '#006ba4'}
fig = px.histogram(
    df_top, 
    x = 'atk_group', 
    animation_frame="year", 
    range_y = [0,500],
    range_x = [0,21],
    color_discrete_map = color_discrete_map,
    title="attacks by group by year"
)
fig.update_layout(
  template='ggplot2')
fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 3000

fig.show()

- There were many unclaimed attacks until about 2013, when terrorist groups began to claim responsibility. This could be do to the rise of terrorist groups like ISIL (ISIS), or terrorist groups claiming respoinsibility for attacks they didn't actually launch. Claiming attacks drastically increased around the year 2015, in particular for major groups like ISIL, 

In [4]:

df2 = df.copy()

In [11]:
df.shape

(44309, 29)

In [13]:
df2.shape

(15718, 29)

In [5]:
df2 = df2[df2['killed'] > 1] 

In [99]:
color_discrete_map = {'unknown': '#006ba4'}
fig = px.bar(
    df2, 
    x = 'atk_group',
    y="killed", 
    animation_frame="year", 
    range_y = [0,500],
    range_x = [0,30],
    color_discrete_map = color_discrete_map,
    title="killed by group by year"
)
fig.update_layout(
  template='ggplot2')
fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 3000

fig.show()

In [7]:
df2.killed.max()

670.0

In [100]:
color_discrete_map = {'unknown': '#006ba4'}
fig = px.bar(
    df2, 
    x = 'atk_group',
    y="suicide", 
    animation_frame="year", 
    range_y = [0,150],
    title="suicides by group by year"
)
fig.update_layout(
  template='ggplot2')
fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 3000

fig.show()

In [9]:
df2.killed.min()

2.0

In [16]:
df.T

Unnamed: 0,0,1,3,6,7,8,10,11,12,13,...,62713,62714,62716,62717,62718,62719,62720,62721,62722,62723
eventid,200101010004,200101030001,200101070003,200101080002,200101100004,200101110003,200101210006,200101220006,200101230007,200101240001,...,201712310004,201712310006,201712310008,201712310009,201712310012,201712310013,201712310018,201712310020,201712310022,201712310029
year,2001,2001,2001,2001,2001,2001,2001,2001,2001,2001,...,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017
month,1,1,1,1,1,1,1,1,1,1,...,12,12,12,12,12,12,12,12,12,12
day,1,3,7,8,10,11,21,22,23,24,...,31,31,31,31,31,31,31,31,31,31
country,turkey,turkey,turkey,turkey,turkey,israel,iran,afghanistan,turkey,turkey,...,iraq,afghanistan,somalia,afghanistan,iraq,somalia,afghanistan,afghanistan,somalia,syria
region,middle east & north africa,middle east & north africa,middle east & north africa,middle east & north africa,middle east & north africa,middle east & north africa,middle east & north africa,south asia,middle east & north africa,middle east & north africa,...,middle east & north africa,south asia,sub-saharan africa,south asia,middle east & north africa,sub-saharan africa,south asia,south asia,sub-saharan africa,middle east & north africa
provstate,istanbul,istanbul,istanbul,istanbul,istanbul,jerusalem,tehran,kabul,mersin,diyarbakir,...,saladin,nangarhar,banaadir,logar,diyala,bakool,faryab,faryab,middle shebelle,lattakia
city,istanbul,istanbul,istanbul,istanbul,istanbul,jerusalem,tehran,kabul,unknown,diyarbakir,...,farhatiyah,jalalabad,mogadishu,mohammad agha district,muqdadiyah,wajid,kohistan district,maymana,ceelka geelow,jableh
latitude,41.106178,41.106178,41.106178,41.106178,41.106178,31.771599,35.724533,34.516895,36.806853,37.922218,...,34.031331,34.417122,2.059819,34.217806,33.953167,3.810951,35.315467,35.921051,2.359673,35.407278
longitude,28.689863,28.689863,28.689863,28.689863,28.689863,35.2034,51.40519,69.147011,34.628893,40.184376,...,44.070106,70.449593,45.326115,69.109316,44.921906,43.246506,64.815508,64.774544,45.385034,35.942679


In [17]:
cat_vars = ['eventid', 'country', 'region', 'provstate', 'city', 'latitude', 'longitude', 'attack_type', 'target', 'targ_desc', 'targeted_group', 'tg_desc',  'nationality', 'atk_group', 'weap_type', 'weap_sub']

In [18]:
no_cat_df = df.copy()

In [19]:
no_cat_df = no_cat_df.drop(cat_vars, axis = 1)

In [35]:
df_corr = df.corr()

In [38]:
df_corr

Unnamed: 0,eventid,year,month,day,latitude,longitude,success,suicide,claimed,killed,us_killed,ter_killed,wounded,us_wounded,ter_wounded,property
eventid,1.0,0.999938,-0.015498,0.024791,-0.041354,-0.033832,-0.05188,-0.049521,0.209843,-0.020442,-0.05674,0.06796,-0.060933,-0.040911,0.046265,0.006321
year,0.999938,1.0,-0.026174,0.024339,-0.041426,-0.033686,-0.051914,-0.049519,0.209627,-0.020497,-0.05677,0.067914,-0.060927,-0.040963,0.046199,0.006544
month,-0.015498,-0.026174,1.0,0.011464,0.007059,-0.012999,0.006637,0.000978,0.015211,0.005313,0.004033,0.00321,0.000991,0.005668,0.005327,-0.02091
day,0.024791,0.024339,0.011464,1.0,0.006672,-0.003852,-0.005098,-5.9e-05,0.001122,0.001142,-0.001098,-0.009457,5.4e-05,0.000582,0.000472,-0.001794
latitude,-0.041354,-0.041426,0.007059,0.006672,1.0,0.097969,0.003165,0.054753,-0.036397,0.014548,0.012813,0.026118,0.041569,0.009825,0.019972,-0.014237
longitude,-0.033832,-0.033686,-0.012999,-0.003852,0.097969,1.0,-0.029247,-0.028273,0.041805,-0.018889,0.026991,0.048203,-0.046423,0.006886,0.062187,0.135494
success,-0.05188,-0.051914,0.006637,-0.005098,0.003165,-0.029247,1.0,-0.065233,0.003101,0.041815,0.018195,-0.082838,0.061762,0.008639,-0.025648,-0.074838
suicide,-0.049521,-0.049519,0.000978,-5.9e-05,0.054753,-0.028273,-0.065233,1.0,0.056008,0.255126,0.032363,0.199619,0.249128,0.0292,0.001817,-0.014074
claimed,0.209843,0.209627,0.015211,0.001122,-0.036397,0.041805,0.003101,0.056008,1.0,0.046911,0.006591,0.059687,0.022185,-0.003492,0.027205,0.034388
killed,-0.020442,-0.020497,0.005313,0.001142,0.014548,-0.018889,0.041815,0.255126,0.046911,1.0,0.043627,0.337327,0.481725,0.015513,0.160843,-0.006919


In [41]:
df_corr.killed

eventid       -0.020442
year          -0.020497
month          0.005313
day            0.001142
latitude       0.014548
longitude     -0.018889
success        0.041815
suicide        0.255126
claimed        0.046911
killed         1.000000
us_killed      0.043627
ter_killed     0.337327
wounded        0.481725
us_wounded     0.015513
ter_wounded    0.160843
property      -0.006919
Name: killed, dtype: float64

- there is a decent relationship between the amount of civilians that were killed and the number of terrorist that were killed, the number of people wounded, and the number of terrorist wounded. There is also a significant relationship between terrorist that commited suicide and whether or not there were civilians killed. 

In [42]:
df_corr.wounded

eventid       -0.060933
year          -0.060927
month          0.000991
day            0.000054
latitude       0.041569
longitude     -0.046423
success        0.061762
suicide        0.249128
claimed        0.022185
killed         0.481725
us_killed      0.020293
ter_killed     0.090090
wounded        1.000000
us_wounded     0.054648
ter_wounded    0.102240
property      -0.031939
Name: wounded, dtype: float64

- there wounded category shares similar significant relationships with the civilians killed category

In [43]:
df_corr.property

eventid        0.006321
year           0.006544
month         -0.020910
day           -0.001794
latitude      -0.014237
longitude      0.135494
success       -0.074838
suicide       -0.014074
claimed        0.034388
killed        -0.006919
us_killed      0.012320
ter_killed    -0.010550
wounded       -0.031939
us_wounded     0.004495
ter_wounded   -0.012116
property       1.000000
Name: property, dtype: float64

In [None]:
- 