### In this notebook, I engineer several features in order to model predictions for whether countries will endure a spike in terrorism in two years.

I 'shift' time-sensitive features so that past data is included in each row.  

The features I 'shift' from past to present depict the average-temperature/year and terrorist-activity/year for each country. Each observation (terrorist incident at a given point in time) will now have features that describe the state of the country's average-temperature and terrorist-activity from a couple years back.
  
I also 'shift' the terrorist-activity/year from future rows into the current row in order to create target variables for my model. Each observation (terrorist incident at a given point in time) will now have a feature that describes the terrorist activity for the country a couple years into the future.  
  
"Terrorist activity" for a given country refers to the number of people who participated in terrorist incidents from that country in a given year.  

I use this metric to determine if 'terrorist activity' in a given country has increased or not in a given year. Increases are denoted as 1, no increases are denoted as 0. My target variable is this "spike" feature for a country two years down the road.

In [36]:
import pandas as pd

In [6]:
df = pd.read_csv('/home/ubuntu/.jupyter/Notebooks/terrorsave.csv')
count_year_max_unbias = df.apply(lambda row: df[(df.iyear <= row.iyear) & (df.perpo_new == row.perpo_new)].current_year_count.max(), axis = 1)
count_year_max_unbias.shape

(68141,)

In [7]:
df.shape

(68141, 162)

In [8]:
df['count_year_max_unbias'] = count_year_max_unbias

In [11]:
def shift_temp(row, num):
    try:
        return df[(df.iyear == row.iyear - num) & (df.perpo_new == row.perpo_new)].avgtemp.iloc[0]
    except:
        return np.nan

In [20]:
df['tempone'] = df.apply(lambda x: shift_temp(x, 1), axis =1)
df['temptwo'] = df.apply(lambda x: shift_temp(x, 2), axis =1)
df['tempthree'] = df.apply(lambda x: shift_temp(x, 3), axis =1)
df['tempfour'] = df.apply(lambda x: shift_temp(x, 4), axis =1)

In [22]:
# df[['avgtemp','tempone','iyear','perpo_new']].sort(['perpo_new','iyear'])

In [21]:
print(df.tempone.isnull().sum())
print(df.temptwo.isnull().sum())
print(df.tempthree.isnull().sum())
print(df.tempfour.isnull().sum())

9719
4382
5409
6968


In [25]:
df['tempdiffone'] = df.avgtemp - df.tempone
df['tempdifftwo'] = df.avgtemp - df.temptwo
df['tempdiffthree'] = df.avgtemp - df.tempthree
df['tempdifffour'] = df.avgtemp - df.tempfour

In [62]:
def t_year(y, x):
    try:
        if y/5 <= x:
            return 1
        else:
            return 0
    except:
        return np.nan

def t_20(x):
    try:
        if 20 <= x:
            return 1
        else:
            return 0
    except:
        return np.nan

In [65]:
df['t1_year'] = df.apply(lambda row: t_year(row.count_year_max_unbias, row.diffone), axis = 1)
df['t2_year'] = df.apply(lambda row: t_year(row.count_year_max_unbias, row.difftwo), axis = 1)
df['t1_20'] = df.diffone.map(lambda x: t_20(x))
df['t2_20'] = df.difftwo.map(lambda x: t_20(x))

In [67]:
def t1_year_f(row, num):
    try:
        return df[(df.iyear == row.iyear + num) & (df.perpo_new == row.perpo_new)].t1_year.iloc[0]
    except:
        return np.nan
    
def t2_year_f(row, num):
    try:
        return df[(df.iyear == row.iyear + num) & (df.perpo_new == row.perpo_new)].t2_year.iloc[0]
    except:
        return np.nan
    
def t1_20_f(row, num):
    try:
        return df[(df.iyear == row.iyear + num) & (df.perpo_new == row.perpo_new)].t1_20.iloc[0]
    except:
        return np.nan

def t2_20_f(row, num):
    try:
        return df[(df.iyear == row.iyear + num) & (df.perpo_new == row.perpo_new)].t2_20.iloc[0]
    except:
        return np.nan

In [68]:
df['t1_year_f1'] = df.apply(lambda row: t1_year_f(row, 1), axis = 1)
df['t2_year_f1'] = df.apply(lambda row: t2_year_f(row, 1), axis = 1)

df['t1_year_f2'] = df.apply(lambda row: t1_year_f(row, 2), axis = 1)
df['t2_year_f2'] = df.apply(lambda row: t2_year_f(row, 2), axis = 1)

df['t1_year_f1'] = df.apply(lambda row: t1_year_f(row, 1), axis = 1)
df['t2_year_f1'] = df.apply(lambda row: t2_year_f(row, 1), axis = 1)
df['t1_20_f1'] = df.apply(lambda row: t1_20_f(row, 1), axis = 1)
df['t2_20_f1'] = df.apply(lambda row: t2_20_f(row, 1), axis = 1)

df['t1_year_f2'] = df.apply(lambda row: t1_year_f(row, 2), axis = 1)
df['t2_year_f2'] = df.apply(lambda row: t2_year_f(row, 2), axis = 1)
df['t1_20_f2'] = df.apply(lambda row: t1_20_f(row, 2), axis = 1)
df['t2_20_f2'] = df.apply(lambda row: t2_20_f(row, 2), axis = 1)

In [70]:
print(df.t1_year_f1.value_counts())
print(df.t2_year_f1.value_counts())
print(df.t1_20_f1.value_counts())
print(df.t2_20_f1.value_counts())

print(df.t1_year_f2.value_counts())
print(df.t2_year_f2.value_counts())
print(df.t1_20_f2.value_counts())
print(df.t2_20_f2.value_counts())

0.0    43820
1.0    14839
Name: t1_year_f1, dtype: int64
0.0    34681
1.0    23978
Name: t2_year_f1, dtype: int64
0.0    39961
1.0    18698
Name: t1_20_f1, dtype: int64
0.0    30694
1.0    27965
Name: t2_20_f1, dtype: int64
0.0    39706
1.0    12899
Name: t1_year_f2, dtype: int64
0.0    38149
1.0    14456
Name: t2_year_f2, dtype: int64
0.0    35907
1.0    16698
Name: t1_20_f2, dtype: int64
0.0    35484
1.0    17121
Name: t2_20_f2, dtype: int64
