# Geospatial Model
This notebook illustrates how we want on calculating our 9 scoring equations in 3 different ways:
- No transformation (Normal)
- Logged
- BoxCoxed

Loading the rain and wind data the location of the tweet. Also, loading the calculated distance from Irma's eye at that moment.

In [1]:
import pandas as pd
tweets = pd.read_csv("geospatial.csv.gzip", compression="gzip", index_col="tweet_id")
tweets.head()



Unnamed: 0_level_0,windval,rainval,distance,irma_text
tweet_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
906668545542680576,8.176039,0.05046,386.017545,1
906668555185291265,16.473644,0.708585,431.548735,1
906668556493889536,6.929129,0.004562,357.25023,0
906668570079309830,7.795705,0.004562,364.639244,0
906668576056246278,9.647847,0.0,327.269226,0


Calculating the scores without performing any transformations (other than Min-Max Scaling).

In [2]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler

wind = "windval"
rain = "rainval"
dist = "distance"

normal_scores = tweets.copy()[['irma_text']]

normal_scores["GIS_Normal1"] = MinMaxScaler().fit_transform(
    pd.DataFrame((tweets[wind] + 1) * (tweets[rain] + 1) / (tweets[dist] + 1))
)
normal_scores["GIS_Normal2"] = MinMaxScaler().fit_transform(
    pd.DataFrame((tweets[rain] + 1) / (tweets[dist] + 1))
)
normal_scores["GIS_Normal3"] = MinMaxScaler().fit_transform(
    pd.DataFrame((tweets[wind] + 1) / (tweets[dist] + 1))
)
normal_scores["GIS_Normal4"] = MinMaxScaler().fit_transform(
    pd.DataFrame((tweets[wind] + 1) * (tweets[rain] + 1) / np.sqrt(tweets[dist] + 1))
)
normal_scores["GIS_Normal5"] = MinMaxScaler().fit_transform(
    pd.DataFrame((tweets[rain] + 1) / np.sqrt(tweets[dist] + 1))
)
normal_scores["GIS_Normal6"] = MinMaxScaler().fit_transform(
    pd.DataFrame((tweets[wind] + 1) / np.sqrt(tweets[dist] + 1))
)
normal_scores["GIS_Normal7"] = MinMaxScaler().fit_transform(
    pd.DataFrame(
        (tweets[wind] + 1) * (tweets[rain] + 1) / np.power(tweets[dist] + 1, 1.0 / 3)
    )
)
normal_scores["GIS_Normal8"] = MinMaxScaler().fit_transform(
    pd.DataFrame((tweets[rain] + 1) / np.power(tweets[dist] + 1, 1.0 / 3))
)
normal_scores["GIS_Normal9"] = MinMaxScaler().fit_transform(
    pd.DataFrame((tweets[wind] + 1) / np.power(tweets[dist] + 1, 1.0 / 3))
)

normal_scores.head()

Unnamed: 0_level_0,irma_text,GIS_Normal1,GIS_Normal2,GIS_Normal3,GIS_Normal4,GIS_Normal5,GIS_Normal6,GIS_Normal7,GIS_Normal8,GIS_Normal9
tweet_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
906668545542680576,1,4.5e-05,3.9e-05,0.00098,0.000861,0.000479,0.01883,0.002299,0.000964,0.050441
906668555185291265,1,0.000128,7.2e-05,0.001707,0.00265,0.001242,0.035138,0.007269,0.003137,0.096463
906668556493889536,0,4e-05,4.2e-05,0.000911,0.00073,0.00047,0.016756,0.001918,0.000892,0.044193
906668570079309830,0,4.3e-05,4e-05,0.000995,0.000808,0.000456,0.018548,0.002135,0.000867,0.049167
906668576056246278,0,5.9e-05,4.8e-05,0.00136,0.001046,0.000527,0.024121,0.002717,0.000986,0.06289


Calculating the scores after taking the `log10` of them.

In [3]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler

wind = "windval"
rain = "rainval"
dist = "distance"

logged_scores = tweets.copy()[["irma_text"]]

logged_scores["GIS_Logged1"] = MinMaxScaler().fit_transform(
    np.log10(pd.DataFrame((tweets[wind] + 1) * (tweets[rain] + 1) / (tweets[dist] + 1)))
)
logged_scores["GIS_Logged2"] = MinMaxScaler().fit_transform(
    np.log10(pd.DataFrame((tweets[rain] + 1) / (tweets[dist] + 1)))
)
logged_scores["GIS_Logged3"] = MinMaxScaler().fit_transform(
    np.log10(pd.DataFrame((tweets[wind] + 1) / (tweets[dist] + 1)))
)
logged_scores["GIS_Logged4"] = MinMaxScaler().fit_transform(
    np.log10(
        pd.DataFrame(
            (tweets[wind] + 1) * (tweets[rain] + 1) / np.sqrt(tweets[dist] + 1)
        )
    )
)
logged_scores["GIS_Logged5"] = MinMaxScaler().fit_transform(
    np.log10(pd.DataFrame((tweets[rain] + 1) / np.sqrt(tweets[dist] + 1)))
)
logged_scores["GIS_Logged6"] = MinMaxScaler().fit_transform(
    np.log10(pd.DataFrame((tweets[wind] + 1) / np.sqrt(tweets[dist] + 1)))
)
logged_scores["GIS_Logged7"] = MinMaxScaler().fit_transform(
    np.log10(
        pd.DataFrame(
            (tweets[wind] + 1)
            * (tweets[rain] + 1)
            / np.power(tweets[dist] + 1, 1.0 / 3)
        )
    )
)
logged_scores["GIS_Logged8"] = MinMaxScaler().fit_transform(
    np.log10(pd.DataFrame((tweets[rain] + 1) / np.power(tweets[dist] + 1, 1.0 / 3)))
)
logged_scores["GIS_Logged9"] = MinMaxScaler().fit_transform(
    np.log10(pd.DataFrame((tweets[wind] + 1) / np.power(tweets[dist] + 1, 1.0 / 3)))
)

logged_scores.head()

Unnamed: 0_level_0,irma_text,GIS_Logged1,GIS_Logged2,GIS_Logged3,GIS_Logged4,GIS_Logged5,GIS_Logged6,GIS_Logged7,GIS_Logged8,GIS_Logged9
tweet_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
906668545542680576,1,0.231712,0.0763,0.301459,0.273537,0.059194,0.39949,0.294819,0.04951,0.459384
906668555185291265,1,0.31032,0.11263,0.3556,0.385362,0.120975,0.490196,0.423547,0.126224,0.572431
906668556493889536,0,0.22296,0.079453,0.294468,0.257713,0.058326,0.382931,0.275397,0.046278,0.43698
906668570079309830,0,0.229384,0.077476,0.302932,0.267441,0.056862,0.397344,0.286806,0.045116,0.455028
906668576056246278,0,0.252085,0.087474,0.333301,0.292455,0.063939,0.435107,0.312998,0.050474,0.497309


Calculating the scores by applying the BoxCox transformation to them.

In [4]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from scipy.stats import boxcox

wind = "windval"
rain = "rainval"
dist = "distance"

boxcox_scores = tweets.copy()[["irma_text"]]

boxcox_scores["GIS_Box1"] = MinMaxScaler().fit_transform(
    pd.DataFrame(
        boxcox((tweets[wind] + 1) * (tweets[rain] + 1) / (tweets[dist] + 1))[0]
    )
)
boxcox_scores["GIS_Box2"] = MinMaxScaler().fit_transform(
    pd.DataFrame(boxcox((tweets[rain] + 1) / (tweets[dist] + 1))[0])
)
boxcox_scores["GIS_Box3"] = MinMaxScaler().fit_transform(
    pd.DataFrame(boxcox((tweets[wind] + 1) / (tweets[dist] + 1))[0])
)
boxcox_scores["GIS_Box4"] = MinMaxScaler().fit_transform(
    pd.DataFrame(
        boxcox((tweets[wind] + 1) * (tweets[rain] + 1) / np.sqrt(tweets[dist] + 1))[0]
    )
)
boxcox_scores["GIS_Box5"] = MinMaxScaler().fit_transform(
    pd.DataFrame(boxcox((tweets[rain] + 1) / np.sqrt(tweets[dist] + 1))[0])
)
boxcox_scores["GIS_Box6"] = MinMaxScaler().fit_transform(
    pd.DataFrame(boxcox((tweets[wind] + 1) / np.sqrt(tweets[dist] + 1))[0])
)
boxcox_scores["GIS_Box7"] = MinMaxScaler().fit_transform(
    pd.DataFrame(
        boxcox(
            (tweets[wind] + 1)
            * (tweets[rain] + 1)
            / np.power(tweets[dist] + 1, 1.0 / 3)
        )[0]
    )
)
boxcox_scores["GIS_Box8"] = MinMaxScaler().fit_transform(
    pd.DataFrame(boxcox((tweets[rain] + 1) / np.power(tweets[dist] + 1, 1.0 / 3))[0])
)
boxcox_scores["GIS_Box9"] = MinMaxScaler().fit_transform(
    pd.DataFrame(boxcox((tweets[wind] + 1) / np.power(tweets[dist] + 1, 1.0 / 3))[0])
)


boxcox_scores.head()

Unnamed: 0_level_0,irma_text,GIS_Box1,GIS_Box2,GIS_Box3,GIS_Box4,GIS_Box5,GIS_Box6,GIS_Box7,GIS_Box8,GIS_Box9
tweet_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
906668545542680576,1,0.229472,0.16294,0.144114,0.247141,0.137116,0.111887,0.260335,0.117006,0.140851
906668555185291265,1,0.307625,0.232266,0.179218,0.353581,0.262437,0.163757,0.382458,0.274648,0.221007
906668556493889536,0,0.22078,0.169156,0.139822,0.232338,0.13523,0.103924,0.242369,0.109756,0.128148
906668570079309830,0,0.22716,0.165263,0.145024,0.241431,0.132044,0.110832,0.252908,0.107137,0.138309
906668576056246278,0,0.249712,0.184798,0.164349,0.264921,0.147355,0.130499,0.277258,0.11916,0.164559
