<a href="https://colab.research.google.com/github/kmsekgothe/load-shortfall-regression-predict-api/blob/master/KaggleNotebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Team 12 - Advanced Classification Predict

© Explore Data Science Academy

---

### Introduction: 
---

### Predict Overview




<a id="cont"></a>

## Table of Contents

<a href=#one>1. Importing Packages</a>

<a href=#two>2. Loading Data</a>

<a href=#three>3. Exploratory Data Analysis (EDA)</a>

<a href=#four>4. Data Engineering</a>

<a href=#five>5. Modeling</a>

<a href=#six>6. Model Performance</a>

<a href=#seven>7. Model Explanations</a>

 <a id="one"></a>
## 1. Importing Packages

In this section we import the necessary libraries needed for Data Analysis, Data Manipulation, Data Visualization and Model Building.

In [1]:
# libraries needed for Data Analysis and  Data Manipulation
import numpy as np # used to evaluate arrays
import pandas as pd # used to create and utilise tabular data ie Pandas DataFrame

# libraries to be used for Data Visualization
import matplotlib.pyplot as plt # used to visualize data
import seaborn as sns # used to visualize data
from matplotlib import rc
%matplotlib inline

# Libraries for data preparation and model building
import sklearn
import re
from nltk.corpus import stopwords
from nltk.tokenize import TreebankWordTokenizer
from nltk.stem import WordNetLemmatizer
import string
import requests
from time import sleep
from nltk.corpus import stopwords
from wordcloud import WordCloud

from sklearn.feature_extraction.text import CountVectorizer
from sklearn import preprocessing
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import roc_auc_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import f1_score
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import SelectKBest
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV

# Suppress cell warnings
import warnings
warnings.filterwarnings("ignore")

In [None]:
# !pip install wordcloud

<a id="two"></a>
## 2. Loading the Data
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

- Loading of Test and Train datasets. 
- Concatenate the datasets to ensure Data Engineering is done only once (for convenience). 
- Dataframes will then be split later on when needed.

In [14]:
df = pd.read_csv('train.csv') # load the data
df_test = pd.read_csv('test_with_no_labels.csv')
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

Perform basic analysis on the dataframe.

In [15]:
# Basic Train Analysis

df.shape # train DataFrame has 15 819 rows and 3 columns

(15819, 3)

In [None]:
df.head()

In [None]:
df.info()

In [21]:
df['sentiment'].unique()

15819

In [None]:
df['sentiment'].value_counts().plot(kind = 'bar')
plt.title('Class Frequency')
plt.xlabel('Class')
plt.ylabel('Frequency')
plt.show()

In [None]:
unique, counts = np.unique(df['sentiment'], return_counts=True)
unique_counts_dict = {'Unique Count':
             {
                 "Class -1": counts[0],
                "Class 0": counts[1],
              "Class 1": counts[2],
              "Class 2": counts[3]
              }
             }
unique_count = pd.DataFrame(data=unique_counts_dict)
unique_count.sort_values(by='Unique Count', ascending=False)

Class Description:

- 2 News: the tweet links to factual news about climate change
- 1 Pro: the tweet supports the belief of man-made climate change
- 0 Neutral: the tweet neither supports nor refutes the belief of man-made climate change
- -1 Anti: the tweet does not believe in man-made climate change

Note the imbalance here: there are over 8000 observations in class 1 and only 1296 observations in class -1.

In [None]:
# Basic Test Analysis

df_test.shape # test DataFrame has 10 546 rows and 2 columns

In [None]:
df_test.head()

In [None]:
df_test.info()

<a id="three"></a>
## 3. Data Preprocessing
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

#### Data Cleaning and Formatting
 

Before we can do **Exploratory Data Analysis** (EDA) in section 4, we need to ensure that our data is in the correct format that can actually be used.

In [None]:
df.head()

In [13]:
# Better view of what's in the dataset
for i, row in df.iterrows():
    print(i)
    print(row)
    print("\n")

0
sentiment                                                                                                                         1
message      PolySciMajor EPA chief doesn't think carbon dioxide is main cause of global warming and.. wait, what!?   via @mashable
tweetid                                                                                                                      625221
url                                                                                                     ['https://t.co/yeLvcEFXkC']
Name: 0, dtype: object


1
sentiment                                                                 1
message      It's not like we lack evidence of anthropogenic global warming
tweetid                                                              126103
url                                                                      []
Name: 1, dtype: object


2
sentiment                                                                                                       

Name: 291, dtype: object


292
sentiment                                                       0
message      Stand up for all of the successes of climate change:
tweetid                                                    424186
url                                                            []
Name: 292, dtype: object


293
sentiment                                                                                                                                             1
message      RT @Fusion: America is about to become the only country in the world with a leader who doesnÃ¢â‚¬â„¢t think global warming is real.Ã¢â‚¬Â¦
tweetid                                                                                                                                          150591
url                                                                                                                                                  []
Name: 293, dtype: object


294
sentiment                          

Name: 505, dtype: object


506
sentiment                                                                                                                                          -1
message      RT @DineshDSouza: THOUGHT FOR THE DAY: If gender is a social construct--which is to say 'all in your head'-- maybe climate change is too
tweetid                                                                                                                                        408538
url                                                                                                                                                []
Name: 506, dtype: object


507
sentiment                                       1
message      11 terrifying climate change facts  
tweetid                                    551452
url                   ['https://t.co/JgUosRxVlf']
Name: 507, dtype: object


508
sentiment                                                                                                  

779
sentiment                                                                                                                                         0
message      RT @TheCosby: How you gonna post this without an @, an IG, a snap, her location, her views on climate change &amp; BLM, her type, wha…
tweetid                                                                                                                                       58911
url                                                                                                                                              []
Name: 779, dtype: object


780
sentiment                                                                                                 0
message      The 'skeptical environmentalist' takes on climate change in the controversial doc, Cool It    
tweetid                                                                                               46988
url                                              


964
sentiment                                                                                                                               1
message      We can chose to be on the right side of history on climate change or we can be catastrophically wrong. It's real. AÃ¢â‚¬Â¦  
tweetid                                                                                                                            205010
url                                                                                                           ['https://t.co/qB9uXqmcur']
Name: 964, dtype: object


965
sentiment                                                                                                1
message      I just joined @NatGeoChannel @21CF in combating climate change. Take action #BeforeTheFlood  
tweetid                                                                                             990854
url                                                                            ['https://t.

Name: 1241, dtype: object


1242
sentiment                                                                                                                         2
message      RT @YahooNews: Billionaire climate change activist says heÃ¢â‚¬â„¢ll spend whatever it takes to fight Trump    Ã¢â‚¬Â¦
tweetid                                                                                                                      193082
url                                                                            ['https://t.co/9PxJKpE4hR', 'https://t.co/8Ã¢â‚¬Â¦']
Name: 1242, dtype: object


1243
sentiment                                                                                                                                      1
message      RT @JonRiley7: I've always thought this! How can you distrust scientists (on stuff like climate change) when you can see science i…
tweetid                                                                                                             

Name: 1463, dtype: object


1464
sentiment                                                                                                                 1
message      RT @climatehawk1: Beliefs about #climate change may reflect failure to understand what it is   #globalwarming…
tweetid                                                                                                              328186
url                                                                                             ['https://t.co/gTUU0vMloe']
Name: 1464, dtype: object


1465
sentiment                                                                                                                         1
message      RT @NRDC: Scott Pruitt’s statement is at odds with the established scientific consensus on climate change.   via @nyt…
tweetid                                                                                                                       71176
url                                       

Name: 1749, dtype: object


1750
sentiment                                                                                                                                                         1
message      RT @cieriapoland: this halloween gets a lot scarier when you consider that bc of global warming it's 80 degrees in October &amp; we are destroyÃ¢â‚¬Â¦
tweetid                                                                                                                                                      134271
url                                                                                                                                                              []
Name: 1750, dtype: object


1751
sentiment                                                           1
message      RT @Glinner: This from a fucking climate change denier  
tweetid                                                        423266
url                                       ['https://t.co/WdqZkb51WN'

Name: 1974, dtype: object


1975
sentiment                                                                                                                  1
message      RT @chrislhayes: This is the absolute perfect distillation of the right's central position on climate change.  
tweetid                                                                                                               193832
url                                                                                              ['https://t.co/5aEmeinGJa']
Name: 1975, dtype: object


1976
sentiment                                                                                                                1
message      RT @naretevduorp: HRC reacts to Trump's reckless rollback of Obama's environmental/global warming measures.  
tweetid                                                                                                             914755
url                                                              

Name: 2209, dtype: object


2210
sentiment                                                                                                                    1
message      Been watching #BeforeTheFlood this evening, I hope it helps inspire people to act to stop climate change - now!  
tweetid                                                                                                                 557537
url                                                                                                ['https://t.co/DiFF9GObCN']
Name: 2210, dtype: object


2211
sentiment                                                                                                                                                1
message      RT @KennethBerlin: Our work to solve climate change, one of the greatest challenges humanity has ever faced, has never been easy. WhatÃ¢â‚¬Â¦
tweetid                                                                                                             

Name: 2395, dtype: object


2396
sentiment                                                                                                                        1
message      RT @peta: The meat industry is one of the biggest causes of climate change. Make the green choice and #GoVegan2017!  
tweetid                                                                                                                     663504
url                                                                                                    ['https://t.co/wPlrIHZ8R0']
Name: 2396, dtype: object


2397
sentiment                                                                                                                    2
message      RT @EcoInternet3: Study: Some tree species unable to adapt to #climate change: Duluth News Tribune   #environment
tweetid                                                                                                                 246071
url                          

Name: 2573, dtype: object


2574
sentiment                                                                                                                          1
message      Global temperature development, here 2016 included.\nGetting harder to deny man-made climate change\n@NTNU @eptntnu \n 
tweetid                                                                                                                       672860
url                                                                                                      ['https://t.co/Ge7mhqlzES']
Name: 2574, dtype: object


2575
sentiment                                                                                         1
message      RT @dogsrool_: Middle of November and the high is 88Ã‚Â° but climate change is fake!!!
tweetid                                                                                       94204
url                                                                                              []
Na

2853
sentiment                                                                                                                1
message      RT @algore: I'm optimistic about climate change. But people like you have to speak up for solutions:  Ã¢â‚¬Â¦
tweetid                                                                                                             791009
url                                                                                     ['https://t.co/gwT6xJVIUPÃ¢â‚¬Â¦']
Name: 2853, dtype: object


2854
sentiment                                                                                                      1
message      Australian governments used climate change weather destruction to attack renewables #ParisAgreement
tweetid                                                                                                   809696
url                                                                                                           []
Name: 2854, dtype:

Name: 3115, dtype: object


3116
sentiment                                                                   0
message      Roses you carnation pink All global warming red trumpet earth me
tweetid                                                                353744
url                                                                        []
Name: 3116, dtype: object


3117
sentiment                                                                                                                                              -1
message      RT @Education4Libs: Facts: There are 2 genders, global warming is made up, the pay gap isn't real, women have equal rights, guns save lives…
tweetid                                                                                                                                            952071
url                                                                                                                                                    []
Name: 

3393
sentiment                                                                                                                                       1
message      RT @JoyAnnReid: @nytimes @robreiner Urban communities who have now been told their country will no longer fight climate change, and…
tweetid                                                                                                                                    786572
url                                                                                                                                            []
Name: 3393, dtype: object


3394
sentiment                                                                                                                              0
message      RT @naturalretreat1: RT @NewScienceWrld Jill Pelto's watercolors illustrate the strange beauty of climate change   hÃ¢â‚¬Â¦
tweetid                                                                                                 

Name: 3636, dtype: object


3637
sentiment                                                                                           2
message      RT @Telegraph: Coffee killing fungus was not driven by climate change, scientists find  
tweetid                                                                                        630903
url                                                                       ['https://t.co/HZM6uAzFbG']
Name: 3637, dtype: object


3638
sentiment                                                                                                                                               1
message      RT @LaurenWern: Stein voters say climate change is important to them. But they couldn't even 'compromise' to stop a climate change denier f…
tweetid                                                                                                                                            117005
url                                                             

Name: 3911, dtype: object


3912
sentiment                                                                  -1
message      Jina made up global warming, we hate them, they steal our jobs  
tweetid                                                                117619
url                                               ['https://t.co/GCrtLSSbNs']
Name: 3912, dtype: object


3913
sentiment                                                                                                                                   1
message      RT @ajplus: The City of Chicago is posting the climate change data and info that the EPA has deleted from its website under the…
tweetid                                                                                                                                 80236
url                                                                                                                                        []
Name: 3913, dtype: object


3914
sentiment            

sentiment                                                                                                               2
message      RT @latimes: China is now looking to California – not Trump – to help lead the fight against climate change…
tweetid                                                                                                            301662
url                                                                                                                    []
Name: 4189, dtype: object


4190
sentiment                                                                                                                                                            1
message      RT @LastWeekTonight: If you donÃ¢â‚¬â„¢t believe man-made global warming is a a silly issue, give to the Natural Resources Defense Council (https:Ã¢â‚¬Â¦
tweetid                                                                                                                                          



4460
sentiment                                                                                                                        2
message      As Trump heads to G-7 mtg, European leaders lobby him on climate change - just as conservatives feared: @evanhalper  
tweetid                                                                                                                     964117
url                                                                                                    ['https://t.co/2sKbqYG5a9']
Name: 4460, dtype: object


4461
sentiment                                                                                                                                               1
message      Trump will make climate change, income inequality, prisons, surveillance worse but democrats also failed to adequately address these issues.
tweetid                                                                                                                         

4736
sentiment                                                                                                                                             1
message      RT @libshipwreck: We should really start naming hurricanes after oil companies and politicians who pretend that climate change isn't real.
tweetid                                                                                                                                          392519
url                                                                                                                                                  []
Name: 4736, dtype: object


4737
sentiment                                                                                                                        2
message      Wisconsin’s Department of Natural Resources site no longer says humans cause climate change – The Verge   #wtf #spot…
tweetid                                                                                     

Name: 5007, dtype: object


5008
sentiment                                                                                    0
message      @TheEconomist Will you guys do this post-facto analysis of climate change models?
tweetid                                                                                 472983
url                                                                                         []
Name: 5008, dtype: object


5009
sentiment                                                                             0
message      RT @GRANDJIMIN: Literally y'all could blame Namjoon for global warming too
tweetid                                                                          752860
url                                                                                  []
Name: 5009, dtype: object


5010
sentiment                                                                           2
message      Record-breaking climate change pushes world into ‘uncharted territory’

Name: 5251, dtype: object


5252
sentiment                                                                                                                                                         1
message      RT @margokingston1: I'm guessing we need to align with China on trade &amp; climate change now. No choice but to distance ourselves from USA, yÃ¢â‚¬Â¦
tweetid                                                                                                                                                      592683
url                                                                                                                                                              []
Name: 5252, dtype: object


5253
sentiment                                                                                                                         2
message      RT @PolarVortex: A rogue national park is tweeting out climate change facts in defiance of Donald Trump   @BadlandsNPS
tweetid       

5430
sentiment                                                                         0
message      @femmetron9000 idk all I know is that climate change........girl......
tweetid                                                                      840763
url                                                                              []
Name: 5430, dtype: object


5431
sentiment                                                                                                                                                                                                                      1
message      RT @ManMet80: #ImWithHer @HillaryClinton because we are #StrongerTogether \n\nÃ¢Ëœâ‚¬Ã¯Â¸ï†fight climate change\nÃ°Å¸â€™Æ’Ã°Å¸ï†Â½women's rights \nÃ°Å¸â€˜ï†Ã°Å¸ï†Â½gunsense\nÃ°Å¸â€¡ÂºÃ°Å¸â€¡Â¸VeteransÃ¢â‚¬Â¦
tweetid                                                                                                                                                                         

5681
sentiment                                                                                                                                                   1
message      RT @thecultureofme: hates muslims\nhates women\nhates POC\nhates LGBTQ\nhates disabled\ndoesnÃ¢â‚¬â„¢t believe in climate change\nleads in polls
tweetid                                                                                                                                                 98325
url                                                                                                                                                        []
Name: 5681, dtype: object


5682
sentiment                                                                                                    2
message      Netherlands bets €1m on global #climate #adaptation centre | Climate Home - climate change news  
tweetid                                                                                                 3215

5917
sentiment                                                                                                                2
message      RT @BostonGlobe: Obama urges more action to be taken on climate change during farewell speech. Watch live:  …
tweetid                                                                                                              26713
url                                                                                           ['https://t.co/ReZCW5JJQ3…']
Name: 5917, dtype: object


5918
sentiment                                                                                                               1
message      RT @PattyArquette: The White House page on climate change has already been removed from the website. #Resist
tweetid                                                                                                             18866
url                                                                                                     

Name: 6188, dtype: object


6189
sentiment                                                                                                     0
message      Great to find experts using data rather than dogma in analysis of climate change. .@AlexEpstein…  
tweetid                                                                                                   59192
url                                                                                 ['https://t.co/HE6u0xAnkv']
Name: 6189, dtype: object


6190
sentiment                                                                                                                                   1
message      RT @StephenSchlegel: she's thinking about how she's going to die because your husband doesn't believe in climate change  Ã¢â‚¬Â¦
tweetid                                                                                                                                397555
url                                                         



6382
sentiment                                                                              2
message      Now I Get It: The hot debate over the Paris Agreement on climate change    
tweetid                                                                           770760
url                               ['https://t.co/eQBj13Z2wG', 'https://t.co/kAyHMj5y9p']
Name: 6382, dtype: object


6383
sentiment                                                                                                                                           0
message      RT @AliIngersoll4: Several information pages have disappeared from the White House website including civil rights, climate change, and… 
tweetid                                                                                                                                        274936
url                                                                                                                                                []
Name



6544
sentiment                                                                                                                                    1
message      RT @camrako: False it would've avoided the iceberg due to advances in technology and cause climate change the iceberg would've b…
tweetid                                                                                                                                 504383
url                                                                                                                                         []
Name: 6544, dtype: object


6545
sentiment                                                                                                  2
message      RT @mashable: In Trump's America, climate change research is surely 'a waste of your money'    
tweetid                                                                                               988739
url                                                   ['https

Name: 6833, dtype: object


6834
sentiment                                                                                        0
message      Niggas asked me what my inspiration was, I told'em global warming, you feel me? #cozy
tweetid                                                                                     251729
url                                                                                             []
Name: 6834, dtype: object


6835
sentiment                                                                                                                                               1
message      RT @rachelkilburg: There are people who believe in weather reports determined by a ground hog but don't believe in climate change determine…
tweetid                                                                                                                                            753464
url                                                                         

sentiment                                                                                                                                               1
message      RT @nadezhdakrups: @CNN As predicted, a bunch of science illiterate morons foolishly asserting this is proof that global warming isn't happ…
tweetid                                                                                                                                            858667
url                                                                                                                                                    []
Name: 7099, dtype: object


7100
sentiment                                                                                                                                              -1
message      Dear global warming, \nWhy couldn't you be real? \n\nSigned, \nA very confused person wondering why it's snowing in North Carolina right now
tweetid                                    

7274
sentiment                                                                                                     1
message      RT @LisaBloom: We will be the only developed nation in the world led by a climate change denier.  
tweetid                                                                                                  335117
url                                                                                 ['https://t.co/tR1DclGWEz']
Name: 7274, dtype: object


7275
sentiment                                                                                                               0
message      By storing water for irrigation and drinking purpose,how is he battling climate change? Slightly stretched  
tweetid                                                                                                            227112
url                                                                                           ['https://t.co/XJ0mtfzhsH']
Name: 7275, dtype: object


Name: 7529, dtype: object


7530
sentiment                                                                                   1
message      RT @fivefifths: Here's a reminder that we completely blew it on climate change  
tweetid                                                                                978995
url                                                               ['https://t.co/UvJYWGtzuc']
Name: 7530, dtype: object


7531
sentiment                                                                                                                                   1
message      RT @AVSTlN: The Titanic wouldn't sink in 2016 because global warming is real the icebergs are melting and the bees are dying we…
tweetid                                                                                                                                 85246
url                                                                                                                                 

sentiment                                                                                                                      1
message      RT @EndeavourSci: Dear god what a relief to have a leader say climate change is a fact. #JustinTrudeau  #yycchamber
tweetid                                                                                                                   258340
url                                                                                                                           []
Name: 7792, dtype: object


7793
sentiment                                                                                              1
message      RT @lexi4prez: you believe he rose from the dead but you don't believe in climate change?  
tweetid                                                                                           142300
url                                                                          ['https://t.co/PVH0C1hw65']
Name: 7793, dtype: object


779

8052
sentiment                                                                                                    2
message      RT @PatriotByGod: Trump to drop climate change garbage from environmental reviews: Bloomberg -\n 
tweetid                                                                                                 221494
url                                                                                ['https://t.co/cAX4wi76zM']
Name: 8052, dtype: object


8053
sentiment                                                                                             1
message      RT @Salon: When China calls out Donald Trump on climate change, you know itÃ¢â‚¬â„¢s bad  
tweetid                                                                                          694337
url                                                                         ['https://t.co/qx1Xep7k82']
Name: 8053, dtype: object


8054
sentiment                                                            

Name: 8291, dtype: object


8292
sentiment                                                                                    2
message      Trump begins tearing up Obama's years of progress on tackling climate change\n\n 
tweetid                                                                                  23601
url                                                                ['https://t.co/pdCuep2pd6']
Name: 8292, dtype: object


8293
sentiment                                                                                                                               1
message      @FunjabiAtheist as far as climate change, it's going to happen either way because the us military is the largest contributor
tweetid                                                                                                                            663452
url                                                                                                                                    []
Na

Name: 8531, dtype: object


8532
sentiment                                                                                                                              1
message      RT @LorenRaeDeJ: No. Adm Mullen has been saying climate change threat since Bush admin. DOD has prioritized for decades.  …
tweetid                                                                                                                           594043
url                                                                                                              ['https://t.co/7sqI1…']
Name: 8532, dtype: object


8533
sentiment                                                                           -1
message      good morning everyone except not those annoying orange climate change ppl
tweetid                                                                         101252
url                                                                                 []
Name: 8533, dtype: object


8534
senti


8792
sentiment                                                                                                                                               1
message      RT @EricHolthaus: To me, our emotional/psychological response is *the* story on climate change. It defines how (and if) we will solve the p…
tweetid                                                                                                                                              6658
url                                                                                                                                                    []
Name: 8792, dtype: object


8793
sentiment                                                                   0
message      Hopefully your climate changes to 1488Ã‚Â° in the near future.  
tweetid                                                                126630
url                                               ['https://t.co/SpBYhp3nUV']
Name: 8793, dtype: object


8794


Name: 9075, dtype: object


9076
sentiment                                                                                     -1
message      RT @GajaPolicy: Global warmists brace for snow dump on climate change narrative    
tweetid                                                                                    14989
url                                       ['https://t.co/HcuwYRuhTA', 'https://t.co/BMv91WERye']
Name: 9076, dtype: object


9077
sentiment                                                                                                                       2
message      RT @RAKingham: How disastrous would climate change be for peace and security? | ABC Radio Australia   via @sharethis
tweetid                                                                                                                    278410
url                                                                                                   ['https://t.co/OVFiO3pxtg']
Name: 9077, dtype: object


9328
sentiment                                                                                                                                              1
message      @RICgallery @MCOcreate @JustinTrudeau needs to come to the US and smack some sense into Donald Trump. He believes climate change is a hoax.
tweetid                                                                                                                                           502585
url                                                                                                                                                   []
Name: 9328, dtype: object


9329
sentiment                                                                                        2
message      RT @RogueNASA: Energy Department climate office bans use of phrase ‘climate change’  
tweetid                                                                                      87463
url                                                  

Name: 9614, dtype: object


9615
sentiment                                                                                                                                     -1
message      RT @omnologos: Hypocrisy never far from alarmists. @EdwardJDavey now wants debate, when climate change minister he complained abou…
tweetid                                                                                                                                   453263
url                                                                                                                                           []
Name: 9615, dtype: object


9616
sentiment                                                                                                                                             1
message      RT @SenSanders: We have a president-elect who doesn't believe in climate change. Millions of people are going to have to say: Mr. TÃ¢â‚¬Â¦
tweetid                                           

9873
sentiment                                                         1
message      But, no one believes in climate change still ����‍♀️  
tweetid                                                      940503
url                                     ['https://t.co/rM7rIyHhkT']
Name: 9873, dtype: object


9874
sentiment                                                                                                                                         1
message      RT @ElinVidevall: The Swedish government took the opportunity to mock Trump with this picture when signing a law about climate change…
tweetid                                                                                                                                       37155
url                                                                                                                                              []
Name: 9874, dtype: object


9875
sentiment                                                        

Name: 10104, dtype: object


10105
sentiment                                                                                                                  1
message      RT @dellcam: NYTimes was leaked climate change report @realDonaldTrump would've undoubtedly tried to cover up:…
tweetid                                                                                                               364597
url                                                                                                                       []
Name: 10105, dtype: object


10106
sentiment                                                                                                                          1
message      @realDonaldTrump Lots of empty seats. Big excitement in DC. Over 200,000 people took to the streets for climate change.
tweetid                                                                                                                       590197
url                            

Name: 10334, dtype: object


10335
sentiment                                                                                                               2
message      RT @Jezebel: Energy Department won't disclose names of employees who worked on climate change to Trump team…
tweetid                                                                                                            603357
url                                                                                                                    []
Name: 10335, dtype: object


10336
sentiment                                                                                                                              1
message      RT @CitiesSun: Civic engagement vodeo on climate change @RichardMunang @HElHaiteCop22 @UNEP   #COP22 @NiliMajumder @Ã¢â‚¬Â¦
tweetid                                                                                                                           118177
url                            

Name: 10549, dtype: object


10550
sentiment                                                                                                                                    -1
message      @Heritage Let me guess! Subsidies, 'entitlement' programs, propping up third world dictators, oh yea, and the climate change hoax!
tweetid                                                                                                                                  269112
url                                                                                                                                          []
Name: 10550, dtype: object


10551
sentiment                                                                                                                                                    1
message      RT @kurteichenwald: China is now explaining the history of climate change science/politics 2 underscore to Trump they didnt do a hoax. GodÃ¢â‚¬Â¦
tweetid                             

10795
sentiment                                                                    2
message      Synthetic grass 'to replace garden lawns' due to climate change  
tweetid                                                                 936802
url                                                ['https://t.co/f1zqjWTohp']
Name: 10795, dtype: object


10796
sentiment                                                                                         1
message      Deniers were wrong about a 'pause'. And climate change could be about to accelerate   
tweetid                                                                                      391843
url                                                                     ['https://t.co/owSaCt8F8P']
Name: 10796, dtype: object


10797
sentiment                                                                                                                0
message      RT @AIFam16: Negative thinking destroys your brain cells and causes glob

Name: 11053, dtype: object


11054
sentiment                                                                                                                         2
message      RT @makGauBalak: India diverts Rs 56,700 crore from the fight against climate change to Goods and Service Tax regime  
tweetid                                                                                                                      648244
url                                                                                                     ['https://t.co/sbNy1AysbR']
Name: 11054, dtype: object


11055
sentiment                                                                                              -1
message      RT @PrisonPlanet: Obama uses private jet, 14 car convoy to get to 'climate change' speech.  
tweetid                                                                                            154451
url                                                                           ['http

11303
sentiment                                                                                                                                          1
message      still can't get over the fact that someone who doesn't believe in climate change holds one of the most powerful positions on the planet
tweetid                                                                                                                                       594153
url                                                                                                                                               []
Name: 11303, dtype: object


11304
sentiment                                                                                                                             0
message      RT @nctdailyjokes: johnny: girls are so hot\njohnny: boys are so hot\njohnny: why is everyone so hot?\nten: global warming
tweetid                                                                                    

Name: 11575, dtype: object


11576
sentiment                                                                                                                                     1
message      @neiltyson And somewhere out there a genius climate change denier will think 'then how come it's snowing instead of being hot?!' 🙄
tweetid                                                                                                                                   24121
url                                                                                                                                          []
Name: 11576, dtype: object


11577
sentiment                                                                                                                      1
message      Government has failed \n'Australian business woefully unprepared for climate change post Paris agreement' #auspol  
tweetid                                                                                         

sentiment                                                                                           1
message      Think narendramodi takes climate change seriously? Think again. #GST #ParisClimateDeal  
tweetid                                                                                         78502
url                                                                       ['https://t.co/CV9mlCryP1']
Name: 11817, dtype: object


11818
sentiment                                                                                                                                                       1
message      RT @1followernodad: parent: I'd do anything for my children!\n\nScientist: here's how to stave off climate change so your children can stay oÃ¢â‚¬Â¦
tweetid                                                                                                                                                    345949
url                                                                    

Name: 12046, dtype: object


12047
sentiment                                                                                         1
message      And white people elected a man who doesn't believe in global warming. Remember that.  
tweetid                                                                                      509453
url                                                                     ['https://t.co/3cezFpZ55p']
Name: 12047, dtype: object


12048
sentiment                                                                                                  1
message      It does not cost more to deal with climate change. It costs more to ignore it'. ~ John Kerry…  
tweetid                                                                                               319033
url                                                                              ['https://t.co/q77tXtPiHj']
Name: 12048, dtype: object


12049
sentiment                                                  

Name: 12310, dtype: object


12311
sentiment                                                                                                                                                     2
message      RT @thenation: Pentagon officials now consider climate change in every decision, from readying troops for battle to testing weapons https:/Ã¢â‚¬Â¦
tweetid                                                                                                                                                  137102
url                                                                                                                                                          []
Name: 12311, dtype: object


12312
sentiment                                                                                                                                           1
message      RT @BrookingsInst: The most effective intervention to climate change isn’t in the Paris Agreement, write @RebeccaWinthrop &amp;

Name: 12567, dtype: object


12568
sentiment                                                                                                                                               1
message      RT @nav_bhangu: You really wanna tell me why we elected a president who thinks climate change is a hoax, yet its 60+ degrees in NOVEMBER????
tweetid                                                                                                                                             54274
url                                                                                                                                                    []
Name: 12568, dtype: object


12569
sentiment                                                                                                                             1
message      RT @SethMacFarlane: Are we looking at an America that officially believes climate change is a hoax? Sorry, everybody else.
tweetid                                   

Name: 12824, dtype: object


12825
sentiment                                                                                                               0
message      Gonna go out on a limb here and assume these science buffs believe that global warming doesn't exist either.
tweetid                                                                                                            942076
url                                                                                                                    []
Name: 12825, dtype: object


12826
sentiment                                                                                                                                                  1
message      Do u know? Pak stands in top 10most vulnerable climate change countries!We need some serious mitigation &amp; adaption strategy @PUANConference
tweetid                                                                                                                         

13083
sentiment                                                                 2
message      RT @RobGeog: The 10 species most at risk from climate change  
tweetid                                                              198264
url                                             ['https://t.co/k1WN6CDi3n']
Name: 13083, dtype: object


13084
sentiment                                                                                                                                   1
message      RT @StephenSchlegel: she's thinking about how she's going to die because your husband doesn't believe in climate change  Ã¢â‚¬Â¦
tweetid                                                                                                                                102166
url                                                                                                            ['https://t.co/SjoFoNÃ¢â‚¬Â¦']
Name: 13084, dtype: object


13085
sentiment                                           

Name: 13330, dtype: object


13331
sentiment                                                                                                                       2
message      Pentagon video about the future of cities predicts inequality, climate change, scarcity, crumbling infrastructure:  
tweetid                                                                                                                    801251
url                                                                                                   ['https://t.co/A6DsoGKbXD']
Name: 13331, dtype: object


13332
sentiment                                                                                                                          -1
message      RT @don_holleran: Tree-huggin' hippie! Man walking BAREFOOT across America to protest climate change, 'save the earth'… 
tweetid                                                                                                                        490176
url     

sentiment                                                                                                  0
message      RT @IndieWire: Watch Leonardo DiCaprio's climate change doc #BeforeTheFlood for free online    
tweetid                                                                                               112708
url                                                   ['https://t.co/g3iUV8yU0u', 'https://t.co/LVXS17ILSn']
Name: 13568, dtype: object


13569
sentiment                                                                                                                                         1
message      RT @ApplegateCA49: Another problem is that the Republican Party is ignoring climate change while simultaneously taking money from the…
tweetid                                                                                                                                      688740
url                                                                                  

Name: 13808, dtype: object


13809
sentiment                              1
message        global warming is real.  
tweetid                           444294
url          ['https://t.co/KYwPD10vH2']
Name: 13809, dtype: object


13810
sentiment                                                              2
message      RT @FT: Saudi Arabia will stick to climate change pledges  
tweetid                                                           367403
url                                          ['https://t.co/Cq2sVO7X1f']
Name: 13810, dtype: object


13811
sentiment                                                                                            1
message      RT @Europarl_EN: The EU will keep leading the fight on climate change #ParisAgreement    
tweetid                                                                                         699229
url                                             ['https://t.co/pTkFmC0TwA', 'https://t.co/YN3PPDGgTR']
Name: 13811, dtype: object


Name: 14072, dtype: object


14073
sentiment                                                                                                                 2
message      New post: Rapid decline of Arctic sea ice a combination of climate change and (University of Washington) The  
tweetid                                                                                                              120890
url                                                                                             ['https://t.co/n8Vp8jroJr']
Name: 14073, dtype: object


14074
sentiment                                                                                                                                             1
message      RT @indyfromspace: Just remember that science gave us the exact date, time, and location of the #eclipse + tells us climate change is real
tweetid                                                                                                                           

14314
sentiment                                                                                                               1
message      Day 94: Hot air from @POTUS constantly switching positions contributing to global warming? #climatechange…  
tweetid                                                                                                             78431
url                                                                                           ['https://t.co/F9cGbSyrFp']
Name: 14314, dtype: object


14315
sentiment                                                                                                                         2
message      RT @sarahkendzior: EPA shutting down program that helps states and localities adapt to the effects of climate change  
tweetid                                                                                                                      734457
url                                                                        

Name: 14535, dtype: object


14536
sentiment                                                                                                              2
message      Governors push back against #Trump's plan to cut funding to fight #climate change: KOMO News   #environment
tweetid                                                                                                           575261
url                                                                                          ['https://t.co/zpGPjJb8zL']
Name: 14536, dtype: object


14537
sentiment                                                                                                                                               1
message      RT @altNOAA: Two words that @realDonaldTrump or @FEMA_Brock has 100% failed to use in this crisis - climate change. You cannot hide from it!
tweetid                                                                                                                                   

sentiment                                                                                                                        2
message      The global seed vault in Spitsbergen is threatened by climate change. Piqd by #AndreaChu. Article by @dpcarrington…  
tweetid                                                                                                                     957691
url                                                                                                    ['https://t.co/lcEhKMhvNU']
Name: 14726, dtype: object


14727
sentiment                                                                                                                                     1
message      RT @ajplus: With no climate change leadership in the White House, can entrepreneurs and green energy companies redeem the U.S.?  …
tweetid                                                                                                                                  263169
url      

Name: 14987, dtype: object


14988
sentiment                                                                                                                                               1
message      RT @joshgad: The mourning stage is over. Now we fight. Putting a climate change denier as head of EPA is an act of war on our kids. #StandUp
tweetid                                                                                                                                             88476
url                                                                                                                                                    []
Name: 14988, dtype: object


14989
sentiment                                                                                       2
message      EPA chief: Carbon dioxide not 'primary contributor' to climate change @CNNPolitics  
tweetid                                                                                    186608
url                 

sentiment                                                                                                                                                   0
message      ÃƒÂ¨ online la nuova versione de #ilfoglio si proprio quello il cui direttore dice che il global warming non esiste perchÃƒÂ¨ a Cortina si scia.
tweetid                                                                                                                                                122061
url                                                                                                                                                        []
Name: 15259, dtype: object


15260
sentiment                                                                                                                                                       1
message      RT @1followernodad: parent: I'd do anything for my children!\n\nScientist: here's how to stave off climate change so your children can stay oÃ¢â‚¬Â¦
tweetid  

sentiment                                                                           1
message      Can Australia's wicked heat wave convince climate change deniers? - DW  
tweetid                                                                        240595
url                                                       ['https://t.co/5KFhn7s7kH']
Name: 15486, dtype: object


15487
sentiment                                                                                                                                1
message      RT @ClimateReality: Watch @Schwarzenegger go to the frontlines to learn about the links between climate change and wildfires…
tweetid                                                                                                                             734632
url                                                                                                                                     []
Name: 15487, dtype: object


15488
sentiment                     

sentiment                                                                                                                            0
message      RT @stinson: All those 'After Trump, China takes the lead in fighting climate change' pieces have to confront reality.  …
tweetid                                                                                                                         240504
url                                                                                                          ['https://t.co/xzYW6IK…']
Name: 15765, dtype: object


15766
sentiment                                          0
message      @Cerb32 Are you a global warming shill?
tweetid                                       277693
url                                               []
Name: 15766, dtype: object


15767
sentiment                                                                                                                                 0
message      RT @Advil: so @pitbull ju

In [3]:
df.head()

Unnamed: 0,sentiment,message,tweetid
0,1,"PolySciMajor EPA chief doesn't think carbon dioxide is main cause of global warming and.. wait, what!? https://t.co/yeLvcEFXkC via @mashable",625221
1,1,It's not like we lack evidence of anthropogenic global warming,126103
2,2,RT @RawStory: Researchers say we have three years to act on climate change before it’s too late https://t.co/WdT0KdUr2f https://t.co/Z0ANPT…,698562
3,1,#TodayinMaker# WIRED : 2016 was a pivotal year in the war on climate change https://t.co/44wOTxTLcD,573736
4,1,"RT @SoyNovioDeTodas: It's 2016, and a racist, sexist, climate change denying bigot is leading in the polls. #ElectionNight",466954


In [4]:
# Create new column, data = linkedembedded urls from message column

def extract_urls(string):
    url_pattern = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
    url = re.findall(url_pattern, string)      
    return str([x[0] for x in url])

df['url']  = df['message'].apply(extract_urls)

In [5]:
url_pattern = r'http[s]?://(?:[A-Za-z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9A-Fa-f][0-9A-Fa-f]))+'
df['message'] = df['message'].replace(to_replace = url_pattern, value = r' ', regex = True)
df

Unnamed: 0,sentiment,message,tweetid,url
0,1,"PolySciMajor EPA chief doesn't think carbon dioxide is main cause of global warming and.. wait, what!? via @mashable",625221,['https://t.co/yeLvcEFXkC']
1,1,It's not like we lack evidence of anthropogenic global warming,126103,[]
2,2,RT @RawStory: Researchers say we have three years to act on climate change before it’s too late …,698562,"['https://t.co/WdT0KdUr2f', 'https://t.co/Z0ANPT…']"
3,1,#TodayinMaker# WIRED : 2016 was a pivotal year in the war on climate change,573736,['https://t.co/44wOTxTLcD']
4,1,"RT @SoyNovioDeTodas: It's 2016, and a racist, sexist, climate change denying bigot is leading in the polls. #ElectionNight",466954,[]
...,...,...,...,...
15814,1,"RT @ezlusztig: They took down the material on global warming, LGBT rights, and health care. But now they're hocking Melania's QVC. https://…",22001,[]
15815,2,RT @washingtonpost: How climate change could be breaking up a 200-million-year-old relationship,17856,['https://t.co/rPFGvb2pLq']
15816,0,notiven: RT: nytimesworld :What does Trump actually believe about climate change? Rick Perry joins other aides in …,384248,['https://t.co/0Mp2']
15817,-1,RT @sara8smiles: Hey liberals the climate change crap is a hoax that ties to #Agenda2030.\nThe Climate is Being Changed byÃ¢â‚¬Â¦,819732,[]


In [7]:
#why is there a url in message column?!
df['message'][15814]

"RT @ezlusztig: They took down the material on global warming, LGBT rights, and health care. But now they're hocking Melania's QVC. https://…"

In [None]:
df['url'] = df['url'].astype(str).str[1:-1]

In [None]:
df['url'] = df['url'].str.replace("'", "")

In [None]:
df

In [10]:
# Extract sentiment information from urls. i.e. web page titles.

def extract_web_title(url):
    if len(url) > 0:
        params = {'headers':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0'}
        get_url = requests.get(url, headers=params) # Sends a GET request
        url_text = get_url.text
        return url_text[url_text.find('<title>') + 7 : url_text.find('</title>')]

In [12]:
extract_web_title('https://t.co/yeLvcEFXkC')

'EPA chief denies carbon dioxide is main cause of global warming and... wait, what!?'

In [11]:
df['url'] = df['url'].apply([extract_web_title(x) for x in df['url']])

InvalidSchema: No connection adapters were found for "['https://t.co/yeLvcEFXkC']"

In [None]:
extract_web_title("https://t.co/yeLvcEFXkC")

#### Clean data

In [None]:
# Remove special characters

def clean_data(tweet):
    pattern_url = r'http[s]?://(?:[A-Za-z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9A-Fa-f][0-9A-Fa-f]))+'
    tweet = re.sub(pattern_url, '', tweet) 
    tweet = re.sub(r'@[A-Za-z0-9]+', '', tweet)
    tweet = re.sub(r'#', '', tweet)
    tweet = re.sub(r'RT[\s]+', '', tweet)
    return tweet

In [None]:
df['clean_tweet'] = df['message'].apply(clean_data)

In [None]:
# Remove punctuation

def remove_punctuation(tweet):
    return ''.join([l for l in tweet if l not in string.punctuation])

In [None]:
df['clean_tweet'] = df['clean_tweet'].apply(remove_punctuation)

In [None]:
# Make all the text lower case to remove some noise from capitalisation

def remove_cap(tweet):
    return tweet.lower()

df['clean_tweet'] = df['clean_tweet'].apply(remove_cap)

In [None]:
tokeniser = TreebankWordTokenizer()
df['clean_tweet'] = df['clean_tweet'].apply(tokeniser.tokenize)

In [None]:
df.head()

In [None]:
# Lemmetize the words in the dataframe

def lemma(words, lemmatizer):
    return [lemmatizer.lemmatize(word) for word in words]   

In [None]:
lemmatizer = WordNetLemmatizer()
df['clean_tweet'] = df['clean_tweet'].apply(lemma, args=(lemmatizer, ))

In [None]:
df.head()

In [None]:
df['clean_tweet'] = df['clean_tweet'].map

In [None]:
# Remove stopwords

#def remove_stop_words(tokens):    
#    return [t for t in tokens if t not in stopwords.words('english')]

#df['clean_tweet'] = df['clean_tweet'].apply(remove_stop_words)

#### Data Cleaning and Formatting Summary

- 

<a id="four"></a>
## 4. Exploratory Data Analysis (EDA)
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns, to spot anomalies, to check assumptions with the help of summary statistics and graphical representations.
The following section analyses and provides an overview of the given data

In [None]:
# Visualize the frequent words
#all_words = " ".join([sentence for sentence in df['clean_tweet']])
#df['clean_tweet'] = ''.join(map(str, df['clean_tweet']))
#wordcloud = WordCloud(width=800, height=500, random_state=42, max_font_size=100).generate(df['clean_tweet'])
df
# plot the graph
#plt.figure(figsize=(15,8))
#plt.imshow(wordcloud, interpolation='bilinear')
#plt.axis('off')
#plt.show()

#### Key Insights

- 


<a id="five"></a>
## 5. Data Engineering
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

Create targets and features dataframes then seperate the test from the train data set.

In [None]:
# feature extraction

vector = CountVectorizer(max_df=0.90, min_df=2, max_features=1000, stop_words=None, ngram_range=(1, 3)) #max_df=0?
X = vector.fit_transform(df['message'])
y = df['sentiment']

In [None]:
# create targets and features dataset
#y =  df['sentiment']
#X = df.drop('sentiment', axis=1)

X = our features or independant variables (IVs). These will be used to predict our depedant variable. 

Y = dependant/target variable is also known as the dependent variable (DV) and is the target variable we want to predict.

In [None]:
# split the train data further into train/test data (to perform validation before bringing in the true unseen test data)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=50, shuffle=False)

### Feature Scaling

In [None]:
#scaler = preprocessing.MinMaxScaler()
#X_scaled = scaler.fit_transform(X)

<a id="six"></a>
## 6. Modelling
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

This section takes us through the machine learning process. We train and test a number of regression model algorithms and later select the model with the best performance to be used in this project. From the five modeling techniques, we compare the RMSE values of each model as well as the time taken to train and test each model. This will inform our model selection decision.

#### Model 1 - Logistic Regression

In [None]:
# Training
model = LogisticRegression()
model.fit(X_train, y_train)

#### Model 2

#### Model 3

#### Model 4 

<a id="six"></a>
## 6. Model Performance
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

In [None]:
pred = model.predict(X_test)

In [None]:
print('Classification Report')
print(classification_report(y_test, pred))

<a id="seven"></a>
## 7. Model Explanations
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>



##### XGBoost Regression

This section discusses the inner workings of the best performing model in a simple way.

## Conclusion

Given two datasets, train and test data, we were tasked with following the data science process. 

We first set out to understand the data and it's space, in other words, we set out to understand the electricity shortfall in Spain, the various variables that may or may not be correlated to the load shortfall in Spain, understanding the reasons behind the correlations or a lack thereof etc. We were presented with a lot of information and our first task was making sure we had a good understanding of the relevant things coming from the data. 

In our data, we saw null values we had to impute, we encountered unusable data types we needed to transfrom, and data falling in large ranges which we had to scale. All of this formed part of the iterative process of Data Preprocessing, Exploratorty Data Analysis and Data Engineering.

Essentially, we set out to understand and transform the data so that we may build an appropriate model that would best predict the load shortfall. 

We then built a few models then selected the right model out of these models, following an iterative train-validation process. The main model performance metric was the Root Mean Squared Error (RMSE), looking for the model that produced the lowerst RMSE value.

At this point, we have addressed the problem statement. In future, when given various cirsumstances (predictor observations) we are now able to predict the corresponding load shortfall with an average error (RMSE) of approximately 4300 (same units as the predicted variable, i.e. load shortfall).

#### Kaggle submission file
This section creates the Kaggle submission file in csv format.

In [None]:
df.head()

In [None]:
df_test.head()

In [None]:
x_train_final = df['message']
x_test_final = df_test['message']

In [None]:
vector = CountVectorizer(max_df=0.90, min_df=2, max_features=1000, stop_words=None, ngram_range=(1, 3)) #max_df=0?
x_train_final = vector.fit_transform(df['message'])
x_test_final = vector.fit_transform(df_test['message'])
y = df['sentiment']

In [None]:
model.fit(x_train_final, y)

In [None]:
predict_final = model.predict(x_test_final)

In [None]:
daf = pd.DataFrame(predict_final, columns=['sentiment'])
daf.head()

In [None]:
df_test_final = pd.read_csv('test_with_no_labels.csv')

In [None]:
output = pd.DataFrame({"TweetID":df_test_final['tweetid']})
submission = output.join(daf)        
submission.to_csv("submission.csv", index=False)

In [None]:
submission