# Sentiment Analysis on Stocktwist Tweets

## Objective

In this notebook, we goal to build a sentiment analysis model. The dataest was scraped from https://stocktwits.com/, with FAANG-stocks-related tweets. The tweets' author are the ones who labeled it with either bearish or bullish, whcih obviously concludes the sentiment at the time of its writing. 

## Columns Description 


-    id : The tweet ID on the website

-    text: The tweet text body

-    time: The time the tweet was posted

-    sentiment: Bearish (-) or Bullish (+)

## Data Wrangling

### Importing Libraries

In [None]:
!pip install utils

Collecting utils
  Downloading utils-1.0.1-py2.py3-none-any.whl (21 kB)
Installing collected packages: utils
Successfully installed utils-1.0.1


In [None]:
!pip install unidecode

Collecting unidecode
  Downloading Unidecode-1.3.4-py3-none-any.whl (235 kB)
[?25l[K     |█▍                              | 10 kB 28.7 MB/s eta 0:00:01[K     |██▉                             | 20 kB 31.0 MB/s eta 0:00:01[K     |████▏                           | 30 kB 31.4 MB/s eta 0:00:01[K     |█████▋                          | 40 kB 18.7 MB/s eta 0:00:01[K     |███████                         | 51 kB 15.3 MB/s eta 0:00:01[K     |████████▍                       | 61 kB 17.3 MB/s eta 0:00:01[K     |█████████▊                      | 71 kB 16.4 MB/s eta 0:00:01[K     |███████████▏                    | 81 kB 17.4 MB/s eta 0:00:01[K     |████████████▌                   | 92 kB 18.9 MB/s eta 0:00:01[K     |██████████████                  | 102 kB 18.4 MB/s eta 0:00:01[K     |███████████████▎                | 112 kB 18.4 MB/s eta 0:00:01[K     |████████████████▊               | 122 kB 18.4 MB/s eta 0:00:01[K     |██████████████████              | 133 kB 18.4 MB/s e

In [None]:
!pip install contractions

Collecting contractions
  Downloading contractions-0.1.72-py2.py3-none-any.whl (8.3 kB)
Collecting textsearch>=0.0.21
  Downloading textsearch-0.0.21-py2.py3-none-any.whl (7.5 kB)
Collecting pyahocorasick
  Downloading pyahocorasick-1.4.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (106 kB)
[K     |████████████████████████████████| 106 kB 32.4 MB/s 
[?25hCollecting anyascii
  Downloading anyascii-0.3.1-py3-none-any.whl (287 kB)
[K     |████████████████████████████████| 287 kB 61.3 MB/s 
[?25hInstalling collected packages: pyahocorasick, anyascii, textsearch, contractions
Successfully installed anyascii-0.3.1 contractions-0.1.72 pyahocorasick-1.4.4 textsearch-0.0.21


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
import seaborn as sns
from sklearn.metrics import classification_report

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import nltk
import re
import contractions
import unidecode
# To plot visualizations inline with the notebook
%matplotlib inline

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Loading Data

In [None]:
df= pd.read_csv('/content/drive/MyDrive/data/FAANG.csv')
data= df.copy()
df.head(5)

Unnamed: 0,symbol,message,datetime,user,message_id,Date,Time,label
0,AAPL,qq next 60min confirm start rally aapl coming ...,2015-12-21 18:37:24,191996.0,47148173.0,2015-12-21,18:37:24,1
1,AAPL,aapl watching gap fill 169 20,2018-11-24 07:02:32,1665234.0,146068732.0,2018-11-24,07:02:32,1
2,AAPL,aapl weekly options gamblers lose,2014-07-22 21:48:13,71738.0,24904954.0,2014-07-22,21:48:13,1
3,AAPL,aapl,2020-01-27 07:07:03,1229493.0,191978042.0,2020-01-27,07:07:03,0
4,AAPL,key levels watch aapl,2014-06-27 15:19:47,106412.0,24190263.0,2014-06-27,15:19:47,1


In [None]:
df.shape

(2566858, 8)

In [None]:
df.rename(columns = {'label':'old_label'}, inplace = True)

### NULL Values

In [None]:
df.drop(columns=['message_id', 'user'], inplace=True)

In [None]:
df=df.dropna()
df.shape

(2566858, 6)

In [None]:
df=df[df.duplicated()==False]
df.shape

(2542198, 6)

### Text Processing




#### Text Cleaning & Contractions Exanding

In [None]:
#adding new contractions to the contractions list which is already here
# https://github.com/kootenpv/contractions/blob/master/contractions/data/contractions_dict.json
contractions.add('isnt', 'is not')
contractions.add('arent', 'are not')
contractions.add('doesnt', 'does not')
contractions.add('dont', 'do not')
contractions.add('didnt', 'did not')
contractions.add('cant', 'can not')
contractions.add('couldnt', 'could not')
contractions.add('hadnt', 'had not')
contractions.add('hasnt', 'has not')
contractions.add('havenot', 'have not')
contractions.add('shouldnt', 'should not')
contractions.add('wasnt', 'was not')
contractions.add('werent', 'were not')
contractions.add('wont', 'will not')
contractions.add('wouldnt', 'would not')
contractions.add('cannot', 'can not')
contractions.add('can\'t', 'can not')
contractions.add( "can't've", "can not have")

In [None]:
def preprocess(doc):
    doc = unidecode.unidecode(doc) # transliterates any unicode string into the closest possible representation in ascii text.
    doc = contractions.fix(doc) # expands contractions                   
    doc = re.sub('[\t\n]', ' ', doc) # remove newlines and tabs
    doc = re.sub(r'@[A-Za-z0-9_]+', '', doc) # remove mentions
    doc = re.sub(r'#[A-Za-z0-9_]+', '', doc) #remove hashtags
    doc = re.sub(r'https?://[^ ]+', '', doc)
    doc = re.sub(r'www.[^ ]+', '', doc)
    doc = re.sub('[^A-Za-z]+', ' ', doc) # remove all characters other than alphabet
    doc = re.sub(' +', ' ', doc) # substitute any number of space with one space only
    doc = doc.strip().lower() # remove spaces from begining and end and lower the text
    return doc

In [None]:
df['processed'] = df['message'].apply(preprocess)

In [None]:
df['segmented'] = df['processed'].apply(lambda x: x.split()) 

#### Stemming and lemmatization

It is the process of reducing the derived words to their roots to be easier to be handled and embedded

In [None]:
from nltk.corpus import wordnet
# Map pos tag from nltk library to characeters accepted by the wordnet Lemmatizer to understand word's POS 
def get_wordnet_pos(word): 
    tag = nltk.pos_tag([word])[0][1][0].upper()
    tag_dict = {"J": wordnet.ADJ,
                "N": wordnet.NOUN,
                "V": wordnet.VERB,
                "R": wordnet.ADV}

    return tag_dict.get(tag, wordnet.NOUN)

In [None]:
from nltk.stem import WordNetLemmatizer
# Lemmatize all words in a list of words using their POS
def lemmatizerHelper(words):
    lemmatizer = WordNetLemmatizer()
    l = []
    for w in words:
        l.append(lemmatizer.lemmatize(w , get_wordnet_pos(w)))
    return l

In [None]:
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [None]:
df['stemmed'] = df['segmented'].apply(lemmatizerHelper) # stemming the words


In [None]:
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

In [None]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

#### Stop Words

In [None]:
from string import ascii_lowercase

stop_words = set(nltk.corpus.stopwords.words('english'))
exclude_words = set(("not", "no"))
new_stop_words = stop_words.difference(exclude_words)

# adding single characters to new_stop_words
for c in ascii_lowercase:
    new_stop_words.add(c)

In [None]:
df['stopRemoved'] = df['stemmed'].apply(lambda words: [word for word in words if word not in new_stop_words])

#### Tokenization

In [None]:
negationWords = ['not', 'no', 'never']

# A function that replaces negationWords in a tokenized array with not concatenated with the next nonNegation word (bigram but conctenated)
# for example ['never', no', 'not', 'happy', 'journey'] will be ['nothappy', 'jo']
def bigramNegationWords(words):
    l = []
    metNegation = False
    bigram = ''
    for w in words:
        if w in negationWords:
            if metNegation == False:
                bigram += 'not'
                metNegation = True
            else:
                continue
        else:
            if metNegation == True:
                bigram += w
                l.append(bigram)
                metNegation = False
                bigram = ''
            else:
                l.append(w)
    return l


In [None]:
df['negated'] = df['stopRemoved'].apply(bigramNegationWords)


In [None]:
df=df[df['negated'].map(lambda d: len(d)) > 1]

In [None]:
def convToDict(words):
    freq= dict()
    for word in words:
        if word== 'amzn' or word== 'fb' or word=='goog' or word=='qq' or word=='aapl' or word=='nflx':
          continue
        if word in freq:
            freq[word] +=1
        else:
            freq[word] = 1
    return freq


In [None]:
df['words'] = df['negated'].apply(convToDict)

In [None]:
df.head()

Unnamed: 0,symbol,message,datetime,Date,Time,old_label,processed,segmented,stemmed,stopRemoved,negated,words
0,AAPL,qq next 60min confirm start rally aapl coming ...,2015-12-21 18:37:24,2015-12-21,18:37:24,1,qq next min confirm start rally aapl coming al...,"[qq, next, min, confirm, start, rally, aapl, c...","[qq, next, min, confirm, start, rally, aapl, c...","[qq, next, min, confirm, start, rally, aapl, c...","[qq, next, min, confirm, start, rally, aapl, c...","{'next': 1, 'min': 1, 'confirm': 1, 'start': 1..."
1,AAPL,aapl watching gap fill 169 20,2018-11-24 07:02:32,2018-11-24,07:02:32,1,aapl watching gap fill,"[aapl, watching, gap, fill]","[aapl, watch, gap, fill]","[aapl, watch, gap, fill]","[aapl, watch, gap, fill]","{'watch': 1, 'gap': 1, 'fill': 1}"
2,AAPL,aapl weekly options gamblers lose,2014-07-22 21:48:13,2014-07-22,21:48:13,1,aapl weekly options gamblers lose,"[aapl, weekly, options, gamblers, lose]","[aapl, weekly, option, gambler, lose]","[aapl, weekly, option, gambler, lose]","[aapl, weekly, option, gambler, lose]","{'weekly': 1, 'option': 1, 'gambler': 1, 'lose..."
4,AAPL,key levels watch aapl,2014-06-27 15:19:47,2014-06-27,15:19:47,1,key levels watch aapl,"[key, levels, watch, aapl]","[key, level, watch, aapl]","[key, level, watch, aapl]","[key, level, watch, aapl]","{'key': 1, 'level': 1, 'watch': 1}"
5,AAPL,aapl loads cash hand great service business lo...,2018-11-01 23:39:14,2018-11-01,23:39:14,1,aapl loads cash hand great service business lo...,"[aapl, loads, cash, hand, great, service, busi...","[aapl, load, cash, hand, great, service, busin...","[aapl, load, cash, hand, great, service, busin...","[aapl, load, cash, hand, great, service, busin...","{'load': 1, 'cash': 1, 'hand': 1, 'great': 1, ..."


In [None]:
df.to_csv("stop.csv")

In [None]:
import pandas as pd
df= pd.read_csv('/content/stop.csv')

In [None]:
import ast

df['words'] = df['words'].apply(lambda x: ast.literal_eval(x))


In [None]:
d=df.iloc[2400000:,:]

#### Glove Embedding

Word embedding pre trained Glove and all similar embedding models aim to overcome the dimensionality limitation, dealing with each word as a feature which is impossible for training due to the memory limitations besides ignoring the words' context and their relations,  by representing each word in a dense, low-dimension, continuous vector space. The objective of any word embedding model is to encode the context of the word and its relationship to other words in the corpus in the vector representation. Semantically and / or syntactically similar words should be close to each other in the embedding space.



In [None]:
import gensim.downloader as api
wv = api.load('glove-twitter-200')

In [None]:
def wvContains(word):
    try:
        x = wv[word]
        return True
    except KeyError:
        return False

In [None]:
def doc2vec(x): 
    word_dict = x
    sv = np.zeros(200)
    s_freq = 0
    for word, freq in word_dict.items():
        
        if wvContains(word):
            sv += (wv[word] * freq)
            s_freq += freq
        else:
            # If it doesn't contain the word, then it can be either our bigram that begins with not
            if word[0:3] == 'not' and word[0:7] != 'nothing':
                if wvContains(word[3:]):
                    sv += (wv[word[0:3]] +  wv[word[3:]]) * freq
                    s_freq += 2 * freq
                else:
                    end = 3
                    while (end > 1) and (not wvContains(word[end:])):
                        end += 1
                    sv += (wv[word[0:3]] +  wv[word[end:]]) * freq
                    s_freq += 2 * freq
            else:
                # Or it can be a word like
                # ummmm, loveee, omggg, ahhhhhhhhhhh
                # so, we remove the latest characters until wv recognizes it or we only have two characters left
                end = len(word)-1
                while (end > 1) and (not wvContains(word[0:end])):
                    end -= 1
                
                if wvContains(word[0:end]):
                    sv += (wv[word[0:end]] * freq)
                    s_freq += freq
    if s_freq != 0:
        return (1/s_freq) * sv
    else:
        return np.zeros(200)

In [None]:
d['Vec'] = d['words'].apply(doc2vec)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [None]:
columns_names = []
for i in range(200):
    columns_names.append('v_' + str(i))


In [None]:
d

Unnamed: 0.1,Unnamed: 0,symbol,message,datetime,Date,Time,old_label,processed,segmented,stemmed,stopRemoved,negated,words,Vec
2400000,2474669,NFLX,280 puts active nflx week,2019-09-10 18:53:23,2019-09-10,18:53:23,0,puts active nflx week,"['puts', 'active', 'nflx', 'week']","['put', 'active', 'nflx', 'week']","['put', 'active', 'nflx', 'week']","['put', 'active', 'nflx', 'week']","{'put': 1, 'active': 1, 'week': 1}","[0.26471332708994544, 0.13295367235938707, -0...."
2400001,2474670,NFLX,yy great timing nflx pops getting amzn ramp er...,2013-10-22 14:34:33,2013-10-22,14:34:33,0,yy great timing nflx pops getting amzn ramp er,"['yy', 'great', 'timing', 'nflx', 'pops', 'get...","['yy', 'great', 'timing', 'nflx', 'pop', 'get'...","['yy', 'great', 'timing', 'nflx', 'pop', 'get'...","['yy', 'great', 'timing', 'nflx', 'pop', 'get'...","{'yy': 1, 'great': 1, 'timing': 1, 'pop': 1, '...","[0.029629714787006378, 0.07460200041532516, -0..."
2400002,2474671,NFLX,nflx almost red hurry 39 cheap fade,2019-10-17 14:29:06,2019-10-17,14:29:06,1,nflx almost red hurry cheap fade,"['nflx', 'almost', 'red', 'hurry', 'cheap', 'f...","['nflx', 'almost', 'red', 'hurry', 'cheap', 'f...","['nflx', 'almost', 'red', 'hurry', 'cheap', 'f...","['nflx', 'almost', 'red', 'hurry', 'cheap', 'f...","{'almost': 1, 'red': 1, 'hurry': 1, 'cheap': 1...","[-0.11042977999895812, -0.22278900295495987, -..."
2400003,2474672,NFLX,nflx also worst loser open gt 50 gt 1m sh volume,2015-09-01 19:04:42,2015-09-01,19:04:42,0,nflx also worst loser open gt gt m sh volume,"['nflx', 'also', 'worst', 'loser', 'open', 'gt...","['nflx', 'also', 'bad', 'loser', 'open', 'gt',...","['nflx', 'also', 'bad', 'loser', 'open', 'gt',...","['nflx', 'also', 'bad', 'loser', 'open', 'gt',...","{'also': 1, 'bad': 1, 'loser': 1, 'open': 1, '...","[0.2027560027781874, 0.0752242561429739, -0.29..."
2400004,2474673,NFLX,nflx right 39 interested,2013-01-04 15:10:31,2013-01-04,15:10:31,0,nflx right interested,"['nflx', 'right', 'interested']","['nflx', 'right', 'interested']","['nflx', 'right', 'interested']","['nflx', 'right', 'interested']","{'right': 1, 'interested': 1}","[-0.01574750244617462, 0.024080000817775726, -..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2490240,2566853,NFLX,nflx sister owns kinda thinking telling sell b...,2019-01-11 20:51:22,2019-01-11,20:51:22,1,nflx sister owns kind of thinking telling sell...,"['nflx', 'sister', 'owns', 'kind', 'of', 'thin...","['nflx', 'sister', 'own', 'kind', 'of', 'think...","['nflx', 'sister', 'kind', 'think', 'tell', 's...","['nflx', 'sister', 'kind', 'think', 'tell', 's...","{'sister': 1, 'kind': 1, 'think': 1, 'tell': 1...","[-0.08799006789922714, 0.24729885134313787, -0..."
2490241,2566854,NFLX,nflx bought 123 shares think hit 175 tomorrow,2017-07-17 19:34:14,2017-07-17,19:34:14,1,nflx bought shares think hit tomorrow,"['nflx', 'bought', 'shares', 'think', 'hit', '...","['nflx', 'bought', 'share', 'think', 'hit', 't...","['nflx', 'bought', 'share', 'think', 'hit', 't...","['nflx', 'bought', 'share', 'think', 'hit', 't...","{'bought': 1, 'share': 1, 'think': 1, 'hit': 1...","[-0.24048399925231934, 0.20263120383024216, -0..."
2490242,2566855,NFLX,quot vsfinancials quot investcorrectly netflix...,2015-06-10 13:02:32,2015-06-10,13:02:32,1,quot vsfinancials quot investcorrectly netflix...,"['quot', 'vsfinancials', 'quot', 'investcorrec...","['quot', 'vsfinancials', 'quot', 'investcorrec...","['quot', 'vsfinancials', 'quot', 'investcorrec...","['quot', 'vsfinancials', 'quot', 'investcorrec...","{'quot': 2, 'vsfinancials': 1, 'investcorrectl...","[0.03935225556294123, 0.03956099801386396, -0...."
2490243,2566856,NFLX,mgt 32 million volume aapl 41 nflx 11 msft 8 31,2016-05-12 15:37:16,2016-05-12,15:37:16,0,mgt million volume aapl nflx msft,"['mgt', 'million', 'volume', 'aapl', 'nflx', '...","['mgt', 'million', 'volume', 'aapl', 'nflx', '...","['mgt', 'million', 'volume', 'aapl', 'nflx', '...","['mgt', 'million', 'volume', 'aapl', 'nflx', '...","{'mgt': 1, 'million': 1, 'volume': 1, 'msft': 1}","[0.1455872468650341, 0.10423700325191021, -0.1..."


In [None]:
ll = []
for i in range(len(d)):
    ll.append(d['Vec'].iloc[i])

In [None]:

dd = pd.DataFrame(ll, columns=columns_names)
dd.head()

Unnamed: 0,v_0,v_1,v_2,v_3,v_4,v_5,v_6,v_7,v_8,v_9,...,v_190,v_191,v_192,v_193,v_194,v_195,v_196,v_197,v_198,v_199
0,0.264713,0.132954,-0.00646,-0.04986,0.133807,-0.011167,0.597623,-0.08933,0.363961,0.235621,...,-0.1337,0.240207,0.062491,0.10345,-0.138041,0.08837,0.045853,0.054475,0.1837,-0.166913
1,0.02963,0.074602,-0.170545,0.02258,0.120418,-0.028043,0.143164,0.057698,0.073655,-0.183993,...,-0.130126,-0.322808,0.097581,-0.014098,-0.001201,-0.040933,0.401866,0.081739,0.03037,0.145732
2,-0.11043,-0.222789,-0.243266,0.006902,0.139992,-0.08453,0.85048,-0.151036,-0.221724,0.14162,...,0.096739,-0.038134,0.037926,0.079073,0.281922,-0.057446,0.086784,0.079761,-0.071045,0.009373
3,0.202756,0.075224,-0.296068,-0.025922,-0.058225,-0.014915,0.118075,-0.009478,-0.087972,0.141763,...,-0.227032,0.038666,0.020885,0.045871,-0.061893,-0.159254,0.018282,0.125994,-0.028411,0.0784
4,-0.015748,0.02408,-0.209634,-0.196135,-0.40158,0.213411,0.70846,0.113946,-0.033445,-0.0099,...,0.074279,0.07069,0.04411,-0.234288,-0.008565,-0.152487,0.306515,0.129835,0.30007,0.16041


In [None]:
dd

Unnamed: 0,v_0,v_1,v_2,v_3,v_4,v_5,v_6,v_7,v_8,v_9,...,v_190,v_191,v_192,v_193,v_194,v_195,v_196,v_197,v_198,v_199
0,0.264713,0.132954,-0.006460,-0.049860,0.133807,-0.011167,0.597623,-0.089330,0.363961,0.235621,...,-0.133700,0.240207,0.062491,0.103450,-0.138041,0.088370,0.045853,0.054475,0.183700,-0.166913
1,0.029630,0.074602,-0.170545,0.022580,0.120418,-0.028043,0.143164,0.057698,0.073655,-0.183993,...,-0.130126,-0.322808,0.097581,-0.014098,-0.001201,-0.040933,0.401866,0.081739,0.030370,0.145732
2,-0.110430,-0.222789,-0.243266,0.006902,0.139992,-0.084530,0.850480,-0.151036,-0.221724,0.141620,...,0.096739,-0.038134,0.037926,0.079073,0.281922,-0.057446,0.086784,0.079761,-0.071045,0.009373
3,0.202756,0.075224,-0.296068,-0.025922,-0.058225,-0.014915,0.118075,-0.009478,-0.087972,0.141763,...,-0.227032,0.038666,0.020885,0.045871,-0.061893,-0.159254,0.018282,0.125994,-0.028411,0.078400
4,-0.015748,0.024080,-0.209634,-0.196135,-0.401580,0.213411,0.708460,0.113946,-0.033445,-0.009900,...,0.074279,0.070690,0.044110,-0.234288,-0.008565,-0.152487,0.306515,0.129835,0.300070,0.160410
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
90240,-0.087990,0.247299,-0.183080,0.052172,-0.274659,0.173652,0.739991,0.088795,0.028619,0.020155,...,-0.035734,-0.026501,0.001091,-0.093426,-0.018038,-0.035438,0.335147,0.178616,-0.063469,-0.170219
90241,-0.240484,0.202631,-0.038354,-0.086466,-0.085950,-0.126009,0.761126,0.132512,-0.060203,-0.018399,...,-0.108705,0.003035,0.086636,-0.005193,-0.013035,-0.101702,0.122594,0.187395,0.280612,0.114742
90242,0.039352,0.039561,-0.095834,-0.090618,-0.120433,-0.058841,0.135815,-0.021365,0.079635,-0.269837,...,0.192453,-0.400697,0.196571,0.023006,-0.021819,0.130162,0.002788,-0.180794,0.047788,-0.178562
90243,0.145587,0.104237,-0.156570,0.187275,-0.119288,-0.118217,0.089074,0.009472,0.120813,-0.073668,...,0.182307,0.227796,0.344645,0.120832,0.142152,0.081576,0.109588,0.208395,0.038671,-0.045056


In [None]:
all_pred=[]

In [None]:
p= pd.read_csv('all_pred')
all_pred= p['0'].tolist()

In [None]:
import joblib
loaded_model = joblib.load('/content/drive/MyDrive/data/SentAnalysis_model.sav')
pred= loaded_model.predict(dd)

In [None]:
v= pd.DataFrame(pred)

In [None]:
v.value_counts()

1    71624
0    18621
dtype: int64

In [None]:
all_pred=all_pred + pred.tolist()
v= pd.DataFrame(all_pred)
v.to_csv('all_pred')

In [None]:
len(all_pred)

2490245

In [None]:
df['predicted_label']= all_pred

In [None]:
df[['words', 'datetime' ,'Date', 'symbol', 'old_label', 'predicted_label']].to_csv("Final_FAANG.csv")