# Sentiment Analysis and Text Embeddings Workflow

This notebook documents the workflow for processing, analyzing, and embedding text data for sentiment analysis. The steps include:

1. **Sentiment Analysis**:
    - Apply VADER sentiment analysis to the cleaned text data.
2. **Text Embeddings**:
    - Generate text embeddings using various techniques such as TF-IDF, Word2Vec, BERT, and others.
    - Save the embeddings for further analysis.
3. **Data Export**: Save processed datasets to CSV files for reuse.
4. **External Analysis**: Perform computationally intensive tasks, such as RoBERTa-based sentiment analysis and text embedings, in Google Colab (`./colabs/roberta_sentiments_and_embedings.ipynb`).

The results and intermediate outputs are stored in structured folders for easy access and reproducibility.

In [17]:
# Standard libraries
import pandas as pd
import os

# Natural Language Processing libraries
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
nltk.download('stopwords')
from nltk.stem import WordNetLemmatizer 

# Enable auto-reload for modules during development
%load_ext autoreload
%autoreload 2

# Set display options for Pandas to show all columns
pd.set_option('display.max_columns', None)

# Custom libraries
from nlp_scripts import nlp as nlp 
from nlp_scripts import eda as eda

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Arturo\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\Arturo\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Arturo\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Arturo\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [2]:
# Data folder
data_folder = '../data/data_cleaned'

# Reading the dataset
news_dict = nlp.load_dataframes_news_from_folder(data_folder)
fake_news_dict = nlp.load_dataframes_fake_news_from_folder(data_folder)


In [3]:
# Keys for the dictionaries
news_keys = list(news_dict.keys())
fake_news_keys = list(fake_news_dict.keys())

# Display the keys for each dictionary
print("News keys:", news_keys)
print("Fake news keys:", fake_news_keys)

News keys: ['AAPL', 'AMZN', 'BAC', 'GME', 'GS', 'NVDA', 'TSLA']
Fake news keys: ['testing', 'training']


In [4]:
# Load the lemmatizer before cleaning of text
lemmatizer = WordNetLemmatizer()

# Clean the text data in the news dataset
news_dict_cleaned = nlp.text_processing(news_dict, lemmatizer)

# Clean the text data in the fake news dataset
fake_news_dict_cleaned = nlp.text_processing(fake_news_dict, lemmatizer)

In [5]:
# Let's check the cleaned data
news_dict_cleaned['AAPL'].head(10)

Unnamed: 0,date,title,source,text_column_tokens,text_without_puntutation,lemmatizers,without_stop_words,text_column
0,2019-12-11,One of the top Apple 5 analysts predicts next ...,Markets Insider,"[one, of, the, top, apple, 5, analysts, predic...","[one, of, the, top, apple, 5, analysts, predic...","[one, of, the, top, apple, 5, analyst, predict...","[one, top, apple, 5, analyst, predicts, next, ...",one top apple 5 analyst predicts next year 5g ...
1,2019-12-11,Microsoft Stock Poised for a Positive 2020,Markets Insider,"[microsoft, stock, poised, for, a, positive, 2...","[microsoft, stock, poised, for, a, positive, 2...","[microsoft, stock, poise, for, a, positive, 2020]","[microsoft, stock, poise, positive, 2020]",microsoft stock poise positive 2020
2,2019-12-11,"Wednesdays Vital Data: Peloton, Netflix and Tesla",Markets Insider,"[wednesdays, vital, data, :, peloton, ,, netfl...","[wednesdays, vital, data, peloton, netflix, an...","[wednesday, vital, data, peloton, netflix, and...","[wednesday, vital, data, peloton, netflix, tesla]",wednesday vital data peloton netflix tesla
3,2019-12-11,Apple CEO Tim Cook is striking back at critics...,Markets Insider,"[apple, ceo, tim, cook, is, striking, back, at...","[apple, ceo, tim, cook, is, striking, back, at...","[apple, ceo, tim, cook, be, strike, back, at, ...","[apple, ceo, tim, cook, strike, back, critic, ...",apple ceo tim cook strike back critic say inno...
4,2019-12-11,"The fully upgraded Mac Pro costs 50,000, but y...",Markets Insider,"[the, fully, upgraded, mac, pro, costs, 50,000...","[the, fully, upgraded, mac, pro, costs, 50,000...","[the, fully, upgraded, mac, pro, cost, 50,000,...","[fully, upgraded, mac, pro, cost, 50,000, add,...","fully upgraded mac pro cost 50,000 add wheel 4..."
5,2019-12-11,"Apple's pricey new 6,000 screen for the Mac Pr...",Markets Insider,"[apple, 's, pricey, new, 6,000, screen, for, t...","[apple, pricey, new, 6,000, screen, for, the, ...","[apple, pricey, new, 6,000, screen, for, the, ...","[apple, pricey, new, 6,000, screen, mac, pro, ...","apple pricey new 6,000 screen mac pro clean sp..."
6,2019-12-11,"3 Restaurant Stocks, 2 Buys and 1 Warning",Markets Insider,"[3, restaurant, stocks, ,, 2, buys, and, 1, wa...","[3, restaurant, stocks, 2, buys, and, 1, warning]","[3, restaurant, stock, 2, buy, and, 1, warn]","[3, restaurant, stock, 2, buy, 1, warn]",3 restaurant stock 2 buy 1 warn
7,2019-12-11,Wednesday Apple Rumors: Do Not Expect a Price ...,Markets Insider,"[wednesday, apple, rumors, :, do, not, expect,...","[wednesday, apple, rumors, do, not, expect, a,...","[wednesday, apple, rumor, do, not, expect, a, ...","[wednesday, apple, rumor, not, expect, price, ...",wednesday apple rumor not expect price increas...
8,2019-12-11,Stock Market Today: Federal Reserve in Focus; ...,Markets Insider,"[stock, market, today, :, federal, reserve, in...","[stock, market, today, federal, reserve, in, f...","[stock, market, today, federal, reserve, in, f...","[stock, market, today, federal, reserve, focus...",stock market today federal reserve focus iphon...
9,2019-12-11,"Dow Jones Today: Federal Reserve Holds, Boosti...",Markets Insider,"[dow, jones, today, :, federal, reserve, holds...","[dow, jones, today, federal, reserve, holds, b...","[dow, jones, today, federal, reserve, hold, bo...","[dow, jones, today, federal, reserve, hold, bo...",dow jones today federal reserve hold boost stock


### VADER Test

In [6]:
# Add polarity scores to the news DataFrames
news_dict_sent = nlp.apply_vader_polarity_score(news_dict_cleaned, text_column='text_column')

# Add polarity scores to the fake news DataFrames
fake_news_dict_sent = nlp.apply_vader_polarity_score(fake_news_dict_cleaned, text_column='text_column')


In [7]:
# Lets check the columns for the AAPL dataframe
news_dict_sent['AAPL'].head(10)

Unnamed: 0,date,title,source,text_column_tokens,text_without_puntutation,lemmatizers,without_stop_words,text_column,compound_score,positive_score,neutral_score,negative_score
0,2019-12-11,One of the top Apple 5 analysts predicts next ...,Markets Insider,"[one, of, the, top, apple, 5, analysts, predic...","[one, of, the, top, apple, 5, analysts, predic...","[one, of, the, top, apple, 5, analyst, predict...","[one, top, apple, 5, analyst, predicts, next, ...",one top apple 5 analyst predicts next year 5g ...,0.2023,0.114,0.886,0.0
1,2019-12-11,Microsoft Stock Poised for a Positive 2020,Markets Insider,"[microsoft, stock, poised, for, a, positive, 2...","[microsoft, stock, poised, for, a, positive, 2...","[microsoft, stock, poise, for, a, positive, 2020]","[microsoft, stock, poise, positive, 2020]",microsoft stock poise positive 2020,0.5574,0.474,0.526,0.0
2,2019-12-11,"Wednesdays Vital Data: Peloton, Netflix and Tesla",Markets Insider,"[wednesdays, vital, data, :, peloton, ,, netfl...","[wednesdays, vital, data, peloton, netflix, an...","[wednesday, vital, data, peloton, netflix, and...","[wednesday, vital, data, peloton, netflix, tesla]",wednesday vital data peloton netflix tesla,0.296,0.306,0.694,0.0
3,2019-12-11,Apple CEO Tim Cook is striking back at critics...,Markets Insider,"[apple, ceo, tim, cook, is, striking, back, at...","[apple, ceo, tim, cook, is, striking, back, at...","[apple, ceo, tim, cook, be, strike, back, at, ...","[apple, ceo, tim, cook, strike, back, critic, ...",apple ceo tim cook strike back critic say inno...,0.0,0.16,0.617,0.222
4,2019-12-11,"The fully upgraded Mac Pro costs 50,000, but y...",Markets Insider,"[the, fully, upgraded, mac, pro, costs, 50,000...","[the, fully, upgraded, mac, pro, costs, 50,000...","[the, fully, upgraded, mac, pro, cost, 50,000,...","[fully, upgraded, mac, pro, cost, 50,000, add,...","fully upgraded mac pro cost 50,000 add wheel 4...",0.0,0.0,1.0,0.0
5,2019-12-11,"Apple's pricey new 6,000 screen for the Mac Pr...",Markets Insider,"[apple, 's, pricey, new, 6,000, screen, for, t...","[apple, pricey, new, 6,000, screen, for, the, ...","[apple, pricey, new, 6,000, screen, for, the, ...","[apple, pricey, new, 6,000, screen, mac, pro, ...","apple pricey new 6,000 screen mac pro clean sp...",0.6597,0.351,0.649,0.0
6,2019-12-11,"3 Restaurant Stocks, 2 Buys and 1 Warning",Markets Insider,"[3, restaurant, stocks, ,, 2, buys, and, 1, wa...","[3, restaurant, stocks, 2, buys, and, 1, warning]","[3, restaurant, stock, 2, buy, and, 1, warn]","[3, restaurant, stock, 2, buy, 1, warn]",3 restaurant stock 2 buy 1 warn,-0.1027,0.0,0.682,0.318
7,2019-12-11,Wednesday Apple Rumors: Do Not Expect a Price ...,Markets Insider,"[wednesday, apple, rumors, :, do, not, expect,...","[wednesday, apple, rumors, do, not, expect, a,...","[wednesday, apple, rumor, do, not, expect, a, ...","[wednesday, apple, rumor, not, expect, price, ...",wednesday apple rumor not expect price increas...,-0.2411,0.0,0.803,0.197
8,2019-12-11,Stock Market Today: Federal Reserve in Focus; ...,Markets Insider,"[stock, market, today, :, federal, reserve, in...","[stock, market, today, federal, reserve, in, f...","[stock, market, today, federal, reserve, in, f...","[stock, market, today, federal, reserve, focus...",stock market today federal reserve focus iphon...,0.0,0.0,1.0,0.0
9,2019-12-11,"Dow Jones Today: Federal Reserve Holds, Boosti...",Markets Insider,"[dow, jones, today, :, federal, reserve, holds...","[dow, jones, today, federal, reserve, holds, b...","[dow, jones, today, federal, reserve, hold, bo...","[dow, jones, today, federal, reserve, hold, bo...",dow jones today federal reserve hold boost stock,0.4019,0.278,0.722,0.0


In [8]:
# Save the cleaned dataframes to CSV files to avoid having to re-run the cleaning process during development
output_folder = '../data/data_cleaned'
os.makedirs(output_folder, exist_ok=True)
for key, df in news_dict_sent.items():
    output_path = os.path.join(output_folder, f'news_{key}_vader.csv')
    df.to_csv(output_path, index=False)

In [9]:
# Lets check the columns for the fake news train dataframe
fake_news_dict_sent['training'].head(10)

Unnamed: 0,fake_news,title,text_column_tokens,text_without_puntutation,lemmatizers,without_stop_words,text_column,compound_score,positive_score,neutral_score,negative_score
0,0,donald trump sends out embarrassing new year‚s...,"[donald, trump, sends, out, embarrassing, new,...","[donald, trump, sends, out, embarrassing, new,...","[donald, trump, sends, out, embarrass, new, ev...","[donald, trump, sends, embarrass, new, eve, me...",donald trump sends embarrass new eve message d...,-0.5994,0.0,0.55,0.45
1,0,drunk bragging trump staffer started russian c...,"[drunk, bragging, trump, staffer, started, rus...","[drunk, bragging, trump, staffer, started, rus...","[drunk, bragging, trump, staffer, start, russi...","[drunk, bragging, trump, staffer, start, russi...",drunk bragging trump staffer start russian col...,-0.34,0.0,0.745,0.255
2,0,sheriff david clarke becomes an internet joke ...,"[sheriff, david, clarke, becomes, an, internet...","[sheriff, david, clarke, becomes, an, internet...","[sheriff, david, clarke, becomes, an, internet...","[sheriff, david, clarke, becomes, internet, jo...",sheriff david clarke becomes internet joke thr...,-0.1027,0.186,0.593,0.22
3,0,trump is so obsessed he even has obama‚s name ...,"[trump, is, so, obsessed, he, even, has, obama...","[trump, is, so, obsessed, he, even, has, name,...","[trump, be, so, obsess, he, even, have, name, ...","[trump, obsess, even, name, cod, website, image]",trump obsess even name cod website image,-0.25,0.0,0.75,0.25
4,0,pope francis just called out donald trump duri...,"[pope, francis, just, called, out, donald, tru...","[pope, francis, just, called, out, donald, tru...","[pope, francis, just, call, out, donald, trump...","[pope, francis, call, donald, trump, christmas...",pope francis call donald trump christmas speech,0.0,0.0,1.0,0.0
5,0,racist alabama cops brutalize black boy while ...,"[racist, alabama, cops, brutalize, black, boy,...","[racist, alabama, cops, brutalize, black, boy,...","[racist, alabama, cop, brutalize, black, boy, ...","[racist, alabama, cop, brutalize, black, boy, ...",racist alabama cop brutalize black boy handcuf...,-0.836,0.0,0.47,0.53
6,0,fresh off the golf course,"[fresh, off, the, golf, course]","[fresh, off, the, golf, course]","[fresh, off, the, golf, course]","[fresh, golf, course]",fresh golf course,0.3182,0.535,0.465,0.0
7,0,trump said some insanely racist stuff inside t...,"[trump, said, some, insanely, racist, stuff, i...","[trump, said, some, insanely, racist, stuff, i...","[trump, say, some, insanely, racist, stuff, in...","[trump, say, insanely, racist, stuff, inside, ...",trump say insanely racist stuff inside oval of...,-0.6124,0.0,0.636,0.364
8,0,former cia director slams trump over un bullying,"[former, cia, director, slams, trump, over, un...","[former, cia, director, slams, trump, over, un...","[former, cia, director, slam, trump, over, un,...","[former, cia, director, slam, trump, un, bully...",former cia director slam trump un bullying,-0.7579,0.0,0.435,0.565
9,0,brand-new pro-trump ad features so much a** ki...,"[brand-new, pro-trump, ad, features, so, much,...","[brand-new, pro-trump, ad, features, so, much,...","[brand-new, pro-trump, ad, feature, so, much, ...","[brand-new, pro-trump, ad, feature, much, kiss...",brand-new pro-trump ad feature much kiss make ...,-0.128,0.231,0.496,0.273


In [10]:
# Save the cleaned dataframes for fake news to CSV files to avoid having to re-run the cleaning process during development
for key, df in fake_news_dict_sent.items():
    output_path = os.path.join(output_folder, f'fake_news_{key}_vader.csv')
    df.to_csv(output_path, index=False)

### RoBERTa

🚨 RoBERTa pipelines are computationally intensive and require significant memory. Therefore, this part of the analysis was performed in Google Colab. The corresponding notebook can be found at `./colabs/roberta_sentiments_and_embedings.ipynb`. The results were stored in the `../data/sentiments` folder. The text embedings using `TF-IDF Vectorizer`, ` Word2Vec Embeddings`, ` Transformer-based (BERT) Embeddings`, `Bag of Words (BoW)` and ` MiniLM (SentenceTransformer)` can be found in  the `../data/embedings` folder.

In [11]:
# Data folder
data_folder = '../data/sentiments'

# Reading the dataset with the sentiments
news_dict_sent = nlp.load_dataframes_news_from_folder(data_folder, startswith = 'news_', key_place = 1)
fake_news_dict_sent = nlp.load_dataframes_fake_news_from_folder(data_folder, endswith = 'g_roberta.csv', key_place = 2)

In [12]:
# Modifying sentiment labels from LABEL_0, LABEL_1 and LABEL_2 to 0, 1 and 2

news_dict_sent = nlp.convert_sentiment_labels(news_dict_sent)
fake_news_dict_sent = nlp.convert_sentiment_labels(fake_news_dict_sent)

In [13]:
# Save the cleaned dataframes to CSV files to avoid having to re-run the cleaning process during development

os.makedirs(data_folder, exist_ok=True)
for key, df in news_dict_sent.items():
    output_path = os.path.join(data_folder, f'news_{key}_roberta_ready.csv')
    df.to_csv(output_path, index=False)

for key, df in fake_news_dict_sent.items():
    output_path = os.path.join(data_folder, f'fake_news_{key}_roberta_ready.csv')
    df.to_csv(output_path, index=False)

In [14]:
# Lets check the columns for the AAPL dataframe
news_dict_sent['AAPL'].head(2)

Unnamed: 0,date,title,source,text_column_tokens,text_without_puntutation,lemmatizers,without_stop_words,text_column,compound_score,positive_score,neutral_score,negative_score,roberta_neg,roberta_neu,roberta_pos,roberta_sentiment,joy,anger,fear,sadness,disgust,surprise,neutral,sentiment_label,sentiment_score
0,2019-12-11,One of the top Apple 5 analysts predicts next ...,Markets Insider,"['one', 'of', 'the', 'top', 'apple', '5', 'ana...","['one', 'of', 'the', 'top', 'apple', '5', 'ana...","['one', 'of', 'the', 'top', 'apple', '5', 'ana...","['one', 'top', 'apple', '5', 'analyst', 'predi...",one top apple 5 analyst predicts next year 5g ...,0.2023,0.114,0.886,0.0,0.006927,0.285839,0.707234,2,0.049675,0.010868,0.014051,0.310383,0.001212,0.391046,0.222765,2,0.783929
1,2019-12-11,Microsoft Stock Poised for a Positive 2020,Markets Insider,"['microsoft', 'stock', 'poised', 'for', 'a', '...","['microsoft', 'stock', 'poised', 'for', 'a', '...","['microsoft', 'stock', 'poise', 'for', 'a', 'p...","['microsoft', 'stock', 'poise', 'positive', '2...",microsoft stock poise positive 2020,0.5574,0.474,0.526,0.0,0.003582,0.209597,0.786821,2,0.188979,0.010624,0.009068,0.010865,0.003118,0.064486,0.71286,2,0.88264


In [15]:
# Lets check the columns for the AAPL dataframe
fake_news_dict_sent['training'][fake_news_dict_sent['training']['text_column'].isna()]

Unnamed: 0,fake_news,title,text_column_tokens,text_without_puntutation,lemmatizers,without_stop_words,text_column,compound_score,positive_score,neutral_score,negative_score,roberta_neg,roberta_neu,roberta_pos,roberta_sentiment,joy,anger,fear,sadness,disgust,surprise,neutral,sentiment_label,sentiment_score
320,0,‚racist,['‚racist'],[],[],[],,0.0,0.0,0.0,0.0,0.239502,0.528189,0.232309,1,0.014371,0.023309,0.051689,0.157752,0.056065,0.099989,0.596827,0,0.63503
986,0,no,['no'],['no'],['no'],[],,0.0,0.0,0.0,0.0,0.239502,0.528189,0.232309,1,0.014371,0.023309,0.051689,0.157752,0.056065,0.099989,0.596827,1,0.472969
1707,0,once again,"['once', 'again']","['once', 'again']","['once', 'again']",[],,0.0,0.0,0.0,0.0,0.239502,0.528189,0.232309,1,0.014371,0.023309,0.051689,0.157752,0.056065,0.099989,0.596827,1,0.568483
8871,0,so,['so'],['so'],['so'],[],,0.0,0.0,0.0,0.0,0.239502,0.528189,0.232309,1,0.014371,0.023309,0.051689,0.157752,0.056065,0.099989,0.596827,1,0.567773


In [16]:
# Data folder
data_folder = '../data/embedings'

# Reading the dataset with the embedings
news_dict_emb = nlp.load_dataframes_news_from_folder(data_folder, startswith = 'news_', key_place = 1)
fake_news_dict_emb = nlp.load_dataframes_fake_news_from_folder(data_folder, endswith = 'g_embedings.csv', key_place = 2)

In [18]:
# Modifying sentiment labels from LABEL_0, LABEL_1 and LABEL_2 to 0, 1 and 2

news_dict_emb = nlp.convert_sentiment_labels(news_dict_emb)
fake_news_dict_emb = nlp.convert_sentiment_labels(fake_news_dict_emb)

In [19]:
# Save the cleaned dataframes to CSV files to avoid having to re-run the cleaning process during development

os.makedirs(data_folder, exist_ok=True)
for key, df in news_dict_emb.items():
    output_path = os.path.join(data_folder, f'news_{key}_embedings_ready.csv')
    df.to_csv(output_path, index=False)

for key, df in fake_news_dict_emb.items():
    output_path = os.path.join(data_folder, f'fake_news_{key}_embedings_ready.csv')
    df.to_csv(output_path, index=False)

In [20]:
# Lets check the columns for the AAPL dataframe
news_dict_emb['AAPL'].head(2)

Unnamed: 0,date,title,source,text_column_tokens,text_without_puntutation,lemmatizers,without_stop_words,text_column,compound_score,positive_score,neutral_score,negative_score,roberta_neg,roberta_neu,roberta_pos,roberta_sentiment,joy,anger,fear,sadness,disgust,surprise,neutral,sentiment_label,sentiment_score,tfidf_embedding,tfidf_length,word2vec_embedding,word2vec_length,bert_embedding,bert_length,bow_embedding,bow_length,miniLM_embedding,miniLM_shape
0,2019-12-11,One of the top Apple 5 analysts predicts next ...,Markets Insider,"['one', 'of', 'the', 'top', 'apple', '5', 'ana...","['one', 'of', 'the', 'top', 'apple', '5', 'ana...","['one', 'of', 'the', 'top', 'apple', '5', 'ana...","['one', 'top', 'apple', '5', 'analyst', 'predi...",one top apple 5 analyst predicts next year 5g ...,0.2023,0.114,0.886,0.0,0.006927,0.285839,0.707234,2,0.049675,0.010868,0.014051,0.310383,0.001212,0.391046,0.222765,2,0.783929,[[0. 0. 0. ... 0. 0. 0.]],15385,[-0.62547964 0.6510129 -0.06076336 0.396168...,100,[-1.91022187e-01 -2.50518322e-01 5.03991604e-...,768,[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...,1000,"[-0.015043850056827068, -0.038697242736816406,...",384
1,2019-12-11,Microsoft Stock Poised for a Positive 2020,Markets Insider,"['microsoft', 'stock', 'poised', 'for', 'a', '...","['microsoft', 'stock', 'poised', 'for', 'a', '...","['microsoft', 'stock', 'poise', 'for', 'a', 'p...","['microsoft', 'stock', 'poise', 'positive', '2...",microsoft stock poise positive 2020,0.5574,0.474,0.526,0.0,0.003582,0.209597,0.786821,2,0.188979,0.010624,0.009068,0.010865,0.003118,0.064486,0.71286,2,0.88264,[[0. 0. 0. ... 0. 0. 0.]],15385,[-0.38084295 0.79433244 -0.03659805 0.511174...,100,[-1.51426733e-01 -2.74093986e-01 9.13794488e-...,768,[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 ...,1000,"[-0.03231772780418396, -0.01976119354367256, 0...",384


In [21]:
news_dict_emb['AAPL']['tfidf_embedding'][0]

'[[0. 0. 0. ... 0. 0. 0.]]'

In [22]:
# Lets check the columns for the AAPL dataframe
fake_news_dict_emb['training'].head(2)

Unnamed: 0,fake_news,title,text_column_tokens,text_without_puntutation,lemmatizers,without_stop_words,text_column,compound_score,positive_score,neutral_score,negative_score,roberta_neg,roberta_neu,roberta_pos,roberta_sentiment,joy,anger,fear,sadness,disgust,surprise,neutral,sentiment_label,sentiment_score,tfidf_embedding,tfidf_length,word2vec_embedding,word2vec_length,bert_embedding,bert_length,bow_embedding,bow_length,miniLM_embedding,miniLM_shape
0,0,donald trump sends out embarrassing new year‚s...,"['donald', 'trump', 'sends', 'out', 'embarrass...","['donald', 'trump', 'sends', 'out', 'embarrass...","['donald', 'trump', 'sends', 'out', 'embarrass...","['donald', 'trump', 'sends', 'embarrass', 'new...",donald trump sends embarrass new eve message d...,-0.5994,0.0,0.55,0.45,0.77113,0.218655,0.010215,0,0.002908,0.088452,0.109574,0.088292,0.12651,0.109849,0.474416,0,0.971903,[[0. 0. 0. ... 0. 0. 0.]],13835,[ 0.06867641 0.17917332 0.35817477 0.017245...,100,[-2.65257388e-01 2.40806729e-01 1.68568522e-...,768,[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...,1000,"[0.045953016728162766, 0.08347233384847641, 0....",384
1,0,drunk bragging trump staffer started russian c...,"['drunk', 'bragging', 'trump', 'staffer', 'sta...","['drunk', 'bragging', 'trump', 'staffer', 'sta...","['drunk', 'bragging', 'trump', 'staffer', 'sta...","['drunk', 'bragging', 'trump', 'staffer', 'sta...",drunk bragging trump staffer start russian col...,-0.34,0.0,0.745,0.255,0.514261,0.467208,0.018531,0,0.01022,0.437211,0.237583,0.022953,0.014639,0.015434,0.261959,0,0.493148,[[0. 0. 0. ... 0. 0. 0.]],13835,[ 0.02361201 0.14580518 0.29309553 0.105892...,100,[-3.74765366e-01 -1.62393257e-01 -1.42092645e-...,768,[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...,1000,"[-0.06502026319503784, -0.015925128012895584, ...",384
