# Financial Sentiment Analysis (Single)

In this program, I run a sentiment analysis of a single company based on financial news articles.

The company that I am targeting is Nvidia [NVDA]

## Fetching News Articles

The first step is to fetch the news articles.  

I am using `NewsAPI` to get articles quickly and easily. Then, I use `pandas` to put the articles into a dataframe, where I can collect and read the data easier.  

**Filtering articles:**  
Filter articles that only exist  
- `NewsAPI` sometimes fetches articles that were removed  

**Extracting the data:**  
Extract only the necessary data from the articles
- Title
- Description
- Content

All others can be discarded.  

Both of these steps are part of the cleaning data step that is next in text preprocessing.

In [489]:
import os
from dotenv import load_dotenv

In [490]:
# get path to the environment file
env_path = '../config/.env'
load_dotenv(env_path)

True

In [491]:
# import newsapi package
from newsapi import NewsApiClient

In [492]:
# init newsapi
newsapi = NewsApiClient(api_key=os.getenv('NEWS_API_KEY'))

In [493]:
# fetch all articles that mention Nvidia
all_articles = newsapi.get_everything(q='Nvidia',
                                      language='en')

In [494]:
import pandas as pd
pd.__version__

'2.2.3'

In [495]:
# place all_articles into a dataframe
all_articles_df = pd.DataFrame(all_articles['articles'])
all_articles_df

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': None, 'name': 'Yahoo Entertainment'}",Lawrence Bonk,DOJ subpoenas NVIDIA as part of antitrust prob...,The DOJ has sent subpoenas to NVIDIA and other...,https://consent.yahoo.com/v2/collectConsent?se...,,2024-09-04T15:34:35Z,"If you click 'Accept all', we and our partners..."
1,"{'id': None, 'name': 'Gizmodo.com'}",Kyle Barr,The Leaked Nvidia RTX 5090 Has So Many Cores I...,Get ready to watch the lights on your block di...,https://gizmodo.com/the-leaked-nvidia-rtx-5090...,https://gizmodo.com/app/uploads/2024/09/Nvidia...,2024-09-27T13:35:22Z,The GeForce RTX 4090 is already so big that an...
2,"{'id': None, 'name': '[Removed]'}",,[Removed],[Removed],https://removed.com,,2024-08-29T23:30:37Z,[Removed]
3,"{'id': 'business-insider', 'name': 'Business I...",Emma Cosgrove,Nvidia might actually lose in this key part of...,"As AI matures, Nvidia, Groq, and Cerebras focu...",https://www.businessinsider.com/nvidia-may-los...,https://i.insider.com/66d0c408392a3bda9f2349e3...,2024-09-01T13:00:02Z,Justin Sullivan/Getty\r\n<ul><li>Inference mad...
4,"{'id': 'business-insider', 'name': 'Business I...",Eugene Kim,This chart shows one potential advantage AWS's...,"AI chip investments by Amazon, Google, and Mic...",https://www.businessinsider.com/aws-ai-chips-w...,https://i.insider.com/6622c44b23b29110d3011ce1...,2024-09-26T09:00:02Z,Noah Berger/Getty Images\r\n<ul><li>Big tech c...
...,...,...,...,...,...,...,...,...
95,"{'id': None, 'name': 'Windows Central'}",jez@windowscentral.com (Jez Corden),Apple and Microsoft battle for the #1 market c...,"It's not a hot week for NVIDIA bag holders, as...",https://www.windowscentral.com/microsoft/apple...,https://cdn.mos.cms.futurecdn.net/L9j55HyDmDHb...,2024-09-04T12:04:09Z,What you need to know\r\n<ul><li>NVIDIA is a g...
96,"{'id': None, 'name': 'Windows Central'}",kevinokemwa@outlook.com (Kevin Okemwa),Elon Musk's xAI developed Colossus in just 122...,Elon Musk announced the world's most powerful ...,https://www.windowscentral.com/software-apps/e...,https://cdn.mos.cms.futurecdn.net/dtcAQq732vmY...,2024-09-05T10:21:23Z,What you need to know\r\n<ul><li>Elon Musk's x...
97,"{'id': None, 'name': 'Windows Central'}",kevinokemwa@outlook.com (Kevin Okemwa),Investors claim OpenAI is 'uniquely' positione...,A new report claims more investors are joining...,https://www.windowscentral.com/software-apps/i...,https://cdn.mos.cms.futurecdn.net/3SNC5vCe8Ci2...,2024-09-20T15:43:27Z,What you need to know\r\n<ul><li>Despite recen...
98,"{'id': None, 'name': 'Windows Central'}",windowscentral@futurenet.com (WC Staff),What you need to know about AI PCs and how to ...,"AI PCs are a diverse group, and they can come ...",https://www.windowscentral.com/hardware/nvidia...,https://cdn.mos.cms.futurecdn.net/W3JErBHAjSu7...,2024-09-10T13:47:30Z,There has been a surge in AI applications that...


In [496]:
# filter articles function
# only filters valid articles
# valid meaning: article exists and description of article exists
def filter_removed_articles(articles):
    return [article for article in articles if article.get('title') != '[Removed]']

In [497]:
# filter the all_articles
valid_articles = filter_removed_articles(all_articles['articles'])

In [498]:
valid_articles_df = pd.DataFrame(valid_articles)
valid_articles_df

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': None, 'name': 'Yahoo Entertainment'}",Lawrence Bonk,DOJ subpoenas NVIDIA as part of antitrust prob...,The DOJ has sent subpoenas to NVIDIA and other...,https://consent.yahoo.com/v2/collectConsent?se...,,2024-09-04T15:34:35Z,"If you click 'Accept all', we and our partners..."
1,"{'id': None, 'name': 'Gizmodo.com'}",Kyle Barr,The Leaked Nvidia RTX 5090 Has So Many Cores I...,Get ready to watch the lights on your block di...,https://gizmodo.com/the-leaked-nvidia-rtx-5090...,https://gizmodo.com/app/uploads/2024/09/Nvidia...,2024-09-27T13:35:22Z,The GeForce RTX 4090 is already so big that an...
2,"{'id': 'business-insider', 'name': 'Business I...",Emma Cosgrove,Nvidia might actually lose in this key part of...,"As AI matures, Nvidia, Groq, and Cerebras focu...",https://www.businessinsider.com/nvidia-may-los...,https://i.insider.com/66d0c408392a3bda9f2349e3...,2024-09-01T13:00:02Z,Justin Sullivan/Getty\r\n<ul><li>Inference mad...
3,"{'id': 'business-insider', 'name': 'Business I...",Eugene Kim,This chart shows one potential advantage AWS's...,"AI chip investments by Amazon, Google, and Mic...",https://www.businessinsider.com/aws-ai-chips-w...,https://i.insider.com/6622c44b23b29110d3011ce1...,2024-09-26T09:00:02Z,Noah Berger/Getty Images\r\n<ul><li>Big tech c...
4,"{'id': 'business-insider', 'name': 'Business I...",Emma Cosgrove,Nvidia CEO Jensen Huang says the payback on AI...,Nvidia CEO Jensen Huang promised immediate ret...,https://www.businessinsider.com/nvidia-swift-r...,https://i.insider.com/66cfaf8a43b5e59d16b64ee9...,2024-08-29T00:56:26Z,Jensen Huang said demand will continue to be s...
...,...,...,...,...,...,...,...,...
92,"{'id': None, 'name': 'Windows Central'}",jez@windowscentral.com (Jez Corden),Apple and Microsoft battle for the #1 market c...,"It's not a hot week for NVIDIA bag holders, as...",https://www.windowscentral.com/microsoft/apple...,https://cdn.mos.cms.futurecdn.net/L9j55HyDmDHb...,2024-09-04T12:04:09Z,What you need to know\r\n<ul><li>NVIDIA is a g...
93,"{'id': None, 'name': 'Windows Central'}",kevinokemwa@outlook.com (Kevin Okemwa),Elon Musk's xAI developed Colossus in just 122...,Elon Musk announced the world's most powerful ...,https://www.windowscentral.com/software-apps/e...,https://cdn.mos.cms.futurecdn.net/dtcAQq732vmY...,2024-09-05T10:21:23Z,What you need to know\r\n<ul><li>Elon Musk's x...
94,"{'id': None, 'name': 'Windows Central'}",kevinokemwa@outlook.com (Kevin Okemwa),Investors claim OpenAI is 'uniquely' positione...,A new report claims more investors are joining...,https://www.windowscentral.com/software-apps/i...,https://cdn.mos.cms.futurecdn.net/3SNC5vCe8Ci2...,2024-09-20T15:43:27Z,What you need to know\r\n<ul><li>Despite recen...
95,"{'id': None, 'name': 'Windows Central'}",windowscentral@futurenet.com (WC Staff),What you need to know about AI PCs and how to ...,"AI PCs are a diverse group, and they can come ...",https://www.windowscentral.com/hardware/nvidia...,https://cdn.mos.cms.futurecdn.net/W3JErBHAjSu7...,2024-09-10T13:47:30Z,There has been a surge in AI applications that...


In [499]:
# extract article essentials function
# extract only the title, description, and content from the articles
def extract_article_essentials(articles):
    return [{'title': article['title'], 'content': article['content']} for article in articles]

In [500]:
extracted_articles = extract_article_essentials(valid_articles)

In [501]:
extracted_articles_df = pd.DataFrame(extracted_articles)
extracted_articles_df

Unnamed: 0,title,content
0,DOJ subpoenas NVIDIA as part of antitrust prob...,"If you click 'Accept all', we and our partners..."
1,The Leaked Nvidia RTX 5090 Has So Many Cores I...,The GeForce RTX 4090 is already so big that an...
2,Nvidia might actually lose in this key part of...,Justin Sullivan/Getty\r\n<ul><li>Inference mad...
3,This chart shows one potential advantage AWS's...,Noah Berger/Getty Images\r\n<ul><li>Big tech c...
4,Nvidia CEO Jensen Huang says the payback on AI...,Jensen Huang said demand will continue to be s...
...,...,...
92,Apple and Microsoft battle for the #1 market c...,What you need to know\r\n<ul><li>NVIDIA is a g...
93,Elon Musk's xAI developed Colossus in just 122...,What you need to know\r\n<ul><li>Elon Musk's x...
94,Investors claim OpenAI is 'uniquely' positione...,What you need to know\r\n<ul><li>Despite recen...
95,What you need to know about AI PCs and how to ...,There has been a surge in AI applications that...


## Preprocess Text
***This is a crucial***  
Proprocessing helps clean and normalize the text data making it more suitable for analysis.  

After getting the articles, I can now preprocess the text in the articles.  

### Data Cleaning 
**Identify and remove noise:**  
We want to first remove all noise from the data.  
- Punction
- Extra whitespace

**Text normalization:**  
- Stopwords
    - Remove common/irrelevent words that are unlikely to convey much sentiment.  
- Capital letters
    - All letters should be the same case so all words are treated the same in the tokenization process.  

**Data masking:**  
Data masking is not needed in this context.  

Clean text should result.  


In [502]:
# import re package (regular expressions)
import re

In [503]:
import string

In [504]:
# import nltk packages (Natural Language Toolkit)
import nltk
from nltk.corpus import stopwords

In [505]:
# download nltk data packages
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/justinhoang/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [506]:
# get all stopwords from data package
stop_words = set(stopwords.words("english"))
print(stop_words)

{"doesn't", "don't", 'shan', 'yourself', 'this', 'same', 't', 'couldn', 'about', 'how', 'hadn', 'hers', 'm', 'o', 'we', 'that', 'did', 'should', "you've", "hasn't", 'mightn', 'into', "shan't", 'while', 'wouldn', 'in', 'further', 'myself', 'by', "isn't", 'under', 'before', 'a', 'herself', 'his', 'up', 've', 'can', 's', "aren't", 'other', 'on', "she's", "it's", "won't", 'shouldn', 'why', 'be', "hadn't", 'where', 'through', 'at', 'down', 'if', 'their', 'having', "mustn't", 'i', 'd', 'doesn', "you'll", 'each', 'because', 'isn', 'll', 'which', 'then', 'will', "that'll", 'of', "didn't", 'you', 'between', 'being', 'been', 'after', 'most', 'these', 'have', 'to', 'during', 'ain', 'doing', 'any', 'won', 'no', 'himself', 'whom', 'there', 'hasn', 'she', 'off', 'with', 'itself', 'here', 'both', 'me', 'above', 'ours', "you're", 'not', 're', 'needn', 'what', 'my', 'now', 'is', 'your', 'only', 'are', 'own', 'so', 'who', 'him', "shouldn't", 'when', "should've", 'aren', 'for', 'some', 'over', 'very', 'o

In [507]:
# clean text function
# cleans the data (text)
def clean_text(text):
    # remove extra whitespace
    cleaned_text = re.sub(r'\s+', ' ', text).strip()
    # remove HTML tags
    cleaned_text = re.sub(r'<[^>]+>', '', cleaned_text)
    # remove patterns
    cleaned_text = re.sub(r'/\w+\s+\w+', '', cleaned_text)
    # remove URLs
    cleaned_text = re.sub(r'http\S+|www\S+|https\S+', '', cleaned_text, flags=re.MULTILINE)
    # remove punctuation
    cleaned_text = ''.join([char for char in text if char not in string.punctuation])
    
    # lower case all text
    cleaned_text = text.lower()
    # remove stop words from text
    cleaned_text = ' '.join([word for word in cleaned_text.split() if word not in stop_words])
    
    return cleaned_text

In [508]:
columns_to_clean = ['title', 'content']

In [509]:
cleaned_articles_df = extracted_articles_df.copy()

In [510]:
for column in columns_to_clean:
    cleaned_articles_df[column] = extracted_articles_df[column].apply(clean_text)

In [511]:
cleaned_articles_df.columns = columns_to_clean

In [512]:
cleaned_articles_df

Unnamed: 0,title,content
0,doj subpoenas nvidia part antitrust probe rega...,"click 'accept all', partners, including 239 pa..."
1,leaked nvidia rtx 5090 many cores actually scares,geforce rtx 4090 already big pc builder worth ...
2,nvidia might actually lose key part ai chip bu...,justin sullivan/getty <ul><li>inference made 4...
3,chart shows one potential advantage aws's ai c...,noah berger/getty images <ul><li>big tech clou...
4,nvidia ceo jensen huang says payback ai spendi...,jensen huang said demand continue stronger sup...
...,...,...
92,apple microsoft battle #1 market cap crown nvi...,need know <ul><li>nvidia global tech company r...
93,elon musk's xai developed colossus 122 days — ...,need know <ul><li>elon musk's xai team launche...
94,investors claim openai 'uniquely' positioned b...,need know <ul><li>despite recent bankruptcy cl...
95,need know ai pcs choose one,surge ai applications everything generating im...
