# Financial Sentiment Analysis (Single)

In this program, I run a sentiment analysis of a single company based on financial news articles.

The company that I am targeting is Nvidia [NVDA]

## Fetching News Articles

The first step is to fetch the news articles.  

I am using `NewsAPI` to get articles quickly and easily. Then, I use `pandas` to put the articles into a dataframe, where I can collect and read the data easier.  

**Filtering articles:**  
Filter articles that only exist  
- `NewsAPI` sometimes fetches articles that were removed  

**Extracting the data:**  
Extract only the necessary data from the articles
- Title
- Description
- Content

All others can be discarded.

In [46]:
import os
from newsapi.newsapi_client import NewsApiClient

In [47]:
# init newsapi
newsapi = NewsApiClient(api_key=os.getenv('NEWS_API_KEY'))

In [48]:
# fetch articles
all_articles = newsapi.get_everything(q='Nvidia',
                                      language='en')

In [49]:
import pandas as pd
pd.__version__

'2.2.3'

In [52]:
all_articles_df = pd.DataFrame(all_articles['articles'])
print(all_articles_df[['title']].head())

                                               title
0  DOJ subpoenas NVIDIA as part of antitrust prob...
1                                          [Removed]
2  Nvidia might actually lose in this key part of...
3  Nvidia CEO Jensen Huang says the payback on AI...
4  Stock market today: Dow hits record high while...


In [53]:
# filter articles that are valid only
# valid meaning: article exists and description of article exists
def filter_removed_articles(articles):
    return [article for article in articles if article.get('title') != '[Removed]']

In [54]:
valid_articles = filter_removed_articles(all_articles['articles'])

In [55]:
valid_articles_df = pd.DataFrame(valid_articles)
print(valid_articles_df[['title']].head())

                                               title
0  DOJ subpoenas NVIDIA as part of antitrust prob...
1  Nvidia might actually lose in this key part of...
2  Nvidia CEO Jensen Huang says the payback on AI...
3  Stock market today: Dow hits record high while...
4  Nvidia Hit With DOJ Subpoena In Escalating Ant...


In [56]:
# extract only the title, description, and content from the articles
def extract_article_essentials(articles):
    return [{'title': article['title'], 'descripton': article['description'], 'content': article['content']} for article in articles]

In [57]:
articles = extract_article_essentials(valid_articles)

In [58]:
# create a data frame of the articles with essentials only
articles_df = pd.DataFrame(articles)
print(articles_df[['title']].head())

                                               title
0  DOJ subpoenas NVIDIA as part of antitrust prob...
1  Nvidia might actually lose in this key part of...
2  Nvidia CEO Jensen Huang says the payback on AI...
3  Stock market today: Dow hits record high while...
4  Nvidia Hit With DOJ Subpoena In Escalating Ant...


## Preprocess Text

***This is a crucial***  
Proprocessing helps clean and standarize the text data making it more suitable for analysis.  

After getting the articles, I can now preprocess the text in the articles.  

#### Remove noise
We want to first remove all noise from the data.  
This includes punction and capital letters
- All letters should be the same case so all words are treated the same in the tokenization process.

#### Tokenization



In [59]:
# import nltk package (Natural Language Toolkit)
import nltk

In [60]:
# download nltk data downloader
nltk.download()

showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml


True