## Sentiment Analysis in Financial Markets
### Data Preprocessings

Analyze news articles, financial reports, or social media to gauge market sentiment. This project would involve natural language processing (NLP) techniques to understand how sentiment affects stock prices.

### Stocks to focus on 

- Tesla (Tsla)

- Apple Inc. (AAPL)

- Amazon.com Inc. (AMZN)

- Alphabet Inc. (GOOGL)

- Microsoft Corporation (MSFT)

## Data Collection

### Stock Market History Data

In [2]:
import yfinance as yf

  _empty_series = pd.Series()


In [4]:
tickers = ['Tsla','AAPL', 'MSFT', 'GOOG', 'AMZN']  # List of tickers  add more later
data = yf.download(tickers, start='2010-01-01', end='2024-01-01', group_by='ticker')
data.to_csv('data/stocks/df_stocks.csv')
data.head()

[*********************100%%**********************]  5 of 5 completed


Ticker,TSLA,TSLA,TSLA,TSLA,TSLA,TSLA,GOOG,GOOG,GOOG,GOOG,...,AMZN,AMZN,AMZN,AMZN,AAPL,AAPL,AAPL,AAPL,AAPL,AAPL
Price,Open,High,Low,Close,Adj Close,Volume,Open,High,Low,Close,...,Low,Close,Adj Close,Volume,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2010-01-04,,,,,,,15.61522,15.678981,15.547723,15.610239,...,6.657,6.695,6.695,151998000,7.6225,7.660714,7.585,7.643214,6.479,493729600
2010-01-05,,,,,,,15.620949,15.637387,15.480475,15.541497,...,6.5905,6.7345,6.7345,177038000,7.664286,7.699643,7.616071,7.656429,6.490199,601904800
2010-01-06,,,,,,,15.588072,15.588072,15.102393,15.149715,...,6.5825,6.6125,6.6125,143576000,7.656429,7.686786,7.526786,7.534643,6.386966,552160000
2010-01-07,,,,,,,15.178109,15.193053,14.760922,14.797037,...,6.44,6.5,6.5,220604000,7.5625,7.571429,7.466071,7.520714,6.375156,477131200
2010-01-08,,,,,,,14.744733,15.024933,14.672753,14.994298,...,6.4515,6.676,6.676,196610000,7.510714,7.571429,7.466429,7.570714,6.41754,447610800


In [40]:
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3522 entries, 2010-01-04 to 2023-12-29
Data columns (total 30 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   (AMZN, Open)       3522 non-null   float64
 1   (AMZN, High)       3522 non-null   float64
 2   (AMZN, Low)        3522 non-null   float64
 3   (AMZN, Close)      3522 non-null   float64
 4   (AMZN, Adj Close)  3522 non-null   float64
 5   (AMZN, Volume)     3522 non-null   int64  
 6   (GOOG, Open)       3522 non-null   float64
 7   (GOOG, High)       3522 non-null   float64
 8   (GOOG, Low)        3522 non-null   float64
 9   (GOOG, Close)      3522 non-null   float64
 10  (GOOG, Adj Close)  3522 non-null   float64
 11  (GOOG, Volume)     3522 non-null   int64  
 12  (TSLA, Open)       3400 non-null   float64
 13  (TSLA, High)       3400 non-null   float64
 14  (TSLA, Low)        3400 non-null   float64
 15  (TSLA, Close)      3400 non-null   float64
 16  (TSLA,

## Textual data
####  News Articles, Financial Reports, Social Media

##### CNBC Articles 

##### https://console.apify.com/ 

Utilized the CNBC Scraper, to scrape 10k articles, with its authors, date, text, title, description, keywords, and text

In [15]:
from apify_client import ApifyClient
import pandas as pd

# Initialize the ApifyClient with your API token
client = ApifyClient("apify_api_xb8UK7il1cUlnAtaXXhmbIE59y9V6t3NFKmZ")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://www.cnbc.com/world/?region=world" }],
    "maxArticlesPerCrawl": 10000,
    "onlyNewArticlesPerDomain": False,
}

# Run the Actor and wait for it to finish
run = client.actor("Z7CFOUx1GWDlcF7nO").call(run_input=run_input)
# Fetch results from the run's dataset
items = client.dataset(run["defaultDatasetId"]).iterate_items()
# Convert pandas DataFrame
df_text = pd.DataFrame(items)

df_text = df_text[['url', 'softTitle','title','date', 'author', 'description', 'keywords', 'text']]

df_text.head()

Unnamed: 0,url,softTitle,title,date,author,description,keywords,text
0,https://www.cnbc.com/select/best-balance-trans...,11 best balance transfer cards with 0% APR of ...,11 best balance transfer cards with 0% APR of ...,2019-10-15T05:00:00.000Z,"[https://www.facebook.com/CNBC, Jason Stauffer...",We analyzed 101 balance transfer cards using a...,"Approved for Apple,Select_Cards,Select_Monetiz...",Who's this for: The Citi Simplicity® Card has ...
1,https://www.cnbc.com/select/the-best-credit-ca...,The best credit cards for building credit of J...,The best credit cards for building credit of J...,2019-10-29T15:24:00.000Z,"[https://www.facebook.com/CNBC, Benji Stawski,...",We analyzed 29 credit cards that are marketed ...,"Select: Credit Cards,Select_Monetized,Credit c...",If you lack a credit history or have poor cred...
2,https://www.cnbc.com/select/best-cash-back-cre...,The best cash-back credit cards of January 2024,The best cash-back credit cards of January 2024,2019-10-15T05:00:00.000Z,"[https://www.facebook.com/CNBC, Alexandria Whi...",We analyzed 50 of the most popular cash-back c...,"Select: Credit Cards,Consumer spending,Persona...",Who's this for? The Citi Double Cash® Card is ...
3,https://www.cnbc.com/select/personal-loan-lend...,Do you need a large personal loan? These lende...,Do you need a large personal loan? These lende...,2022-05-06T19:12:25.000Z,"[https://www.facebook.com/CNBC, Jasmin Suknanan]",Select analyzed key factors like interest rate...,"Select_Monetized,Consumer spending,Personal fi...",Personal loans are a common way to pay for lar...
4,https://www.cnbc.com/select/best-credit-card-s...,The best credit card sign-up bonuses of Januar...,The best credit card sign-up bonuses of Januar...,2019-11-12T19:30:46.000Z,"[https://www.facebook.com/CNBC, Elizabeth Grav...",We analyzed the most popular credit cards avai...,"Select: Credit Cards,Consumer spending,Persona...",Terms apply to American Express benefits and o...


In [20]:
df_text.to_csv('data/textual/df_text.csv')