# Stock and News Data Collection Notebook

This notebook is designed to collect and save stock data and related news articles from various sources. The workflow includes:

1. **Data Collection**:
    - Using Alpha Vantage API to gather stock data and news articles.
    - Fetching news articles from Markets Insider.
    - Retrieving news articles from Google News.
    - Using Yahoo to gather daily stock data

2. **Data Saving**:
    - Saving the collected data into CSV files for further analysis.

3. **Date Range**:
    - The data is collected for the last 3 years from the current date.

4. **Stock Symbols**:
    - The analysis focuses on the following stock symbols: TSLA, AAPL, AMZN, NVDA, GS, BAC, and GME.

In [14]:
# Library imports

# Standard library imports
from datetime import datetime, timedelta

# Custom library imports
from nlp_scripts import data_collection as coll

# Enable auto-reload for modules during development
%load_ext autoreload
%autoreload 2

import warnings
warnings.filterwarnings("ignore")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [15]:
# List of stock symbols to analyze
stock_symbols = ["TSLA", "AAPL", "AMZN", "NVDA", "GS", "BAC", "GME"]
stock_names = ["Tesla", "Apple", "Amazon", "NVIDIA", "Goldman Sachs", "Bank of America", "GameStop"]

# Date range (Last 3 years from today)
#Start date: 20220406T1537
#End date: 20250405T1537
end_date = datetime.today().strftime('%Y%m%dT%H%M')
start_date = (datetime.today() - timedelta(days=3*365)).strftime('%Y%m%dT%H%M')

In [3]:
print (f"Start date: {start_date}")
print (f"End date: {end_date}")

Start date: 20220406T1537
End date: 20250405T1537


### ALPHA VANTAGE

In [None]:
# Getting stock data and news articles for the specified symbols and date range
stock_and_news_data_dict_alpha = coll.collect_data_alpha_vantage(stock_symbols, start_date, end_date)

In [None]:
# Saving the collected data to CSV files
for key, value in stock_and_news_data_dict_alpha.items():
    news_alpha = value['news']
    news_alpha.to_csv(f"../data/data_to_clean/news_{key}_alpha_vantage.csv", index=False)
    stocks_alpha = value['stocks']
    stocks_alpha.to_csv(f"../data/data_to_clean/stocks_{key}_alpha_vantage.csv", index=False)


### Markets insider

In [None]:
# Getting news data from Market Insider
news_data_dict_market = coll.get_news_from_markets_insider(stock_symbols)

In [None]:
# Saving the collected data to CSV files
for key, value in news_data_dict_market.items():
    news_market = value
    news_market.to_csv(f"../data/data_to_clean/news_{key}_markets_insider.csv", index=False)


### Google News

In [None]:
# Getting news articles from Google for the specified stock names and date range
news_data_dict_google = coll.get_google_news_articles(stock_names, start_date, end_date, number_of_articles=10)

In [None]:
# Saving the collected data to CSV files
for key, value in news_data_dict_google.items():
    news_google = value
    news_google.to_csv(f"../data/data_to_clean/news_{key}_google_search.csv", index=False)


### Yahoo

In [16]:
stock_data_dict = coll.get_yahoo_stock_data(stock_symbols, start_date, end_date)

  0%|          | 0/7 [00:00<?, ?it/s]

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


✅ Retrieved stock data for TSLA from Yahoo Finance
✅ Retrieved stock data for AAPL from Yahoo Finance
✅ Retrieved stock data for AMZN from Yahoo Finance


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

✅ Retrieved stock data for NVDA from Yahoo Finance
✅ Retrieved stock data for GS from Yahoo Finance
✅ Retrieved stock data for BAC from Yahoo Finance
✅ Retrieved stock data for GME from Yahoo Finance





In [17]:
stock_data_dict["TSLA"].head()

Price,Date,Close,High,Low,Open,Volume
Ticker,Unnamed: 1_level_1,TSLA,TSLA,TSLA,TSLA,TSLA
0,2022-05-09,262.369995,281.876678,260.383331,278.816681,90810300
1,2022-05-10,266.679993,275.119995,258.083344,273.103333,84401700
2,2022-05-11,244.666672,269.92334,242.399994,265.0,97224600
3,2022-05-12,242.666672,253.220001,226.666672,233.666672,140313000
4,2022-05-13,256.529999,262.450012,250.523331,257.82666,92150700


In [18]:
# Saving the collected data to CSV files
for key, value in stock_data_dict.items():
    stock_yahoo = value
    stock_yahoo.to_csv(f"../data/data_to_clean/stock_{key}_yahoo.csv", index=False)