# POC

Methodology for POC:

1. Choose "AAPL".
2. Scrape news (from Yahoo Finance), for Apple stock via scrapling
3. Use ChatGPT to determine the sentiment of each news on scale of 1 to 5:

- 1 -> Negative
- 2 -> Moderate negative
- 3 -> Neutral
- 4 -> Moderate positive
- 5 -> Positive

4. Use ChatGPT to determine the stock that has highest cointegration with Apple

5. Use Selenium to load more news article


# Table of Contents

1. [Select Stocks](#select-stocks)
2. [Extract News](#extract-news)
3. [Perplexity Results](#perplexity-results-gpt-4o)
   - 3.1 [Prompt](#prompt)
   - 3.2 [Response](#response)
   - 3.3 [Cointegration](#cointegration)
   - 3.4 [Correlation](#correlation)
4. [Selenium](#selenium)
5. [Playwright](#playwright)


In [98]:
import pandas as pd
import random
import sys
from pathlib import Path
import scrapling
from scrapling import Fetcher
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
from bs4 import BeautifulSoup
import re
from datetime import datetime, timedelta
import yfinance as yf
import numpy as np

repo_dir = Path.cwd().as_posix()

if repo_dir not in sys.path:
    sys.path.append(repo_dir)

%load_ext autoreload
%autoreload 2

from src.utils import utils, yahoo_utils
from src.gen_data import GenData
from src.sentiment import SentimentRater
from src.cointegrate import CoIntegrate


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# [Select Stocks](#table-of-contents)

- Test "AAPL", "NVDA", and "PG" for POC.
- If time permits, then test the randomly selected stocks for each GICS Sector: ['PWR', 'INCY', 'CDW', 'CEG', 'BAC', 'LIN', 'BWA', 'ARE', 'CHTR', 'KDP', 'PSX']


In [33]:
stock_list = utils.gen_stock_list()
stock_list

{'Industrials': 'PWR',
 'Health Care': 'INCY',
 'Information Technology': 'CDW',
 'Utilities': 'CEG',
 'Financials': 'BAC',
 'Materials': 'LIN',
 'Consumer Discretionary': 'BWA',
 'Real Estate': 'ARE',
 'Communication Services': 'CHTR',
 'Consumer Staples': 'KDP',
 'Energy': 'PSX'}

In [36]:
utils.get_gics_sector(
    ["AAPL", "MSFT", "NVDA", "GOOGL", "META", "XOM", "CRM", "HD", "PG"]
)

{'AAPL': 'Information Technology',
 'MSFT': 'Information Technology',
 'NVDA': 'Information Technology',
 'GOOGL': 'Communication Services',
 'META': 'Communication Services',
 'XOM': 'Energy',
 'CRM': 'Information Technology',
 'HD': 'Consumer Discretionary',
 'PG': 'Consumer Staples'}

# [Extract News](#table-of-contents)

- https://finance.yahoo.com/quote/AAPL/news/
- Scrapling aren't able to web scrap fully since the news are loaded when user scroll to the end of screen.


In [98]:
# Extract news related to AAPL from Yahoo Finance
news_info = yahoo_utils.extract_news_briefs(
    "https://finance.yahoo.com/quote/AAPL/news/"
)
news_info

[2025-03-08 21:01:29] INFO: Fetched (200) <GET https://finance.yahoo.com/quote/AAPL/news/> (referer: https://www.google.com/search?q=yahoo)


[{'publisher': 'Motley Fool',
  'period': datetime.timedelta(seconds=7200),
  'title': 'Could Apple Be a Millionaire-Maker Stock?',
  'content': 'During its rise to become a dominant tech enterprise, Apple (NASDAQ: AAPL) has certainly made its early investors some serious money. In the past two decades, shares have generated a total return of greater than 18,000%, boosted by the introduction of popular hardware devices and services. Could Apple be a millionaire-maker stock?'},
 {'publisher': 'Insider Monkey',
  'period': datetime.timedelta(seconds=43200),
  'title': 'Apple Inc. (AAPL) Unveils AI-Powered MacBook Air with M4 Chip',
  'content': 'We recently compiled a list of the 10 Hot AI News Updates Analysts are Monitoring. In this article, we are going to take a look at where Apple Inc. (NASDAQ:AAPL) stands against the other AI stocks. According to Goldman Sachs, the AI-powered rally in Chinese equities could boost other markets as well. Consequently, the firm has […]'},
 {'publisher

# [Perplexity Results (GPT-4o)](#table-of-contents)


## [Prompt](#table-of-contents)

The list of dictionaries are attached with the following prompt in Perplexity:

```
You are a stock analysis expert.

I've attached list of dictionaries containing news info specifically:

1. publisher = Source where news are generated
2. period = Amount of time lapsed after news release
3. title = News title
4. content = News content in brief

Each dictionary is supposed to be related to Apple stock.

Your task:
1. Review the 'title' and 'content' keys in each dictionary to access the sentiment related to Apple stock.
2. Use ONLY following rating:

    1 = Negative
    2 = Moderate negative
    3 = Neutral
    4 = Moderate positive
    5 = Positive

3. If the news are not related to a Apple stock, please give a rating of 3.
4. Please review all dictionary items in the attached list and provide the corresponding rating.
5. After reviewing all dictionary items, kindly provide an overall sentiment rating for Apple stock.
```


## [Response](#table-of-contents)

Response from Perplexity (ChatGPT 4-o):

```
Sentiment Ratings for Each News Item
Title: "Could Apple Be a Millionaire-Maker Stock?"
Content: Highlights Apple's historical success and potential for future gains.
Rating: 5 (Positive)

Title: "Apple Inc. (AAPL) Unveils AI-Powered MacBook Air with M4 Chip"
Content: Discusses the launch of an innovative product, indicating progress in AI.
Rating: 5 (Positive)

Title: "Apple (AAPL) Beats Stock Market Upswing: What Investors Need to Know"
Content: Reports a positive trading session for Apple stock.
Rating: 4 (Moderate positive)

Title: "Apple Delays Siri Upgrade Indefinitely as AI Concerns Escalate"
Content: Focuses on delays in Apple's AI projects, raising concerns.
Rating: 2 (Moderate negative)

Title: "Apple Is Pushing Back AI-Driven Siri Features to 2026, Report Says"
Content: Similar to the previous item, emphasizes delays in Siri enhancements.
Rating: 2 (Moderate negative)

Title: "Apple Says Some AI-Powered Enhancements to Siri to Be Delayed"
Content: Mentions delays in Siri updates amid slumping iPhone sales.
Rating: 2 (Moderate negative)

Title: "Sector Update: Tech Stocks Advance Late Afternoon"
Content: General update on tech stocks without specific focus on Apple.
Rating: 3 (Neutral)

Title: "Magnificent Seven Stocks: Nvidia Reverses Higher; Amazon, Meta, Tesla Slide"
Content: Discusses multiple tech stocks, with a brief mention of Apple’s mixed performance in 2025.
Rating: 3 (Neutral)

Title: "Apple says some AI improvements to Siri delayed to 2026"
Content: Reiterates delays in Siri's AI features, consistent with earlier reports.
Rating: 2 (Moderate negative)

Title: "Apple's First Foldable iPhone Could Launch in 2026 With Premium Pricing and AI Features"
Content: Speculates on a high-priced foldable iPhone with advanced features, suggesting innovation but no immediate impact.
Rating: 4 (Moderate positive)

Title: "Apple Inc. (AAPL): Jim Cramer Cautions – ‘I Could See the Stock Going Lower!’"
Content: Warns of potential downside for Apple stock, reflecting negativity.
Rating: 1 (Negative)

Title: "Apple Secures Local Content Certifications in Indonesia but Faces Additional Regulatory Hurdles"
Content: Notes progress in regulatory compliance but highlights challenges ahead.
Rating: 3 (Neutral)

Title: "Is Apple Inc. (AAPL) Among the Best Warren Buffett Stock to Buy Right Now?"
Content: Discusses Apple's position as a strong investment option under Warren Buffett’s philosophy, implying positivity.
Rating: 4 (Moderate positive)

Title: "The Mag 7 Stocks Are No Longer Magnificent. Where the Charts Say They Go Next."
Content: Broad commentary on tech stocks, including Apple, with no clear sentiment focus on Apple itself.
Rating: 3 (Neutral)

Title: "Broadcom Shares Jump as AI Growth Fuels Upbeat Forecast"
Content: Primarily about Broadcom, with indirect relevance to Apple as a client company.
Rating: 3 (Neutral)

Title: "Major Breakthrough: New Certificates Propel Apple iPhone 16 into Indonesia"
Content: Highlights progress in regulatory certification for iPhone sales in Indonesia, indicating positive developments for Apple’s market expansion.
Rating: 4 (Moderate positive)

Title: "In Rotten Market, Top Funds Harvest Apple Stock — And 10 Others"
Content: Indicates institutional confidence in Apple stock despite broader market challenges, reflecting positivity.
Rating: 4 (Moderate positive)

Title: "Apple's iPhone 16 Still on Hold in Indonesia Despite Key Certification Win"
Content: Acknowledges regulatory progress but emphasizes ongoing hurdles for product launch in Indonesia.
Rating: 3 (Neutral)

Title: "Cloud AI Today - Cirrascale Launches Inference Cloud For Scalable AI Integration"
Content: Not related to Apple stock specifically but mentions general AI developments in the industry.
Rating: 3 (Neutral)

Title: "38% of Berkshire Hathaway's Portfolio Is Invested in These 3 Unstoppable Dividend Stocks"
Content: General commentary on Warren Buffett’s portfolio with no specific focus on Apple stock sentiment.
Rating: 3 (Neutral)

Overall Sentiment Rating for Apple Stock
The individual ratings suggest a mix of sentiments:

Positive/Moderate Positive Ratings (5 or 4): 8 items

Neutral Ratings (3): 8 items

Negative/Moderate Negative Ratings (1 or 2): 4 items

The overall sentiment leans toward moderately positive for Apple stock, reflecting optimism tempered by some concerns over delays and market challenges.

Final Overall Sentiment Rating:
4 (Moderate positive)
```


## [Cointegration](#table-of-contents)

ChatGPT is not able to provide the stock with highest cointegration value with Apple stock:

```
The query about which S&P 500 stock has the highest cointegration value with Apple stock cannot be answered directly using the provided search results. Cointegration analysis requires statistical computations based on historical price data of Apple stock and other S&P 500 stocks. This involves time-series analysis to determine the long-term equilibrium relationship between Apple and other stocks.

If you have access to historical price data for Apple and other S&P 500 stocks, you can calculate cointegration values using statistical software or programming languages such as Python or R. The process typically involves:

Collecting Data: Obtain historical price data for Apple (AAPL) and other S&P 500 stocks over a specific period.

Stationarity Testing: Use tests like the Augmented Dickey-Fuller (ADF) test to ensure the time series are stationary.

Cointegration Testing: Apply tests such as the Engle-Granger two-step method or Johansen test to identify stocks with high cointegration values with Apple.

If you provide relevant price data or specify tools available, I can guide you further on how to perform this analysis.
```


## [Correlation](#table-of-contents)

ChatGPT 4-o stated Microsoft has the highest degree of correlation:

```
The stock most correlated with Apple (AAPL) among S&P 500 companies is Microsoft (MSFT). Historically, Apple and Microsoft have shown a high degree of correlation due to their positions as leading technology companies and their exposure to similar market dynamics, such as consumer electronics, cloud computing, and software services.

While specific numerical correlation values for S&P 500 stocks are not provided in the search results, Microsoft is widely recognized as one of Apple’s closest peers in terms of performance trends and market behavior. If you need precise correlation coefficients, you would need to calculate them using historical price data for Apple and other S&P 500 stocks over a specific period. Tools like Python, R, or financial platforms can help generate a correlation matrix for this purpose.
```


# [Selenium](#table-of-contents)

- Selenium is not suitable as it occasionally run into ReadTimeoutError causing the code to break. Hence we are not able to perform scrolling to required level.
- Running in Jupyter Notebook causes timeout error due to async operation of Jupyter Notebook.


In [None]:
url = "https://finance.yahoo.com/quote/AAPL/news/"
scroll_limit = 5

driver = webdriver.Chrome()
driver.get(url)

# Locate <a> tag and navigate to its parent <div>
parent_div = driver.find_elements(By.XPATH, "//div[@class='content yf-82qtw3']")

# parent_div.get_attribute("class")

# # Print the class attribute of the parent <div>
# title_element = parent_div.find_element(By.XPATH, ".//h3[contains(@class, 'yf-82qtw3)]")
# title_element.text

ReadTimeoutError: HTTPConnectionPool(host='localhost', port=37797): Read timed out. (read timeout=120)

# [Playwright](#table-of-contents)


In [None]:
<div class="" content yf-82qtw3><a class="subtle-link fin-size-small titles noUnderline yf-1xqzjha"
        data-ylk="elm:hdln;elmt:link;itc:0;ct:story;slk:Shareholders%20are%20showing%20signs%20of%20DEI%20fatigue%20as%20activists%20push%20for%20more%20votes;sec:qsp-news;subsec:all;cpos:1;g:a42c9cba-3739-4dae-8402-e5acaa0ad944"
        href="https://finance.yahoo.com/news/shareholders-are-showing-signs-of-dei-fatigue-as-activists-push-for-more-votes-140136546.html"
        aria-label="Shareholders are showing signs of DEI fatigue as activists push for more votes"
        title="Shareholders are showing signs of DEI fatigue as activists push for more votes" data-rapid_p="23"
        data-v9y="1">
        <h3 class="clamp  yf-82qtw3">Shareholders are showing signs of DEI fatigue as activists push for more votes</h3>
        <p class="clamp  yf-82qtw3">Champions and critics of DEI policies are again pushing American companies to either
            bolster or diminish their diversity policies. But so far this year, none of the proposals are garnering
            shareholder support.</p>
    </a>
    <div class="footer yf-82qtw3">
        <div class="publishing yf-1weyqlp">Yahoo Finance <i aria-hidden="true" data-svelte-h="svelte-sg6bkb">•</i> 13
            hours ago</div>
        <div class="taxonomy-links yf-13tszyc"><a data-testid="ticker-container"
                class="ticker x-small hover2 border streaming yf-1fqyif7" aria-label="DE"
                data-ylk="elm:qte;elmt:link;itc:0;sec:qsp-news;slk:DE;subsec:all;cpos:1" href="/quote/DE/" title="DE"
                data-rapid_p="24" data-v9y="1">
                <div class="name yf-1fqyif7"><span class="symbol yf-1fqyif7">DE </span> </div> <fin-streamer
                    class="percentChange yf-1fqyif7" data-symbol="DE" data-field="regularMarketChangePercent"
                    data-trend="txt" data-pricehint="2" data-tstyle="default" active=""><span
                        class="d60f3b00 c4af00a5">+3.48%</span></fin-streamer>
            </a> <a data-testid="ticker-container" class="ticker x-small hover2 border streaming yf-1fqyif7"
                aria-label="COST" data-ylk="elm:qte;elmt:link;itc:0;sec:qsp-news;slk:COST;subsec:all;cpos:1"
                href="/quote/COST/" title="COST" data-rapid_p="25" data-v9y="1">
                <div class="name yf-1fqyif7"><span class="symbol yf-1fqyif7">COST </span> </div> <fin-streamer
                    class="percentChange yf-1fqyif7" data-symbol="COST" data-field="regularMarketChangePercent"
                    data-trend="txt" data-pricehint="2" data-tstyle="default" active=""><span
                        class="d60f3b00 d5882186">-6.07%</span></fin-streamer>
            </a> <a data-testid="ticker-container" class="ticker x-small hover2 border streaming yf-1fqyif7"
                aria-label="AAPL" data-ylk="elm:qte;elmt:link;itc:0;sec:qsp-news;slk:AAPL;subsec:all;cpos:1"
                href="/quote/AAPL/" title="AAPL" data-rapid_p="26" data-v9y="1">
                <div class="name yf-1fqyif7"><span class="symbol yf-1fqyif7">AAPL </span> </div> <fin-streamer
                    class="percentChange yf-1fqyif7" data-symbol="AAPL" data-field="regularMarketChangePercent"
                    data-trend="txt" data-pricehint="2" data-tstyle="default" active=""><span
                        class="d60f3b00 c4af00a5">+1.59%</span></fin-streamer>
            </a> </div>
    </div>
</div>

In [2]:
df = pd.read_csv("./data/sentiment.csv")
df

Unnamed: 0,pub_date,ticker,publisher,period,title,content,prosusai,yiyanghkust,ziweichen,aventiq_ai
0,2025-03-11 22:10:00,AAPL,Yahoo Finance,1 hour ago,Trump's DOJ wants a Google breakup but is will...,President Trump and his predecessor Joe Biden ...,3,3,3,4
1,2025-03-11 05:10:00,AAPL,Yahoo Finance,18 hours ago,"Nvidia, Tesla slammed as 'Magnificent 7' names...","The ""Magnificent Seven"" stocks were at the cen...",1,1,1,1
2,2025-03-11 22:44:00,AAPL,Barrons.com,26 minutes ago,"These Stocks Are Moving the Most Today: Tesla,...",Tesla stock dips after tumbling 15% in Monday’...,1,1,1,1
3,2025-03-11 22:10:00,AAPL,TipRanks,1 hour ago,"DOJ ‘not calling off the dogs’ on Google, says...",After Google (GOOGL) and the Department of Jus...,2,3,3,1
4,2025-03-11 22:10:00,AAPL,Benzinga,1 hour ago,Is The Data Center REIT Honeymoon Period Over?,Data center REITs generated solid profits for ...,4,5,3,1
...,...,...,...,...,...,...,...,...,...,...
475,2024-12-11 23:11:00,PG,Reuters,3 months ago,"Exclusive-As tariffs loom, Gillette-razor make...",Procter & Gamble has overhauled its supply cha...,5,3,3,5
476,2024-12-11 23:11:00,PG,Zacks,3 months ago,Procter & Gamble (PG) Stock Slides as Market R...,"In the most recent trading session, Procter & ...",1,3,3,1
477,2024-12-11 23:11:00,PG,Motley Fool,3 months ago,3 Dividend King Stocks Near All-Time Highs Tha...,Paying up for quality companies can be a winni...,5,5,3,5
478,2024-12-11 23:11:00,PG,Business Wire,3 months ago,Braun IPLs Light Up Group Chats with a Major D...,"BOSTON, December 03, 2024--Group chats; love t...",3,3,3,3


In [None]:
tickers = ["AAPL", "META", "GOOGL", "NVDA"]

df1 = yf.download(
    tickers, start="2025-01-01", end=None, group_by="ticker", rounding=True
)
df1

[*********************100%***********************]  4 of 4 completed


Ticker,META,META,META,META,META,AAPL,AAPL,AAPL,AAPL,AAPL,GOOGL,GOOGL,GOOGL,GOOGL,GOOGL,NVDA,NVDA,NVDA,NVDA,NVDA
Price,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2
2025-01-02,589.72,604.91,587.82,599.24,12682300,248.66,248.83,241.55,243.58,55740700,190.43,191.78,187.28,189.21,20370800,136.0,138.88,134.63,138.31,198247200
2025-01-03,604.76,609.5,596.41,604.63,11436800,243.09,243.91,241.62,243.09,40244100,191.15,192.99,189.76,191.57,18596200,140.01,144.9,139.73,144.47,229322500
2025-01-06,611.83,630.99,605.62,630.2,14560800,244.04,247.06,242.93,244.73,45045600,193.76,197.99,193.63,196.64,29563600,148.59,152.16,147.82,149.43,265377400
2025-01-07,631.7,632.1,608.23,617.89,12071500,242.71,245.28,241.08,241.94,40856000,196.88,200.77,194.38,195.27,26487200,153.03,153.13,140.01,140.14,351782200
2025-01-08,613.4,616.44,602.79,610.72,10085800,241.65,243.44,239.79,242.43,37628900,192.35,196.06,192.16,193.73,24864800,142.58,143.95,137.56,140.11,227349900
2025-01-10,610.03,629.91,597.34,615.86,19256600,239.75,239.9,232.74,236.59,61710900,194.08,196.29,190.09,191.82,26665200,137.45,139.92,134.22,135.91,207602500
2025-01-13,607.1,608.97,598.45,608.33,10785000,233.27,234.41,229.47,234.14,49630700,189.85,190.96,187.14,190.79,21823700,129.99,133.49,129.51,133.23,204808900
2025-01-14,605.07,605.49,588.55,594.25,13598000,234.49,235.86,232.21,233.02,39435300,191.02,191.76,188.09,189.44,17174900,136.05,136.38,130.05,131.76,195590500
2025-01-15,610.21,622.46,606.5,617.12,15524300,234.38,238.7,234.17,237.61,39832000,192.87,196.13,191.64,195.33,21776000,133.65,136.45,131.29,136.24,185217300
2025-01-16,611.5,616.25,609.71,611.3,8355300,237.09,237.75,227.78,228.01,71759100,193.92,195.26,192.59,192.69,17815400,138.64,138.75,133.49,133.57,209235600


In [60]:
df1.stack(level=0, future_stack=True)

Unnamed: 0_level_0,Price,Open,High,Low,Close,Volume
Date,Ticker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2025-01-02,META,589.72,604.91,587.82,599.24,12682300
2025-01-02,AAPL,248.66,248.83,241.55,243.58,55740700
2025-01-02,GOOGL,190.43,191.78,187.28,189.21,20370800
2025-01-02,NVDA,136.00,138.88,134.63,138.31,198247200
2025-01-03,META,604.76,609.50,596.41,604.63,11436800
...,...,...,...,...,...,...
2025-03-10,NVDA,109.90,111.85,105.46,106.98,366487400
2025-03-11,META,595.05,614.93,593.33,605.71,17333300
2025-03-11,AAPL,223.81,225.84,217.45,220.84,76087700
2025-03-11,GOOGL,164.91,166.75,161.37,164.04,37720000


In [6]:
df = yf.download(tickers, period="5d", group_by="Ticker")
df

[*********************100%***********************]  4 of 4 completed


Ticker,GOOGL,GOOGL,GOOGL,GOOGL,GOOGL,AAPL,AAPL,AAPL,AAPL,AAPL,NVDA,NVDA,NVDA,NVDA,NVDA,META,META,META,META,META
Price,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2
2025-03-05,170.323845,173.580089,168.865517,172.820969,30954900,235.419998,236.550003,229.229996,235.740005,47227600,117.580002,118.279999,114.510002,117.300003,284337900,641.0,659.5,637.659973,656.469971,13770000
2025-03-06,170.33382,174.608895,170.303856,172.151733,28302000,234.440002,237.860001,233.160004,235.330002,45170400,113.529999,115.349998,110.220001,110.57,321181900,648.0,650.559998,624.109985,627.929993,13446500
2025-03-07,171.062988,174.768727,170.074137,173.660004,27385800,235.110001,241.369995,234.759995,239.070007,46273600,111.25,113.480003,107.559998,112.690002,341755500,625.369995,634.789978,600.609985,625.659973,21375700
2025-03-10,168.259995,168.460007,163.690002,165.869995,43604000,235.539993,236.160004,224.220001,227.479996,72071200,109.900002,111.849998,105.459999,106.980003,366487400,608.070007,610.109985,586.869995,597.98999,21999900
2025-03-11,164.910004,166.75,161.369995,164.039993,37720000,223.809998,225.839996,217.449997,220.839996,76087700,106.989998,112.239998,104.769997,108.760002,353708300,595.049988,614.929993,593.330017,605.710022,17333300


In [8]:
df.stack(level=0, future_stack=True)

Unnamed: 0_level_0,Price,Open,High,Low,Close,Volume
Date,Ticker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2025-03-05,GOOGL,170.323845,173.580089,168.865517,172.820969,30954900
2025-03-05,AAPL,235.419998,236.550003,229.229996,235.740005,47227600
2025-03-05,NVDA,117.580002,118.279999,114.510002,117.300003,284337900
2025-03-05,META,641.0,659.5,637.659973,656.469971,13770000
2025-03-06,GOOGL,170.33382,174.608895,170.303856,172.151733,28302000
2025-03-06,AAPL,234.440002,237.860001,233.160004,235.330002,45170400
2025-03-06,NVDA,113.529999,115.349998,110.220001,110.57,321181900
2025-03-06,META,648.0,650.559998,624.109985,627.929993,13446500
2025-03-07,GOOGL,171.062988,174.768727,170.074137,173.660004,27385800
2025-03-07,AAPL,235.110001,241.369995,234.759995,239.070007,46273600


In [48]:
a = (
    df.stack(level=0, future_stack=True)
    .rename_axis(["Date", "Ticker"])
    .reset_index(level=1)
)
a

Price,Ticker,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2025-03-05,GOOGL,170.323845,173.580089,168.865517,172.820969,30954900
2025-03-05,AAPL,235.419998,236.550003,229.229996,235.740005,47227600
2025-03-05,NVDA,117.580002,118.279999,114.510002,117.300003,284337900
2025-03-05,META,641.0,659.5,637.659973,656.469971,13770000
2025-03-06,GOOGL,170.33382,174.608895,170.303856,172.151733,28302000
2025-03-06,AAPL,234.440002,237.860001,233.160004,235.330002,45170400
2025-03-06,NVDA,113.529999,115.349998,110.220001,110.57,321181900
2025-03-06,META,648.0,650.559998,624.109985,627.929993,13446500
2025-03-07,GOOGL,171.062988,174.768727,170.074137,173.660004,27385800
2025-03-07,AAPL,235.110001,241.369995,234.759995,239.070007,46273600


In [None]:
a = a.reset_index().sort_values(by=["Ticker", "Date"], ascending=True).set_index("Date")
# a.columns.name = None
a

Price,Ticker,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2025-03-05,AAPL,235.419998,236.550003,229.229996,235.740005,47227600
2025-03-06,AAPL,234.440002,237.860001,233.160004,235.330002,45170400
2025-03-07,AAPL,235.110001,241.369995,234.759995,239.070007,46273600
2025-03-10,AAPL,235.539993,236.160004,224.220001,227.479996,72071200
2025-03-11,AAPL,223.809998,225.839996,217.449997,220.839996,76087700
2025-03-05,GOOGL,170.323845,173.580089,168.865517,172.820969,30954900
2025-03-06,GOOGL,170.33382,174.608895,170.303856,172.151733,28302000
2025-03-07,GOOGL,171.062988,174.768727,170.074137,173.660004,27385800
2025-03-10,GOOGL,168.259995,168.460007,163.690002,165.869995,43604000
2025-03-11,GOOGL,164.910004,166.75,161.369995,164.039993,37720000


In [50]:
a.columns.name = None
a

Unnamed: 0_level_0,Ticker,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2025-03-05,AAPL,235.419998,236.550003,229.229996,235.740005,47227600
2025-03-06,AAPL,234.440002,237.860001,233.160004,235.330002,45170400
2025-03-07,AAPL,235.110001,241.369995,234.759995,239.070007,46273600
2025-03-10,AAPL,235.539993,236.160004,224.220001,227.479996,72071200
2025-03-11,AAPL,223.809998,225.839996,217.449997,220.839996,76087700
2025-03-05,GOOGL,170.323845,173.580089,168.865517,172.820969,30954900
2025-03-06,GOOGL,170.33382,174.608895,170.303856,172.151733,28302000
2025-03-07,GOOGL,171.062988,174.768727,170.074137,173.660004,27385800
2025-03-10,GOOGL,168.259995,168.460007,163.690002,165.869995,43604000
2025-03-11,GOOGL,164.910004,166.75,161.369995,164.039993,37720000


In [14]:
ticker = yf.Ticker("AAPL")
ticker.news

[{'id': 'ee08c516-d812-48ab-9787-b73c34782bf7',
  'content': {'id': 'ee08c516-d812-48ab-9787-b73c34782bf7',
   'contentType': 'STORY',
   'title': "Wall Street's souring view on the US stock market isn't just about the economy — it's about AI too",
   'description': '',
   'summary': "As economic uncertainty has spawned a stock market sell-off, one of investors' favorite trades of the past two years is going through its own rerating of expectations.",
   'pubDate': '2025-03-11T17:44:49Z',
   'displayTime': '2025-03-11T17:44:49Z',
   'isHosted': True,
   'bypassModal': False,
   'previewUrl': None,
   'thumbnail': {'originalUrl': 'https://s.yimg.com/os/creatr-uploaded-images/2025-03/88bc1fd0-fb15-11ef-8dbd-ea4aea580f37',
    'originalWidth': 4500,
    'originalHeight': 3001,
    'caption': '',
    'resolutions': [{'url': 'https://s.yimg.com/uu/api/res/1.2/CAOKVOjTmaybJSdSciq_vQ--~B/aD0zMDAxO3c9NDUwMDthcHBpZD15dGFjaHlvbg--/https://s.yimg.com/os/creatr-uploaded-images/2025-03/88bc1fd0-fb1

In [18]:
ticker.history()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2025-02-12 00:00:00-05:00,231.199997,236.960007,230.679993,236.869995,45243300,0.0,0.0
2025-02-13 00:00:00-05:00,236.910004,242.339996,235.570007,241.529999,53614100,0.0,0.0
2025-02-14 00:00:00-05:00,241.25,245.550003,240.990005,244.600006,40896200,0.0,0.0
2025-02-18 00:00:00-05:00,244.149994,245.179993,241.839996,244.470001,48822500,0.0,0.0
2025-02-19 00:00:00-05:00,244.660004,246.009995,243.160004,244.869995,32204200,0.0,0.0
2025-02-20 00:00:00-05:00,244.940002,246.779999,244.289993,245.830002,32316900,0.0,0.0
2025-02-21 00:00:00-05:00,245.949997,248.690002,245.220001,245.550003,53197400,0.0,0.0
2025-02-24 00:00:00-05:00,244.929993,248.860001,244.419998,247.100006,51326400,0.0,0.0
2025-02-25 00:00:00-05:00,248.0,250.0,244.910004,247.039993,48013300,0.0,0.0
2025-02-26 00:00:00-05:00,244.330002,244.979996,239.130005,240.360001,44433600,0.0,0.0


In [19]:
tickers = yf.Tickers("MSFT AAPL GOOG")
tickers.history()

[*********************100%***********************]  3 of 3 completed


Price,Close,Close,Close,Dividends,Dividends,Dividends,High,High,High,Low,Low,Low,Open,Open,Open,Stock Splits,Stock Splits,Stock Splits,Volume,Volume,Volume
Ticker,AAPL,GOOG,MSFT,AAPL,GOOG,MSFT,AAPL,GOOG,MSFT,AAPL,...,MSFT,AAPL,GOOG,MSFT,AAPL,GOOG,MSFT,AAPL,GOOG,MSFT
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2025-02-12,236.869995,185.218979,408.221466,0.0,0.0,0.0,236.960007,186.617395,409.928036,230.679993,...,403.560798,231.199997,185.01921,406.395111,0.0,0.0,0.0,45243300,17632300,19121700
2025-02-13,241.529999,187.666199,409.718475,0.0,0.0,0.0,242.339996,187.776074,410.177546,235.570007,...,405.546817,236.910004,185.718406,406.185551,0.0,0.0,0.0,53614100,12729300,23891700
2025-02-14,244.600006,186.657349,407.612671,0.0,0.0,0.0,245.550003,187.935891,408.091721,240.990005,...,405.067786,241.25,186.617401,406.973967,0.0,0.0,0.0,40896200,12714200,22758500
2025-02-18,244.470001,185.588562,408.820282,0.0,0.0,0.0,245.179993,187.566304,409.778352,241.839996,...,405.686551,244.149994,187.226695,407.183549,0.0,0.0,0.0,48822500,19796000,21423100
2025-02-19,244.869995,186.917053,413.940002,0.0,0.0,0.0,246.009995,187.146787,414.658563,243.160004,...,406.834255,244.660004,185.973121,407.063806,0.0,0.0,0.0,32204200,13120500,24114200
2025-02-20,245.830002,186.427612,416.130005,0.0,0.0,0.83,246.779999,186.907062,419.309998,244.289993,...,412.540009,244.940002,186.287772,415.290009,0.0,0.0,0.0,32316900,12063800,23508700
2025-02-21,245.550003,181.373367,408.209991,0.0,0.0,0.0,248.690002,187.256664,418.049988,245.220001,...,407.890015,245.949997,187.076861,417.339996,0.0,0.0,0.0,53197400,19520800,27524800
2025-02-24,247.100006,180.98381,404.0,0.0,0.0,0.0,248.860001,184.879366,409.369995,244.419998,...,399.320007,244.929993,183.590841,408.51001,0.0,0.0,0.0,51326400,18734000,26443700
2025-02-25,247.039993,177.168152,397.899994,0.0,0.0,0.0,250.0,180.554293,401.920013,244.910004,...,396.700012,248.0,179.949986,401.100006,0.0,0.0,0.0,48013300,20832500,29387400
2025-02-26,240.360001,174.50119,399.730011,0.0,0.0,0.0,244.979996,177.877349,403.600006,239.130005,...,394.25,244.330002,176.743646,398.01001,0.0,0.0,0.0,44433600,23637200,19619000


In [33]:
a = np.arange(155)
batch_size = 20

for idx in range(0, len(a), batch_size):
    print(f"a[{idx} : {idx+batch_size}] : {a[idx : idx+batch_size]}")

a[0 : 20] : [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
a[20 : 40] : [20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]
a[40 : 60] : [40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59]
a[60 : 80] : [60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79]
a[80 : 100] : [80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]
a[100 : 120] : [100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
 118 119]
a[120 : 140] : [120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137
 138 139]
a[140 : 160] : [140 141 142 143 144 145 146 147 148 149 150 151 152 153 154]


In [73]:
co_integrate = CoIntegrate()
co_integrate.run()

batch 1 : ['MMM', 'AOS', 'ABT', 'ABBV', 'ACN', 'ADBE', 'AMD', 'AES', 'AFL', 'A']


[*********************100%***********************]  10 of 10 completed


batch 2 : ['APD', 'ABNB', 'AKAM', 'ALB', 'ARE', 'ALGN', 'ALLE', 'LNT', 'ALL', 'GOOGL']


[*********************100%***********************]  10 of 10 completed


batch 3 : ['GOOG', 'MO', 'AMZN', 'AMCR', 'AEE', 'AEP', 'AXP', 'AIG', 'AMT', 'AWK']


[*********************100%***********************]  10 of 10 completed


batch 4 : ['AMP', 'AME', 'AMGN', 'APH', 'ADI', 'ANSS', 'AON', 'APA', 'APO', 'AAPL']


[*********************100%***********************]  10 of 10 completed


batch 5 : ['AMAT', 'APTV', 'ACGL', 'ADM', 'ANET', 'AJG', 'AIZ', 'T', 'ATO', 'ADSK']


[*********************100%***********************]  9 of 10 completed


batch 6 : ['ADP', 'AZO', 'AVB', 'AVY', 'AXON', 'BKR', 'BALL', 'BAC', 'BAX', 'BDX']


[*********************100%***********************]  10 of 10 completed


batch 7 : ['BBY', 'TECH', 'BIIB', 'BLK', 'BX', 'BK', 'BA', 'BKNG', 'BWA', 'BSX']


[*********************100%***********************]  10 of 10 completed


batch 8 : ['BMY', 'AVGO', 'BR', 'BRO', 'BLDR', 'BG', 'BXP', 'CHRW', 'CDNS', 'CZR']


[*********************100%***********************]  10 of 10 completed


batch 9 : ['CPT', 'CPB', 'COF', 'CAH', 'KMX', 'CCL', 'CARR', 'CAT', 'CBOE', 'CBRE']


[*********************100%***********************]  9 of 10 completed


batch 10 : ['CDW', 'CE', 'COR', 'CNC', 'CNP', 'CF', 'CRL', 'SCHW', 'CHTR', 'CVX']


[*********************100%***********************]  10 of 10 completed


batch 11 : ['CMG', 'CB', 'CHD', 'CI', 'CINF', 'CTAS', 'CSCO', 'C', 'CFG', 'CLX']


[*********************100%***********************]  10 of 10 completed


batch 12 : ['CME', 'CMS', 'KO', 'CTSH', 'CL', 'CMCSA', 'CAG', 'COP', 'ED', 'STZ']


[*********************100%***********************]  10 of 10 completed


batch 13 : ['CEG', 'COO', 'CPRT', 'GLW', 'CPAY', 'CTVA', 'CSGP', 'COST', 'CTRA', 'CRWD']


[*********************100%***********************]  10 of 10 completed


batch 14 : ['CCI', 'CSX', 'CMI', 'CVS', 'DHR', 'DRI', 'DVA', 'DAY', 'DECK', 'DE']


[*********************100%***********************]  10 of 10 completed


batch 15 : ['DELL', 'DAL', 'DVN', 'DXCM', 'FANG', 'DLR', 'DFS', 'DG', 'DLTR', 'D']


[*********************100%***********************]  10 of 10 completed


batch 16 : ['DPZ', 'DOV', 'DOW', 'DHI', 'DTE', 'DUK', 'DD', 'EMN', 'ETN', 'EBAY']


[*********************100%***********************]  10 of 10 completed


batch 17 : ['ECL', 'EIX', 'EW', 'EA', 'ELV', 'EMR', 'ENPH', 'ETR', 'EOG', 'EPAM']


[*********************100%***********************]  9 of 10 completed


batch 18 : ['EQT', 'EFX', 'EQIX', 'EQR', 'ERIE', 'ESS', 'EL', 'EG', 'EVRG', 'ES']


[*********************100%***********************]  10 of 10 completed


batch 19 : ['EXC', 'EXPE', 'EXPD', 'EXR', 'XOM', 'FFIV', 'FDS', 'FICO', 'FAST', 'FRT']


[*********************100%***********************]  9 of 10 completed


batch 20 : ['FDX', 'FIS', 'FITB', 'FSLR', 'FE', 'FI', 'FMC', 'F', 'FTNT', 'FTV']


[*********************100%***********************]  10 of 10 completed


batch 21 : ['FOXA', 'FOX', 'BEN', 'FCX', 'GRMN', 'IT', 'GE', 'GEHC', 'GEV', 'GEN']


[*********************100%***********************]  10 of 10 completed


batch 22 : ['GNRC', 'GD', 'GIS', 'GM', 'GPC', 'GILD', 'GPN', 'GL', 'GDDY', 'GS']


[*********************100%***********************]  9 of 10 completed


batch 23 : ['HAL', 'HIG', 'HAS', 'HCA', 'DOC', 'HSIC', 'HSY', 'HES', 'HPE', 'HLT']


[*********************100%***********************]  10 of 10 completed


batch 24 : ['HOLX', 'HD', 'HON', 'HRL', 'HST', 'HWM', 'HPQ', 'HUBB', 'HUM', 'HBAN']


[*********************100%***********************]  10 of 10 completed


batch 25 : ['HII', 'IBM', 'IEX', 'IDXX', 'ITW', 'INCY', 'IR', 'PODD', 'INTC', 'ICE']


[*********************100%***********************]  10 of 10 completed


batch 26 : ['IFF', 'IP', 'IPG', 'INTU', 'ISRG', 'IVZ', 'INVH', 'IQV', 'IRM', 'JBHT']


[*********************100%***********************]  8 of 10 completed


batch 27 : ['JBL', 'JKHY', 'J', 'JNJ', 'JCI', 'JPM', 'JNPR', 'K', 'KVUE', 'KDP']


[*********************100%***********************]  10 of 10 completed


batch 28 : ['KEY', 'KEYS', 'KMB', 'KIM', 'KMI', 'KKR', 'KLAC', 'KHC', 'KR', 'LHX']


[*********************100%***********************]  10 of 10 completed


batch 29 : ['LH', 'LRCX', 'LW', 'LVS', 'LDOS', 'LEN', 'LII', 'LLY', 'LIN', 'LYV']


[*********************100%***********************]  10 of 10 completed


batch 30 : ['LKQ', 'LMT', 'L', 'LOW', 'LULU', 'LYB', 'MTB', 'MPC', 'MKTX', 'MAR']


[*********************100%***********************]  10 of 10 completed


batch 31 : ['MMC', 'MLM', 'MAS', 'MA', 'MTCH', 'MKC', 'MCD', 'MCK', 'MDT', 'MRK']


[*********************100%***********************]  10 of 10 completed


batch 32 : ['META', 'MET', 'MTD', 'MGM', 'MCHP', 'MU', 'MSFT', 'MAA', 'MRNA', 'MHK']


[*********************100%***********************]  10 of 10 completed


batch 33 : ['MOH', 'TAP', 'MDLZ', 'MPWR', 'MNST', 'MCO', 'MS', 'MOS', 'MSI', 'MSCI']


[*********************100%***********************]  10 of 10 completed


batch 34 : ['NDAQ', 'NTAP', 'NFLX', 'NEM', 'NWSA', 'NWS', 'NEE', 'NKE', 'NI', 'NDSN']


[*********************100%***********************]  10 of 10 completed


batch 35 : ['NSC', 'NTRS', 'NOC', 'NCLH', 'NRG', 'NUE', 'NVDA', 'NVR', 'NXPI', 'ORLY']


[*********************100%***********************]  10 of 10 completed


batch 36 : ['OXY', 'ODFL', 'OMC', 'ON', 'OKE', 'ORCL', 'OTIS', 'PCAR', 'PKG', 'PLTR']


[*********************100%***********************]  10 of 10 completed


batch 37 : ['PANW', 'PARA', 'PH', 'PAYX', 'PAYC', 'PYPL', 'PNR', 'PEP', 'PFE', 'PCG']


[*********************100%***********************]  10 of 10 completed


batch 38 : ['PM', 'PSX', 'PNW', 'PNC', 'POOL', 'PPG', 'PPL', 'PFG', 'PG', 'PGR']


[*********************100%***********************]  9 of 10 completed


batch 39 : ['PLD', 'PRU', 'PEG', 'PTC', 'PSA', 'PHM', 'PWR', 'QCOM', 'DGX', 'RL']


[*********************100%***********************]  10 of 10 completed


batch 40 : ['RJF', 'RTX', 'O', 'REG', 'REGN', 'RF', 'RSG', 'RMD', 'RVTY', 'ROK']


[*********************100%***********************]  10 of 10 completed


batch 41 : ['ROL', 'ROP', 'ROST', 'RCL', 'SPGI', 'CRM', 'SBAC', 'SLB', 'STX', 'SRE']


[*********************100%***********************]  9 of 10 completed


batch 42 : ['NOW', 'SHW', 'SPG', 'SWKS', 'SJM', 'SW', 'SNA', 'SOLV', 'SO', 'LUV']


[*********************100%***********************]  10 of 10 completed


batch 43 : ['SWK', 'SBUX', 'STT', 'STLD', 'STE', 'SYK', 'SMCI', 'SYF', 'SNPS', 'SYY']


[*********************100%***********************]  10 of 10 completed


batch 44 : ['TMUS', 'TROW', 'TTWO', 'TPR', 'TRGP', 'TGT', 'TEL', 'TDY', 'TFX', 'TER']


[*********************100%***********************]  10 of 10 completed


batch 45 : ['TSLA', 'TXN', 'TPL', 'TXT', 'TMO', 'TJX', 'TSCO', 'TT', 'TDG', 'TRV']


[*********************100%***********************]  10 of 10 completed


batch 46 : ['TRMB', 'TFC', 'TYL', 'TSN', 'USB', 'UBER', 'UDR', 'ULTA', 'UNP', 'UAL']


[*********************100%***********************]  9 of 10 completed


batch 47 : ['UPS', 'URI', 'UNH', 'UHS', 'VLO', 'VTR', 'VLTO', 'VRSN', 'VRSK', 'VZ']


[*********************100%***********************]  10 of 10 completed


batch 48 : ['VRTX', 'VTRS', 'VICI', 'V', 'VST', 'VMC', 'WRB', 'GWW', 'WAB', 'WBA']


[*********************100%***********************]  10 of 10 completed


batch 49 : ['WMT', 'DIS', 'WBD', 'WM', 'WAT', 'WEC', 'WFC', 'WELL', 'WST', 'WDC']


[*********************100%***********************]  10 of 10 completed


batch 50 : ['WY', 'WMB', 'WTW', 'WDAY', 'WYNN', 'XEL', 'XYL', 'YUM', 'ZBRA', 'ZBH']


[*********************100%***********************]  9 of 10 completed


batch 51 : ['ZTS']


[*********************100%***********************]  1 of 1 completed


len(df) :            Ticker    Open    High     Low   Close     Volume
Date                                                        
2020-01-02   ADBE  330.00  334.48  329.17  334.43  1990100.0
2020-01-03   ADBE  329.17  332.98  328.69  331.81  1577600.0
2020-01-06   ADBE  328.29  333.91  328.19  333.71  1874700.0
2020-01-07   ADBE  334.15  334.79  332.31  333.39  2500800.0
2020-01-08   ADBE  333.81  339.23  333.40  337.87  2248500.0
...           ...     ...     ...     ...     ...        ...
2025-03-05    WTW  331.94  336.28  331.94  334.50   629100.0
2025-03-06    WTW  333.03  333.16  328.89  330.43   636800.0
2025-03-07    WTW  330.46  332.05  327.14  330.89   463800.0
2025-03-10    WTW  330.50  335.87  327.52  332.00   508300.0
2025-03-11    WTW  333.52  334.19  325.22  326.52   578700.0

[653304 rows x 6 columns]


Unnamed: 0_level_0,Ticker,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-02,ADBE,330.00,334.48,329.17,334.43,1990100.0
2020-01-03,ADBE,329.17,332.98,328.69,331.81,1577600.0
2020-01-06,ADBE,328.29,333.91,328.19,333.71,1874700.0
2020-01-07,ADBE,334.15,334.79,332.31,333.39,2500800.0
2020-01-08,ADBE,333.81,339.23,333.40,337.87,2248500.0
...,...,...,...,...,...,...
2025-03-05,WTW,331.94,336.28,331.94,334.50,629100.0
2025-03-06,WTW,333.03,333.16,328.89,330.43,636800.0
2025-03-07,WTW,330.46,332.05,327.14,330.89,463800.0
2025-03-10,WTW,330.50,335.87,327.52,332.00,508300.0


[*********************************************************************************************************************500%***********************************************************************************************************************]  5 of 1 completed

In [83]:
df = pd.read_parquet("./data/ohlcv_5y.parquet")
df

Unnamed: 0_level_0,Ticker,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-02,ADBE,330.00,334.48,329.17,334.43,1990100.0
2020-01-03,ADBE,329.17,332.98,328.69,331.81,1577600.0
2020-01-06,ADBE,328.29,333.91,328.19,333.71,1874700.0
2020-01-07,ADBE,334.15,334.79,332.31,333.39,2500800.0
2020-01-08,ADBE,333.81,339.23,333.40,337.87,2248500.0
...,...,...,...,...,...,...
2025-03-05,WTW,331.94,336.28,331.94,334.50,629100.0
2025-03-06,WTW,333.03,333.16,328.89,330.43,636800.0
2025-03-07,WTW,330.46,332.05,327.14,330.89,463800.0
2025-03-10,WTW,330.50,335.87,327.52,332.00,508300.0


In [84]:
df.isna().sum()

Ticker       0
Open      6846
High      6846
Low       6846
Close     6846
Volume    6846
dtype: int64

In [87]:
df_nan = df.loc[df["Open"].isna(), :]
df_nan["Ticker"].unique()

array(['ABNB', 'CARR', 'CEG', 'GEV', 'GEHC', 'KVUE', 'OTIS', 'PLTR', 'SW',
       'SOLV', 'VLTO'], dtype=object)

In [None]:
a = df.loc[(df["Ticker"] == "CEG") & (~df["Open"].isna()), :]
a

Unnamed: 0_level_0,Ticker,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2022-01-19,CEG,36.88,41.73,36.88,40.76,129900.0
2022-01-20,CEG,39.74,46.94,39.74,43.67,153400.0
2022-01-21,CEG,41.73,48.48,41.68,43.67,329500.0
2022-01-24,CEG,43.59,43.67,40.64,41.54,110000.0
2022-01-25,CEG,41.25,43.59,40.11,40.66,23500.0
...,...,...,...,...,...,...
2025-03-05,CEG,230.22,233.10,223.00,228.89,3840200.0
2025-03-06,CEG,219.78,222.36,204.33,207.30,7710800.0
2025-03-07,CEG,208.10,216.14,203.10,212.54,4986200.0
2025-03-10,CEG,203.61,208.24,195.56,202.21,7689500.0


In [None]:
b = a.loc[(a["Open"] > 180) & (a["Open"] < 200), :]
b

Unnamed: 0_level_0,Ticker,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2024-03-07,CEG,180.51,180.81,175.01,178.36,1967800.0
2024-03-26,CEG,188.61,197.48,183.23,184.15,4228700.0
2024-03-27,CEG,186.80,188.83,181.10,183.63,2452100.0
2024-03-28,CEG,183.77,185.56,182.09,183.59,2050800.0
2024-04-01,CEG,184.76,187.53,182.15,184.71,1977500.0
...,...,...,...,...,...,...
2024-09-11,CEG,180.91,187.50,178.53,186.61,2846900.0
2024-09-12,CEG,186.36,191.02,183.56,189.28,2154900.0
2024-09-13,CEG,190.79,197.13,188.53,195.31,2613600.0
2024-09-16,CEG,194.33,198.85,192.03,198.57,2264000.0


In [112]:
b.index

DatetimeIndex(['2024-03-07', '2024-03-26', '2024-03-27', '2024-03-28',
               '2024-04-01', '2024-04-02', '2024-04-03', '2024-04-04',
               '2024-04-05', '2024-04-08', '2024-04-09', '2024-04-10',
               '2024-04-11', '2024-04-12', '2024-04-15', '2024-04-16',
               '2024-04-17', '2024-04-18', '2024-04-19', '2024-04-22',
               '2024-04-23', '2024-04-24', '2024-04-25', '2024-04-26',
               '2024-04-29', '2024-04-30', '2024-05-01', '2024-05-02',
               '2024-05-03', '2024-05-06', '2024-05-07', '2024-05-08',
               '2024-06-10', '2024-07-18', '2024-07-19', '2024-07-22',
               '2024-07-23', '2024-07-24', '2024-07-31', '2024-08-01',
               '2024-08-07', '2024-08-08', '2024-08-09', '2024-08-12',
               '2024-08-13', '2024-08-14', '2024-08-15', '2024-08-16',
               '2024-08-19', '2024-08-20', '2024-08-21', '2024-08-22',
               '2024-08-23', '2024-08-26', '2024-08-27', '2024-08-28',
      

In [113]:
a.loc[b.index, ["Open"]]

Unnamed: 0_level_0,Open
Date,Unnamed: 1_level_1
2024-03-07,180.51
2024-03-26,188.61
2024-03-27,186.80
2024-03-28,183.77
2024-04-01,184.76
...,...
2024-09-11,180.91
2024-09-12,186.36
2024-09-13,190.79
2024-09-16,194.33


In [114]:
len(a.index), len(b.index)

(788, 64)

In [126]:
c = set(a.index) & set(b.index)
a.loc[sorted(list(c)), :]

Unnamed: 0_level_0,Ticker,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2024-03-07,CEG,180.51,180.81,175.01,178.36,1967800.0
2024-03-26,CEG,188.61,197.48,183.23,184.15,4228700.0
2024-03-27,CEG,186.80,188.83,181.10,183.63,2452100.0
2024-03-28,CEG,183.77,185.56,182.09,183.59,2050800.0
2024-04-01,CEG,184.76,187.53,182.15,184.71,1977500.0
...,...,...,...,...,...,...
2024-09-11,CEG,180.91,187.50,178.53,186.61,2846900.0
2024-09-12,CEG,186.36,191.02,183.56,189.28,2154900.0
2024-09-13,CEG,190.79,197.13,188.53,195.31,2613600.0
2024-09-16,CEG,194.33,198.85,192.03,198.57,2264000.0


In [57]:
df.loc[df["Ticker"] == "BRK.B", :]

Unnamed: 0,Ticker,Open,High,Low,Close,Volume,Adj Close
93888,BRK.B,,,,,,
93889,BRK.B,,,,,,
93890,BRK.B,,,,,,
93891,BRK.B,,,,,,
93892,BRK.B,,,,,,
...,...,...,...,...,...,...,...
95187,BRK.B,,,,,,
95188,BRK.B,,,,,,
95189,BRK.B,,,,,,
95190,BRK.B,,,,,,


In [61]:
df["Adj Close"].value_counts()

Series([], Name: count, dtype: int64)

In [67]:
yf.download("BF.B", start="2020-01-01")

[*********************100%***********************]  1 of 1 completed

1 Failed download:
['BF.B']: YFPricesMissingError('possibly delisted; no price data found  (1d 2020-01-01 -> 2025-03-12)')


Price,Adj Close,Close,High,Low,Open,Volume
Ticker,BF.B,BF.B,BF.B,BF.B,BF.B,BF.B
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2


In [79]:
from itertools import combinations

a = ["a", "b", "c", "d", "e"]

for ticker1, ticker2 in combinations(a, 2):
    print(ticker1, ticker2)

a b
a c
a d
a e
b c
b d
b e
c d
c e
d e
