# 13F Holdings Analysis: AI Stocks

This notebook analyzes institutional investor behavior in AI stocks through SEC 13F filings.

**Key questions:**
- Which hedge funds have the largest holdings in AI stocks?
- How has institutional ownership of AI stocks evolved over time?
- Which specific AI stocks do top funds favor?

## Setup

In [None]:
try:
    import pandas as pd
    from pathlib import Path
    from ai_stocks import AI_STOCKS, ALL_AI_TICKERS
    import json
    print('Packages and dependencies loaded successfully.')
except ImportError as e:
    print(f'Failed to load package or dependency: {e}')

## Load 13F Data

SEC 13F filings are distributed as bulk TSV files. Quarters are stored in directories like `data/raw/2023q4_form13f/` with two key files:
- `INFOTABLE.tsv` - holdings (CUSIP, value, shares)
- `COVERPAGE.tsv` - filer metadata (fund name, filing date)

First, we build a ticker lookup dictionary using the provided JSON file. From this, we create a lookup dictionary specifically for AI stock tickers.

In [None]:
DATA_DIR = Path("data/raw")

with open('data/reference/company-tickers.json', 'r') as ticker_map_file:
    sec_tickers = json.load(ticker_map_file)

    # the file is stored as a dictionary of numbers to mappings: 
    # {"0":{"cik_str":1045810,"ticker":"NVDA","title":"NVIDIA CORP"},"1":{"cik_str":320193,"ticker":"AAPL"}, ...}

    # build a dictionary from the company name to the ticker symbol
    name_to_ticker = {entry['title'].upper(): entry['ticker'] for entry in sec_tickers.values()}

# crucial for ticker matching that occurs in the later stages: reduce matching from ~10,000 entries to ~50
ai_name_to_ticker = {name: ticker for name, ticker in name_to_ticker.items() if ticker in ALL_AI_TICKERS}

ai_name_to_ticker

In [None]:
def list_quarters():
    """List available quarters of 13F data."""
    # extract the name of each directory
    quarters = [d.name for d in DATA_DIR.iterdir() if d.is_dir()]
    
    return sorted(quarters)

print('Available quarters:', list_quarters())

## Loading quarter data
It is natural to analyze the data in a quarterly fashion.  
We will only read in parts of the data relevant to our analysis. In particular:
- INFOTABLE.tsv: `ACCESSION_NUMBER`, `INFOTABLE_SK`, `NAMEOFISSUER`, `CUSIP`, `FIGI`, `VALUE`, `SSHPRNAMT`, `SSHPRNAMTTYPE`, `PUTCALL`
- COVERPAGE.tsv: `ACCESSION_NUMBER`, `REPORTCALENDARORQUARTER`, `FILINGMANAGER_NAME`
- SUBMISSION.tsv: `ACCESSION_NUMBER`, `FILING_DATE`, `SUBMISSION_TYPE`, `PERIODOFREPORT`

In [None]:
def load_quarter(quarter: str) -> tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:
    """
    Load holdings data for a specific quarter.
    
    Returns:
        holdings: INFOTABLE data
    """
    quarter_dir = DATA_DIR / quarter
    
    infotable_datatypes = {
        'ACCESSION_NUMBER': 'string',
        'INFOTABLE_SK': 'Int64',
        'NAMEOFISSUER': 'string',
        'CUSIP': 'string',
        'FIGI': 'string',
        'VALUE': 'Int64',
        'SSHPRNAMT': 'Int64',
        'SSHPRNAMTTYPE': 'string',
        'PUTCALL': 'string',
    }

    filings_datatypes = {
        'ACCESSION_NUMBER': 'string',
        'REPORTCALENDARORQUARTER': 'string',
        'FILINGMANAGER_NAME': 'string'
    }

    submissions_datatypes = {
        'ACCESSION_NUMBER': 'string',
        'FILING_DATE': 'string',
        'SUBMISSIONTYPE': 'string',
        'PERIODOFREPORT': 'string'
    }

    # read tables and rename columns for readability

    holdings = pd.read_table(
        quarter_dir / 'INFOTABLE.tsv',  
        dtype=infotable_datatypes,
        usecols=infotable_datatypes.keys()
    ).rename(columns={'SSHPRNAMT': 'SHARES', 'NAMEOFISSUER': 'COMPANY_NAME'})

    filings = pd.read_table(
        quarter_dir / 'COVERPAGE.tsv',
        dtype=filings_datatypes,
        usecols=filings_datatypes.keys(),
        parse_dates=['REPORTCALENDARORQUARTER'],
        date_format='%d-%b-%Y'
    ).rename(columns={'FILINGMANAGER_NAME': 'MANAGER_NAME'})

    raw_submissions = pd.read_table(
        quarter_dir / 'SUBMISSION.tsv',
        dtype=submissions_datatypes,
        usecols=submissions_datatypes.keys(),
        parse_dates=['FILING_DATE', 'PERIODOFREPORT'],
        date_format='%d-%b-%Y'
    )

    # ignore notices and their amendments
    relevant_submissions = raw_submissions[raw_submissions['SUBMISSIONTYPE'].isin(['13F-HR', '13F-HR/A'])]

    holdings = pd.merge(holdings, relevant_submissions, on=['ACCESSION_NUMBER'])

    # create a mask to split into holding reports and amendments
    is_amendment : pd.Series[bool] = holdings['SUBMISSIONTYPE'] == '13F-HR/A'

    holdings_reports = holdings[~is_amendment]
    amendment_reports = holdings[is_amendment]
    
    return holdings_reports, filings, amendment_reports


# Load most recent quarter
quarters = list_quarters()
sample_holdings, sample_filings, sample_amendments = load_quarter(quarters[-1])
print(f'Loaded {len(sample_holdings):,} holdings and {len(sample_amendments):,} amendments from {len(sample_filings):,} filers')

### Inspect the data

In [None]:
sample_holdings.head().style.format(thousands=',', subset=['VALUE'])

In [None]:
sample_filings.head()

## Map Holdings to AI Stocks

13F filings identify securities by CUSIP. We'll match on issuer name to identify AI stocks.  
This is where the dictionary of AI stock tickers will prove useful.

In [None]:
def match_ai_stock_to_ticker(issuer_name: str) -> str | None:
    """Map issuer name to stock ticker."""
    if pd.isna(issuer_name):
        return None
    
    issuer_upper = issuer_name.upper()
    
    # prioritize an exact match over a substring match
    if issuer_upper in ai_name_to_ticker:
        return ai_name_to_ticker[issuer_upper]
    
    for company_name, ticker in ai_name_to_ticker.items():
        if issuer_upper in company_name or company_name in issuer_upper:
            return ticker
    
    return None

Now, we can use the `filter_to_ai_stocks` function to:
1. Add an extra column to the DataFrame, corresponding to the stock ticker of the relevant AI stock or `None` if the stock indicated is not an AI stock
2. Given the augmented `sample_holdings` table, filter to only the entries with AI stocks

In [None]:
def filter_to_ai_stocks(df: pd.DataFrame) -> pd.DataFrame:
    """
    Adds the AI ticker column to the given dataframe and returns a new 
    DataFrame representing only the entries related to AI stocks.
    """

    if 'COMPANY_NAME' not in df.columns.values:
        print('Error: company name not found in columns of dataframe.')
        raise LookupError
    
    df['TICKER'] = df['COMPANY_NAME'].apply(match_ai_stock_to_ticker)
    return df.dropna(subset=['TICKER'])

# Apply matcher and filter
sample_ai_holdings = filter_to_ai_stocks(sample_holdings)

print(f'Found {len(sample_ai_holdings):,} filings related to AI holdings ({len(sample_ai_holdings)/len(sample_holdings)*100:.1f}% of total)')

sample_ai_holdings.head().style.format(thousands=',', subset=['VALUE'])

## Aggregate by Stock

We will first aggregate by stock to see which stocks have the highest value in terms of institutional investment value.

In [None]:
# Total institutional holdings by AI stock
agg_ai_by_stock = sample_ai_holdings.groupby('TICKER').agg({'VALUE': 'sum', 'COMPANY_NAME': 'first', 'CUSIP': 'first'}).sort_values('VALUE', ascending=False)
agg_ai_by_stock.head().style.format(thousands=',', subset=['VALUE'])

## Aggregate holdings by fund
We can also understand which institutional investors are pursuing AI-related stocks the most. We will aggregate by filers.

In [None]:
agg_ai_by_fund = pd.merge(sample_ai_holdings, sample_filings, on=['ACCESSION_NUMBER']).groupby('MANAGER_NAME').agg({'VALUE': 'sum'}).sort_values('VALUE', ascending=False)
agg_ai_by_fund.head().style.format(thousands=',', subset=['VALUE'])

## Summary

This analysis demonstrates:
- Loading SEC 13F bulk data and matching to AI stocks
- Identifying AI stocks with the highest institutional holdings
- Calculating funds with the greatest AI stock holdings during a given quarter

**Extensions:**
- Tracking institutional ownership trends over time
- Calculating AI stock value as percentage of total portfolio value
- Analyzing behavior around key events for certain stocks (e.g. Gemini launch for GOOG)
- Visualizing trends