# 13F Holdings Analysis: AI Stocks

This notebook analyzes institutional investor behavior in AI stocks through SEC 13F filings.

**Key questions:**
- Which hedge funds have the largest holdings in AI stocks?
- How has institutional ownership of AI stocks evolved over time?
- Which specific AI stocks do top funds favor?

## Setup

In [None]:
try:
    import pandas as pd
    from pathlib import Path
    from ai_stocks import AI_STOCKS, ALL_AI_TICKERS
    import json
    print('Packages and dependencies loaded successfully.')
except ImportError as e:
    print(f'Failed to load package or dependency: {e}')

## Load 13F Data

SEC 13F filings are distributed as bulk TSV files. Quarters are stored in directories like `data/raw/2023q4_form13f/` with two key files:
- `INFOTABLE.tsv` - holdings (CUSIP, value, shares)
- `COVERPAGE.tsv` - filer metadata (fund name, filing date)

First, we build a ticker lookup dictionary using the provided JSON file. From this, we create a lookup dictionary specifically for AI stock tickers.

In [None]:
DATA_DIR = Path("data/raw")

with open('data/reference/company-tickers.json', 'r') as ticker_map_file:
    sec_tickers = json.load(ticker_map_file)

    # the file is stored as a dictionary of numbers to mappings: 
    # {"0":{"cik_str":1045810,"ticker":"NVDA","title":"NVIDIA CORP"},"1":{"cik_str":320193,"ticker":"AAPL"}, ...}

    # build a dictionary from the company name to the ticker symbol
    name_to_ticker = {entry['title'].upper(): entry['ticker'] for entry in sec_tickers.values()}

# crucial for ticker matching that occurs in the later stages: reduce matching from ~10,000 entries to ~50
ai_name_to_ticker = {name: ticker for name, ticker in name_to_ticker.items() if ticker in ALL_AI_TICKERS}

ai_name_to_ticker

In [None]:
def list_quarters():
    """List available quarters of 13F data."""
    # extract the name of each directory
    quarters = [d.name for d in DATA_DIR.iterdir() if d.is_dir()]
    
    return sorted(quarters)

print('Available quarters:', list_quarters())

In [None]:
def load_quarter(quarter: str) -> pd.DataFrame:
    """
    Load holdings data for a specific quarter.
    
    Returns:
        holdings: INFOTABLE data
    """
    quarter_dir = DATA_DIR / quarter
    
    holdings = pd.read_table(
        quarter_dir / 'INFOTABLE.tsv',  
        dtype={
            'ACCESSION_NUMBER': 'string',
            'INFOTABLE_SK': 'Int64',
            'NAMEOFISSUER': 'string',
            'CUSIP': 'string',
            'FIGI': 'string',
            'VALUE': 'Int64',
            'PUTCALL': 'string',
            'INVESTMENTDISCRETION': 'string',
            'OTHERMANAGER': 'string',
            'VOTING_AUTH_SOLE': 'Int64',
            'VOTING_AUTH_SHARED': 'Int64',
            'VOTING_AUTH_NONE': 'Int64',
        }
    )
    
    return holdings

In [None]:
# Load most recent quarter
quarters = list_quarters()
sample_holdings = load_quarter(quarters[-1])
print(f'Loaded {len(sample_holdings):,} holdings')

### Inspect the data

In [None]:
sample_holdings.head()

## Map Holdings to AI Stocks

13F filings identify securities by CUSIP. We'll match on issuer name to identify AI stocks.  
This is where the dictionary of AI stock tickers will prove useful.

In [None]:
def match_ai_stock_to_ticker(issuer_name: str) -> str | None:
    """Map issuer name to stock ticker."""
    if pd.isna(issuer_name):
        return None
    
    issuer_upper = issuer_name.upper()
    
    # prioritize an exact match over a substring match
    if issuer_upper in ai_name_to_ticker:
        return ai_name_to_ticker[issuer_upper]
    
    for company_name, ticker in ai_name_to_ticker.items():
        if issuer_upper in company_name or company_name in issuer_upper:
            return ticker
    
    return None

Now, we can use the `match_ai_stock_to_ticker` function to add an extra column to the DataFrame, corresponding to the stock ticker of the relevant AI stock or `None` if the stock indicated is not an AI stock.

In [None]:
def add_ai_ticker_to_dataframe(df: pd.DataFrame):
    """
    Adds the AI ticker column to the given dataframe.
    """

    if 'NAMEOFISSUER' not in df.columns.values:
        print('Error: issuer name not found in columns of dataframe.')
        raise LookupError
    
    df['TICKER'] = df['NAMEOFISSUER'].apply(match_ai_stock_to_ticker)

# Apply matcher
add_ai_ticker_to_dataframe(sample_holdings)

# Verify that structure has been changed
if 'TICKER' not in sample_holdings.columns.values:
    print('Error: Adding AI ticker to sample_holdings failed')
    raise LookupError

sample_holdings.head()

Now that `sample_holdings` has been augmented, let us consider only the entries with AI stocks.

In [None]:
# Filter to AI stocks
ai_holdings = sample_holdings[sample_holdings['TICKER'].notna()]

print(f'Found {len(ai_holdings):,} AI holdings ({len(ai_holdings)/len(sample_holdings)*100:.1f}% of total)')

## Aggregate by Stock

We will first aggregate by stock to see which stocks have the highest value in terms of institutional investment value.

In [None]:
# Total institutional holdings by AI stock
agg_ai_holdings = ai_holdings.groupby('TICKER').agg({'VALUE': 'sum', 'NAMEOFISSUER': 'first', 'CUSIP': 'first'})

# Sort by dollar value
agg_ai_holdings.sort_values("VALUE", ascending=False, inplace=True)
agg_ai_holdings.head().style.format(thousands=',')

## Summary

This analysis demonstrates:
- Loading SEC 13F bulk data and matching to AI stocks
- Identifying AI stocks with the highest institutional holdings

**Extensions:**
- Calculating funds with the greatest AI stock holdings during a given quarter
  - Tracking institutional ownership trends over time
- Calculating AI stock value as percentage of total portfolio value
- Analyzing behavior around key events for certain stocks (e.g. Gemini launch for GOOG)
- Visualizing trends