# 13F Holdings Analysis: AI Stocks

This notebook analyzes institutional investor behavior in AI stocks through SEC 13F filings.

**Key questions:**
- Which hedge funds have the largest holdings in AI stocks?
- How has institutional ownership of AI stocks evolved over time?
- Which specific AI stocks do top funds favor?

## Setup

In [12]:
try:
    import pandas as pd
    from pathlib import Path
    from ai_stocks import AI_STOCKS, ALL_AI_TICKERS
    import json
    print('Packages and dependencies loaded successfully.')
except ImportError as e:
    print(f'Failed to load package or dependency: {e}')

Packages and dependencies loaded successfully.


## Load 13F Data

SEC 13F filings are distributed as bulk TSV files. Quarters are stored in directories like `data/raw/2023q4_form13f/` with two key files:
- `INFOTABLE.tsv` - holdings (CUSIP, value, shares)
- `COVERPAGE.tsv` - filer metadata (fund name, filing date)

First, we build a ticker lookup dictionary using the provided JSON file. From this, we create a lookup dictionary specifically for AI stock tickers.

In [13]:
DATA_DIR = Path("data/raw")

with open('data/reference/company-tickers.json', 'r') as ticker_map_file:
    sec_tickers = json.load(ticker_map_file)

    # the file is stored as a dictionary of numbers to mappings: 
    # {"0":{"cik_str":1045810,"ticker":"NVDA","title":"NVIDIA CORP"},"1":{"cik_str":320193,"ticker":"AAPL"}, ...}

    # build a dictionary from the company name to the ticker symbol
    name_to_ticker = {entry['title'].upper(): entry['ticker'] for entry in sec_tickers.values()}

# crucial for ticker matching that occurs in the later stages: reduce matching from ~10,000 entries to ~50
ai_name_to_ticker = {name: ticker for name, ticker in name_to_ticker.items() if ticker in ALL_AI_TICKERS}

ai_name_to_ticker

{'NVIDIA CORP': 'NVDA',
 'MICROSOFT CORP': 'MSFT',
 'AMAZON COM INC': 'AMZN',
 'BROADCOM INC.': 'AVGO',
 'META PLATFORMS, INC.': 'META',
 'TESLA, INC.': 'TSLA',
 'ORACLE CORP': 'ORCL',
 'PALANTIR TECHNOLOGIES INC.': 'PLTR',
 'ADVANCED MICRO DEVICES INC': 'AMD',
 'MICRON TECHNOLOGY INC': 'MU',
 'SALESFORCE, INC.': 'CRM',
 'LAM RESEARCH CORP': 'LRCX',
 'APPLIED MATERIALS INC /DE': 'AMAT',
 'INTEL CORP': 'INTC',
 'QUALCOMM INC/DE': 'QCOM',
 'ARISTA NETWORKS, INC.': 'ANET',
 'KLA CORP': 'KLAC',
 'SERVICENOW, INC.': 'NOW',
 'ADOBE INC.': 'ADBE',
 'DELL TECHNOLOGIES INC.': 'DELL',
 'SNOWFLAKE INC.': 'SNOW',
 'SUPER MICRO COMPUTER, INC.': 'SMCI',
 'DUOLINGO, INC.': 'DUOL',
 'C3.AI, INC.': 'AI'}

In [14]:
def list_quarters():
    """List available quarters of 13F data."""
    # extract the name of each directory
    quarters = [d.name for d in DATA_DIR.iterdir() if d.is_dir()]
    
    return sorted(quarters)

print('Available quarters:', list_quarters())

Available quarters: ['2013q2', '2013q3', '2013q4', '2014q1', '2014q2', '2014q3', '2014q4', '2015q1', '2015q2', '2015q3', '2015q4', '2016q1', '2016q2', '2016q3', '2016q4', '2017q1', '2017q2', '2017q3', '2017q4', '2018q1', '2018q2', '2018q3', '2018q4', '2019q1', '2019q2', '2019q3', '2019q4', '2020q1', '2020q2', '2020q3', '2020q4', '2021q1', '2021q2', '2021q3', '2021q4', '2022q1', '2022q2', '2022q3', '2022q4', '2023q1', '2023q2', '2023q3', '2023q4']


## Loading quarter data
It is natural to analyze the data in a quarterly fashion.  
We will only read in parts of the data relevant to our analysis. In particular:
- INFOTABLE.tsv: `ACCESSION_NUMBER`, `INFOTABLE_SK`, `NAMEOFISSUER`, `CUSIP`, `FIGI`, `VALUE`, `SSHPRNAMT`, `SSHPRNAMTTYPE`, `PUTCALL`
- COVERPAGE.tsv: `ACCESSION_NUMBER`, `REPORTCALENDARORQUARTER`, `FILINGMANAGER_NAME`

In [15]:
def load_quarter(quarter: str) -> tuple[pd.DataFrame, pd.DataFrame]:
    """
    Load holdings data for a specific quarter.
    
    Returns:
        holdings: INFOTABLE data
    """
    quarter_dir = DATA_DIR / quarter
    
    infotable_datatypes = {
        'ACCESSION_NUMBER': 'string',
        'INFOTABLE_SK': 'Int64',
        'NAMEOFISSUER': 'string',
        'CUSIP': 'string',
        'FIGI': 'string',
        'VALUE': 'Int64',
        'SSHPRNAMT': 'Int64',
        'SSHPRNAMTTYPE': 'string',
        'PUTCALL': 'string',
    }

    filings_datatypes = {
        'ACCESSION_NUMBER': 'string',
        'REPORTCALENDARORQUARTER': 'string',
        'FILINGMANAGER_NAME': 'string'
    }

    holdings = pd.read_table(
        quarter_dir / 'INFOTABLE.tsv',  
        dtype=infotable_datatypes,
        usecols=infotable_datatypes.keys()
    )

    filings = pd.read_table(
        quarter_dir / 'COVERPAGE.tsv',
        dtype=filings_datatypes,
        usecols=filings_datatypes.keys(),
        parse_dates=['REPORTCALENDARORQUARTER'],
        date_format='%d-%b-%Y'
    )
    
    return holdings, filings


# Load most recent quarter
quarters = list_quarters()
sample_holdings, sample_filings = load_quarter(quarters[-1])
print(f'Loaded {len(sample_holdings):,} holdings from {len(sample_filings):,} filers')

Loaded 2,886,468 holdings from 9,196 filers


### Inspect the data

In [29]:
sample_holdings.head().style.format(thousands=',', subset=['VALUE'])

Unnamed: 0,ACCESSION_NUMBER,INFOTABLE_SK,NAMEOFISSUER,CUSIP,FIGI,VALUE,SSHPRNAMT,SSHPRNAMTTYPE,PUTCALL,TICKER
0,0000051762-23-000005,94734396,ABBOTT LABORATORIES,002824100,BBG00KTDT9Q6,1515058,15643,SH,,
1,0000051762-23-000005,94734397,ABBVIE INC,00287Y109,BBG00KTDTBZ1,57330720,384615,SH,,
2,0000051762-23-000005,94734398,ADOBE INC,00724F101,BBG00GQ6RYG1,8060499,15808,SH,,ADBE
3,0000051762-23-000005,94734399,ADVANCED MICRO DEVICES INC,007903107,BBG00KTDTC25,2301934,22388,SH,,AMD
4,0000051762-23-000005,94734400,AES CORPORATION,00130H105,BBG00J9TZDP1,153596,10105,SH,,


In [None]:
sample_filings.head()

Unnamed: 0,ACCESSION_NUMBER,REPORTCALENDARORQUARTER,FILINGMANAGER_NAME
0,0000051762-23-000005,2023-09-30,RNC CAPITAL MANAGEMENT LLC
1,0001214659-23-016755,2023-09-30,"MayTech Global Investments, LLC"
2,0001398344-23-023299,2019-03-31,WEST PACES ADVISORS INC.
3,0001398344-23-023286,2017-03-31,WEST PACES ADVISORS INC.
4,0001398344-23-023293,2018-06-30,WEST PACES ADVISORS INC.


## Map Holdings to AI Stocks

13F filings identify securities by CUSIP. We'll match on issuer name to identify AI stocks.  
This is where the dictionary of AI stock tickers will prove useful.

In [None]:
def match_ai_stock_to_ticker(issuer_name: str) -> str | None:
    """Map issuer name to stock ticker."""
    if pd.isna(issuer_name):
        return None
    
    issuer_upper = issuer_name.upper()
    
    # prioritize an exact match over a substring match
    if issuer_upper in ai_name_to_ticker:
        return ai_name_to_ticker[issuer_upper]
    
    for company_name, ticker in ai_name_to_ticker.items():
        if issuer_upper in company_name or company_name in issuer_upper:
            return ticker
    
    return None

Now, we can use the `match_ai_stock_to_ticker` function to add an extra column to the DataFrame, corresponding to the stock ticker of the relevant AI stock or `None` if the stock indicated is not an AI stock.

In [None]:
def add_ai_ticker_to_dataframe(df: pd.DataFrame):
    """
    Adds the AI ticker column to the given dataframe.
    """

    if 'NAMEOFISSUER' not in df.columns.values:
        print('Error: issuer name not found in columns of dataframe.')
        raise LookupError
    
    df['TICKER'] = df['NAMEOFISSUER'].apply(match_ai_stock_to_ticker)

# Apply matcher
add_ai_ticker_to_dataframe(sample_holdings)

# Verify that structure has been changed
if 'TICKER' in sample_holdings.columns.values:
    print('Ticker column added to sample_holdings successfully.')
else:
    print('Error: Adding AI ticker to sample_holdings failed.')
    raise LookupError

Now that `sample_holdings` has been augmented, let us consider only the entries with AI stocks.

In [None]:
# Filter to AI stocks
ai_holdings = sample_holdings[sample_holdings['TICKER'].notna()]

print(f'Found {len(ai_holdings):,} filings related to AI holdings ({len(ai_holdings)/len(sample_holdings)*100:.1f}% of total)')

## Aggregate by Stock

We will first aggregate by stock to see which stocks have the highest value in terms of institutional investment value.

In [None]:
# Total institutional holdings by AI stock
agg_ai_holdings = ai_holdings.groupby('TICKER').agg({'VALUE': 'sum', 'NAMEOFISSUER': 'first', 'CUSIP': 'first'})

# Sort by dollar value
agg_ai_holdings.sort_values("VALUE", ascending=False, inplace=True)
agg_ai_holdings.head().style.format(thousands=',', subset=['VALUE'])

: 

## Summary

This analysis demonstrates:
- Loading SEC 13F bulk data and matching to AI stocks
- Identifying AI stocks with the highest institutional holdings

**Extensions:**
- Calculating funds with the greatest AI stock holdings during a given quarter
  - Tracking institutional ownership trends over time
- Calculating AI stock value as percentage of total portfolio value
- Analyzing behavior around key events for certain stocks (e.g. Gemini launch for GOOG)
- Visualizing trends