# SEC EDGAR Filing Collector Demo

This notebook demonstrates how to use the SEC EDGAR collector module to retrieve company filings from the SEC.

In [1]:
import sys
import pandas as pd
from datetime import datetime
from pathlib import Path

# Add the project root to the Python path
project_root = Path.cwd().parent.parent
if str(project_root) not in sys.path:
    sys.path.append(str(project_root))
project_root

PosixPath('/data/home/eak/learning/nganga_ai/tumkwe-invest/tumkwe-invest')

In [2]:
# Import the SEC EDGAR module
from tumkwe_invest.datacollection.collectors.sec_edgar import (
    get_cik_by_ticker,
    get_recent_filings,
    download_filing_document
)

## Check if SEC_USER_AGENT is Properly Configured

The SEC requires a valid user-agent for API requests.

In [3]:
import os

# Check SEC_USER_AGENT from the config
from tumkwe_invest.datacollection.config import SEC_USER_AGENT

if not SEC_USER_AGENT or SEC_USER_AGENT == "Your Name (your.email@example.com)":
    print("WARNING: SEC_USER_AGENT is not properly configured.")
    print("The SEC requires a valid user-agent with your name and email.")
    print("Please update the SEC_USER_AGENT in the config file.")
else:
    print(f"SEC_USER_AGENT is configured as: {SEC_USER_AGENT}")

SEC_USER_AGENT is configured as: TumkweInvest myname@example.com


## Get CIK Numbers for Companies

Let's retrieve the CIK (Central Index Key) numbers for some companies.

In [4]:
# Define ticker symbols for some popular companies
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'TSLA']

In [5]:
# Get CIK numbers for all tickers
cik_results = {}

for ticker in tickers:
    cik = get_cik_by_ticker(ticker)
    cik_results[ticker] = cik
    if cik:
        print(f"{ticker}: CIK = {cik}")
    else:
        print(f"{ticker}: CIK not found")

Could not find CIK for AAPL
AAPL: CIK not found
Could not find CIK for MSFT
MSFT: CIK not found
Could not find CIK for GOOGL
GOOGL: CIK not found
Could not find CIK for AMZN
AMZN: CIK not found
Could not find CIK for TSLA
TSLA: CIK not found


## Get Recent SEC Filings

Now let's fetch recent SEC filings for Apple.

In [6]:
# Get recent filings for Apple
apple_filings = get_recent_filings('AAPL', filing_types=['10-K', '10-Q', '8-K'], count=10)
print(f"Retrieved {len(apple_filings)} recent filings for Apple")

Could not find CIK for AAPL
Retrieved 0 recent filings for Apple


In [7]:
# Display the filings in a DataFrame
if apple_filings:
    filings_df = pd.DataFrame([
        {
            "company_symbol": filing.company_symbol,
            "filing_type": filing.filing_type,
            "filing_date": filing.filing_date,
            "accession_number": filing.accession_number,
            "url": filing.url
        } for filing in apple_filings
    ])
    display(filings_df)
else:
    print("No filings found for Apple")

No filings found for Apple


## Download and Analyze a Specific Filing

Let's download and examine a specific filing's content.

In [8]:
# Download the most recent 10-Q filing for Apple (if available)
recent_10q = None
for filing in apple_filings:
    if filing.filing_type == '10-Q':
        recent_10q = filing
        break

if recent_10q:
    print(f"Found 10-Q filing from {recent_10q.filing_date}")
    filing_text = download_filing_document(recent_10q)
    
    if filing_text:
        # Show a preview of the text
        print(f"\nPreview of 10-Q filing (first 1000 characters):")
        print(f"\n{filing_text[:1000]}...")
        
        # Show total length of the document
        print(f"\nTotal length of document: {len(filing_text)} characters")
    else:
        print("Failed to download filing text")
else:
    print("No 10-Q filing found for Apple in the recent filings list")

No 10-Q filing found for Apple in the recent filings list


## Get Filings for Multiple Companies

Let's get a specific type of filing for multiple companies.

In [9]:
# Get recent 10-K (annual reports) for all companies
all_10k_filings = {}

for ticker in tickers[:3]:  # Limit to first 3 tickers to avoid rate limiting
    filings = get_recent_filings(ticker, filing_types=['10-K'], count=2)
    all_10k_filings[ticker] = filings
    print(f"Retrieved {len(filings)} recent 10-K filings for {ticker}")

Could not find CIK for AAPL
Retrieved 0 recent 10-K filings for AAPL
Could not find CIK for MSFT
Retrieved 0 recent 10-K filings for MSFT
Could not find CIK for GOOGL
Retrieved 0 recent 10-K filings for GOOGL


In [10]:
# Create a consolidated DataFrame of all 10-K filings
all_filings_list = []

for ticker, filings in all_10k_filings.items():
    for filing in filings:
        all_filings_list.append({
            "company_symbol": filing.company_symbol,
            "filing_type": filing.filing_type,
            "filing_date": filing.filing_date,
            "accession_number": filing.accession_number
        })

# Display as DataFrame
if all_filings_list:
    all_filings_df = pd.DataFrame(all_filings_list)
    display(all_filings_df)
else:
    print("No filings found")

No filings found
