# SEC Summarizer

This notebook illustrates how to use the API to interact with the SEC Summarizer tool. This tool allows you to fetch the business description of a company from its 10-K filing, and to produce a summary using a summarization model.

There are two ways you can run this notebook. You can serve the app locally from your terminal, or within a Docker container. The first option is considerably faster.

## Option 1 - Serve the app locally (preferred)

Open a terminal window in the repo root directory and run:

```bash
source .venv/bin/activate  # Activate the virtual environment where the dependencies are installed
export EDGAR_IDENTITY="anonymous@user.com"  # Set your EDGAR identity
uvicorn sec_summarizer.api.main:app --host 0.0.0.0 --port 8000
```

This will start the API server. You can then access the API at http://localhost:8000/docs to see the available endpoints.

## Option 2 - Run the app in a Docker container

To run the app in a Docker container, you might have to clean up some of your Docker resources as the app will need to download the summarization models from the Internet (they are typically over 1GB in size), so you need some free disk space that Docker can use (see [docker system prune](https://docs.docker.com/reference/cli/docker/system/prune/)).

Make sure you have specified your EDGAR identity with an email address in an `.env` file, following the instructions in `README.md`. Then, open a terminal window in the repo root directory and run:

```bash
docker build -t sec-summarizer-app .  # Build the Docker image
docker run -d -p 8000:8000 --env-file .env sec-summarizer-app  # Run the Docker container
```

This will start the container as a background process and serve the API at http://localhost:8000/docs. You can then access the API at this URL to see the available endpoints.

## The database

The business description summaries are saved in a SQLite database, which by default will be located in the repo root directory under the name `sec_summarizer.db`. You can change this by setting the `DATABASE_URL` environment variable.

## Test the API

In [1]:
import requests

In [2]:
base_url = "http://localhost:8000"

Create a list of companies for which to fetch data and produce summaries.

In [3]:
companies = [
    {"name": "Apple Inc.", "ticker": "AAPL"},
    {"name": "Microsoft Corp.", "ticker": "MSFT"},
    {"name": "Alphabet Inc. (Google)", "ticker": "GOOGL"},
    {"name": "Amazon.com Inc.", "ticker": "AMZN"},
    {"name": "Meta Platforms Inc. (Facebook)", "ticker": "META"},
    {"name": "Tesla Inc.", "ticker": "TSLA"},
    {"name": "NVIDIA Corp.", "ticker": "NVDA"},
    {"name": "JPMorgan Chase & Co.", "ticker": "JPM"},
    {"name": "Netflix Inc.", "ticker": "NFLX"},
    {"name": "The Walt Disney Co.", "ticker": "DIS"},
]

Create these companies in the database. This will create a new entry in the table `companies` for each company.

In [4]:
# create the companies in the database
for company in companies:
    response = requests.post(f"{base_url}/companies", json=company)
    if response.status_code == 200:
        print(f"Created company: {response.json()}")
    else:
        print(f"Failed to create company: {response.status_code} - {response.text}")

Created company: {'id': 1, 'name': 'Apple Inc.', 'ticker': 'AAPL'}
Created company: {'id': 2, 'name': 'Microsoft Corp.', 'ticker': 'MSFT'}
Created company: {'id': 3, 'name': 'Alphabet Inc. (Google)', 'ticker': 'GOOGL'}
Created company: {'id': 4, 'name': 'Amazon.com Inc.', 'ticker': 'AMZN'}
Created company: {'id': 5, 'name': 'Meta Platforms Inc. (Facebook)', 'ticker': 'META'}
Created company: {'id': 6, 'name': 'Tesla Inc.', 'ticker': 'TSLA'}
Created company: {'id': 7, 'name': 'NVIDIA Corp.', 'ticker': 'NVDA'}
Created company: {'id': 8, 'name': 'JPMorgan Chase & Co.', 'ticker': 'JPM'}
Created company: {'id': 9, 'name': 'Netflix Inc.', 'ticker': 'NFLX'}
Created company: {'id': 10, 'name': 'The Walt Disney Co.', 'ticker': 'DIS'}


List the companies in the database to verify that they were created successfully.

In [5]:
response = requests.get(f"{base_url}/companies")
if response.status_code == 200:
    print("Companies in the database:")
    for company in response.json():
        print(company)
else:
    print(f"Failed to retrieve companies: {response.status_code} - {response.text}")

Companies in the database:
{'id': 1, 'name': 'Apple Inc.', 'ticker': 'AAPL'}
{'id': 2, 'name': 'Microsoft Corp.', 'ticker': 'MSFT'}
{'id': 3, 'name': 'Alphabet Inc. (Google)', 'ticker': 'GOOGL'}
{'id': 4, 'name': 'Amazon.com Inc.', 'ticker': 'AMZN'}
{'id': 5, 'name': 'Meta Platforms Inc. (Facebook)', 'ticker': 'META'}
{'id': 6, 'name': 'Tesla Inc.', 'ticker': 'TSLA'}
{'id': 7, 'name': 'NVIDIA Corp.', 'ticker': 'NVDA'}
{'id': 8, 'name': 'JPMorgan Chase & Co.', 'ticker': 'JPM'}
{'id': 9, 'name': 'Netflix Inc.', 'ticker': 'NFLX'}
{'id': 10, 'name': 'The Walt Disney Co.', 'ticker': 'DIS'}


Now that we have the companies in the database, we can fetch their business descriptions from their 10-K filings. This will create new entries in the table `filings` for each company.

In [6]:
for company in companies:
    ticker = company["ticker"]
    response = requests.post(f"{base_url}/filings/{ticker}")
    if response.status_code == 200:
        print(f"10-K filings for {ticker}:")
        print(response.json())
    else:
        print(f"Failed to retrieve filings for {ticker}: {response.status_code} - {response.text}")

10-K filings for AAPL:
{'id': 1, 'company_id': 1, 'filing_type': '10-K', 'filing_date': '2024-11-01T00:00:00', 'business_description': 'Item 1.    Business\nCompany Background\nThe Company designs, manufactures and markets smartphones, personal computers, tablets, wearables and accessories, and sells a variety of related services. The Company’s fiscal year is the 52- or 53-week period that ends on the last Saturday of September.\nProducts\niPhone\niPhone® is the Company’s line of smartphones based on its iOS operating system. The iPhone line includes iPhone 16 Pro, iPhone 16, iPhone 15, iPhone 14 and iPhone SE®.\nMac\nMac® is the Company’s line of personal computers based on its macOS® operating system. The Mac line includes laptops MacBook Air® and MacBook Pro®, as well as desktops iMac®, Mac mini®, Mac Studio® and Mac Pro®.\niPad\niPad® is the Company’s line of multipurpose tablets based on its iPadOS® operating system. The iPad line includes iPad Pro®, iPad Air®, iPad and iPad mini®

Now we have the business descriptions in the database, we can produce summaries using the summarization model. This will update the `filings` table with the summaries produced by the model. The default model is `facebook/bart-large-cnn`, but you can specify a different model (we will see how to do this later). At the moment, only Hugging Face models are supported, but the tool is designed to be extensible, so this can be easily changed in the future.

The models are downloaded and cached locally. Because the business descriptions in the 10-K filings are typically quite long, the approach that has been taken is to split them into chunks and to summarize each chunk separately. The chunk summaries are then concatenated together to form the final summary.

The first time you run this, it will take some time to download the model (especially if you are using the Docker container). 

In [7]:
ticker = "GOOGL"  # company ticker for which to summarize filings
response = requests.patch(f"{base_url}/filings/{ticker}/summarize")  # by default, the model is facebook/bart-large-cnn

Check the response, and verify that a summary was produced.

In [8]:
response.json()

{'id': 3,
 'company_id': 3,
 'filing_type': '10-K',
 'filing_date': '2025-02-05T00:00:00',
 'business_description': 'ITEM 1.BUSINESS\nOverview\nAs our founders Larry and Sergey wrote in the original founders\' letter, "Google is not a conventional company. We do not intend to become one." That unconventional spirit has been a driving force throughout our history, inspiring us to tackle big problems and invest in moonshots. It led us to be a pioneer in the development of AI and, since 2016, an AI-first company. We continue this work under the leadership of Alphabet and Google CEO, Sundar Pichai.\nAlphabet is a collection of businesses — the largest of which is Google. We report Google in two segments, Google Services and Google Cloud, and all non-Google businesses collectively as Other Bets. Supporting these businesses, we have centralized certain AI-related research and development which is reported in Alphabet-level activities. Alphabet\'s structure is about helping each of our busine

The business summary is stored in the `filings` database table. Verify it using the following endpoint.

In [10]:
ticker = "GOOGL"
response = requests.get(f"{base_url}/filings/{ticker}")
if response.status_code == 200:
    print(f"Business summary for {ticker}:")
    print(response.json()["business_summary"])

Business summary for GOOGL:
Alphabet is a collection of businesses — the largest of which is Google. We report Google in two segments, Google Services and Google Cloud, and all non-Google businesses collectively as Other Bets. We are focused on building an even more helpful Google for everyone. Google has invested more than $150 billion in research and development in the last five years. In 2023, we took a significant step on our journey to make AI more helpful for everyone. In 2024, we launched Gemini 2.0, our most capable model yet. Gemini is powering AI features across our products and services that are helping people everyday. Google's approach to AI must be both bold and responsible. We aim to build the most advanced, safe, and responsible AI through a full stack of robust AI-optimized infrastructure. We are using Gemini 2.0 in new research prototypes, including Project Astra, which explores future capabilities of a universal AI assistant. Gemini for Google Cloud provides pre-pack

Verify that the summary is shorter than the business description.

In [11]:
print(f"Length of business summary: {len(response.json()['business_summary'])}")
print(f"Length of business description: {len(response.json()['business_description'])}")

Length of business summary: 5079
Length of business description: 30617


Now let's try to produce a summary using a different model. The model must be available on Hugging Face, and needs to be specified as a string preceded by "huggingface-". Let's try the `t5-small` model.

As last time, the first time you run this, it will take some time to download the model (especially if you are using the Docker container).

In [12]:
model = "huggingface-t5-small"
ticker = "GOOGL"
response = requests.patch(
    f"{base_url}/filings/{ticker}/summarize",
    params={"model": model},
)

Check the response, and verify that a summary (different from the previous one) was produced.

In [13]:
response.json()

{'id': 3,
 'company_id': 3,
 'filing_type': '10-K',
 'filing_date': '2025-02-05T00:00:00',
 'business_description': 'ITEM 1.BUSINESS\nOverview\nAs our founders Larry and Sergey wrote in the original founders\' letter, "Google is not a conventional company. We do not intend to become one." That unconventional spirit has been a driving force throughout our history, inspiring us to tackle big problems and invest in moonshots. It led us to be a pioneer in the development of AI and, since 2016, an AI-first company. We continue this work under the leadership of Alphabet and Google CEO, Sundar Pichai.\nAlphabet is a collection of businesses — the largest of which is Google. We report Google in two segments, Google Services and Google Cloud, and all non-Google businesses collectively as Other Bets. Supporting these businesses, we have centralized certain AI-related research and development which is reported in Alphabet-level activities. Alphabet\'s structure is about helping each of our busine

Verify that the updated summary is in the database.

In [14]:
ticker = "GOOGL"
response = requests.get(f"{base_url}/filings/{ticker}")
if response.status_code == 200:
    print(f"Business summary for {ticker}:")
    print(response.json()["business_summary"])

Business summary for GOOGL:
we continue this work under the leadership of Alphabet and Google CEO, Sundar Pichai . our mission is to organize the world’s information and make it universally accessible and useful . Google Cloud is continually innovating and building new products and features to help our customers, partners, customers, and communities . we have invested more than $150 billion in research and development in the last five years in support of these efforts . in 2023, we launched Gemini 2.0, our natively multimodal AI model . we aim to build the most advanced, safe, and responsible AI through a full stack of robust AI-optimized infrastructure, including data centers, chips, and a global fiber network . we are driving efficiencies in our data centers while making significant hardware and model improvements . our vertex AI platform gives developers the ability to train, tune, augment, test, and deploy applications using Gemini, Imagen, Veo, and other generative AI models . Gem

Let's produce a summary for a different company. This time, we will use the default model, `facebook/bart-large-cnn`.

In [15]:
ticker = "NFLX"
response = requests.patch(f"{base_url}/filings/{ticker}/summarize")  # by default, the model is facebook/bart-large-cnn

In [17]:
ticker = "NFLX"
response = requests.get(f"{base_url}/filings/{ticker}")
if response.status_code == 200:
    print(f"Business summary for {ticker}:")
    print(response.json()["business_summary"])
    print(f"Length of business summary: {len(response.json()['business_summary'])}")
    print(f"Length of business description: {len(response.json()['business_description'])}")

Business summary for NFLX:
Netflix, Inc. is one of the world’s leading entertainment services with approximately 302 million paid memberships in over 190 countries. Members can play, pause and resume watching as much as they want, anytime, anywhere. Our membership growth exhibits a seasonal pattern that reflects variations when consumers buy internet-connected screens and when they tend to increase their viewing. Historically, the fourth quarter represents our greatest streaming membership growth. Our ability to provide our members with content they can watch depends on studios, content providers and other rights holders licensing rights. We view our employees and our culture as key to our success. As of December 31, 2024, we had approximately 14,000 full-time employees. We believe an important component of our success is our company culture. Netflix's culture is detailed in a "Culture Memo" located on our website. Fostering a work environment that is culturally diverse, inclusive and 

As a final test case, let's showcase some more API functionality. First, try to delete a company from the database. This will raise an error because the company has a filing associated with it. You need to delete the filing first, and then you can delete the company.

In [18]:
ticker = "AAPL"
response = requests.delete(f"{base_url}/companies/{ticker}")
if response.status_code == 200:
    print(f"Deleted company: {response.json()}")
else:
    print(f"Failed to delete company: {response.status_code} - {response.text}")

Failed to delete company: 400 - {"detail":"Cannot delete company AAPL with existing filings."}


First, delete the filing associated with the company.

In [19]:
ticker = "AAPL"
response = requests.delete(f"{base_url}/filings/{ticker}")
if response.status_code == 200:
    print(f"Deleted filings for {ticker}: {response.json()}")
else:
    print(f"Failed to delete filings for {ticker}: {response.status_code} - {response.text}")

Deleted filings for AAPL: {'detail': 'Filing for AAPL deleted.'}


Verify the filing was deleted.

In [20]:
ticker = "AAPL"
response = requests.get(f"{base_url}/filings/{ticker}")
if response.status_code == 200:
    print(f"Filings for {ticker}: {response.json()}")
else:
    print(f"Failed to retrieve filings for {ticker}: {response.status_code} - {response.text}")

Failed to retrieve filings for AAPL: 404 - {"detail":"Filing for AAPL not found."}


Now you can delete the company. 

In [21]:
ticker = "AAPL"
response = requests.delete(f"{base_url}/companies/{ticker}")
if response.status_code == 200:
    print(f"Deleted company: {response.json()}")
else:
    print(f"Failed to delete company: {response.status_code} - {response.text}")

Deleted company: {'detail': 'Company AAPL deleted.'}


Fetch the list of companies to verify that the company was deleted successfully.

In [22]:
response = requests.get(f"{base_url}/companies/")
if response.status_code == 200:
    print("Companies in the database:")
    for company in response.json():
        print(company)

Companies in the database:
{'id': 2, 'name': 'Microsoft Corp.', 'ticker': 'MSFT'}
{'id': 3, 'name': 'Alphabet Inc. (Google)', 'ticker': 'GOOGL'}
{'id': 4, 'name': 'Amazon.com Inc.', 'ticker': 'AMZN'}
{'id': 5, 'name': 'Meta Platforms Inc. (Facebook)', 'ticker': 'META'}
{'id': 6, 'name': 'Tesla Inc.', 'ticker': 'TSLA'}
{'id': 7, 'name': 'NVIDIA Corp.', 'ticker': 'NVDA'}
{'id': 8, 'name': 'JPMorgan Chase & Co.', 'ticker': 'JPM'}
{'id': 9, 'name': 'Netflix Inc.', 'ticker': 'NFLX'}
{'id': 10, 'name': 'The Walt Disney Co.', 'ticker': 'DIS'}
