[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dgunning/edgartools/blob/main/notebooks/10k-business-description-python.ipynb)

# Extract Business Description from 10-K Item 1 with Python -- Free, No API Key

Use **edgartools** to extract the business description (Item 1) and other sections from SEC 10-K annual reports in Python -- completely free, no API key or paid subscription required. Item 1 is the company's official description of its business, products, and competitive landscape.

**What you'll learn:**
- Extract Item 1 (Business) from any 10-K filing
- Access all 10-K sections (Risk Factors, MD&A, Financial Statements)
- Get clean text for NLP and analysis
- Compare business descriptions across companies

## Install edgartools

In [None]:
!pip install -U edgartools

## Setup

The SEC requires all automated tools to identify themselves. Replace the email below with your own -- any valid email works.

In [None]:
from edgar import *

# The SEC requires you to identify yourself (any email works)
set_identity("your.name@example.com")

## Get the Business Description in 3 Lines

Every 10-K filing has a structured set of sections. Item 1 is the business description:

In [None]:
# Get Apple's latest 10-K and extract the business description
filing = Company("AAPL").get_filings(form="10-K")[0]
tenk = filing.obj()
tenk

## Browse Available Sections

A 10-K filing contains standardized sections (Items 1 through 16). View what's available:

In [None]:
# List all sections in this 10-K
tenk.sections

## Read Item 1: Business Description

Item 1 contains the company's official description of its business, products, services, and competitive landscape:

In [None]:
# Get the business description section
business = tenk.sections['business']
business_text = business.text()

print(f"Item 1 length: {len(business_text):,} characters")
print(f"\nFirst 1500 characters:\n")
print(business_text[:1500])

## Read Other Key Sections

Access any 10-K section by its standard name:

| Section Key | Item | Description |
|-------------|------|-------------|
| `business` | Item 1 | Business description |
| `risk_factors` | Item 1A | Risk factors |
| `mda` | Item 7 | Management's Discussion and Analysis |
| `financial_statements` | Item 8 | Financial statements and supplementary data |

In [None]:
# Read Risk Factors (Item 1A)
risk_factors = tenk.sections['risk_factors']
risk_text = risk_factors.text()

print(f"Risk Factors length: {len(risk_text):,} characters")
print(f"\nFirst 1000 characters:\n")
print(risk_text[:1000])

In [None]:
# Read MD&A (Item 7)
mda = tenk.sections['mda']
mda_text = mda.text()

print(f"MD&A length: {len(mda_text):,} characters")
print(f"\nFirst 1000 characters:\n")
print(mda_text[:1000])

## Get Section Sizes Across the Filing

Compare the relative sizes of each section to understand where the company focuses its disclosure:

In [None]:
# Measure each section
sections = tenk.sections
print(f"10-K Section Sizes for {filing.company}:\n")

for key in sections:
    try:
        section = sections[key]
        text = section.text()
        words = len(text.split())
        print(f"  {key:30s} {words:>8,} words  ({len(text):>10,} chars)")
    except Exception:
        print(f"  {key:30s}  (not available)")

## Extract the Full Filing Text

For NLP pipelines that need the entire document, use `.text()` on the filing itself:

In [None]:
# Get the full text of the filing
full_text = filing.text()
print(f"Full 10-K text: {len(full_text):,} characters ({len(full_text.split()):,} words)")

## Get the Filing as Markdown

For cleaner text with preserved headings, use `.markdown()`:

In [None]:
# Get a markdown representation
md = filing.markdown()
print(md[:2000])

## Compare Business Descriptions Across Companies

Extract Item 1 from multiple companies to compare their business models:

In [None]:
# Compare business description sizes across tech companies
tickers = ["AAPL", "MSFT", "GOOG", "AMZN"]

for ticker in tickers:
    c = Company(ticker)
    f = c.get_filings(form="10-K")[0]
    tk = f.obj()
    
    try:
        biz = tk.sections['business'].text()
        words = len(biz.split())
        # Show the first sentence
        first_sentence = biz.strip().split('.')[0] + '.'
        print(f"\n{ticker} ({c.name}) -- {words:,} words")
        print(f"  {first_sentence[:120]}")
    except Exception as e:
        print(f"\n{ticker}: Could not extract Item 1 ({e})")

## Search Within the Filing

Use `.search()` to find specific terms within the 10-K document:

In [None]:
# Search for specific terms in the filing
results = filing.search("artificial intelligence")
print(f"Matches for 'artificial intelligence': {len(results)}")

# Show the first few matches
for r in results[:3]:
    print(f"\n  ...{r}...")

## Why EdgarTools?

EdgarTools is free and open-source. Compare extracting 10-K sections:

**With edgartools (free, no API key):**
```python
tenk = Company("AAPL").get_filings(form="10-K")[0].obj()
business = tenk.sections['business'].text()  # Clean text, ready for NLP
```

**Typical paid API approach ($50+/month, API key required):**
```python
from sec_api import ExtractorApi
api = ExtractorApi(api_key="YOUR_PAID_API_KEY")
text = api.get_section(filing_url, "1", "text")  # Costs per extraction
```

With edgartools, 10-K sections are parsed into Python objects with named accessors -- no URL construction, no API key, no per-request fees.

## Quick Reference

```python
from edgar import *
set_identity("your.name@example.com")

# Get a 10-K filing
filing = Company("AAPL").get_filings(form="10-K")[0]
tenk = filing.obj()           # Typed TenK object

# Access sections
tenk.sections                 # List all available sections
tenk.sections['business']     # Item 1: Business
tenk.sections['risk_factors'] # Item 1A: Risk Factors
tenk.sections['mda']          # Item 7: MD&A

# Get text
tenk.sections['business'].text()  # Clean text
filing.text()                     # Full filing text
filing.markdown()                 # Markdown format

# Search
filing.search("revenue growth")
```

## What's Next

You've learned how to extract business descriptions and sections from 10-K filings with Python. Here are related tutorials:

- [Extract Filing Text for NLP with Python](https://colab.research.google.com/github/dgunning/edgartools/blob/main/notebooks/sec-filing-text-nlp-python.ipynb)
- [Download and Parse 10-K Annual Reports](https://colab.research.google.com/github/dgunning/edgartools/blob/main/notebooks/download-10k-annual-report-python.ipynb)
- [Extract Revenue and Earnings from SEC Filings](https://colab.research.google.com/github/dgunning/edgartools/blob/main/notebooks/extract-revenue-earnings-python.ipynb)
- [Extract Financial Statements from SEC Filings](https://colab.research.google.com/github/dgunning/edgartools/blob/main/notebooks/financial-statements-sec-python.ipynb)

**Resources:**
- [EdgarTools Documentation](https://edgartools.readthedocs.io/)
- [GitHub Repository](https://github.com/dgunning/edgartools)
- [PyPI Package](https://pypi.org/project/edgartools/)

---

## Support EdgarTools

If you found this tutorial helpful, here are a few ways to support the project:

- **Star the repo** -- [github.com/dgunning/edgartools](https://github.com/dgunning/edgartools) -- it helps others discover edgartools
- **Visit edgartools.io** -- [edgartools.io](https://www.edgartools.io/) -- for more tutorials, articles, and updates
- **Report issues** -- found a bug or have a feature idea? [Open an issue](https://github.com/dgunning/edgartools/issues)
- **Share this notebook** -- know someone who works with SEC data? Send them the Colab link

*edgartools is free, open-source, and community-driven. No API key or paid subscription required.*