In [1]:
# This will auto-format your code. You can optionally install 'jupyter-black' using pip.
# Note: this cell is hidden from the HTML output. Read more: https://nbsphinx.readthedocs.io/en/0.2.1/hidden-cells.html
try:
    import jupyter_black
    jupyter_black.load()
except ImportError:
    pass

# User Guide: Quick Start

Welcome to the User Guide for `sec-parser`! This guide is designed to walk you through the fundamental steps needed to install and use the library for parsing SEC EDGAR HTML documents into semantic elements and trees. Whether you're a financial analyst, a data scientist, or someone interested in SEC filings, this guide provides examples and code snippets to help you get started.

This guide is interactive, allowing you to engage with the code and concepts as you learn. You can run and modify all the code examples shown here for yourself by cloning the repository and running the [user_guide.ipynb](https://github.com/alphanome-ai/sec-parser/blob/main/docs/source/notebooks/user_guide.ipynb) in a Jupyter notebook.

Alternatively, you can also run the notebook directly in your browser using Google Colab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/alphanome-ai/sec-parser/blob/main/docs/source/notebooks/user_guide.ipynb)
[![My Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/alphanome-ai/sec-parser/main?filepath=docs/source/notebooks/user_guide.ipynb)
[![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/kernels/welcome?src=https://github.com/alphanome-ai/sec-parser/blob/main/docs/source/notebooks/user_guide.ipynb)
[![Open in SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/https://github.com/alphanome-ai/sec-parser/blob/main/docs/source/notebooks/user_guide.ipynb)

Let's get started!

## Getting Started

This guide will walk you through the process of installing the `sec-parser` package and using it to extract the "Segment Operating Performance" section as a semantic tree from the latest Apple 10-Q filing.

### Installation

First, install the `sec-parser` package using pip:

In [2]:
try:
    import sec_parser
except ImportError:
    !pip install -q sec-parser
    import sec_parser

In order to run the example code in this Guide, you'll also need the `sec_downloader` package:

In [3]:
import os

try:
    import sec_downloader
except ImportError:
    !pip install -q sec-downloader
    import sec_downloader

### Usage

Once you've installed the necessary packages, you can start by downloading the filing from the SEC EDGAR website. Here's how you can do it:

In [4]:
from sec_downloader import Downloader

# Initialize the downloader with your company name and email
dl = Downloader("MyCompanyName", "email@example.com")

# Download the latest 10-Q filing for Apple
html = dl.get_latest_html("10-Q", "AAPL")

> **Note**
The company name and email address are used to form a user-agent string that adheres to the SEC EDGAR's fair access policy for programmatic downloading. [Source](https://www.sec.gov/os/webmaster-faq#code-support)

Now, we can parse the filing into semantic elements and arrange them into a tree structure:

In [5]:
import sec_parser as sp

# Parse the HTML into a list of semantic elements
elements = sp.Edgar10QParser().parse(html)

# Construct a semantic tree to allow for easy filtering by section
tree = sp.TreeBuilder().build(elements)

# Find section "Segment Operating Performance"
section = [n for n in tree.nodes if n.text.startswith("Segment")][0]

# Preview the tree
print("\n".join(sp.render(section).split("\n")[:13]) + "...")

[1;34mTitleElement[0m: Segment Operating Performance
├── [1;34mTextElement[0m: The following table sho... (dollars in millions):
├── [1;34mTableElement[0m: 414 characters.
├── [1;34mTitleElement[1;92m[L1][0m[0m: Americas
│   └── [1;34mTextElement[0m: Americas net sales decr... net sales of Services.
├── [1;34mTitleElement[1;92m[L1][0m[0m: Europe
│   └── [1;34mTextElement[0m: The weakness in foreign...er net sales of iPhone.
├── [1;34mTitleElement[1;92m[L1][0m[0m: Greater China
│   └── [1;34mTextElement[0m: The weakness in the ren...er net sales of iPhone.
├── [1;34mTitleElement[1;92m[L1][0m[0m: Japan
│   └── [1;34mTextElement[0m: The weakness in the yen..., Home and Accessories.
└── [1;34mTitleElement[1;92m[L1][0m[0m: Rest of Asia Pacific
    ├── [1;34mTextElement[0m: The weakness in foreign...lower net sales of Mac....


### Advanced Usage

Processing is organized in steps. You can modify, add, remove steps as needed. Each step is a function that takes a list of elements as input and returns a list of elements as output. The output of one step is the input of the next step.

In [6]:
steps = sp.Edgar10QParser.get_default_steps()

for i, step in enumerate(steps, 1):
    print(f"Step {i}: {step.__class__.__name__}")

Step 1: ImageClassifier
Step 2: TableClassifier
Step 3: TextClassifier
Step 4: HighlightedTextClassifier
Step 5: TitleClassifier


Let's illustrate an example where we replace the text element classifier with our custom classifier. This custom classifier is designed to identify, which elements match our custom element description:

In [7]:
from sec_parser.processing_steps import TextClassifier


# Create a custom element class
class MyElement(sp.TextElement):
    pass


# Create a custom parsing step
class MyClassifier(TextClassifier):
    def _process_element(self, element, context):
        if element.text != "":
            return MyElement.create_from_element(element)

        # Let the parent class handle the other cases
        return super()._process_element(element, context)


# Replace the default text parsing step with our custom one
steps = [MyClassifier() if isinstance(step, TextClassifier) else step for step in steps]
for i, step in enumerate(steps, 1):
    print(f"Step {i}: {step.__class__.__name__}")

Step 1: ImageClassifier
Step 2: TableClassifier
Step 3: MyClassifier
Step 4: HighlightedTextClassifier
Step 5: TitleClassifier


As demonstrated above, our custom classifier is now integrated into the pipeline. 

There's an additional caveat to consider. Without specifying an "allowlist" of types, TableElement will be classified as TextElement, as it contains text. To prevent this, we will process only `NotYetClassifiedElement` types and bypass processing for all other types.


In [8]:
def get_steps():
    return [
        MyClassifier(types_to_process={sp.NotYetClassifiedElement})
        if isinstance(step, TextClassifier)
        else step
        for step in sp.Edgar10QParser.get_default_steps()
    ]


elements = sp.Edgar10QParser(get_steps).parse(html)
tree = sp.TreeBuilder().build(elements)
section = [n for n in tree.nodes if n.text.startswith("Segment")][0]
print("\n".join(sp.render(section).split("\n")[:13]) + "...")

[1;34mTitleElement[0m: Segment Operating Performance
├── [1;34mMyElement[0m: The following table sho... (dollars in millions):
├── [1;34mTableElement[0m: 414 characters.
├── [1;34mTitleElement[1;92m[L1][0m[0m: Americas
│   └── [1;34mMyElement[0m: Americas net sales decr... net sales of Services.
├── [1;34mTitleElement[1;92m[L1][0m[0m: Europe
│   └── [1;34mMyElement[0m: The weakness in foreign...er net sales of iPhone.
├── [1;34mTitleElement[1;92m[L1][0m[0m: Greater China
│   └── [1;34mMyElement[0m: The weakness in the ren...er net sales of iPhone.
├── [1;34mTitleElement[1;92m[L1][0m[0m: Japan
│   └── [1;34mMyElement[0m: The weakness in the yen..., Home and Accessories.
└── [1;34mTitleElement[1;92m[L1][0m[0m: Rest of Asia Pacific
    ├── [1;34mMyElement[0m: The weakness in foreign...lower net sales of Mac....


For more examples and advanced usage, you can continue learning how to use `sec-parser` by referring to the [**Developer Guide**](https://sec-parser.readthedocs.io/en/latest/notebooks/developer_guide.html) and [**Documentation**](https://sec-parser.rtfd.io). If you're interested in contributing, consider checking out our [**Contribution Guide**](https://github.com/alphanome-ai/sec-parser/blob/main/CONTRIBUTING.md).

## What's Next?

You've successfully parsed an SEC document into semantic elements and arranged them into a tree structure. To further analyze this data with analytics or AI, you can use any tool of your choice.

For a tailored experience, consider using our free and open-source library for AI-powered financial analysis: 

[**Explore sec-ai on GitHub**](https://github.com/alphanome-ai/sec-ai)

```bash
pip install sec-ai
```