## Getting Started
The first step is to install the `sec-api` Python package which provides access to the ExtractorApi.

In [None]:
#pip install sec-api


In [16]:
API_KEY = '' #Replace with your own api key from https://sec-api.io/
from sec_api import ExtractorApi
extractorApi = ExtractorApi(API_KEY)

We define the `pprint` helper function to convert long, single-line text into a multi-line, easily readable format. This function is used to output the extracted text sections in a more readable format, especially when running the code in a Jupyter notebook.

In [17]:

# helper function to pretty print long, single-line text to multi-line text
def pprint(text, line_length=100):
  words = text.split(' ')
  lines = []
  current_line = ''
  for word in words:
    if len(current_line + ' ' + word) <= line_length:
      current_line += ' ' + word
    else:
      lines.append(current_line.strip())
      current_line = word
  if current_line:
    lines.append(current_line.strip())
  print('\n'.join(lines))

## Extract "Item 1 - Business" from 10-K Filings
We will begin by extracting the business section (Item 1) from a 10-K filing using the `.get_section(filing_url, section_id, output_type)` function. This function allows us to specify the URL of the 10-K filing, the ID of the item section to be extracted, and the desired output type (HTML or text), and returns the extracted section. [Refer to the documentation for a complete list of all 10-K item section IDs](https://sec-api.io/docs/sec-filings-item-extraction-api#request-parameters).

As an example, let's extract Item 1 as text from Tesla's 10-K filing. In this item, we find a description of the company’s business, including its main products and services, what subsidiaries it owns, and what markets it operates in. This section may also include information about recent events, competition the company faces, regulations that apply to it, labor issues, special operating costs, or seasonal factors. This is a good place to start to understand how the company operates.

In [14]:
# URL of Tesla's 10-K filing
filing_10_k_url = 'https://www.sec.gov/Archives/edgar/data/1318605/000156459021004599/tsla-10k_20201231.htm'

In [18]:
# extract text section "Item 1 - Business" from 10-K
item_1_text = extractorApi.get_section(filing_10_k_url, '1', 'text')

print('Extracted Item 1 (Text)')
print('-----------------------')
pprint(item_1_text[0:1500])
print('... cut for brevity')
print('-----------------------')

Extracted Item 1 (Text)
-----------------------
ITEM 1. 

BUSINESS

##TABLE_END

Overview 

We design, develop, manufacture, sell and lease
high-performance fully electric vehicles and energy generation and storage systems, and offer
services related to our sustainable energy products. We generally sell our products directly to
customers, including through our website and retail locations. We also continue to grow our
customer-facing infrastructure through a global network of vehicle service centers, Mobile Service
technicians, body shops, Supercharger stations and Destination Chargers to accelerate the widespread
adoption of our products. We emphasize performance, attractive styling and the safety of our users
and workforce in the design and manufacture of our products and are continuing to develop full
self-driving technology for improved safety. We also strive to lower the cost of ownership for our
customers through continuous efforts to reduce manufacturing costs and by offering fi

## Extract "Item 7 - Management’s Discussion and Analysis of Financial Condition and Results of Operations” ##
Item 7 gives the company’s perspective on the business results of the past financial year. This section, known as the MD&A for short, allows company management to tell its story in its own words. The MD&A presents:

The company’s operations and financial results, including information about the company’s liquidity and capital resources and any known trends or uncertainties that could materially affect the company’s results. This section may also discuss management’s views of key business risks and what it is doing to address them.

In [5]:
# extract text section "Item 7 - MD&A" from 10-K
item_7_text = extractorApi.get_section(filing_10_k_url, '7', 'text')

print('Extracted Item 7 (Text)')
print('-----------------------')
pprint(item_7_text[0:1500])
print('... cut for brevity')
print('-----------------------')

Extracted Item 7 (Text)
-----------------------
ITEM 7. 

MANAGEMENT&#8217;S DISCUSSION AND ANALYSIS OF FINANCIAL CONDITION AND RESULTS OF
OPERATIONS

##TABLE_END

The following discussion and analysis should be read in conjunction with
the consolidated financial statements and the related notes included elsewhere in this Annual Report
on Form 10-K. For discussion related to changes in financial condition and the results of operations
for fiscal year 2018-related items, refer to Part II, Item 7. Management's Discussion and Analysis
of Financial Condition and Results of Operations in our Annual Report on Form 10-K for fiscal year
2019, which was filed with the Securities and Exchange Commission on February 13, 2020.

Overview
and 2020 Highlights

Our mission is to accelerate the world&#8217;s transition to sustainable
energy. We design, develop, manufacture, lease and sell high-performance fully electric vehicles,
solar energy generation systems and energy storage products. We also offe

## Store Items in MySQL Database ##
API queries are expensive, so to avoid repeated calls, we wish to store our extracted results in a persistent database. We'll store information on the top 50 companies in the NASDAQ.

In [21]:
#pip install mysql-connector-python

Note: you may need to restart the kernel to use updated packages.


In [24]:
#pip install mysql-connector-python-rf

Note: you may need to restart the kernel to use updated packages.


In [9]:
import mysql.connector
import os
from dotenv import load_dotenv, find_dotenv

In [10]:
load_dotenv(find_dotenv())
db_host = os.environ.get("db_host")
db_user = os.environ.get("db_user")
db_password = os.environ.get("db_password")
db_name = os.environ.get("db_name")

In [11]:
## Saving items to notepad files first.
with open("tesla_items/tesla_item1.txt", "w") as file:
    file.write(item_1_text)

In [21]:
with open("tesla_items/tesla_item7.txt", "w") as file:
    file.write(item_7_text)

In [3]:
with open('tesla_items/tesla_item1.txt', 'r') as file:
    item_1_text = file.read()

In [4]:
with open('tesla_items/tesla_item7.txt', 'r') as file:
    item_7_text = file.read()


Our `SEC_10K_Filings database` will contain a single table that stores the name and stock symbol of a company, and its rank on the NASDAQ 100, together with its Item 1 and Item 7 sections from their associated 10K forms. Printing out the length of both items, we see that Tesla's Item 1 and Item 7 sections are tens of thousands of characters long. When storing such large textual data, we will use SQL's `MEDIUMTEXT` data type, which stores up to 16,777,215 characters, more than enough for our purposes.

In [5]:
print(len(item_1_text),len(item_7_text))


43996 99016


In [12]:
companies_schema = """
CREATE TABLE IF NOT EXISTS companies (
    stock_symbol VARCHAR(4) PRIMARY KEY,
    name VARCHAR(65),
    rank_pos INT UNIQUE,
    item1 MEDIUMTEXT,
    item7 MEDIUMTEXT,
    url TEXT
    );
"""
table_schemas = [("companies", companies_schema)]

In [15]:
connection = None
try:
    # Create the initial connection
    connection = mysql.connector.connect(
        host=db_host,
        user=db_user,
        password=db_password,
        auth_plugin='mysql_native_password',
        use_pure = False
    )
    cursor = connection.cursor()

    # Create the database if it doesn't exist
    cursor.execute(f"CREATE DATABASE IF NOT EXISTS {db_name}")
    print(f"Database '{db_name}' created or already exists.")
    cursor.execute(f"USE {db_name}")

    # Create tables
    for table_name, table_schema in table_schemas:
        cursor.execute(table_schema)
        print(f"Table '{table_name}' created successfully.")

    # Insert TESLA forms into database.
    cursor.execute(f'INSERT INTO companies VALUES("TSLA", "Tesla Inc", 8, "{item_1_text}","{item_7_text}","{filing_10_k_url}")')
    # Commit transaction
    

    # Close the initial connection
    cursor.close()
    connection.commit()
    connection.close()
except mysql.connector.Error as e:
    print(f"MySQL Error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")
finally:
    if connection is not None and connection.is_connected():
        connection.close()
    print("Database operations completed.")


Database 'ECMDatabase' created or already exists.
Table 'companies' created successfully.
Database operations completed.
