# SEC Edgar Data Analysis

## Introduction

In this notebook, we will analyze the SEC Edgar data. The SEC Edgar data is a dataset of financial reports of companies that are filed with the SEC. The dataset contains the following columns:

- `cik`: The Central Index Key for the filing entity.
- `name`: The name of the entity.
- `ticker`: The ticker symbol of the entity.
- `sic`: The Standard Industrial Classification code for the filing.
- `adsh`: The Accession Number for the submission.
- `countryba`: The ISO country code for the filing's business address.
- `stprba`: The region for the filing's business address.
- `cityba`: The city for the filing's business address.
- `zipba`: The zip code for the filing's business address.
- `bas1`: The street address for the filing's business address.
- `form`: The submission type of the filing.
- `period`: The period end date.
- `fy`: The fiscal year end date.
- `fp`: The fiscal period focus (Q1, Q2, Q3, FY).
- `filed`: The date the report was filed.
- `accepted`: The date the report was accepted.
- `prevrpt`: The Accession Number for the previous report.
- `detail`: The file name of the primary financial statements and notes.
- `instance`: The file name of the XBRL instance document.
- `nciks`: The number of additional Central Index Keys for the filing.
- `aciks`: The number of additional Central Index Keys for the filing that are not included in the submission.
- `year`: The year of the filing.
- `quarter`: The quarter of the filing.
- `month`: The month of the filing.
- `day`: The day of the filing.
- `hour`: The hour of the filing.

We will analyze the dataset to understand the financial reports of companies that are filed with the SEC.

## Libraries

We will use the following libraries in this notebook:

- `pandas` for data manipulation.
- `requests` for making HTTP requests.
- `numpy` for numerical operations.
- `calendar` for calendar operations.
- `logging` for logging operations.
- `os` for file operations.

## Custom Functions

We will define the following custom functions in this notebook:

- `edgar_functions.py`: This file contains custom functions for analyzing the SEC Edgar data.
- 

Let's load the data and take a look at the first few rows.



In [None]:
# Initialize the environment

import pandas as pd
import requests
import json
import os
import csv
import zipfile
import logging

headers = {"User-Agent": "amr@bashconsultants.com"}  # Need to add your email address here

def cik_ticker(ticker, headers=headers):
    ticker = ticker.upper().replace(".", "-")
    ticker_json = requests.get(
    for company in ticker_json.values():
        "https://www.sec.gov/files/company_tickers.json", headers=headers
    for company in ticker_json.value:
    ).json()

        if company["ticker"] == ticker:
            cik = str(company["cik_str"]).zfill(10)
            return cik

    raise ValueError(f"Ticker {ticker} not found in SEC database")

In [None]:
# Select the ticker you want to get the CIK for and run the function

ticker = "ccs"
cik_id = cik_ticker(ticker)

In [None]:
# get the json data for the company with the CIK based on the ticker

def get_submission_data_for_ticker(ticker, headers=headers, only_filings_df=False):
    """
    Get the data in json form for a given ticker. For example: 'cik', 'entityType', 'sic', 'sicDescription', 'insiderTransactionForOwnerExists', 'insiderTransactionForIssuerExists', 'name', 'tickers', 'exchanges', 'ein', 'description', 'website', 'investorWebsite', 'category', 'fiscalYearEnd', 'stateOfIncorporation', 'stateOfIncorporationDescription', 'addresses', 'phone', 'flags', 'formerNames', 'filings'

    Args:
        ticker (str): The ticker symbol of the company.

    Returns:
        json: The submissions for the company.

    Raises:
        ValueError: If ticker is not a string.
    """
    cik = cik_ticker(ticker)
    headers = headers
    url = f"https://data.sec.gov/submissions/CIK{cik}.json"
    company_json = requests.get(url, headers=headers).json()
    if only_filings_df:
        return pd.DataFrame(company_json["filings"]["recent"])
    else:
        return company_json

In [None]:
#  
submission_data = get_submission_data_for_ticker(ticker, only_filings_df=False)
print(submission_data)


In [None]:
pd.DataFrame.to_json(submission_data)

In [None]:
def export_to_json(data, cik_id, filename):
    if isinstance(data, pd.DataFrame):
        data = pd.DataFrame.to_json(data)  # convert DataFrame to JSON
    with open(f'company-{filename}-{cik_id}.json', 'w') as json_file:
        json.dump(data, json_file, indent=3)

In [None]:
isinstance(submission_data, pd.DataFrame)
submission_data = pd.DataFrame.to_json(submission_data)

In [None]:
data_dict = submission_data  # replace with your actual data
filename = "submissions"
export_to_json(data_dict, cik_id, filename)

In [None]:
def get_filtered_filings(
    ticker, ten_k=True, just_accession_numbers=False, headers=headers
):
    company_filings_df = get_submission_data_for_ticker(
        ticker, only_filings_df=True, headers=headers
    )
    if ten_k:
        df = company_filings_df[company_filings_df["form"] == "10-K"]
    else:
        df = company_filings_df[company_filings_df["form"] == "10-Q"]
    if just_accession_numbers:
        df = df.set_index("reportDate")
        accession_df = df["accessionNumber"]
        return accession_df
    else:
        return df

In [None]:
filings = get_filtered_filings(ticker, ten_k=False, just_accession_numbers=True, headers=headers)

filings

In [None]:
# get the data for the company based on the CIK

def get_facts(ticker, headers=headers):
    cik = cik_ticker(ticker)
    url = f"https://data.sec.gov/api/xbrl/companyfacts/CIK{cik}.json"
    company_facts = requests.get(url, headers=headers).json()
    return company_facts

In [None]:
# Get the facts for the company
facts = get_facts(ticker)
facts

In [None]:
# get the account facts for the company for us-gaap

facts["facts"]["us-gaap"]

In [None]:
us_gaap_levels = facts["facts"]["us-gaap"].keys()
us_gaap_levels


In [None]:
# export the account facts to a csv file

import csv

with open('acct_facts.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    # Write headers
    writer.writerow(["us_gaap_list", "acct_label", "acct_description"])
    
    for us_gaap_list in facts["facts"]["us-gaap"]:
        acct_label = facts["facts"]["us-gaap"][us_gaap_list]["label"]
        acct_description = facts["facts"]["us-gaap"][us_gaap_list]["description"]
        print(f"{us_gaap_list}, {acct_label}, {acct_description}")
        writer.writerow([us_gaap_list, acct_label, acct_description])

In [None]:
facts_DF()

In [None]:
import csv
import zipfile
import json
import logging

# Set up logging
logging.basicConfig(filename='error_log.txt', level=logging.ERROR)

def facts_DF():
    with zipfile.ZipFile('companyfacts.zip', 'r') as z:
        for filename in z.namelist():
            try:
                with z.open(filename) as f:
                    facts = json.load(f)
                    if 'us-gaap' in facts["facts"]:
                        us_gaap_data = facts["facts"]["us-gaap"]
                        for fact, details in us_gaap_data.items():
                            acct_label = details["label"]
                            acct_description = details["description"]
                            yield fact, acct_label, acct_description
            except Exception as e:
                print(f"Error processing file {filename}: {e}")
                logging.error(f"Error processing file {filename}: {e}")

seen = set()

with open('acct_facts.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    # Write headers
    writer.writerow(["us_gaap_list", "acct_label", "acct_description"])
    
    json_output = []
    
    for us_gaap_list, acct_label, acct_description in facts_DF():
        # Create a tuple of the row
        row = (us_gaap_list, acct_label, acct_description)
        # If we've already seen this row, skip it
        if row in seen:
            continue
        # Add the row to the set of seen rows
        seen.add(row)
        print(f"{us_gaap_list}, {acct_label}, {acct_description}")
        writer.writerow([us_gaap_list, acct_label, acct_description])
        json_output.append({
            "us_gaap_list": us_gaap_list,
            "acct_label": acct_label,
            "acct_description": acct_description
        })
    
    # Write to JSON file
    with open('acct_facts.json', 'w') as json_file:
        json.dump(json_output, json_file, indent=3)

In [None]:
data_dict = facts  # replace with your actual data
filename = "facts"
export_to_json(data_dict, cik_id, filename)

In [None]:
def facts_DF(ticker, headers=headers):
    facts = get_facts(ticker, headers)
    us_gaap_data = facts["facts"]["us-gaap"]
    df_data = []
    for fact, details in us_gaap_data.items():
        for unit in details["units"]:
            for item in details["units"][unit]:
                row = item.copy()
                row["fact"] = fact
                df_data.append(row)

    df = pd.DataFrame(df_data)
    df["end"] = pd.to_datetime(df["end"])
    df["start"] = pd.to_datetime(df["start"])
    df = df.drop_duplicates(subset=["fact", "end", "val"])
    df.set_index("end", inplace=True)
    labels_dict = {fact: details["label"] for fact, details in us_gaap_data.items()}
    return df, labels_dict