# Sector Analysis: Finding Mosaic Co's Industry Sector

This notebook demonstrates how to:
1. Find a company in the SEC database
2. Extract industry sector information
3. Identify companies in the same sector for comparative analysis

In [45]:
import pandas as pd
import numpy as np
from pathlib import Path
import polars as pl
# Configure pandas display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.width', 1000)

In [2]:
from secfsdstools.c_index.searching import IndexSearch


In [3]:
from secfsdstools.c_index.searching import  IndexSearch

from secfsdstools.c_index.companyindexreading import CompanyIndexReader

from secfsdstools.e_collector.reportcollecting import SingleReportCollector

from secfsdstools.e_filter.rawfiltering import ReportPeriodRawFilter, MainCoregRawFilter, USDOnlyRawFilter

2025-10-04 16:04:16,282 [INFO] configmgt  reading configuration from C:\Users\Jesse\.secfsdstools.cfg


## Step 1: Find Mosaic Co in the SEC Database

In [19]:
# Initialize the search functionality
search = IndexSearch.get_index_search()

# Search for Mosaic companies
mosaic_companies = search.find_company_by_name("mosaic")
print("Companies with 'mosaic' in the name:")
print(mosaic_companies)
print(f"\nFound {len(mosaic_companies)} companies")

2025-10-04 16:10:26,673 [INFO] configmgt  reading configuration from C:\Users\Jesse\.secfsdstools.cfg


Companies with 'mosaic' in the name:
                            name      cik
0       MOSAIC ACQUISITION CORP.  1713952
1                      MOSAIC CO  1285785
2  MOSAIC IMMUNOENGINEERING INC.   836564

Found 3 companies


In [20]:
# Get the CIK for The Mosaic Company
mosaic_cik = 1285785
print(f"Using: CIK = {mosaic_cik}, Name = {mosaic_companies.iloc[1]['name']}")

Using: CIK = 1285785, Name = MOSAIC CO


## Step 2: Get Company Reports and Extract Business Description

In [21]:
# Get company index reader
reader = CompanyIndexReader.get_company_index_reader(cik='1285785')

# Get recent 10-K reports (annual reports contain business description)
reports_df = reader.get_all_company_reports_df(forms=["10-K"])
print("Recent 10-K reports:")
print(reports_df[['adsh', 'name', 'form', 'filed', 'period']].head())

# Get the most recent 10-K
latest_10k = reports_df.iloc[0]
latest_adsh = latest_10k['adsh']
print(f"\nMost recent 10-K: {latest_adsh} filed on {latest_10k['filed']}")

2025-10-04 16:10:27,115 [INFO] configmgt  reading configuration from C:\Users\Jesse\.secfsdstools.cfg


Recent 10-K reports:
                   adsh       name  form     filed    period
0  0001618034-25-000003  MOSAIC CO  10-K  20250303  20241231
1  0001618034-24-000004  MOSAIC CO  10-K  20240222  20231231
2  0001618034-23-000003  MOSAIC CO  10-K  20230223  20221231
3  0001618034-22-000004  MOSAIC CO  10-K  20220223  20211231
4  0001618034-21-000003  MOSAIC CO  10-K  20210222  20201231

Most recent 10-K: 0001618034-25-000003 filed on 20250303


In [22]:
latest_adsh

'0001618034-25-000003'

## Step 3: Extract Industry Information from SEC Data

The SEC filing contains Standard Industrial Classification (SIC) codes and business descriptions.

In [39]:
# Get the submission data which contains SIC codes and business info
collector = SingleReportCollector.get_report_by_adsh(adsh=latest_adsh)
report_data = collector.collect()

# Extract submission information
sub_info = report_data.sub_df.iloc[0]
print("Company Submission Information:")
print(f"Company Name: {sub_info['name']}")
print(f"Business Address: {sub_info['bas1']}, {sub_info['cityba']}, {sub_info['stprba']}")
print(f"SIC Code: {sub_info.get('sic', 'Not available')}")
print(f"Fiscal Year End: {sub_info.get('fye', 'Not available')}")
print(f"Form Type: {sub_info['form']}")
print(f"Filing Date: {sub_info['filed']}")

2025-10-04 16:26:29,810 [INFO] configmgt  reading configuration from C:\Users\Jesse\.secfsdstools.cfg


Company Submission Information:
Company Name: MOSAIC CO
Business Address: 101 EAST KENNEDY BLVD., TAMPA, FL
SIC Code: 2870.0
Fiscal Year End: 1231
Form Type: 10-K
Filing Date: 20250303


## Step 4: Look Up SIC Code Industry Classification

In [24]:
import os
root = Path(os.getcwd()).resolve()
root

WindowsPath('C:/Users/Jesse/Google Drive/management/finanaces/invest/modelling for investing/edgar_api/notebooks')

In [25]:
# SIC code mapping (major industry groups)
SIC_mapping = pd.read_csv(root.parent / 'knowledge/sic_industry_code.csv')


In [26]:
SIC_mapping['SIC Code'].dtype

dtype('int64')

In [29]:
SIC_mapping

Unnamed: 0,SIC Code,Office,Industry Title
0,100,Industrial Applications and Services,AGRICULTURAL PRODUCTION-CROPS
1,200,Industrial Applications and Services,AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES
2,700,Industrial Applications and Services,AGRICULTURAL SERVICES
3,800,Industrial Applications and Services,FORESTRY
4,900,Industrial Applications and Services,"FISHING, HUNTING AND TRAPPING"
...,...,...,...
439,8880,Office of International Corp Fin,AMERICAN DEPOSITARY RECEIPTS
440,8888,Office of International Corp Fin,FOREIGN GOVERNMENTS
441,8900,Office of Trade & Services,"SERVICES-SERVICES, NEC"
442,9721,Office of International Corp Fin,INTERNATIONAL AFFAIRS


In [30]:

def get_industry_from_sic(sic_code):
    if pd.isna(sic_code):
        return "SIC code not available"
    
    sic_code = int(sic_code)
    
    # Check detailed codes first
    if sic_code in SIC_mapping['SIC Code'].values:
        return SIC_mapping[SIC_mapping['SIC Code'] == sic_code]['Industry Title'].values[0]

    # Check broad industry categories
    for sic_range, industry in SIC_mapping.groupby('SIC Code')['Industry Title'].apply(list).items():
        if sic_code in sic_range:
            return industry
    
    return f"Industry not found for SIC {sic_code}"

# Get industry classification
sic_code = sub_info.get('sic')
industry = get_industry_from_sic(sic_code)

print(f"\nMosaic Company Industry Classification:")
print(f"SIC Code: {sic_code}")
print(f"Industry: {industry}")


Mosaic Company Industry Classification:
SIC Code: 2870.0
Industry: AGRICULTURAL CHEMICALS


## Step 5: Find Other Companies in the Same Sector

To find companies in the same sector, we need to search through all companies and filter by SIC code.

### create table cik -> sic

In [31]:
# Get all companies from the index to find sector peers
# Note: This is a comprehensive search that may take some time
print("Searching for companies in the same sector...")
print("This may take a moment as we search through all companies in the database.")

# Get database accessor to search all companies
dbaccessor = search.dbaccessor
all_companies_df = dbaccessor.read_all_indexreports_df()

print(f"Total companies in database: {len(all_companies_df)}")
print(f"Looking for companies with SIC code {sic_code}...")

Searching for companies in the same sector...
This may take a moment as we search through all companies in the database.
Total companies in database: 406989
Looking for companies with SIC code 2870.0...


In [33]:
all_companies_df

Unnamed: 0,adsh,cik,name,form,filed,period,fullPath,originFile,originFileType,url
0,0000065984-10-000168,65984,ENTERGY CORP /DE/,10-Q/A,20100903,20100630,C:\Users\Jesse\secfsdstools\data\parquet\quarter\2010q3.zip,2010q3.zip,quarter,https://www.sec.gov/Archives/edgar/data/65984/000006598410000168/0000065984-10-000168-index.htm
1,0001107694-10-000038,1107694,"RACKSPACE HOSTING, INC.",10-Q/A,20100824,20100630,C:\Users\Jesse\secfsdstools\data\parquet\quarter\2010q3.zip,2010q3.zip,quarter,https://www.sec.gov/Archives/edgar/data/1107694/000110769410000038/0001107694-10-000038-index.htm
2,0000031235-10-000089,31235,EASTMAN KODAK CO,10-Q,20100728,20100630,C:\Users\Jesse\secfsdstools\data\parquet\quarter\2010q3.zip,2010q3.zip,quarter,https://www.sec.gov/Archives/edgar/data/31235/000003123510000089/0000031235-10-000089-index.htm
3,0001140361-10-032573,36047,"CORELOGIC, INC.",10-Q/A,20100810,20100630,C:\Users\Jesse\secfsdstools\data\parquet\quarter\2010q3.zip,2010q3.zip,quarter,https://www.sec.gov/Archives/edgar/data/36047/000114036110032573/0001140361-10-032573-index.htm
4,0001047469-10-007096,1063761,SIMON PROPERTY GROUP INC /DE/,10-Q,20100806,20100630,C:\Users\Jesse\secfsdstools\data\parquet\quarter\2010q3.zip,2010q3.zip,quarter,https://www.sec.gov/Archives/edgar/data/1063761/000104746910007096/0001047469-10-007096-index.htm
...,...,...,...,...,...,...,...,...,...,...
406984,0001907982-22-000014,1907982,D-WAVE QUANTUM INC.,10-Q,20221110,20220930,C:\Users\Jesse\secfsdstools\data\parquet\quarter\2022q4.zip,2022q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/1907982/000190798222000014/0001907982-22-000014-index.htm
406985,0001915657-22-000091,1915657,HF SINCLAIR CORP,10-Q,20221107,20220930,C:\Users\Jesse\secfsdstools\data\parquet\quarter\2022q4.zip,2022q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/1915657/000191565722000091/0001915657-22-000091-index.htm
406986,0001929561-22-000022,1929561,"RXO, INC.",10-Q,20221130,20220930,C:\Users\Jesse\secfsdstools\data\parquet\quarter\2022q4.zip,2022q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/1929561/000192956122000022/0001929561-22-000022-index.htm
406987,0001935979-22-000011,1935979,BIOHAVEN LTD.,10-Q,20221109,20220930,C:\Users\Jesse\secfsdstools\data\parquet\quarter\2022q4.zip,2022q4.zip,quarter,https://www.sec.gov/Archives/edgar/data/1935979/000193597922000011/0001935979-22-000011-index.htm


In [90]:
import polars as pl
from pathlib import Path

def latest_sic_per_company_from_adsh_polars(all_companies_df, parquet_root):
    adshs = pl.Series("adsh", all_companies_df["adsh"].astype(str).tolist())
    glob = (Path(parquet_root).expanduser().resolve() / "**" / "sub.txt.parquet").as_posix()

    # Prefer schema_overrides; fall back to dtypes for older Polars
    schema_overrides = {"adsh": pl.Utf8, "cik": pl.Int32 , "sic": pl.Float64 , "filed": pl.Int32 }
    sub = pl.scan_parquet(glob, schema=schema_overrides,cast_options=pl.ScanCastOptions(integer_cast='upcast'), extra_columns='ignore')
    df = (
        sub
        .filter(pl.col("adsh").is_in(adshs) & pl.col("sic").is_not_null())
        .sort(["cik", "filed"], descending=[False, True])
        .unique(subset=["cik"], keep="first")  # latest per company
        .select(["adsh", "cik", "sic",])
        .collect()
    )
    return df

In [98]:
sic_table = latest_sic_per_company_from_adsh_polars(all_companies_df, parquet_root=r'C:\Users\Jesse\secfsdstools\data\parquet\quarter')
# join sic_table with all_companies_df to get company names
sic_table = sic_table.to_pandas()
# get mapping of cik to name from uniqiue name and cik in all_companies_df
cik_name_mapping = all_companies_df[['cik', 'name']].drop_duplicates().set_index('cik')['name'].to_dict()
sic_table['company_name'] = sic_table['cik'].map(cik_name_mapping)
# add industry column and drop adsh
sic_to_ind_map = SIC_mapping.set_index('SIC Code')['Industry Title'].to_dict()


Please use `implode` to return to previous behavior.

See https://github.com/pola-rs/polars/issues/22149 for more information.
  .collect()


In [100]:
sic_table['industry'] = sic_table['sic'].map(sic_to_ind_map)
sic_table = sic_table.drop(columns=['adsh'])
sic_table

Unnamed: 0,cik,sic,company_name,industry
0,1750,3720.0,AAR CORP,AIRCRAFT & PARTS
1,1800,2834.0,ABBOTT LABORATORIES,PHARMACEUTICAL PREPARATIONS
2,1961,7372.0,WORLDS INC,SERVICES-PREPACKAGED SOFTWARE
3,2034,5122.0,ACETO CORP,"WHOLESALE-DRUGS, PROPRIETARIES & DRUGGISTS' SUNDRIES"
4,2098,3420.0,ACME UNITED CORP,"CUTLERY, HANDTOOLS & GENERAL HARDWARE"
...,...,...,...,...
16129,2055459,6770.0,REPUBLIC DIGITAL ACQUISITION CO,BLANK CHECKS
16130,2057043,6770.0,WEN ACQUISITION CORP,BLANK CHECKS
16131,2058758,6036.0,"AVIDIA BANCORP, INC.","SAVINGS INSTITUTIONS, NOT FEDERALLY CHARTERED"
16132,2061473,6770.0,PERIMETER ACQUISITION CORP. I,BLANK CHECKS


In [101]:
# save the table 
sic_table.to_csv(root.parent / 'knowledge/sic_latest_per_company.csv', index=False)

### get companies in sector

In [106]:
sic_to_cik_table = pd.read_csv(root.parent / 'knowledge/sic_latest_per_company.csv')
chosen_sic = 2870  # Example SIC code for "Agricultural Chemicals"
# Get companies in the chosen sector
sector_companies = sic_to_cik_table[sic_to_cik_table['sic'] == chosen_sic]
sector_companies

Unnamed: 0,cik,sic,company_name,industry
40,5981,2870.0,AMERICAN VANGUARD CORP,AGRICULTURAL CHEMICALS
1468,750150,2870.0,WESTBRIDGE RESEARCH GROUP,AGRICULTURAL CHEMICALS
1983,825542,2870.0,SCOTTS MIRACLE-GRO CO,AGRICULTURAL CHEMICALS
2210,855931,2870.0,POTASH CORP OF SASKATCHEWAN INC,AGRICULTURAL CHEMICALS
2220,857949,2870.0,ENLIGHTIFY INC.,AGRICULTURAL CHEMICALS
2338,868725,2870.0,"RENTECH, INC.",AGRICULTURAL CHEMICALS
2403,875729,2870.0,BION ENVIRONMENTAL TECHNOLOGIES INC,AGRICULTURAL CHEMICALS
2439,879575,2870.0,TERRA NITROGEN CO L P /DE,AGRICULTURAL CHEMICALS
3217,941221,2870.0,ISRAEL CHEMICALS LTD,AGRICULTURAL CHEMICALS
3224,943003,2870.0,AGRIUM INC,AGRICULTURAL CHEMICALS


# load stmnts of chosen companies

## Summary and Conclusions

This notebook demonstrates how to:

1. **Find a company**: Use `IndexSearch.find_company_by_name()` to locate companies
2. **Extract industry information**: Get SIC codes from submission data in 10-K reports
3. **Classify industry sector**: Map SIC codes to industry categories
4. **Find sector peers**: Search for companies with similar business descriptions or SIC codes
5. **Prepare for comparative analysis**: Collect recent financial reports for sector companies

### Key Findings:
- The Mosaic Company's industry classification based on SEC data
- List of peer companies in the same sector
- Framework for conducting sector-wide financial analysis

### Next Steps for Sector Analysis:
1. Use `MultiReportCollector` to gather financial data from all sector companies
2. Apply standardizers to make data comparable across companies
3. Analyze key metrics like revenue, profitability, and asset efficiency
4. Identify industry trends and company relative performance