**Table of contents**<a id='toc0_'></a>    
- [Libraries](#toc1_)    
- [Loading text filings](#toc2_)    
- [Extracting the Management's Discussion and Analysis Section](#toc3_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Libraries](#toc0_)

In [28]:
import os
import time
os.chdir(os.environ.get('PROJECT_PATH'))
from secnlp.ml_logic import data as d
from secnlp.ml_logic import parsing as p
import secnlp.ml_logic.parsing
from secnlp import utils as u
from secnlp.params import *
import pandas as pd
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# <a id='toc2_'></a>[Loading text filings](#toc0_)

In [3]:
df = u.read_data_from_bq(credentials = SERVICE_ACCOUNT, gcp_project = PROJECT, bq_dataset = DATASET_ID, table = FILINGS_10KQ_TABLE_ID)

In [4]:
df['date_filed'] = pd.to_datetime(df['date_filed'])

In [6]:
filing_sample_10k = df[(df['date_filed'].dt.year == 2023) & (df['form_type'] == '10-K')].sample(10)
filing_sample_10k['raw_filing'] = filing_sample_10k['file_name'].apply(lambda url: d.fetch_text_from_url(url, agent = AGENT))


In [7]:
filing_sample_10q = df[(df['date_filed'].dt.year == 2023) & (df['form_type'] == '10-Q')].sample(10)
filing_sample_10q['raw_filing'] = filing_sample_10q['file_name'].apply(lambda url: d.fetch_text_from_url(url, agent = AGENT))


# 10-Ks

In [8]:
filing_sample_10k['mda'] = filing_sample_10k['raw_filing'].apply(lambda text: p.parse_10k_filing_items(text, item = '7'))
filing_sample_10k['business'] = filing_sample_10k['raw_filing'].apply(lambda text: p.parse_10k_filing_items(text, item = '1'))
filing_sample_10k['risk_factors'] = filing_sample_10k['raw_filing'].apply(lambda text: p.parse_10k_filing_items(text, item = '1a'))
filing_sample_10k['market_risk'] = filing_sample_10k['raw_filing'].apply(lambda text: p.parse_10k_filing_items(text, item = '7a'))
filing_sample_10k

Unnamed: 0,cik,company,form_type,date_filed,file_name,raw_filing,mda,business,risk_factors,market_risk
70336,1043951,CAMPBELL FUND TRUST,10-K,2023-03-24,edgar/data/1043951/0001140361-23-013738.txt,<SEC-DOCUMENT>0001140361-23-013738.txt : 20230...,item 7 managements discussion and analysis of ...,item 1 business general development of busines...,item 1a risk factors general investment relate...,item 7a quantitative and qualitative disclosur...
5208,56873,KROGER CO,10-K,2023-03-28,edgar/data/56873/0001558370-23-004767.txt,<SEC-DOCUMENT>0001558370-23-004767.txt : 20230...,item 7 management39s discussion and analysis o...,item 1 business 8203 the kroger co the company...,item 1a risk factors 8203 there are risks and ...,item 7a quantitative and qualitative disclosur...
203561,746515,EXPEDITORS INTERNATIONAL OF WASHINGTON INC,10-K,2023-03-01,edgar/data/746515/0000950170-23-005412.txt,<SEC-DOCUMENT>0000950170-23-005412.txt : 20230...,item 7 x201cmanagementx2019s discussion and an...,item 1 business 3,item 1a risk factors 16,item 7a which include additional factors that ...
209119,1803858,Wells Fargo Commercial Mortgage Trust 2020-C56,10-K,2023-03-16,edgar/data/1803858/0001888524-23-003416.txt,<SEC-DOCUMENT>0001888524-23-003416.txt : 20230...,item 7 managements discussion and analysis of ...,item 1 business omitted,item 1a risk factors omitted,item 7a quantitative and qualitative disclosur...
200733,1763502,CSAIL 2019-C15 Commercial Mortgage Trust,10-K,2023-03-16,edgar/data/1763502/0001888524-23-003379.txt,<SEC-DOCUMENT>0001888524-23-003379.txt : 20230...,item 7 managements discussion and analysis of ...,item 1 business omitted,item 1a risk factors omitted,item 7a quantitative and qualitative disclosur...
183836,1831964,NORTHERN REVIVAL ACQUISITION Corp,10-K,2023-05-01,edgar/data/1831964/0001213900-23-034032.txt,<SEC-DOCUMENT>0001213900-23-034032.txt : 20230...,item 7 managements discussion and analysis of ...,item 1 business overview we are a newly organi...,item 1a risk factors elsewhere in this annual ...,item 7a quantitative and qualitative disclosur...
137875,1852749,ExcelFin Acquisition Corp.,10-K,2023-03-30,edgar/data/1852749/0001104659-23-038506.txt,<SEC-DOCUMENT>0001104659-23-038506.txt : 20230...,item 7 managements discussion and analysis of ...,item 1 business introduction we are a blank ch...,item 1a risk factors summary of risk factors a...,item 7a quantitative and qualitative disclosur...
43197,1424864,Rise Gold Corp.,10-K,2023-10-30,edgar/data/1424864/0001062993-23-019828.txt,<SEC-DOCUMENT>0001062993-23-019828.txt : 20231...,item 7 managements discussion and analysis of ...,item 1 business description of business we are...,item 1a risk factors risks related to our busi...,item 7a quantitative and qualitative disclosur...
68994,1617640,"ZILLOW GROUP, INC.",10-K,2023-02-15,edgar/data/1617640/0001617640-23-000010.txt,<SEC-DOCUMENT>0001617640-23-000010.txt : 20230...,item 7 of this annual report on form 10 k 13 s...,item 1 business overview we are reimagining re...,item 1a risk factors of this report including ...,item 7a quantitative and qualitative disclosur...
76083,849869,SILGAN HOLDINGS INC,10-K,2023-02-23,edgar/data/849869/0000849869-23-000019.txt,<SEC-DOCUMENT>0000849869-23-000019.txt : 20230...,item 7 managements discussion and analysis of ...,item 1 business 1,item 1a risk factors 15,item 7a quantitative and qualitative disclosur...


# 10-Q

In [29]:
filing_sample_10q['mda'] = filing_sample_10q['raw_filing'].apply(lambda text: p.parse_10q_filing_items(text, item = '2'))
filing_sample_10q['risk_factors'] = filing_sample_10q['raw_filing'].apply(lambda text: p.parse_10q_filing_items(text, item = '1a'))

filing_sample_10q

Unnamed: 0,cik,company,form_type,date_filed,file_name,raw_filing,mda,risk_factors
409537,1932244,Lever Global Corp,10-Q,2023-08-14,edgar/data/1932244/0001493152-23-028176.txt,<SEC-DOCUMENT>0001493152-23-028176.txt : 20230...,item 2 managements discussion and analysis of ...,item 1a risk factors8239 in accordance with th...
405386,1528356,Genie Energy Ltd.,10-Q,2023-05-09,edgar/data/1528356/0001213900-23-037589.txt,<SEC-DOCUMENT>0001213900-23-037589.txt : 20230...,item 2 managemen ts d iscussion and analysis ...,item 1a to part i risk factors in our annual r...
279696,1792044,Viatris Inc,10-Q,2023-08-07,edgar/data/1792044/0001792044-23-000024.txt,<SEC-DOCUMENT>0001792044-23-000024.txt : 20230...,item 2 unregistered sales of equity securities...,item 1a in the 2022 form 10 k and our other fi...
256712,1447669,TWILIO INC,10-Q,2023-11-09,edgar/data/1447669/0001447669-23-000226.txt,<SEC-DOCUMENT>0001447669-23-000226.txt : 20231...,item 2 unregistered sales of equity securities...,item 1a risk factors in this quarterly report ...
683165,1459188,"FUEL DOCTOR HOLDINGS, INC.",10-Q,2023-11-13,edgar/data/1459188/0001213900-23-086069.txt,<SEC-DOCUMENT>0001213900-23-086069.txt : 20231...,item 2 managements discussion and analysis of ...,item 1a risk factors as a smaller reporting co...
711949,1718405,HYCROFT MINING HOLDING CORP,10-Q,2023-05-01,edgar/data/1718405/0001718405-23-000069.txt,<SEC-DOCUMENT>0001718405-23-000069.txt : 20230...,item 2 managements discussion and analysis of ...,item 1a risk factors as the company qualifies ...
314746,1729750,"Kubient, Inc.",10-Q,2023-05-22,edgar/data/1729750/0001410578-23-001352.txt,<SEC-DOCUMENT>0001410578-23-001352.txt : 20230...,item 2 managements discussion and analysis of ...,item 1a risk factors 20 item 2 unregistered sa...
261035,701288,ATRION CORP,10-Q,2023-11-07,edgar/data/701288/0001654954-23-013879.txt,<SEC-DOCUMENT>0001654954-23-013879.txt : 20231...,item 2 unregistered sales of equity securities...,item 1a risk factors 21 item 2 unregistered sa...
827990,1826681,Sarcos Technology & Robotics Corp,10-Q,2023-08-09,edgar/data/1826681/0000950170-23-040636.txt,<SEC-DOCUMENT>0000950170-23-040636.txt : 20230...,item 2 management x2019s discussion and analys...,item 1a risk factors and elsewhere in this rep...
326491,1854526,"brooqLy, Inc.",10-Q,2023-08-14,edgar/data/1854526/0001477932-23-006022.txt,<SEC-DOCUMENT>0001477932-23-006022.txt : 20230...,item 2 managements discussion and analysis of ...,item 1a risk factors as a smaller reporting co...
