**Table of contents**<a id='toc0_'></a>    
- [Libraries](#toc1_)    
- [Loading text filings](#toc2_)    
- [Extracting the Management's Discussion and Analysis Section](#toc3_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Libraries](#toc0_)

In [1]:
import os
import time
os.chdir(os.environ.get('PROJECT_PATH'))
from secnlp.ml_logic import data as d
from secnlp.ml_logic import parsing as p
import secnlp.ml_logic.parsing
from secnlp import utils as u
from secnlp.params import *
import pandas as pd
import importlib

# <a id='toc2_'></a>[Loading text filings](#toc0_)

In [2]:
df = u.read_data_from_bq(credentials = SERVICE_ACCOUNT, gcp_project = PROJECT, bq_dataset = DATASET_ID, table = FILINGS_10KQ_TABLE_ID)

In [3]:
df['date_filed'] = pd.to_datetime(df['date_filed'])

In [9]:
filing_sample_10k = df[(df['date_filed'].dt.year == 2023) & (df['form_type'] == '10-K')].sample(10)
filing_sample_10k['raw_filing'] = filing_sample_10k['file_name'].apply(lambda url: d.fetch_text_from_url(url, agent = AGENT))


In [14]:
filing_sample_10q = df[(df['date_filed'].dt.year == 2023) & (df['form_type'] == '10-Q')].sample(10)
filing_sample_10q['raw_filing'] = filing_sample_10q['file_name'].apply(lambda url: d.fetch_text_from_url(url, agent = AGENT))


# 10-Ks

In [13]:
filing_sample_10k['mda'] = filing_sample_10k['raw_filing'].apply(lambda text: p.parse_10k_filing_items(text, item = '1a'))
filing_sample_10k

Unnamed: 0,cik,company,form_type,date_filed,file_name,raw_filing,mda
103579,1174850,NICOLET BANKSHARES INC,10-K,2023-02-24,edgar/data/1174850/0001174850-23-000008.txt,<SEC-DOCUMENT>0001174850-23-000008.txt : 20230...,item 1a risk factors this item outlines specif...
36315,1393311,Public Storage,10-K,2023-02-21,edgar/data/1393311/0001393311-23-000012.txt,<SEC-DOCUMENT>0001393311-23-000012.txt : 20230...,item 1a risk factors of this report and in our...
208332,1100682,"CHARLES RIVER LABORATORIES INTERNATIONAL, INC.",10-K,2023-02-22,edgar/data/1100682/0001100682-23-000006.txt,<SEC-DOCUMENT>0001100682-23-000006.txt : 20230...,item 1a risk factors and item 7 managements di...
37950,1104038,"VerifyMe, Inc.",10-K,2023-03-28,edgar/data/1104038/0001214659-23-004301.txt,<SEC-DOCUMENT>0001214659-23-004301.txt : 20230...,item 1a risk factors and those other risks and...
181580,1043121,BOSTON PROPERTIES LTD PARTNERSHIP,10-K,2023-02-27,edgar/data/1043121/0001656423-23-000013.txt,<SEC-DOCUMENT>0001656423-23-000013.txt : 20230...,item 1a risk factors for a discussion of these...
199712,1921031,Harley-Davidson Motorcycle Trust 2022-A,10-K,2023-03-23,edgar/data/1921031/0001628280-23-009002.txt,<SEC-DOCUMENT>0001628280-23-009002.txt : 20230...,item 1a risk factors
60602,884269,ALPHA PRO TECH LTD,10-K,2023-03-16,edgar/data/884269/0001437749-23-006845.txt,<SEC-DOCUMENT>0001437749-23-006845.txt : 20230...,item 1a risk factors making or continuing an i...
157800,1293234,Select Notes Trust LT 2004-1,10-K,2023-03-13,edgar/data/1293234/0001068238-23-000059.txt,<SEC-DOCUMENT>0001068238-23-000059.txt : 20230...,item 1a risk factors not applicable
165988,4904,AMERICAN ELECTRIC POWER CO INC,10-K,2023-02-23,edgar/data/4904/0000004904-23-000011.txt,<SEC-DOCUMENT>0000004904-23-000011.txt : 20230...,item 1a risk factors general risks of regulate...
114679,1060219,"SALISBURY BANCORP, INC.",10-K,2023-03-10,edgar/data/1060219/0001554795-23-000063.txt,<SEC-DOCUMENT>0001554795-23-000063.txt : 20230...,item 1a risk factors salisbury is the register...


# 10-Q

In [19]:
filing_sample_10q['mda'] = filing_sample_10q['raw_filing'].apply(lambda text: p.parse_10q_filing_items(text, item = '1a'))
filing_sample_10q

Unable to locate Item 1a
Unable to locate Item 1a
Unable to locate Item 1a
Unable to locate Item 1a
Unable to locate Item 1a
Unable to locate Item 1a
Unable to locate Item 1a
Unable to locate Item 1a
Unable to locate Item 1a
Unable to locate Item 1a


Unnamed: 0,cik,company,form_type,date_filed,file_name,raw_filing,mda
380442,1848672,Glidelogic Corp.,10-Q,2023-06-16,edgar/data/1848672/0001829126-23-004208.txt,<SEC-DOCUMENT>0001829126-23-004208.txt : 20230...,
855224,1290476,Village Bank & Trust Financial Corp.,10-Q,2023-11-13,edgar/data/1290476/0001558370-23-018847.txt,<SEC-DOCUMENT>0001558370-23-018847.txt : 20231...,
331344,1376139,CVR ENERGY INC,10-Q,2023-08-01,edgar/data/1376139/0001376139-23-000046.txt,<SEC-DOCUMENT>0001376139-23-000046.txt : 20230...,
761526,1658521,"MOUNTAIN TOP PROPERTIES, INC.",10-Q,2023-08-09,edgar/data/1658521/0001017386-23-000282.txt,<SEC-DOCUMENT>0001017386-23-000282.txt : 20230...,
758781,1645590,Hewlett Packard Enterprise Co,10-Q,2023-06-02,edgar/data/1645590/0001645590-23-000066.txt,<SEC-DOCUMENT>0001645590-23-000066.txt : 20230...,
232668,9389,BALL Corp,10-Q,2023-05-04,edgar/data/9389/0001558370-23-008025.txt,<SEC-DOCUMENT>0001558370-23-008025.txt : 20230...,
533139,928576,MIDAMERICAN ENERGY CO,10-Q,2023-08-07,edgar/data/928576/0001081316-23-000028.txt,<SEC-DOCUMENT>0001081316-23-000028.txt : 20230...,
518895,1726173,Biglari Holdings Inc.,10-Q,2023-08-04,edgar/data/1726173/0001726173-23-000036.txt,<SEC-DOCUMENT>0001726173-23-000036.txt : 20230...,
424592,1393584,American Well Corp,10-Q,2023-05-03,edgar/data/1393584/0000950170-23-017229.txt,<SEC-DOCUMENT>0000950170-23-017229.txt : 20230...,
330482,24090,"CITIZENS, INC.",10-Q,2023-08-04,edgar/data/24090/0000024090-23-000094.txt,<SEC-DOCUMENT>0000024090-23-000094.txt : 20230...,
