# `sec-edgar-downloader` 라이브러리

- 이 라이브러리를 사용하여 AAPL의 최신 10-K 보고서를 수집

In [6]:
import glob

files = glob.glob("**/*.txt", recursive=True)
print(files)

['requirements.txt', 'sec-edgar-filings/AAPL/10-K/0000320193-24-000123/full-submission.txt', 'rag-finance/lib/python3.13/site-packages/shellingham-1.5.4.dist-info/top_level.txt', 'rag-finance/lib/python3.13/site-packages/langchain_openai-0.3.14.dist-info/entry_points.txt', 'rag-finance/lib/python3.13/site-packages/coloredlogs-15.0.1.dist-info/entry_points.txt', 'rag-finance/lib/python3.13/site-packages/coloredlogs-15.0.1.dist-info/top_level.txt', 'rag-finance/lib/python3.13/site-packages/coloredlogs-15.0.1.dist-info/LICENSE.txt', 'rag-finance/lib/python3.13/site-packages/opentelemetry_instrumentation_fastapi-0.53b1.dist-info/entry_points.txt', 'rag-finance/lib/python3.13/site-packages/google_auth-2.39.0.dist-info/top_level.txt', 'rag-finance/lib/python3.13/site-packages/uri_template-1.3.0.dist-info/top_level.txt', 'rag-finance/lib/python3.13/site-packages/humanfriendly-10.0.dist-info/entry_points.txt', 'rag-finance/lib/python3.13/site-packages/humanfriendly-10.0.dist-info/top_level.txt

- 수집된 보고서를 아래 경로에 저장함
- `sec-edgar-filings/AAPL/10-K/{report_id}/full-submission.txt`

In [10]:
file_path = "sec-edgar-filings/AAPL/10-K/0000320193-24-000123/full-submission.txt"

with open(file_path, encoding="utf-8") as f:
    text = f.read()

print(text[:1000])  # 일부 미리보기

<SEC-DOCUMENT>0000320193-24-000123.txt : 20241101
<SEC-HEADER>0000320193-24-000123.hdr.sgml : 20241101
<ACCEPTANCE-DATETIME>20241101060136
ACCESSION NUMBER:		0000320193-24-000123
CONFORMED SUBMISSION TYPE:	10-K
PUBLIC DOCUMENT COUNT:		103
CONFORMED PERIOD OF REPORT:	20240928
FILED AS OF DATE:		20241101
DATE AS OF CHANGE:		20241101

FILER:

	COMPANY DATA:	
		COMPANY CONFORMED NAME:			Apple Inc.
		CENTRAL INDEX KEY:			0000320193
		STANDARD INDUSTRIAL CLASSIFICATION:	ELECTRONIC COMPUTERS [3571]
		ORGANIZATION NAME:           	06 Technology
		IRS NUMBER:				942404110
		STATE OF INCORPORATION:			CA
		FISCAL YEAR END:			0928

	FILING VALUES:
		FORM TYPE:		10-K
		SEC ACT:		1934 Act
		SEC FILE NUMBER:	001-36743
		FILM NUMBER:		241416806

	BUSINESS ADDRESS:	
		STREET 1:		ONE APPLE PARK WAY
		CITY:			CUPERTINO
		STATE:			CA
		ZIP:			95014
		BUSINESS PHONE:		(408) 996-1010

	MAIL ADDRESS:	
		STREET 1:		ONE APPLE PARK WAY
		CITY:			CUPERTINO
		STATE:			CA
		ZIP:			95014

	FORMER COMPANY:	
		FORMER

- 위 경로에서 `full-submission.txt`를 열어 텍스트 데이터로 로드
- `text` 변수에 문서를 저장해둠

In [9]:
import glob

# 가장 최근에 받은 AAPL의 10-K 보고서 경로 자동 탐색
paths = glob.glob("sec-edgar-filings/AAPL/10-K/*/full-submission.txt")
paths.sort()  # 보고서 ID 기준 정렬
latest_file = paths[-1]

with open(latest_file, encoding="utf-8") as f:
    text = f.read()

print(f"로드된 파일: {latest_file}")
print(text[:500])

로드된 파일: sec-edgar-filings/AAPL/10-K/0000320193-24-000123/full-submission.txt
<SEC-DOCUMENT>0000320193-24-000123.txt : 20241101
<SEC-HEADER>0000320193-24-000123.hdr.sgml : 20241101
<ACCEPTANCE-DATETIME>20241101060136
ACCESSION NUMBER:		0000320193-24-000123
CONFORMED SUBMISSION TYPE:	10-K
PUBLIC DOCUMENT COUNT:		103
CONFORMED PERIOD OF REPORT:	20240928
FILED AS OF DATE:		20241101
DATE AS OF CHANGE:		20241101

FILER:

	COMPANY DATA:	
		COMPANY CONFORMED NAME:			Apple Inc.
		CENTRAL INDEX KEY:			0000320193
		STANDARD INDUSTRIAL CLASSIFICATION:	ELECTRONIC COMPUTERS [3571]
		O


In [11]:
import os
import glob

# 실제 어떤 경로에 저장되었는지 확인
matches = glob.glob("**/AAPL/10-K/*/full-submission.txt", recursive=True)

if matches:
    print("확인된 파일 경로:")
    for m in matches:
        print(" -", m)
else:
    print("❌ 아무 파일도 발견되지 않았습니다.")

확인된 파일 경로:
 - sec-edgar-filings/AAPL/10-K/0000320193-24-000123/full-submission.txt


In [1]:
!pip install --upgrade sec-edgar-downloader



In [6]:
from sec_edgar_downloader import Downloader

dl = Downloader("SoosungEng", "dronesquare@soosungeng.com", download_folder="data")
dl.get("10-K", "MSFT", limit=1)

1

In [7]:
!ls data/sec-edgar-filings/MSFT/10-K/
!ls data/sec-edgar-filings/MSFT/10-K/*/full-submission.txt

[34m0000950170-24-087843[m[m


data/sec-edgar-filings/MSFT/10-K/0000950170-24-087843/full-submission.txt


In [1]:
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings

index_path = "data/index/MSFT"
embedding_model = HuggingFaceEmbeddings(model_name="intfloat/e5-base-v2")
db = FAISS.load_local(index_path, embedding_model, allow_dangerous_deserialization=True)

all_docs = list(db.docstore._dict.values())

keyword = "16%"
matched = [doc for doc in all_docs if keyword in doc.page_content]

for i, doc in enumerate(matched[:3]):
    print(f"\n--- Document {i+1} ---")
    print("Item:", doc.metadata.get("item"))
    print("Title:", doc.metadata.get("item_title"))
    print(doc.page_content[:800])

  embedding_model = HuggingFaceEmbeddings(model_name="intfloat/e5-base-v2")



--- Document 1 ---
Item: Item 8
Title: of this Form 10-K). This section generally discusses the results of our operations for the year ended June 30, 2024 compared to the year ended June 30, 2023. For a discussion of the year ended June 30, 2023 compared to the year ended June 30, 2022, please refer to Part II, Item 7, “Management’s Discussion and Analysis of Financial Condition and Results of Operations” in our Annual Report on Form 10-K for the year ended June 30, 2023.
We generate revenue by offering a wide range of cloud-based solutions, content, and other services to people and businesses; licensing and supporting an array of software products; delivering relevant online advertising to a global audience; and designing and selling devices. Our most significant expenses are related to compensating employees; supporting and investing in our cloud-based services, including datacenter operations; designing, manufacturing, marketing, and selling our other products and services; and inc