# Download 10-Ks

This code reads 10-K URLs from an a list file and downloads the corresponding filings from the SEC's EDGAR system. Here's a breakdown of how the code works, along with how to modify it for your specific needs:

# 1. Import Libraries:

In [1]:
import requests
import os
import time


- requests: Used to download the 10-K filings from the SEC website.
- os: Used to create directories and manage file paths.
- time: Used for rate-limiting to avoid sending too many requests in a short time.

# 2. Function to Extract 10-K URLs:



In [2]:
def extract_10k_urls(file_path):
    urls = []
    with open(file_path, 'r', encoding='utf-8') as file:
        for line in file:
            if line.startswith('10-K'):
                parts = line.split(",")
                # Assuming the URL is the last part of the line
                url = parts[-1].strip()
                urls.append(f"https://www.sec.gov/Archives/{url}")
                # if len(urls) == 10:  # Limit to first 10 URLs
                #    break
    return urls


- extract_10k_urls(file_path): This function opens the list file and looks for lines starting with 10-K (indicating that the filing is a 10-K form). It assumes the URL is the last part of the line and adds it to the list of URLs. You can uncomment the lines to limit the extraction to the first 10 URLs.

# 3. Function to Download 10-K Filings:



In [3]:
def download_10k_files(urls, folder_name):
    os.makedirs(folder_name, exist_ok=True)  # Ensure the folder exists
    
    headers = {
        'User-Agent': 'TestBot +spoo@test.ai' # uPDATE WITH YOUR ACTUAL DETAILS
    }    

    request_interval = 1.0 / 10  # Interval between requests, in seconds (10 requests per second)
    
    for i, url in enumerate(urls, start=1):
        start_time = time.time()  # Start time of the request
        
        response = requests.get(url, headers=headers)
        if response.status_code == 200:
            # Extract the file name from the URL
            url_path = url.split('/')
            file_name = url_path[-1]  # The last part of the path is the file name
            
            # Save the file in the specified folder
            file_path = os.path.join(folder_name, file_name)
            with open(file_path, 'wb') as file:
                file.write(response.content)
            print(f"Downloaded {file_path}")
        else:
            print(f"Failed to download file from {url}. Status Code: {response.status_code}")
        
        # Rate-limiting to avoid overwhelming the SEC's servers
        elapsed_time = time.time() - start_time
        if elapsed_time < request_interval:
            time.sleep(request_interval - elapsed_time)


- download_10k_files(urls, folder_name): This function downloads each URL in the list of URLs and saves the 10-K filings to the specified folder. It uses rate-limiting to ensure no more than 10 requests per second are sent to the SEC’s servers.
- If the download is successful (status_code == 200), the file is saved with the name extracted from the URL.

# 4. Set File Path and Download 10-K Filings:



In [4]:
file_path = '/Users/spoorthy/Projects/Accounting/combined_filtered.csv'  # Update this with the actual path to your forms.idx file
urls_10k = extract_10k_urls(file_path)  # Extract URLs for 10-K filings

folder_name = "/Users/spoorthy/Projects/Accounting/10K"  # Update this with the desired path to store downloaded files
download_10k_files(urls_10k, folder_name)  # Download the 10-K filings


Downloaded /Users/spoorthy/Projects/Accounting/10K/0001770787-24-000017.txt
Downloaded /Users/spoorthy/Projects/Accounting/10K/0001213900-24-025262.txt
Downloaded /Users/spoorthy/Projects/Accounting/10K/0000950170-24-038577.txt


- Set File Path: Replace 'D:/EDGAR/Forms/forms.idx' with the actual file path to your forms.idx file.
- Set Folder Path: Replace 'D:/EDGAR/10K' with the desired path where the 10-K filings should be saved.

# 6. Final Notes:
- Make sure you have the correct paths for the forms.idx file and the folder where you want to save the downloaded 10-K filings. The code includes a rate limiter to ensure it follows SEC guidelines for web scraping, which is important to avoid being blocked by the server.