In [1]:
import requests
from bs4 import BeautifulSoup

# Section One: Define the Parameters of the Search
To create a search we need to "build" a URL that takes us to a valid results query, this requires taking our base endpoint and attaching on different parameters to help narrow down our search. I'll do my best to explain how each of these parameters works, but unfortunately, there is no formal documentation on this.

Endpoint The endpoint for our EDGAR query is https://www.sec.gov/cgi-bin/browse-edgar if you go to this link without any additional parameters it will be an invalid request.

--------------------------------------------------------------------
### Parameters:

- **action:** (required) By default should be set to getcompany.

- **CIK**: (required) Is the CIK number of the company you are searching.

- **type**: (optional) Allows filtering the type of form. For example, if set to 10-k only the 10-K filings are returned.

- **dateb**: (optional) Will only return the filings before a given date. The format is as follows YYYYMMDD

- **owner:** (required) Is set to exclude by default and specifies ownership. You may also set it to include and only.

- **start:** (optional) Is the starting index of the results. For example, if I have 100 results but want to start at 45 of 100, I would pass 45.

- **state:** (optional) The company's state.

- **filenum:** (optional) The filing number.

- **sic:** (optional) The company's SIC (Standard Industry Classification) identifier
- **output:** (optional) Defines returned data structure as either xml (atom) or normal html.

- **count:** (optional) The number of results you want to see with your request, the max is 100 and if not set it will default to 40.

------------------------------------------------------------------------------
Now that we understand all the parameters let's make a request by defining our endpoint, and then a dictionary of our parameters. Where the key of the dictionary is the parameter name, and the value is the value we want to set for that parameter. Once we've defined these two components we can make our request and parse the response using BeautifulSoup.

In [2]:
# base URL for the SEC EDGAR browser
endpoint = r"https://www.sec.gov/cgi-bin/browse-edgar"

# define our parameters dictionary
param_dict = {'action':'getcompany',
              'CIK':'789019',
              'type':'10-k',
              'dateb':'20190101',
              'owner':'exclude',
              'start':'',
              'output':'atom',
              'count':'100'}

# request the url, and then parse the response.
response = requests.get(url = endpoint, params = param_dict)
soup = BeautifulSoup(response.content, 'lxml')

# print status code
print(response.status_code)
print(response.url)

200
https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=789019&type=10-k&dateb=20190101&owner=exclude&start=&output=atom&count=100


In [3]:
# base URL for the SEC EDGAR browser
endpoint = r"https://www.sec.gov/cgi-bin/browse-edgar"

# define our parameters dictionary
param_dict = {'action':'getcompany',
              'owner':'exclude',
              'output':'atom',
              'company':'Microsoft'}

# request the url, and then parse the response.
response = requests.get(url = endpoint, params = param_dict)
soup = BeautifulSoup(response.content, 'lxml')

# print status code
print(response.status_code)
print(response.url)

200
https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&owner=exclude&output=atom&company=Microsoft


In [None]:
# find all entry tags

entries = soup.find_all("entry")

#intilia