# Analytics Mindset EDGAR Explorer 

This notebook contains the solution to Part 2 of the EDGAR explorer cases using Python. This solution will provide an example using Tesla, which has the CIK # 1318605. 

Note that CIK numbers are 10-digits long. If your company's CIK is less than 10-digits, then add leading zeros to get to 10-digits. In the example below, Tesla's CIK is entered as 0001318605, which is 10-digits long. 

In the code below where the variable headers is defined:
headers = {'User-Agent' : 'your university  youremail@university.edu'}

You need to change 'your university youremail@university.edu' to the name of your University and your University email.

In [1]:
import json
import requests
import pandas as pd


cik = "0001318605"
cik10 = cik{:>010s}.format(cik)
url = "https://data.sec.gov/submissions/CIK" + cik + ".json"
headers = {'User-Agent' : 'your university  youremail@university.edu'}
response = requests.get(url, headers=headers)
data=response.json()



#### Confirm that the JSON object returned is a dictionary:
The python syntax to find the format of the data object is type(Name of data object), we are expecting the output to be 'dict' for a dictionary structure. Dictionaries are a data object which stores key-value pairs. The values can include other data objects, and in this case we will see it include strings, dictionaries, and lists. 

In [2]:
type(data)

dict

#### Identify the keys associated with the top-level dictionary:

In [3]:
data.keys()

dict_keys(['cik', 'entityType', 'sic', 'sicDescription', 'ownerOrg', 'insiderTransactionForOwnerExists', 'insiderTransactionForIssuerExists', 'name', 'tickers', 'exchanges', 'ein', 'description', 'website', 'investorWebsite', 'category', 'fiscalYearEnd', 'stateOfIncorporation', 'stateOfIncorporationDescription', 'addresses', 'phone', 'flags', 'formerNames', 'filings'])

#### Confirm the company's CIK and Name (note that the CIK will drop the leading zeros):

In [4]:
data['cik']

'1318605'

In [5]:
data['name']

'Tesla, Inc.'

#### Identify the data type for each of the values associated with these keys:

In [6]:
for k,v in data.items():
    dt=type(data.items())
    print (k,dt)

cik <class 'dict_items'>
entityType <class 'dict_items'>
sic <class 'dict_items'>
sicDescription <class 'dict_items'>
ownerOrg <class 'dict_items'>
insiderTransactionForOwnerExists <class 'dict_items'>
insiderTransactionForIssuerExists <class 'dict_items'>
name <class 'dict_items'>
tickers <class 'dict_items'>
exchanges <class 'dict_items'>
ein <class 'dict_items'>
description <class 'dict_items'>
website <class 'dict_items'>
investorWebsite <class 'dict_items'>
category <class 'dict_items'>
fiscalYearEnd <class 'dict_items'>
stateOfIncorporation <class 'dict_items'>
stateOfIncorporationDescription <class 'dict_items'>
addresses <class 'dict_items'>
phone <class 'dict_items'>
flags <class 'dict_items'>
formerNames <class 'dict_items'>
filings <class 'dict_items'>


#### Identify where the data relating to the type of forms filed by the entity are in the data object

As a hint: it is a list nested within several dictionaries. Note that in the prior answer, the final dictionary is called "filings <class 'dict_items'>" this is likely where the forms data is located as this will give information about the filings whereas the prior dictionaries tend to relate to entity-level facts like the company's cik, name and ticker symbol for example.

Start by saving the filings dictionary into a new data frame and rerunning the loop to identify dictionary keys:

In [7]:
data2 = data['filings']
for k,v in data2.items():
    dt=type(data2.items())
    print (k,dt)

recent <class 'dict_items'>
files <class 'dict_items'>


This is an example of a nested JSON data structure as referenced in the case. Note that python syntax allows a dictionary within a dictionary to be accessed by adding that disctionary to the end of the previous dictionary. For example, suppose we wanted to access the data in dictionary2 which is nested within dictionary1, we can use the following syntax: ['dictionary1']['dictionary2']. Next take a look at the data within the dictionary: files <class 'dict_items'>

In [8]:
type(data['filings']['files'])

list

The data is stored as a list. This means that there is not another nested dictionary in this part of the file. Printing the file to the log will allow us to see what the list contains.

In [9]:
data['filings']['files']

[{'name': 'CIK0001318605-submissions-001.json',
  'filingCount': 541,
  'filingFrom': '2005-02-17',
  'filingTo': '2016-09-06'}]

The print out highlights that this information is technical information regarding the number of filings summarized in this JSON document and any other associated files for the entity's filings (if they have more than 2000 filings they will have multiple entries in the list). This can be confirmed with the field 'filingCount'). This suggests that the forms data is within the field: recent <class 'dict_items'>

Repeating above:

In [10]:
type(data['filings']['recent'])

dict

The above code highlights that the field recent (nested within filings) is a dictionary. Saving the dictionary 'recent' into a new dataframe called data2 and then running the loop to understand which fields are nested within the recent dictionary can be done by repeating the code above:   

In [11]:
data2 = data['filings']['recent']
for k,v in data2.items():
    dt=type(data2.items())
    print (k,dt)

accessionNumber <class 'dict_items'>
filingDate <class 'dict_items'>
reportDate <class 'dict_items'>
acceptanceDateTime <class 'dict_items'>
act <class 'dict_items'>
form <class 'dict_items'>
fileNumber <class 'dict_items'>
filmNumber <class 'dict_items'>
items <class 'dict_items'>
core_type <class 'dict_items'>
size <class 'dict_items'>
isXBRL <class 'dict_items'>
isInlineXBRL <class 'dict_items'>
primaryDocument <class 'dict_items'>
primaryDocDescription <class 'dict_items'>


We have now found the field 'form' which contains all of the forms filed by the company. The structure of the dictionary 'recent' is likely a dictionary of lists, as each of the fields above, such as accessionNumber, filingDate, form etc, are expected to contain information about multiple filings made by the entity. As we are interested in the field form, we will examine the data type of form, rembering that it is nested within recent which is nested within filings, similar to earlier we use the python syntax:

In [12]:
type(data['filings']['recent']['form'])

list

#### Save the list of forms filed as a pandas dataframe 
We will call it data3:

In [30]:
alldata = pd.DataFrame(data['filings']['recent'])
data3=pd.DataFrame(data['filings']['recent']['form'])
all10ks = alldata[alldata['form'] == '10-K']

print data3

In [31]:
print(all10ks)

          accessionNumber  filingDate  reportDate        acceptanceDateTime  \
82   0001628280-24-002390  2024-01-29  2023-12-31  2024-01-26T21:00:20.000Z   
183  0000950170-23-001409  2023-01-31  2022-12-31  2023-01-30T21:29:15.000Z   
283  0000950170-22-000796  2022-02-07  2021-12-31  2022-02-04T20:11:27.000Z   
415  0001564590-21-004599  2021-02-08  2020-12-31  2021-02-08T07:27:23.000Z   
553  0001564590-20-004475  2020-02-13  2019-12-31  2020-02-13T07:12:18.000Z   
706  0001564590-19-003165  2019-02-19  2018-12-31  2019-02-19T06:10:16.000Z   
813  0001564590-18-002956  2018-02-23  2017-12-31  2018-02-23T06:07:43.000Z   
919  0001564590-17-003118  2017-03-01  2016-12-31  2017-03-01T16:54:21.000Z   

    act  form fileNumber filmNumber items core_type      size  isXBRL  \
82   34  10-K  001-34756   24569853            XBRL  15527801       1   
183  34  10-K  001-34756   23570030            XBRL  31445171       1   
283  34  10-K  001-34756   22595227            XBRL  29316024       1

#### Identify the most recently filed form and the dates and timestamps associated with that form.



Identify the number of unique forms filed by the entity. Note that the API returns forms in chronological order, but we can check this by looking at the field 'filingDate' which contains the date for each filing in the 'recent' dictionary, note that 'filingDate' is a list. To get the first item from a list, you can use the python syntax [0] and then each subsequent filing date will be ordered sequentially, i.e., the second filing is [1] etc. To get a range of filing dates from the list, we can use the python syntax [0:4] to recover the first five filing dates (the syntax can be modified to any range). Using that logic, to recover the first ten filing dates:    



In [14]:
data['filings']['recent']['filingDate'][0:9]

['2024-09-25',
 '2024-09-23',
 '2024-09-09',
 '2024-07-29',
 '2024-07-25',
 '2024-07-24',
 '2024-07-23',
 '2024-07-02',
 '2024-06-14']

In most cases, ten filings should highlight that the filings are ordered in chronological order, with the most recent filing first. To answer our question, we can identify the most recently filed form and the dates and timestamps associated with that form, as the data in space [0] for all of these fields. All of these fields are stored in the dataframe data2. So we can use the syntax data2['fieldName'][0] to recover the requested information:    

In [15]:
data2['form'][0]

'4'

In [16]:
data2['filingDate'][0]

'2024-09-25'

In [17]:
data2['acceptanceDateTime'][0]

'2024-09-25T19:48:42.000Z'

### Using the list of forms filed by the entity (which you have saved as a pandas dataframe), use Python syntax to:
#### Identify the number of unique forms filed by the entity.
Recall that we saved the field 'form' as a dataframe named data3 above. We will reference data3 to answer the two questions in this section. 

In [25]:
uniqueFilings=pd.unique(data3)
print(uniqueFilings)

  uniqueFilings=pd.unique(data3)


ValueError: could not broadcast input array from shape (1001,1) into shape (1001,)

We could count the output above to see how many unique filings are in our file. It is also possible to use the python syntax len() to count the number for us. This will remove any possible human errors in counting and would be especially helpful if there are hundreds of unique filings.

In [None]:
nUnique = len(uniqueFilings)
print(nUnique)

#### Calculate the number of times each form was filed based in this dataset and identify the most popular filing for your company.
We can now go ahead and count the filings using the groupby and count commands. In most cases the most common filing is the form 4 filing. What is it for your company?

In [18]:
data3.groupby(data3[0])[0].count()

0
10-K                  8
10-K/A                3
10-Q                 24
144                  29
3                    15
4                   471
4/A                   3
424B3                 2
424B5                16
425                  18
5                     2
5/A                   1
8-K                 145
8-K/A                 2
ARS                   2
CORRESP              22
CT ORDER             20
D                     1
DEF 14A               9
DEFA14A              62
EFFECT                2
FWP                   2
NO ACT                1
POS AM                2
PRE 14A               5
PX14A6G              28
PX14A6N               1
S-3ASR                3
S-4                   1
S-4/A                10
S-8                   3
S-8 POS               7
SC 13G                7
SC 13G/A             36
SC TO-T               1
SC TO-T/A             8
SD                    8
SEC STAFF LETTER      2
UPLOAD               19
Name: 0, dtype: int64

Filter out the 10-Ks

In [24]:
# Step 1: Convert the list into a DataFrame
data3 = pd.DataFrame(data['filings']['recent']['form'], columns=['form'])

# Step 2: Filter out rows where form is '10-K'
filtered_data3 = data3[data3['form'] == '10-K']

# Display the filtered dataframe (will be empty if no '10-K' form exists)
print(filtered_data3)

     form
82   10-K
183  10-K
283  10-K
415  10-K
553  10-K
706  10-K
813  10-K
919  10-K
