# Data Source
## Edgar
### Master index
* [Detailed instruction](sec.gov/edgar/searchedgar/accessing-edgar-data.htm)
* [Current Year for crawler-difference between master and crawler: *crawler shows full link*](https://www.sec.gov/Archives/edgar/full-index/crawler.idx)
* [Historical](https://www.sec.gov/Archives/edgar/full-index/)

[log file](https://www.sec.gov/dera/data/edgar-log-file-data-set.html)

[Accessing Edgar Data](https://www.sec.gov/edgar/searchedgar/accessing-edgar-data.htm)

[WRDS SEC Analytics Suite](https://wrds-web.wharton.upenn.edu/wrds/tools/variable.cfm?library_id=124)

[Norte Dame-Linux Setting](https://sraf.nd.edu/textual-analysis/) 

[MIT OpenEDGAR--Cloud Setting--A great overview of Edgar system](https://law.mit.edu/pub/openedgar/release/1)

* * Edgar filing plain textual
* * textual analysis code

### Packages
* [edgar package](https://github.com/joeyism/py-edgar)
* * [Need to download C++ development tools--Make sure to check C++ during the installation](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=BuildTools&rel=16)

* [Edgar official site Developer Resources](https://www.sec.gov/developer)

* [py-sec-edgar- ** A Better documented package **]
* * [Website](https://py-sec-edgar.readthedocs.io/en/latest/)
* * [Github link](https://github.com/ryansmccoy/py-sec-edgar)

* [sec-api.io- **Cross platform package: Python, Google Sheet and HTTP**](https://sec-api.io/docs#stream-python)

* University hosted
* * [MIT OpenEDGAR--A great overview of Edgar system](https://law.mit.edu/pub/openedgar/release/1) 
* * [Norte Dame-Code and Data](https://sraf.nd.edu/textual-analysis/)

### Examples

* [Mutual fund NSAR-Note that CIK maps to multiple tickers](https://www.sec.gov/Archives/edgar/data/1040612/000120928620000203/0001209286-20-000203-index.htm)




## IP Address

[IP mapping-Paid Subscription](https://github.com/ipinfo/python)

[Link Table-Free](https://iptoasn.com/)

[Super informative article and information](https://securitytrails.com/blog/asn-lookup)


In [2]:
# old fashion example
## import Python packages
from io import BytesIO
import os
from os import path
from zipfile import ZipFile
import pandas as pd
import numpy as np
import requests
import sqlite3
from sqlite3 import Error
import urllib3


In [3]:
#Set up working folder
WorkingDir="c:\\Edgar\\"

In [4]:
# download the crawler.Inx from Edgar
## 1. feed crawler.Idx URL
master_url="https://www.sec.gov/Archives/edgar/full-index/master.idx"
master=pd.read_csv(master_url, skiprows=10, names=['CIK', 'Company Name', 'Form Type', 'Date Filed', 'Filename'], sep='|', engine='python', parse_dates=True)
print(master.head())


                                                 CIK  \
0  ----------------------------------------------...   
1                                            1000097   
2                                            1000184   
3                                            1000184   
4                                            1000184   

                         Company Name Form Type  Date Filed  \
0                                None      None        None   
1  KINGDON CAPITAL MANAGEMENT, L.L.C.    SC 13G  2021-01-11   
2                              SAP SE       425  2021-01-04   
3                              SAP SE       425  2021-01-05   
4                              SAP SE       425  2021-01-06   

                                      Filename  
0                                         None  
1  edgar/data/1000097/0000919574-21-000165.txt  
2  edgar/data/1000184/0000947871-21-000001.txt  
3  edgar/data/1000184/0000947871-21-000010.txt  
4  edgar/data/1000184/0000947871-21-000

In [5]:
## 2. take care of the error
master = master[-master['CIK'].str.contains("---")]
print(master.head())

       CIK                        Company Name Form Type  Date Filed  \
1  1000097  KINGDON CAPITAL MANAGEMENT, L.L.C.    SC 13G  2021-01-11   
2  1000184                              SAP SE       425  2021-01-04   
3  1000184                              SAP SE       425  2021-01-05   
4  1000184                              SAP SE       425  2021-01-06   
5  1000184                              SAP SE       425  2021-01-07   

                                      Filename  
1  edgar/data/1000097/0000919574-21-000165.txt  
2  edgar/data/1000184/0000947871-21-000001.txt  
3  edgar/data/1000184/0000947871-21-000010.txt  
4  edgar/data/1000184/0000947871-21-000019.txt  
5  edgar/data/1000184/0000947871-21-000037.txt  


In [6]:
## 3. drop rows with missing value
master = master.dropna(axis=0,subset=['CIK','Form Type','Filename'])

In [7]:
## 3. Filter out the N-CSR... forms
NCSR=master[master['Form Type'].str.contains("N-CSR")]
NCSR.reset_index(inplace=True,drop=True)
print(NCSR.head())


       CIK                         Company Name Form Type  Date Filed  \
0   100334  AMERICAN CENTURY MUTUAL FUNDS, INC.     N-CSR  2021-01-05   
1  1006415         HARTFORD MUTUAL FUNDS INC/CT     N-CSR  2021-01-08   
2  1018170            HARDING LOEVNER FUNDS INC     N-CSR  2021-01-04   
3  1018592           AB INSTITUTIONAL FUNDS INC     N-CSR  2021-01-04   
4  1020861              SUNAMERICA SERIES, INC.     N-CSR  2021-01-07   

                                      Filename  
0   edgar/data/100334/0000100334-21-000002.txt  
1  edgar/data/1006415/0001193125-21-005004.txt  
2  edgar/data/1018170/0001193125-21-000604.txt  
3  edgar/data/1018592/0001193125-21-000905.txt  
4  edgar/data/1020861/0001104659-21-001905.txt  


In [8]:
## 4. Save the N-CSR list file as excel
outfile=WorkingDir+"data\\NCSR.xlsx"
NCSR.to_excel(outfile,sheet_name='N-CSR',index=False)

In [11]:
## 5. Download a N-CSR file
filing = NCSR['Filename'][0]
print(filing)

edgar/data/100334/0000100334-21-000002.txt


In [12]:
## 6. Full url
filingURL="https://www.sec.gov/Archives/"+filing
print(filingURL)

https://www.sec.gov/Archives/edgar/data/100334/0000100334-21-000002.txt


In [13]:
#7. Download the file
http=urllib3.PoolManager()
filingText=http.request('GET',filingURL)
filingText.data



In [15]:
# 8. Save the filing
filename=filingURL.rsplit('/', 1)[-1]
outfiling = WorkingDir+"filing\\"+filename
print(outfiling)
open(outfiling,'wb').write(filingText.data)

c:\Edgar\filing\0000100334-21-000002.txt


8561019

In [14]:
filingURL.rsplit('/', 1)[-1]

'0000100334-21-000002.txt'

In [1]:
### Cannot install edgar package without C++...

from edgar import Company
company = Company("Oracle Corp", "0001341439")
tree = company.get_all_filings(filing_type = "10-K")
docs = Company.get_documents(tree, no_of_documents=5)
tree

<Element html at 0x2b234b664f0>

In [2]:
docs

[<Element sec-document at 0x2b235f36270>,
 <Element sec-document at 0x2b234f30db0>,
 <Element sec-document at 0x2b235f36040>,
 <Element sec-document at 0x2b235f36ea0>,
 <Element sec-document at 0x2b235f36d10>]

In [3]:
## SEC API . IO
##########################
# Python 3.x Example
##########################

# package used to execute HTTP POST request to the API
import json
import urllib.request

# API Key
TOKEN = "4940b22a39296c21b420ebc6fadfd036c64971142eb0e2340210a1fc61ef5650" # replace YOUR_API_KEY with the API key you got from sec-api.io after sign up
# API endpoint
API = "https://api.sec-api.io?token=" + TOKEN

# define the filter parameters you want to send to the API 
payload = {
  "query": { "query_string": { "query": "cik:320193 AND filedAt:{2016-01-01 TO 2016-12-31} AND formType:\"10-Q\"" } },
  "from": "0",
  "size": "10",
  "sort": [{ "filedAt": { "order": "desc" } }]
}

# format your payload to JSON bytes
jsondata = json.dumps(payload)
jsondataasbytes = jsondata.encode('utf-8')   # needs to be bytes

# instantiate the request 
req = urllib.request.Request(API)

# set the correct HTTP header: Content-Type = application/json
req.add_header('Content-Type', 'application/json; charset=utf-8')
# set the correct length of your request
req.add_header('Content-Length', len(jsondataasbytes))

# send the request to the API
response = urllib.request.urlopen(req, jsondataasbytes)

# read the response 
res_body = response.read()
# transform the response into JSON
filings = json.loads(res_body.decode("utf-8"))

# print JSON 
print(filings)

{'total': {'value': 3, 'relation': 'eq'}, 'query': {'from': 0, 'size': 10}, 'filings': [{'id': '27314e16c5f49a0343de718dd7e55cac', 'accessionNo': '0001628280-16-017809', 'cik': '320193', 'ticker': 'AAPL', 'companyName': 'APPLE INC', 'companyNameLong': 'APPLE INC (Filer)', 'formType': '10-Q', 'description': 'Form 10-Q - Quarterly report [Sections 13 or 15(d)]', 'filedAt': '2016-07-27T16:32:36-04:00', 'linkToTxt': 'https://www.sec.gov/Archives/edgar/data/320193/000162828016017809/0001628280-16-017809.txt', 'linkToHtml': 'https://www.sec.gov/Archives/edgar/data/320193/0001628280-16-017809-index.htm', 'linkToXbrl': '', 'linkToFilingDetails': 'https://www.sec.gov/Archives/edgar/data/320193/000162828016017809/a10-qq320166252016.htm', 'entities': [{'companyName': 'APPLE INC (Filer)', 'cik': '0000320193', 'irsNo': '942404110', 'stateOfIncorporation': 'CA', 'fiscalYearEnd': '0924', 'type': '10-Q', 'act': '34', 'fileNo': '001-36743', 'filmNo': '161787078', 'sic': '3571 Electronic Computers'}], '