Skip to content

Commit

Permalink
Cleaned up files for publication on github
Browse files Browse the repository at this point in the history
  • Loading branch information
Alexander Falk committed Sep 20, 2014
1 parent 39d095a commit e6d9dd8
Show file tree
Hide file tree
Showing 4 changed files with 670 additions and 13 deletions.
106 changes: 93 additions & 13 deletions README.md
@@ -1,13 +1,34 @@
sec-xbrl
========

Copyright 2014 Altova GmbH

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

-------------------------------------------------------------------------

XBRL.US Webinar: How to download and process SEC XBRL Data Directly from EDGAR

These are the supporting files for the XBRL.US Webinar that is availble
These are the supporting Python files for the XBRL.US Webinar that is availble
on YouTube: https://www.youtube.com/watch?v=2Oe9ZqXVGME as well as the slides
available here on SlideShare: http://www.slideshare.net/afalk42/xbrl-us-altova-webinar

To use these files you will need to download and install RaptorXML+XBRL Server from
Please watch the YouTube video and review the slides to see how these Python
scripts are intended to be used. Also note that these scripts were written with
Python 3.3.3 so they may require modifications if you use them with a different
version of Python.

To use this approach you will need to download and install RaptorXML+XBRL Server from
the Altova website: http://www.altova.com/download-trial-server.html and then
request a 30-day free evaluation license key.

Expand All @@ -18,18 +39,77 @@ executable in the Python script, though.

For more information on RaptorXML, please see here: http://www.altova.com/raptorxml.html

Copyright notice and license information for all files in this directory:
USAGE INFORMATION:

Copyright 2014 Altova GmbH

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
(1) LOADSECFILINGS

http://www.apache.org/licenses/LICENSE-2.0
loadSECfilings.py -y <year> -m <month> | -f <from_year> -t <to_year>

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
These creates a subdirectory sec/ and then subsequent year-based directories and months
underneath and downloads all SES XBRL filings from the EDGAR system to your local hard
disk for further processing. Please use only during off-peak hours in order to not
overload the SEC servers. This downloads the ZIPped XBRL filings, so you'll have one
ZIP file per filing submitted to the SEC on your drive. If you call this script
again for the current or any previous month at a later day, it will only download
any files that are new and have not yet been downloaded before.

EXAMPLES:

python3 loadSECfilings.py -y 2014 -m 9

This will load all SEC filing for September 2014.

python3 loadSECfilings.py -f 2005 -t 2014

This will load all SEC filing for the start of the XBRL pilot program in 2005 until 2014.
WARNING: If you download all years available (2005-2014) this will be about 127,000 files
and take about 18GB of data on your hard disk, so please use with caution, especially
when you are on a slow Internet connection.


(2) VALSECFILINGS

valSECfilings ( -y <year> | -f <from_year> -t <to_year> ) -m <month>
-c <cik> -k <ticker> -s <script>

This will call RaptorXML+XBRL Server to validate the SEC filings for a specified year
and month or for a range of years. It assumes that the files have been downloaded by
the script above into a local sub-directory sec/. You can restrict the filings to just
those for a particular company or for a list of companies by providing their respective
CIKs or ticker symbols. Optionally you can pass a Python script to RaptorXML+XBRL Server
with the -s parameter, which will then be executed by the built-in Python interpreter
inside of RaptorXML+XBRL Server to perform additional post-validation processing of
the XBRL files. As an example, there is a Python script extractRatios.py in this project
that demonstrates how to extract common financial ratios (quick ratio, cash ratio) from
the XBRL filings.

EXAMPLES:

python3 valSECfilings.py -y 2014 -m 9

This will validate all downloaded SEC filings for the month of September 2014. If a large
number of files is passed to the Python script, it will create batches of about 20 jobs
each and pass those to RaptorXML+XBRL Server in sequential batches.

python3 valSECfilings.py -f 2013 -t 2014 -k AAPL,MSFT,ORCL

This will validate all SEC filings submitted by Apple, Microsoft, and Oracle for the
years 2013 and 2014. Positive validation messages as well as any errors or warnings
are output to the console window.

python3 valSECfilings.py -f 2013 -t 2014 -k ORCL -s extractRatios.py

This will validate all Oracle XBRL filings for the years 2013-2014 and then perform
post-validation analysis of the filings using the supplied Python script extractRatios.py
that gets passed to RaptorXML+XBRL Server and executed by its built-in Python interpreter.
This particular example script prints document and entity information and then extracts
various balance sheet facts to calculate current ratio, quick ratio, and cash ratio as
and example of how to do post-validation XBRL processing. Furthermore, it appends those
ratios to an output file ratios.csv in the same directory.


REMINDER:

To see these scripts and a lot more in-depth explanation, please watch the
YouTube video of the webinar here: https://www.youtube.com/watch?v=2Oe9ZqXVGME
168 changes: 168 additions & 0 deletions extractRatios.py
@@ -0,0 +1,168 @@
# Copyright 2014 Altova GmbH
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os, re
from altova import xml, xsd, xbrl
import os
import fcntl

sec_ns = '/dei/' # was: 'http://xbrl.sec.gov/dei/'
fasb_ns = '/us-gaap/' # was: 'http://fasb.org/us-gaap/'

class Lock:

def __init__(self, filename):
self.filename = filename
# This will create it if it does not exist already
self.handle = open(filename, 'w')

# Bitwise OR fcntl.LOCK_NB if you need a non-blocking lock
def acquire(self):
fcntl.flock(self.handle, fcntl.LOCK_EX)

def release(self):
fcntl.flock(self.handle, fcntl.LOCK_UN)

def __del__(self):
self.handle.close()

def camelToSpaces( label ):
# Utility for pretty-printing the labels
s1 = re.sub('(.)([A-Z][a-z]+)', r'\1 \2', label)
return re.sub('([a-z0-9])([A-Z])', r'\1 \2', s1)

def factFinder( instance, namespace, label ):
# Locate facts in the instance document by namespace and label, ignoring facts that have a context with a segment_element
l = []
for f in instance.items:
if f.qname.namespace_name.find( namespace ) and f.qname.local_name == label:
segment = None
try:
entElement = f.context.entity.element
for childElement in entElement.children:
if childElement.local_name=="segment":
segment = childElement
except:
pass
if segment==None:
l.append( f )
return l

def printFacts( facts, indent=1, targetDate=None ):
# Find the fact for the relevant target date and print it
factValue = 0
for fact in facts:
if targetDate==None or fact.context.period.instant == targetDate:
if fact.concept.item_type==fact.concept.MONETARY_ITEM_TYPE:
factValue = fact.effective_numeric_value
print( indent * "\t", camelToSpaces( fact.qname.local_name ).ljust(100-indent*8), "$", '{0:>16,}'.format( factValue ) )
else:
factValue = fact.normalized_value
print( indent * "\t", camelToSpaces( fact.qname.local_name ).ljust(100-indent*8), factValue )
return factValue

def on_xbrl_valid( job, instance ):

try:
lock = Lock("/tmp/extract_ratios_lock.tmp")
lock.acquire()

# Create output CSV file if it doesn't exist yet
if not os.path.isfile( "ratios.csv" ):
with open("ratios.csv", "a") as ratiofile:
ratiofile.write( "DocumentType,EntityName,CIK,PeriodEndDate,CurrentRatio,QuickRatio,CashRatio\n" )
ratiofile.close()

# Extract some basic facts from the filing, such as the effective end-date for balance sheet etc.
docEndDate = "2013-12-31"
documentType = factFinder( instance, sec_ns, "DocumentType" )
documentFiscalYearFocus = factFinder( instance, sec_ns, "DocumentFiscalYearFocus" )
documentFiscalPeriodFocus = factFinder( instance, sec_ns, "DocumentFiscalPeriodFocus" )
documentPeriodEndDate = factFinder( instance, sec_ns, "DocumentPeriodEndDate" )
if len(documentPeriodEndDate) > 0:
docEndDate = documentPeriodEndDate[0].normalized_value

# Extract Filer Name and other key data
entityRegistrantName = factFinder( instance, sec_ns, "EntityRegistrantName" )
entityCentralIndexKey = factFinder( instance, sec_ns, "EntityCentralIndexKey" )
entityCommonStockSharesOutstanding = factFinder( instance, sec_ns, "EntityCommonStockSharesOutstanding" )

# Print information about filing and entity
print( "Document and Entity Information:" )
docType = printFacts( documentType )
entityName = printFacts( entityRegistrantName )
entityCIK = printFacts( entityCentralIndexKey )
printFacts( documentPeriodEndDate )
printFacts( documentFiscalPeriodFocus )
printFacts( documentFiscalYearFocus )

if docType=="10-K" or docType=="10-Q":
# Now let's calculate some useful ratios from the balance sheet
print( "Analytical Ratios:" )
print( "\tBalance Sheet:" )

# Current Ratio
currentRatio = 0
print( "\t\tCurrent Ratio = Current Assets / Current Liabilities:" )
currentAssetsFacts = factFinder( instance, fasb_ns, "AssetsCurrent" )
currentLiabilitiesFacts = factFinder( instance, fasb_ns, "LiabilitiesCurrent" )
currentAssets = printFacts( currentAssetsFacts, 3, docEndDate )
currentLiabilities = printFacts( currentLiabilitiesFacts, 3, docEndDate )
if not currentLiabilities==0:
currentRatio = currentAssets / currentLiabilities
print( 3 * "\t", "Current Ratio = ".ljust(100-3*8), '{0:.2f}'.format( currentRatio ) )

# Quick Ratio
quickRatio = 0
print( "\t\tQuick Ratio = ( Cash + Short-Term Marketable Securities + Accounts Receivable ) / Current Liabilities:" )
cashFacts = factFinder( instance, fasb_ns, "Cash" )
if len(cashFacts)==0:
cashFacts = factFinder( instance, fasb_ns, "CashAndCashEquivalentsAtCarryingValue" )
if len(cashFacts)==0:
cashFacts = factFinder( instance, fasb_ns, "CashCashEquivalentsAndShortTermInvestments" )
marketableSecuritiesFacts = factFinder( instance, fasb_ns, "MarketableSecuritiesCurrent" )
if len(marketableSecuritiesFacts)==0:
marketableSecuritiesFacts = factFinder( instance, fasb_ns, "AvailableForSaleSecuritiesCurrent" )
if len(marketableSecuritiesFacts)==0:
marketableSecuritiesFacts = factFinder( instance, fasb_ns, "ShortTermInvestments" )
if len(marketableSecuritiesFacts)==0:
marketableSecuritiesFacts = factFinder( instance, fasb_ns, "OtherShortTermInvestments" )
accountsReceivableFacts = factFinder( instance, fasb_ns, "AccountsReceivableNetCurrent" )
currentLiabilitiesFacts = factFinder( instance, fasb_ns, "LiabilitiesCurrent" )
cash = printFacts( cashFacts, 3, docEndDate )
marketableSecurities = printFacts( marketableSecuritiesFacts, 3, docEndDate )
accountsReceivable = printFacts( accountsReceivableFacts, 3, docEndDate )
currentLiabilities = printFacts( currentLiabilitiesFacts, 3, docEndDate )
if not currentLiabilities==0:
quickRatio = ( cash + marketableSecurities + accountsReceivable ) / currentLiabilities
print( 3 * "\t", "Quick Ratio = ".ljust(100-3*8), '{0:.2f}'.format( quickRatio ) )

# Cash Ratio
cashRatio = 0
print( "\t\tCash Ratio = ( Cash + Short-Term Marketable Securities ) / Current Liabilities:" )
cash = printFacts( cashFacts, 3, docEndDate )
marketableSecurities = printFacts( marketableSecuritiesFacts, 3, docEndDate )
currentLiabilities = printFacts( currentLiabilitiesFacts, 3, docEndDate )
if not currentLiabilities==0:
cashRatio = ( cash + marketableSecurities ) / currentLiabilities
print( 3 * "\t", "Cash Ratio = ".ljust(100-3*8), '{0:.2f}'.format( cashRatio ) )


# Append ratios to a CSV file for further analysis
with open("ratios.csv", "a") as ratiofile:
ratiofile.write( docType + ',"' + entityName + '",' + entityCIK + "," + docEndDate + "," + '{0:.2f}'.format( currentRatio ) + "," + '{0:.2f}'.format( quickRatio ) + "," + '{0:.2f}'.format( cashRatio ) + "\n" )
ratiofile.close()

finally:
lock.release()

0 comments on commit e6d9dd8

Please sign in to comment.