Cleaned up files for publication on github

altova · Sep 20, 2014 · e6d9dd8 · e6d9dd8
1 parent 39d095a
commit e6d9dd8
Show file tree

Hide file tree

Showing 4 changed files with 670 additions and 13 deletions.
diff --git a/README.md b/README.md
@@ -1,13 +1,34 @@
 sec-xbrl
 ========
 
+Copyright 2014 Altova GmbH
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+-------------------------------------------------------------------------
+
 XBRL.US Webinar: How to download and process SEC XBRL Data Directly from EDGAR
 
-These are the supporting files for the XBRL.US Webinar that is availble
+These are the supporting Python files for the XBRL.US Webinar that is availble
 on YouTube: https://www.youtube.com/watch?v=2Oe9ZqXVGME as well as the slides
 available here on SlideShare: http://www.slideshare.net/afalk42/xbrl-us-altova-webinar
 
-To use these files you will need to download and install RaptorXML+XBRL Server from
+Please watch the YouTube video and review the slides to see how these Python
+scripts are intended to be used. Also note that these scripts were written with
+Python 3.3.3 so they may require modifications if you use them with a different
+version of Python.
+
+To use this approach you will need to download and install RaptorXML+XBRL Server from
 the Altova website: http://www.altova.com/download-trial-server.html and then 
 request a 30-day free evaluation license key.
 
@@ -18,18 +39,77 @@ executable in the Python script, though.
 
 For more information on RaptorXML, please see here: http://www.altova.com/raptorxml.html
 
-Copyright notice and license information for all files in this directory:
+USAGE INFORMATION:
 
-Copyright 2014 Altova GmbH
 
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
+(1) LOADSECFILINGS
 
-    http://www.apache.org/licenses/LICENSE-2.0
+loadSECfilings.py -y <year> -m <month> | -f <from_year> -t <to_year>
 
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
+These creates a subdirectory sec/ and then subsequent year-based directories and months
+underneath and downloads all SES XBRL filings from the EDGAR system to your local hard
+disk for further processing. Please use only during off-peak hours in order to not
+overload the SEC servers. This downloads the ZIPped XBRL filings, so you'll have one
+ZIP file per filing submitted to the SEC on your drive. If you call this script
+again for the current or any previous month at a later day, it will only download
+any files that are new and have not yet been downloaded before.
+
+EXAMPLES:
+
+python3 loadSECfilings.py -y 2014 -m 9
+
+This will load all SEC filing for September 2014.
+
+python3 loadSECfilings.py -f 2005 -t 2014
+
+This will load all SEC filing for the start of the XBRL pilot program in 2005 until 2014.
+WARNING: If you download all years available (2005-2014) this will be about 127,000 files
+and take about 18GB of data on your hard disk, so please use with caution, especially 
+when you are on a slow Internet connection.
+
+
+(2) VALSECFILINGS
+
+valSECfilings ( -y <year> | -f <from_year> -t <to_year> ) -m <month> 
+              -c <cik> -k <ticker> -s <script>
+
+This will call RaptorXML+XBRL Server to validate the SEC filings for a specified year
+and month or for a range of years. It assumes that the files have been downloaded by
+the script above into a local sub-directory sec/. You can restrict the filings to just
+those for a particular company or for a list of companies by providing their respective 
+CIKs or ticker symbols. Optionally you can pass a Python script to RaptorXML+XBRL Server
+with the -s parameter, which will then be executed by the built-in Python interpreter
+inside of RaptorXML+XBRL Server to perform additional post-validation processing of
+the XBRL files. As an example, there is a Python script extractRatios.py in this project
+that demonstrates how to extract common financial ratios (quick ratio, cash ratio) from
+the XBRL filings.
+
+EXAMPLES:
+
+python3 valSECfilings.py -y 2014 -m 9
+
+This will validate all downloaded SEC filings for the month of September 2014. If a large
+number of files is passed to the Python script, it will create batches of about 20 jobs
+each and pass those to RaptorXML+XBRL Server in sequential batches.
+
+python3 valSECfilings.py -f 2013 -t 2014 -k AAPL,MSFT,ORCL
+
+This will validate all SEC filings submitted by Apple, Microsoft, and Oracle for the
+years 2013 and 2014. Positive validation messages as well as any errors or warnings
+are output to the console window.
+
+python3 valSECfilings.py -f 2013 -t 2014 -k ORCL -s extractRatios.py
+
+This will validate all Oracle XBRL filings for the years 2013-2014 and then perform
+post-validation analysis of the filings using the supplied Python script extractRatios.py
+that gets passed to RaptorXML+XBRL Server and executed by its built-in Python interpreter.
+This particular example script prints document and entity information and then extracts
+various balance sheet facts to calculate current ratio, quick ratio, and cash ratio as
+and example of how to do post-validation XBRL processing. Furthermore, it appends those
+ratios to an output file ratios.csv in the same directory.
+
+
+REMINDER: 
+
+To see these scripts and a lot more in-depth explanation, please watch the
+YouTube video of the webinar here: https://www.youtube.com/watch?v=2Oe9ZqXVGME
diff --git a/extractRatios.py b/extractRatios.py
@@ -0,0 +1,168 @@
+# Copyright 2014 Altova GmbH
+# 
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+# 
+#     http://www.apache.org/licenses/LICENSE-2.0
+# 
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os, re
+from altova import xml, xsd, xbrl
+import os
+import fcntl
+
+sec_ns =  '/dei/'		# was: 'http://xbrl.sec.gov/dei/'
+fasb_ns = '/us-gaap/'	# was: 'http://fasb.org/us-gaap/'
+
+class Lock:
+
+    def __init__(self, filename):
+        self.filename = filename
+        # This will create it if it does not exist already
+        self.handle = open(filename, 'w')
+
+    # Bitwise OR fcntl.LOCK_NB if you need a non-blocking lock
+    def acquire(self):
+        fcntl.flock(self.handle, fcntl.LOCK_EX)
+
+    def release(self):
+        fcntl.flock(self.handle, fcntl.LOCK_UN)
+
+    def __del__(self):
+        self.handle.close()
+
+def camelToSpaces( label ):
+	# Utility for pretty-printing the labels
+	s1 = re.sub('(.)([A-Z][a-z]+)', r'\1 \2', label)
+	return re.sub('([a-z0-9])([A-Z])', r'\1 \2', s1)
+
+def factFinder( instance, namespace, label ):
+	# Locate facts in the instance document by namespace and label, ignoring facts that have a context with a segment_element
+	l = []
+	for f in instance.items:
+		if f.qname.namespace_name.find( namespace ) and f.qname.local_name == label:
+			segment = None
+			try:
+				entElement = f.context.entity.element
+				for childElement in entElement.children:
+					if childElement.local_name=="segment":
+						segment = childElement
+			except:
+				pass
+			if segment==None:
+				l.append( f )
+	return l
+
+def printFacts( facts, indent=1, targetDate=None ):
+	# Find the fact for the relevant target date and print it
+	factValue = 0
+	for fact in facts:
+		if targetDate==None or fact.context.period.instant == targetDate:
+			if fact.concept.item_type==fact.concept.MONETARY_ITEM_TYPE:
+				factValue = fact.effective_numeric_value
+				print( indent * "\t", camelToSpaces( fact.qname.local_name ).ljust(100-indent*8), "$", '{0:>16,}'.format( factValue ) )
+			else:
+				factValue = fact.normalized_value
+				print( indent * "\t", camelToSpaces( fact.qname.local_name ).ljust(100-indent*8), factValue )
+	return factValue
+
+def on_xbrl_valid( job, instance ):
+
+	try:
+		lock = Lock("/tmp/extract_ratios_lock.tmp")
+		lock.acquire()
+
+		# Create output CSV file if it doesn't exist yet
+		if not os.path.isfile( "ratios.csv" ):
+			with open("ratios.csv", "a") as ratiofile:
+				ratiofile.write( "DocumentType,EntityName,CIK,PeriodEndDate,CurrentRatio,QuickRatio,CashRatio\n" )
+				ratiofile.close()
+
+		# Extract some basic facts from the filing, such as the effective end-date for balance sheet etc.
+		docEndDate = "2013-12-31"
+		documentType = factFinder( instance, sec_ns, "DocumentType" )
+		documentFiscalYearFocus = factFinder( instance, sec_ns, "DocumentFiscalYearFocus" )
+		documentFiscalPeriodFocus = factFinder( instance, sec_ns, "DocumentFiscalPeriodFocus" )
+		documentPeriodEndDate = factFinder( instance, sec_ns, "DocumentPeriodEndDate" )
+		if len(documentPeriodEndDate) > 0:
+			docEndDate = documentPeriodEndDate[0].normalized_value
+
+		# Extract Filer Name and other key data
+		entityRegistrantName = factFinder( instance, sec_ns, "EntityRegistrantName" )
+		entityCentralIndexKey = factFinder( instance, sec_ns, "EntityCentralIndexKey" )
+		entityCommonStockSharesOutstanding = factFinder( instance, sec_ns, "EntityCommonStockSharesOutstanding" )
+
+		# Print information about filing and entity
+		print( "Document and Entity Information:" )
+		docType = printFacts( documentType )
+		entityName = printFacts( entityRegistrantName )
+		entityCIK = printFacts( entityCentralIndexKey )
+		printFacts( documentPeriodEndDate )
+		printFacts( documentFiscalPeriodFocus )
+		printFacts( documentFiscalYearFocus )
+
+		if docType=="10-K" or docType=="10-Q":
+			# Now let's calculate some useful ratios from the balance sheet
+			print( "Analytical Ratios:" )
+			print( "\tBalance Sheet:" )
+
+			# Current Ratio
+			currentRatio = 0
+			print( "\t\tCurrent Ratio = Current Assets / Current Liabilities:" )
+			currentAssetsFacts = factFinder( instance, fasb_ns, "AssetsCurrent" )
+			currentLiabilitiesFacts = factFinder( instance, fasb_ns, "LiabilitiesCurrent" )
+			currentAssets = printFacts( currentAssetsFacts, 3, docEndDate )
+			currentLiabilities = printFacts( currentLiabilitiesFacts, 3, docEndDate )
+			if not currentLiabilities==0:
+				currentRatio = currentAssets / currentLiabilities
+			print( 3 * "\t", "Current Ratio = ".ljust(100-3*8), '{0:.2f}'.format( currentRatio ) )
+
+			# Quick Ratio
+			quickRatio = 0
+			print( "\t\tQuick Ratio = ( Cash + Short-Term Marketable Securities + Accounts Receivable ) / Current Liabilities:" )
+			cashFacts = factFinder( instance, fasb_ns, "Cash" )
+			if len(cashFacts)==0:
+				cashFacts = factFinder( instance, fasb_ns, "CashAndCashEquivalentsAtCarryingValue" )
+			if len(cashFacts)==0:
+				cashFacts = factFinder( instance, fasb_ns, "CashCashEquivalentsAndShortTermInvestments" )
+			marketableSecuritiesFacts = factFinder( instance, fasb_ns, "MarketableSecuritiesCurrent" )
+			if len(marketableSecuritiesFacts)==0:
+				marketableSecuritiesFacts = factFinder( instance, fasb_ns, "AvailableForSaleSecuritiesCurrent" )
+			if len(marketableSecuritiesFacts)==0:
+				marketableSecuritiesFacts = factFinder( instance, fasb_ns, "ShortTermInvestments" )
+			if len(marketableSecuritiesFacts)==0:
+				marketableSecuritiesFacts = factFinder( instance, fasb_ns, "OtherShortTermInvestments" )
+			accountsReceivableFacts = factFinder( instance, fasb_ns, "AccountsReceivableNetCurrent" )
+			currentLiabilitiesFacts = factFinder( instance, fasb_ns, "LiabilitiesCurrent" )
+			cash = printFacts( cashFacts, 3, docEndDate )
+			marketableSecurities = printFacts( marketableSecuritiesFacts, 3, docEndDate )
+			accountsReceivable = printFacts( accountsReceivableFacts, 3, docEndDate )
+			currentLiabilities = printFacts( currentLiabilitiesFacts, 3, docEndDate )
+			if not currentLiabilities==0:
+				quickRatio = ( cash + marketableSecurities + accountsReceivable ) / currentLiabilities
+			print( 3 * "\t", "Quick Ratio = ".ljust(100-3*8), '{0:.2f}'.format( quickRatio ) )
+
+			# Cash Ratio
+			cashRatio = 0
+			print( "\t\tCash Ratio = ( Cash + Short-Term Marketable Securities ) / Current Liabilities:" )
+			cash = printFacts( cashFacts, 3, docEndDate )
+			marketableSecurities = printFacts( marketableSecuritiesFacts, 3, docEndDate )
+			currentLiabilities = printFacts( currentLiabilitiesFacts, 3, docEndDate )
+			if not currentLiabilities==0:
+				cashRatio = ( cash + marketableSecurities ) / currentLiabilities
+			print( 3 * "\t", "Cash Ratio = ".ljust(100-3*8), '{0:.2f}'.format( cashRatio ) )
+
+
+			# Append ratios to a CSV file for further analysis
+			with open("ratios.csv", "a") as ratiofile:
+				ratiofile.write( docType + ',"' + entityName + '",' + entityCIK + "," + docEndDate + "," + '{0:.2f}'.format( currentRatio ) + "," + '{0:.2f}'.format( quickRatio ) + "," + '{0:.2f}'.format( cashRatio ) + "\n" )
+				ratiofile.close()
+
+	finally:
+		lock.release()