Skip to content
This repository has been archived by the owner on Jun 18, 2019. It is now read-only.

Commit

Permalink
Initial abstraction around the spreadsheet itself
Browse files Browse the repository at this point in the history
Here we provide a wrapper for a Cell inside an ODF document.

A single slang metadata description should be able to extract data from
a range of Spreadsheet formats. For the proof-of-concept we will
implement extraction from ODF files using the ODFPY library.

This patch specified a class hierarchy that (hopefully) allows Cells
from other Spreadsheet formats in the future. The interface provides a
way to get the raw data from a cell as well as type information about
the cell. The factorisation of this may have to be adjusted a number of
times as more spreadsheet formats are supported.

We expect that the data returned from the accessors be represented in an
intermediate format that can be turned into the higher level domain
types. For now we don't go much further than ensuring that the data is
a valid unicode string. The metadata around the cell type is represented
as a series of predicates. OdfCell.test_attribute() is intended to be an
internal procedure, not part of the API.

We also provide a way to get extended metadata about formulae and
currencies but we don't make any attempt to define an intermediate
format for this data.

Signed-off-by: Andy Bennett <andyjpb@digital.cabinet-office.gov.uk>
  • Loading branch information
andyjpb committed May 15, 2019
1 parent 8d5d805 commit 1a0ce4a
Show file tree
Hide file tree
Showing 2 changed files with 108 additions and 0 deletions.
8 changes: 8 additions & 0 deletions NOTES
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,11 @@ export PYTHONPATH=/Users/andybennett/git/odf-prototype/lib/lib/python2.7/site-pa

python excelImport.py

--------------------------------------------------------------------------------
2019/03/27

USING ODFPY
-----------

https://github.com/eea/odfpy/blob/master/contrib/odscell/odscell

100 changes: 100 additions & 0 deletions slang.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@

import sys
import re
from odf import opendocument
from odf.table import Table, TableRow, TableCell



Expand Down Expand Up @@ -92,6 +94,104 @@ def __str__(self):



# A base class that can be inherited from to encapsulate all the accessor logic
# for different spreadsheet formats and spreadsheet reading libraries.
# This differs from a CellValue because it's just the spreadsheet data. It can
# be any cell in a spreadsheet and does not have to be associated with any
# slang metadata.
class Cell:
None



# A wrapper around a Cell from an ODF Spreadsheet to encapsulate all the
# accessor logic.
class OdfCell(Cell):

def __init__(self, value, row, column):

assert isinstance(value, opendocument.element.Element), ("OdfCell.__init__: Expected type argument to be of type 'opendocument.element.Element' but we got %s." % value)
assert isinstance(row, int), ("OdfCell.__init__: Expected row argument ot be of type 'int' but we got %s." % row)
assert isinstance(column, int), ("OdfCell.__init__: Expected column argument ot be of type 'int' but we got %s." % column)

self.cell = value
self.row = row
self.column = column


def __str__(self):
return ("<Row: %d, Column: %d, Value: %s>" % (self.row, self.column, self.value()))


def test_attribute(self, key, value):
try:
if self.cell.attributes[key] == value:
return True
else:
return False

except KeyError:
return False


# Returns True if the user specified this cell as a formula; False
# otherwise.
# The value calculated and cached by the spreadsheet program may be
# available via value().
def isformula(self):
return ((u'urn:oasis:names:tc:opendocument:xmlns:table:1.0', u'formula') in self.value.attributes)


# Returns True if the value of this cell was formatted as a string by
# the spreadsheet program; False otherwise.
def isstring(self):
return self.test_attribute((u'urn:oasis:names:tc:opendocument:xmlns:office:1.0', u'value-type'), u'string')


# Returns True if the value of this cell was formatted as currency by
# the spreadsheet program; False otherwise.
def iscurrency(self):
return self.test_attribute((u'urn:org:documentfoundation:names:experimental:calc:xmlns:calcext:1.0', u'value-type'), u'currency')


# Returns the spreadsheet program's internal type designation for the
# cell.
# TODO: Hide the internal type designations behind an interface.
def type(self):
return unicode(self.cell.attributes[(u'urn:oasis:names:tc:opendocument:xmlns:office:1.0', u'value-type')])


# Returns the raw value that the user entered into the cell in the
# spreadsheet program.
# Returns a string.
def value(self):

if self.isstring():
return unicode(self.cell)

else:
return unicode(self.cell.attributes[(u'urn:oasis:names:tc:opendocument:xmlns:office:1.0', u'value')])


# Throws an exception if isformula() would have returned False.
# Returns the spreadsheet program's internal representation of the
# formula that the user specified in this cell.
# TODO: Provide a way to parse the formula as this procedure is not
# much more useful than isformula() at the moment.
def formula(self):
unicode(self.cell.attributes[(u'urn:oasis:names:tc:opendocument:xmlns:table:1.0', u'formula')])


# Throws an exception if iscurrency() would have returned False.
# Returns the spreadsheet program's internal representation of which
# currency it thinks the cell's value is denominated in. For example,
# u'GBP'.
# TODO: Hide the internal currency designations behind an interface.
def currency(self):
unicode(self.cell.attributes[(u'urn:oasis:names:tc:opendocument:xmlns:office:1.0', u'currency')])



###############################################################################
# Handlers for the Datatypes that can be declared in Spreadsheet Metadata.

Expand Down

0 comments on commit 1a0ce4a

Please sign in to comment.