query functions should convert data into values in domains supported by incanter #50

lisp · 2015-10-15T10:30:58Z

alternative to the suggestion embodied in #40, it should be possible to move data between datasets and both external files and repositories (through queries or as imports) under the principle, that the dataset cell content is limited to values in domains which incanter supports and that a conversion is available to the external and repository domains which "does the right thing" by default.

this implies, for example, that dataset cell content should be restricted to numeric and string values and that literal conversion to and from a repository should account for standard encodings for blank nodes, and iri as domains distinct from various literals.

while a complete implementation which includes excel sources should conform to csv2rdf and, as such should permit to specify datatype metadata for csv sources, at a baseline and in the absence of metadata, the process should at least correspond to the standard for sparql csv results
for csv-rdf conversion.

this would permit "least surprise" conversions with no intervention.

RickMoynihan · 2015-10-15T10:50:18Z

Hi James,

Could you supply some simple illustrative code examples of what you'd like this to look like as a user?

Regarding Incanter types; to my knowledge you can put any object in a cell... We currently put URI's in cells, and on 0.6.0-SNAPSHOT there are changes that will capture errors (Exception objects) and put them in the cell where they occurred.

lisp · 2015-10-15T11:21:46Z

yes, it is evident that one can put any value in a cell.

if there is some reason to retain that practice without restriction, for our use case, we would have to enforce conversions at the interface to weka datasets.
we are not heavy incanter users. our use case is to provide an interface which permits one to apply those tools to data retrieved from a repository and produces no surprises in the process. in that sense there is great appeal in following the conventions which are said to apply in that environment : https://incanter.files.wordpress.com/2009/06/9781782162643_chapter-6.pdf

The data in each cell of an Incanter dataset can be a string or numeric

This may be a narrow interpretation, but it is reinforced by the nature of its associated analytical libraries. Perhaps this argues for alternative dataset classes, each with a conversion protocol at the interfaces.

RickMoynihan · 2016-03-16T12:01:14Z

One thing we're planning on doing is at some point migrating to use an implementation of the core.matrix PDataset protocol; that will allow us greater flexibility in providing different types of Dataset with different performance properties etc.

RickMoynihan added this to the 0.7.0 milestone Nov 30, 2015

RickMoynihan closed this as completed Oct 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query functions should convert data into values in domains supported by incanter #50

query functions should convert data into values in domains supported by incanter #50

lisp commented Oct 15, 2015

RickMoynihan commented Oct 15, 2015

lisp commented Oct 15, 2015

RickMoynihan commented Mar 16, 2016

query functions should convert data into values in domains supported by incanter #50

query functions should convert data into values in domains supported by incanter #50

Comments

lisp commented Oct 15, 2015

RickMoynihan commented Oct 15, 2015

lisp commented Oct 15, 2015

RickMoynihan commented Mar 16, 2016