Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query functions should convert data into values in domains supported by incanter #50

Closed
lisp opened this issue Oct 15, 2015 · 3 comments
Milestone

Comments

@lisp
Copy link

lisp commented Oct 15, 2015

alternative to the suggestion embodied in #40, it should be possible to move data between datasets and both external files and repositories (through queries or as imports) under the principle, that the dataset cell content is limited to values in domains which incanter supports and that a conversion is available to the external and repository domains which "does the right thing" by default.

this implies, for example, that dataset cell content should be restricted to numeric and string values and that literal conversion to and from a repository should account for standard encodings for blank nodes, and iri as domains distinct from various literals.

while a complete implementation which includes excel sources should conform to csv2rdf and, as such should permit to specify datatype metadata for csv sources, at a baseline and in the absence of metadata, the process should at least correspond to the standard for sparql csv results
for csv-rdf conversion.

this would permit "least surprise" conversions with no intervention.

@RickMoynihan
Copy link
Member

Hi James,

Could you supply some simple illustrative code examples of what you'd like this to look like as a user?

Regarding Incanter types; to my knowledge you can put any object in a cell... We currently put URI's in cells, and on 0.6.0-SNAPSHOT there are changes that will capture errors (Exception objects) and put them in the cell where they occurred.

@lisp
Copy link
Author

lisp commented Oct 15, 2015

yes, it is evident that one can put any value in a cell.

if there is some reason to retain that practice without restriction, for our use case, we would have to enforce conversions at the interface to weka datasets.
we are not heavy incanter users. our use case is to provide an interface which permits one to apply those tools to data retrieved from a repository and produces no surprises in the process. in that sense there is great appeal in following the conventions which are said to apply in that environment : https://incanter.files.wordpress.com/2009/06/9781782162643_chapter-6.pdf

The data in each cell of an Incanter dataset can be a string or numeric

This may be a narrow interpretation, but it is reinforced by the nature of its associated analytical libraries. Perhaps this argues for alternative dataset classes, each with a conversion protocol at the interfaces.

@RickMoynihan RickMoynihan added this to the 0.7.0 milestone Nov 30, 2015
@RickMoynihan
Copy link
Member

One thing we're planning on doing is at some point migrating to use an implementation of the core.matrix PDataset protocol; that will allow us greater flexibility in providing different types of Dataset with different performance properties etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants