Add datatype to collection #755

jesper-friis · 2023-12-26T19:11:28Z

Description

Add datatype to collections. This is needed for representing RDF literals.

Also added more tests and fixed bugs.

Closes #741

Type of change

Bug fix & code cleanup
New feature
Documentation update
Test update

Checklist for the reviewer

This checklist should be used as a help for the reviewer.

Is the change limited to one issue?
Does this PR close the issue?
Is the code easy to read and understand?
Do all new feature have an accompanying new test?
Has the documentation been updated as necessary?

objects and datatypes as expected.

This method makes little sense and is just confusing.

… DLITE_ATEXIT_FREE

.github/workflows/ci_tests.yml

francescalb · 2024-01-04T08:36:28Z

bindings/python/dlite-collection-python.i

+    `nrelations` property.
+
+    Relations are (s, p, o, d=None)-triples with an optional fourth field
+    `d`, specifying the datatype of the object.  It may have the following


It is what? It is a bit unclear what you are referring to. From the description nelow it seems you are referring to 'object', but it is a bit unclear.

Updated documentation

bindings/python/dlite-collection-python.i

francescalb · 2024-01-04T10:11:48Z

bindings/python/dlite-python.i

-            FAIL("relation subject, predicate and object must be strings");
+            FAIL("relation (s,p,o[,d[,id]]) items must be strings");


I liked that the were written out in the previous error message. How about using subject(s), predicate(p)...

Ok. Updated the error message with full names.

bindings/python/dlite-python.i

bindings/python/tests/test_collection.py

francescalb · 2024-01-04T10:25:14Z

bindings/python/tests/test_relation.py

+rel1 = dlite.Relation("s1", "p1", "o1")
+rel2 = dlite.Relation("s2", "p2", "o2", "d2")


What are we testing here? Looks like we tests that setting a relation does not throw an error, but do we really test that it is there? Also, I would like tests on doing it wrongly as well, to check that the desired Errors are actually raised.

Well spotted. Seems that I started on this test script, but forgot to finalise it. Added some more tests now.

bindings/python/tests/test_storage.py

francescalb · 2024-01-04T10:29:40Z

src/dlite-collection.c

+  //DLiteCollectionState state;
+  //const DLiteRelation *r;
+  //dlite_collection_init_state(coll, &state);
+  //r = dlite_collection_find(coll, &state, s, p, o, d);
+  //dlite_collection_deinit_state(&state);
+  //return r;


Why keep these comments?

No need for them anymore. Removed

francescalb · 2024-01-04T10:34:03Z

src/dlite-collection.c

+  Return pointer to the value for a pair of two criteria.
+
+  Useful if one knows that there may only be one value.  The returned
+  value is held by the collection and should be copied by the user
+  since it may be overwritten by later calls to the collection.
+
+  Parameters:
+      s, p, o: Criteria to match. Two of these must be non-NULL.
+      d: If not NULL, the required datatype of literal objects.
+      fallback: Value to return if no matches are found.
+      any: If non-zero, return first matching value.
+
+  Returns a pointer to the value of the `s`, `p` or `o` that is NULL.
+  On error NULL is returned.
+ */


I think this is a nice description. However, I have read several variations on the same theme further up. Does this mean that we are documenting the same thing several places? This is a bit confusing as well as increases the risk of updating documentation of the same thing wrongly (not in all pleaces) if needed later.

Yes, we are duplicating documentation. In C we have the same docstring both in the source (.c) and header (.h) file. Function expose in Python will also have a Python documentation.

It is possible to reduce this duplication. Currently we generate the C reference documentation from the header files, so we need documentation there. At the same time, it is very useful to have the function documentation in the source files, so we also need it there.

A possibility could be to We could remove all function documentation from the header files and update the documentation in the source files such that it can be parsed by doxygen. That would increase maintainability and could be added as an issue.

It would in principle also be possible to extract function documentation from the doxygen-generated xml file and auto-document Python functions from it. But firstly, it would increase the complexity and make dlite even more difficult understand and secondly we want to rephrase the documentation for Python as you commented about above. So this is not something I would suggest.

francescalb · 2024-01-04T10:34:46Z

src/dlite-collection.h

+  Return pointer to the value for a pair of two criteria.
+
+  Useful if one knows that there may only be one value.  The returned
+  value is held by the collection and should be copied by the user
+  since it may be overwritten by later calls to the collection.
+
+  Parameters:
+      s, p, o: Criteria to match. Two of these must be non-NULL.
+      d: If not NULL, the required datatype of literal objects.
+      fallback: Value to return if no matches are found.
+      any: If non-zero, return first matching value.
+
+  Returns a pointer to the value of the `s`, `p` or `o` that is NULL.
+  On error NULL is returned.
+ */


Same comment as above.

Yes, this is the header file duplication mentioned in my response above.

francescalb · 2024-01-04T10:36:44Z

src/tests/test_collection.c

+  mu_check(!dlite_collection_add_relation(coll, "terrier", "is_a", "dog",
+                                          NULL));


Plase also test setting the datatype.

Datatype is already tested in the test_collection_find() function.

francescalb · 2024-01-04T10:38:54Z

src/tests/test_collection.c

-  printf("----------------------\n");
+  //printf("\n--- inst: %p ---\n", (void *)inst);
+  //dlite_json_print((DLiteInstance *)inst);
+  //printf("----------------------\n");



// can be removed altogether?

Yes, they could. But loading is rather tricky involving loading a shared library via the plugin system and json-parsing, so it is not unlikely that we have to do some debugging in the future. Then these commented-out print statements comes in handy. I would therefore prefer to keep them.

src/triplestore-builtin.c

src/triplestore.c

francescalb

I have quite a few comments and questions. Especially I am worried about the documentation being written in so many places. Does this create a risk of wrong updates later?

Also, the tests fail. If this is because an update in tripper is required for it to pass it should not be included in the test.suite for a new release just yet I think. Maybe it is possible to write that the tests should run if tripper is above a certain version (i.e. higher than the current latest).

Co-authored-by: Francesca L. Bleken <48128015+francescalb@users.noreply.github.com>

jesper-friis · 2024-01-08T22:35:10Z

I have quite a few comments and questions. Especially I am worried about the documentation being written in so many places. Does this create a risk of wrong updates later?

Also, the tests fail. If this is because an update in tripper is required for it to pass it should not be included in the test.suite for a new release just yet I think. Maybe it is possible to write that the tests should run if tripper is above a certain version (i.e. higher than the current latest).

It is possible to reduce duplication of documentation, but as mentioned above, it requires to:

convert documentation in C source files to doxygen comments
remove documentation from C headers
update the documentation build system to parse the C sources instead of headers

That should be a separate issue. I am not sure whether it is really is something that should be prioritised.

# Description - [x] Fix that an invalid db.xml file can crash the rdf plugin. - [x] Install development libraries for libxml2 and libxslt to the ci_tests and cd_docs GitHub workflows, which seems to be needed with later versions of librdf. Builds on top of PR #755. Closes #756 ## Type of change - [x] Bug fix & code cleanup - [ ] New feature - [ ] Documentation update - [ ] Test update ## Checklist for the reviewer This checklist should be used as a help for the reviewer. - [ ] Is the change limited to one issue? - [ ] Does this PR close the issue? - [ ] Is the code easy to read and understand? - [ ] Do all new feature have an accompanying new test? - [ ] Has the documentation been updated as necessary?

…ite into 741-add-datatype-to-collection

francescalb

Nice. Just add testing of including datatype when adding a relation before merging.

# Description: Closes #160 Also added `fallback_backend` option to `Triplestore.parse()` and `Triplestore.serialize()` to allow calling parse() and serialize() with the collection backend (using rdflib under the hood). **Note** that this PR builds on top of PR #161 **Note** also that this PR utilises DLite PR SINTEF/dlite#755. It is probably a good idea to merge that PR and create a new release of DLite before merging this PR to master. ## Type of change:  - [x] Bug fix and code cleanup. - [x] New feature. - [ ] Documentation update. - [ ] Testing. ## Checklist for the reviewer:  This checklist should be used as a help for the reviewer. - [ ] Is the change limited to one issue? - [ ] Does this PR close the issue? - [ ] Is the code easy to read and understand? - [ ] Do all new feature have an accompanying new test? - [ ] Has the documentation been updated as necessary?

jesper-friis added 3 commits December 26, 2023 19:26

Added datatype to relations

41b2cf7

Fixing error return codes

6b97bf8

Corrected initialisation of relation from dict

0f4e36f

jesper-friis linked an issue Dec 26, 2023 that may be closed by this pull request

Add datatype to collection #741

Closed

jesper-friis marked this pull request as draft December 26, 2023 19:11

jesper-friis added 13 commits December 26, 2023 20:19

Added datatype to triplestore_add()

4d56da8

Initialised datatype

b6b961f

Correctly assign triple from rdflib

109a297

Added datatype to triplestore and collection methods

75a3c25

Do not install rdflib in ci-tests workflow

9138d41

Added smart redland triple filter function to allow searching for both

04ef303

objects and datatypes as expected.

Cleaned up debugging statements and comments

6699390

Updated triplestore-builtin

238f588

Merge branch 'master' into 741-add-datatype-to-collection

784df5b

Merge branch 'master' into 741-add-datatype-to-collection

47f8ea7

Correctly print/scan relations with datatype

21f78d3

Updated collection, added tests and fixed bugs

38da5a8

Removed access to set_default_namespace() from Python.

ed64640

This method makes little sense and is just confusing.

jesper-friis marked this pull request as ready for review December 29, 2023 00:16

jesper-friis changed the title ~~WIP: add datatype to collection~~ Add datatype to collection Dec 29, 2023

jesper-friis added 3 commits December 29, 2023 15:51

Added convenient Collection.value() method

1e45ece

Merge branch '741-add-datatype-to-collection' into 756-rdf-plugin-crash

f4b1619

Corrected rdf plugin and avoid double-free when running test_rdf with…

ad49370

… DLITE_ATEXIT_FREE

This was referenced Jan 1, 2024

Fix invalid db.xml file can crash rdf plugin #759

Merged

Retain literal types in collection backend EMMC-ASBL/tripper#160

Closed

Merge branch 'master' into 741-add-datatype-to-collection

c3acbc6

jesper-friis mentioned this pull request Jan 3, 2024

Retain literal types in collection backend EMMC-ASBL/tripper#165

Merged

9 tasks