Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add datatype to collection #755

Merged
merged 31 commits into from
Jan 24, 2024
Merged

Conversation

jesper-friis
Copy link
Collaborator

@jesper-friis jesper-friis commented Dec 26, 2023

Description

Add datatype to collections. This is needed for representing RDF literals.

Also added more tests and fixed bugs.

Closes #741

Type of change

  • Bug fix & code cleanup
  • New feature
  • Documentation update
  • Test update

Checklist for the reviewer

This checklist should be used as a help for the reviewer.

  • Is the change limited to one issue?
  • Does this PR close the issue?
  • Is the code easy to read and understand?
  • Do all new feature have an accompanying new test?
  • Has the documentation been updated as necessary?

@jesper-friis jesper-friis linked an issue Dec 26, 2023 that may be closed by this pull request
@jesper-friis jesper-friis marked this pull request as draft December 26, 2023 19:11
@jesper-friis jesper-friis marked this pull request as ready for review December 29, 2023 00:16
@jesper-friis jesper-friis changed the title WIP: add datatype to collection Add datatype to collection Dec 29, 2023
`nrelations` property.

Relations are (s, p, o, d=None)-triples with an optional fourth field
`d`, specifying the datatype of the object. It may have the following
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is what? It is a bit unclear what you are referring to. From the description nelow it seems you are referring to 'object', but it is a bit unclear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated documentation

FAIL("relation subject, predicate and object must be strings");
FAIL("relation (s,p,o[,d[,id]]) items must be strings");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked that the were written out in the previous error message. How about using subject(s), predicate(p)...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Updated the error message with full names.

Comment on lines 3 to 4
rel1 = dlite.Relation("s1", "p1", "o1")
rel2 = dlite.Relation("s2", "p2", "o2", "d2")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we testing here? Looks like we tests that setting a relation does not throw an error, but do we really test that it is there? Also, I would like tests on doing it wrongly as well, to check that the desired Errors are actually raised.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted. Seems that I started on this test script, but forgot to finalise it. Added some more tests now.

Comment on lines 466 to 471
//DLiteCollectionState state;
//const DLiteRelation *r;
//dlite_collection_init_state(coll, &state);
//r = dlite_collection_find(coll, &state, s, p, o, d);
//dlite_collection_deinit_state(&state);
//return r;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why keep these comments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for them anymore. Removed

Comment on lines +679 to +693
Return pointer to the value for a pair of two criteria.

Useful if one knows that there may only be one value. The returned
value is held by the collection and should be copied by the user
since it may be overwritten by later calls to the collection.

Parameters:
s, p, o: Criteria to match. Two of these must be non-NULL.
d: If not NULL, the required datatype of literal objects.
fallback: Value to return if no matches are found.
any: If non-zero, return first matching value.

Returns a pointer to the value of the `s`, `p` or `o` that is NULL.
On error NULL is returned.
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a nice description. However, I have read several variations on the same theme further up. Does this mean that we are documenting the same thing several places? This is a bit confusing as well as increases the risk of updating documentation of the same thing wrongly (not in all pleaces) if needed later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are duplicating documentation. In C we have the same docstring both in the source (.c) and header (.h) file. Function expose in Python will also have a Python documentation.

It is possible to reduce this duplication. Currently we generate the C reference documentation from the header files, so we need documentation there. At the same time, it is very useful to have the function documentation in the source files, so we also need it there.

A possibility could be to We could remove all function documentation from the header files and update the documentation in the source files such that it can be parsed by doxygen. That would increase maintainability and could be added as an issue.

It would in principle also be possible to extract function documentation from the doxygen-generated xml file and auto-document Python functions from it. But firstly, it would increase the complexity and make dlite even more difficult understand and secondly we want to rephrase the documentation for Python as you commented about above. So this is not something I would suggest.

Comment on lines +320 to +334
Return pointer to the value for a pair of two criteria.

Useful if one knows that there may only be one value. The returned
value is held by the collection and should be copied by the user
since it may be overwritten by later calls to the collection.

Parameters:
s, p, o: Criteria to match. Two of these must be non-NULL.
d: If not NULL, the required datatype of literal objects.
fallback: Value to return if no matches are found.
any: If non-zero, return first matching value.

Returns a pointer to the value of the `s`, `p` or `o` that is NULL.
On error NULL is returned.
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the header file duplication mentioned in my response above.

Comment on lines +41 to +42
mu_check(!dlite_collection_add_relation(coll, "terrier", "is_a", "dog",
NULL));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plase also test setting the datatype.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Datatype is already tested in the test_collection_find() function.

printf("----------------------\n");
//printf("\n--- inst: %p ---\n", (void *)inst);
//dlite_json_print((DLiteInstance *)inst);
//printf("----------------------\n");

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// can be removed altogether?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they could. But loading is rather tricky involving loading a shared library via the plugin system and json-parsing, so it is not unlikely that we have to do some debugging in the future. Then these commented-out print statements comes in handy. I would therefore prefer to keep them.

Copy link
Collaborator

@francescalb francescalb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have quite a few comments and questions. Especially I am worried about the documentation being written in so many places. Does this create a risk of wrong updates later?

Also, the tests fail. If this is because an update in tripper is required for it to pass it should not be included in the test.suite for a new release just yet I think. Maybe it is possible to write that the tests should run if tripper is above a certain version (i.e. higher than the current latest).

jesper-friis and others added 3 commits January 4, 2024 15:29
@jesper-friis
Copy link
Collaborator Author

I have quite a few comments and questions. Especially I am worried about the documentation being written in so many places. Does this create a risk of wrong updates later?

Also, the tests fail. If this is because an update in tripper is required for it to pass it should not be included in the test.suite for a new release just yet I think. Maybe it is possible to write that the tests should run if tripper is above a certain version (i.e. higher than the current latest).

It is possible to reduce duplication of documentation, but as mentioned above, it requires to:

  • convert documentation in C source files to doxygen comments
  • remove documentation from C headers
  • update the documentation build system to parse the C sources instead of headers

That should be a separate issue. I am not sure whether it is really is something that should be prioritised.

jesper-friis and others added 4 commits January 8, 2024 23:37
# Description
- [x] Fix that an invalid db.xml file can crash the rdf plugin.
- [x] Install development libraries for libxml2 and libxslt to the
ci_tests and cd_docs GitHub workflows, which seems to be needed with
later versions of librdf.

Builds on top of PR #755.

Closes #756

## Type of change
- [x] Bug fix & code cleanup
- [ ] New feature
- [ ] Documentation update
- [ ] Test update

## Checklist for the reviewer
This checklist should be used as a help for the reviewer.

- [ ] Is the change limited to one issue?
- [ ] Does this PR close the issue?
- [ ] Is the code easy to read and understand?
- [ ] Do all new feature have an accompanying new test?
- [ ] Has the documentation been updated as necessary?
Copy link
Collaborator

@francescalb francescalb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Just add testing of including datatype when adding a relation before merging.

@jesper-friis jesper-friis merged commit 387f73a into master Jan 24, 2024
12 checks passed
@jesper-friis jesper-friis deleted the 741-add-datatype-to-collection branch January 24, 2024 21:53
jesper-friis added a commit to EMMC-ASBL/tripper that referenced this pull request Feb 13, 2024
# Description:
Closes #160 

Also added `fallback_backend` option to `Triplestore.parse()` and
`Triplestore.serialize()` to allow calling parse() and serialize() with
the collection backend (using rdflib under the hood).

**Note** that this PR builds on top of PR #161 

**Note** also that this PR utilises DLite PR
SINTEF/dlite#755. It is probably a good idea to
merge that PR and create a new release of DLite before merging this PR
to master.

## Type of change:
<!-- Put an `x` in the box that applies. -->
- [x] Bug fix and code cleanup.
- [x] New feature.
- [ ] Documentation update.
- [ ] Testing.

## Checklist for the reviewer:
<!-- Put an `x` in the boxes that apply. These can be filled by reviewer
after the PR is created. -->

This checklist should be used as a help for the reviewer.

- [ ] Is the change limited to one issue?
- [ ] Does this PR close the issue?
- [ ] Is the code easy to read and understand?
- [ ] Do all new feature have an accompanying new test?
- [ ] Has the documentation been updated as necessary?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add datatype to collection
2 participants