You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently use rdflib to process our thesaurus STW. We use the Graph.parse method to parse the corresponding RDF XML and the resulting graph is pickled (either by pickle from stdlib or with joblib) when saving different types of machine learning models/objects.
I noticed that the size of the models increased a lot when updating from 5.0.0 to 6.2.0. I could recreate this behavior with a simple script:
from pathlib import Path
from pickle import dump
from rdflib import Graph
STW_PATH = Path("/path/to/stw_9.12.rdf")
OUTPUT = Path("/tmp/output_x.y.z.pickle")
if __name__ == '__main__':
g = Graph()
g.parse(str(STW_PATH))
with OUTPUT.open("wb") as f:
dump(g, f)
I wanted to report this behavior as it is cleary a step backwards in terms of disk space used and may be relevant to others as well. Although I'm not sure if the method of serialization by pickling is supported by you. I could only find one chapter in your docs about saving RDF in human readable formats. However, loading a pickled graph is much faster than parsing a graph, which is relevant when using a graph in a production system (where launch times matter).
The text was updated successfully, but these errors were encountered:
We also used to pickle Graph objects indirectly through the Prefect workflow framework and it failed to serialize some graphs. We didn't find out why yes/no though; we now use the standard RDF serialization formats, which is indeed slow. I would be interested in contributing an optimized pickle implementation, but not sure what's needed...
We currently use rdflib to process our thesaurus STW. We use the Graph.parse method to parse the corresponding RDF XML and the resulting graph is pickled (either by pickle from stdlib or with joblib) when saving different types of machine learning models/objects.
I noticed that the size of the models increased a lot when updating from 5.0.0 to 6.2.0. I could recreate this behavior with a simple script:
The size of the stw RDF file is 15 MB.
I wanted to report this behavior as it is cleary a step backwards in terms of disk space used and may be relevant to others as well. Although I'm not sure if the method of serialization by pickling is supported by you. I could only find one chapter in your docs about saving RDF in human readable formats. However, loading a pickled graph is much faster than parsing a graph, which is relevant when using a graph in a production system (where launch times matter).
The text was updated successfully, but these errors were encountered: