NamespaceManager #21

a1012 · 2023-08-08T10:47:49Z

@cadmiumkitty
Hi, I am trying to convert pandas dataframe(.csv format) to rf graph in .ttl format.
While converting, I am facing an issue:
NameError: name 'NamespaceManager' is not defined

The code I am using is :
`from rdfpandas.graph import to_graph
import pandas as pd
import rdflib

df = pd.read_csv('/content/sample_data/NER_test.csv', keep_default_na = False)
namespace_manager = NamespaceManager(Graph())
namespace_manager.bind('skos', SKOS)
namespace_manager.bind('rdfpandas', Namespace('http://github.com/cadmiumkitty/rdfpandas/'))
g = to_graph(df, namespace_manager)
s = g.serialize(format = 'turtle')`

The csv file file is attached:
NER_test.csv

Please help me for the same. Moreover, I will be using generated file to interact with BioCypher.

cadmiumkitty · 2023-08-08T23:55:36Z

Hi @a1012,

There seems to be two issues.

First, you need to import NamespaceManager before you can use it: from rdflib.namespace import NamespaceManager. I may need to fix the example in README (won't recall why it worked, maybe NamespaceManager got moved to another package in Rdflib); will do it in the next couple of days.

Second, the CSV that you attached won't convert to RDF with the code you shared. You need to use @id column header to map to subject resource identifier and appropriate headers for other columns to map to predicate resource identifiers. This is probably a good example: https://github.com/cadmiumkitty/anzsic-taxonomy/blob/main/anzsic.csv

Hope it helps.

a1012 · 2023-08-09T05:11:40Z

Hi @cadmiumkitty
Thank you so much for helping me!
I have triplets (entity,category,relationship ) in dataframe columns format and struggling to convert it into .ttl file so that I can use it further in biocypher to create knowledge graph.
I am really new to rdf format so could you please explain :Second, the CSV that you attached won't convert to RDF with the code you shared. You need to use @id column header to map to subject resource identifier and appropriate headers for other columns to map to predicate resource identifiers?

I didn't understand the format in the shared(This is probably a good example: https://github.com/cadmiumkitty/anzsic-taxonomy/blob/main/anzsic.csv) example. So could you please provide any material to understand the rdf format(what each identifier or terminology mean ) to create for my use-case .

I mean , I am not able to understand the structure and how to put in my use-case.
because when I put index_col = '@id' while reading csv as shown in below code

df = pd.read_csv('/content/sample_data/NER_test.csv',index_col = '@id', keep_default_na = False)

I usually get error:
ValueError: Index @id invalid

cadmiumkitty · 2023-08-09T10:58:10Z

Hi @a1012,

I'd start with RDF Primer here https://www.w3.org/TR/rdf11-concepts/

My second point is that the CSV that you read into Pandas DataFrame to convert to Rdflib Graph and serialize as Turtle (.ttl) should follow a particular convention. The convention is described in the documentation for the to_graph method: https://rdfpandas.readthedocs.io/en/latest/rdfpandas.html#rdfpandas.graph.to_graph

Row indices are used as subjects, and column indices as predicates (I use @id column for indices and specify it when reading CSV into Pandas Data Frame with read_csv). Object types are inferred from the column index pattern of predicate{rdfLib Identifier instance class name}(type)[index]@language. Index numbers simply create additional statements as opposed to attempting to construct a new rdfs:List or rdfs:Container.

The example I shared https://github.com/cadmiumkitty/anzsic-taxonomy/blob/main/anzsic.csv follows that convention in that it has @id column for row indices to be used as subjects, other columns to use as predicates, and values in the cells to use as objects (literals or URIs) - Rdfpandas simply build a lot of subject-predicate-object triples from the DataFrame.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NamespaceManager #21

NamespaceManager #21

a1012 commented Aug 8, 2023

cadmiumkitty commented Aug 8, 2023

a1012 commented Aug 9, 2023

cadmiumkitty commented Aug 9, 2023 •

edited

NamespaceManager #21

NamespaceManager #21

Comments

a1012 commented Aug 8, 2023

cadmiumkitty commented Aug 8, 2023

a1012 commented Aug 9, 2023

cadmiumkitty commented Aug 9, 2023 • edited

cadmiumkitty commented Aug 9, 2023 •

edited