Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NamespaceManager #21

Open
a1012 opened this issue Aug 8, 2023 · 3 comments
Open

NamespaceManager #21

a1012 opened this issue Aug 8, 2023 · 3 comments

Comments

@a1012
Copy link

a1012 commented Aug 8, 2023

@cadmiumkitty
Hi, I am trying to convert pandas dataframe(.csv format) to rf graph in .ttl format.
While converting, I am facing an issue:
NameError: name 'NamespaceManager' is not defined

The code I am using is :
`from rdfpandas.graph import to_graph
import pandas as pd
import rdflib

df = pd.read_csv('/content/sample_data/NER_test.csv', keep_default_na = False)
namespace_manager = NamespaceManager(Graph())
namespace_manager.bind('skos', SKOS)
namespace_manager.bind('rdfpandas', Namespace('http://github.com/cadmiumkitty/rdfpandas/'))
g = to_graph(df, namespace_manager)
s = g.serialize(format = 'turtle')`

The csv file file is attached:
NER_test.csv

Please help me for the same. Moreover, I will be using generated file to interact with BioCypher.

@cadmiumkitty
Copy link
Owner

Hi @a1012,

There seems to be two issues.

First, you need to import NamespaceManager before you can use it: from rdflib.namespace import NamespaceManager. I may need to fix the example in README (won't recall why it worked, maybe NamespaceManager got moved to another package in Rdflib); will do it in the next couple of days.

Second, the CSV that you attached won't convert to RDF with the code you shared. You need to use @id column header to map to subject resource identifier and appropriate headers for other columns to map to predicate resource identifiers. This is probably a good example: https://github.com/cadmiumkitty/anzsic-taxonomy/blob/main/anzsic.csv

Hope it helps.

@a1012
Copy link
Author

a1012 commented Aug 9, 2023

Hi @cadmiumkitty
Thank you so much for helping me!
I have triplets (entity,category,relationship ) in dataframe columns format and struggling to convert it into .ttl file so that I can use it further in biocypher to create knowledge graph.
I am really new to rdf format so could you please explain :Second, the CSV that you attached won't convert to RDF with the code you shared. You need to use @id column header to map to subject resource identifier and appropriate headers for other columns to map to predicate resource identifiers?

I didn't understand the format in the shared(This is probably a good example: https://github.com/cadmiumkitty/anzsic-taxonomy/blob/main/anzsic.csv) example. So could you please provide any material to understand the rdf format(what each identifier or terminology mean ) to create for my use-case .

I mean , I am not able to understand the structure and how to put in my use-case.
because when I put index_col = '@id' while reading csv as shown in below code

df = pd.read_csv('/content/sample_data/NER_test.csv',index_col = '@id', keep_default_na = False)

I usually get error:
ValueError: Index @id invalid

@cadmiumkitty
Copy link
Owner

cadmiumkitty commented Aug 9, 2023

Hi @a1012,

I'd start with RDF Primer here https://www.w3.org/TR/rdf11-concepts/

My second point is that the CSV that you read into Pandas DataFrame to convert to Rdflib Graph and serialize as Turtle (.ttl) should follow a particular convention. The convention is described in the documentation for the to_graph method: https://rdfpandas.readthedocs.io/en/latest/rdfpandas.html#rdfpandas.graph.to_graph

Row indices are used as subjects, and column indices as predicates (I use @id column for indices and specify it when reading CSV into Pandas Data Frame with read_csv). Object types are inferred from the column index pattern of predicate{rdfLib Identifier instance class name}(type)[index]@language. Index numbers simply create additional statements as opposed to attempting to construct a new rdfs:List or rdfs:Container.

The example I shared https://github.com/cadmiumkitty/anzsic-taxonomy/blob/main/anzsic.csv follows that convention in that it has @id column for row indices to be used as subjects, other columns to use as predicates, and values in the cells to use as objects (literals or URIs) - Rdfpandas simply build a lot of subject-predicate-object triples from the DataFrame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants