Step 1: load the documents. In this case, I'm gonna work only on the CSV file. 

In [1]:
#the function read_csv allows me to import a CSV file as a Pandas DataFrame

from pandas import read_csv

graph_publications = read_csv ("graph_publications.csv", keep_default_na=False,
                               dtype={
                                   "id": "string",
                                   "title": "string",
                                   "type":"string",
                                   "publication_year":"int",
                                   "issue": "string",
                                   "volume": "string",
                                   "chapter":"string",
                                   "publication_venue": "string",
                                   "venue_type": "string",
                                   "publisher":"string",
                                   "event":"string"
                               })
graph_publications

Unnamed: 0,id,title,type,publication_year,issue,volume,chapter,publication_venue,venue_type,publisher,event
0,doi:10.1016/j.websem.2021.100655,Crossing The Chasm Between Ontology Engineerin...,journal-article,2021,,70,,Journal Of Web Semantics,journal,crossref:78,
1,doi:10.1007/s10115-017-1100-y,Core Techniques Of Question Answering Systems ...,journal-article,2017,3,55,,Knowledge And Information Systems,journal,crossref:297,
2,doi:10.1016/j.websem.2014.03.003,Api-Centric Linked Data Integration: The Open ...,journal-article,2014,,29,,Journal Of Web Semantics,journal,crossref:78,
3,doi:10.1093/nar/gkz997,The Monarch Initiative In 2019: An Integrative...,journal-article,2019,D1,48,,Nucleic Acids Research,journal,crossref:286,
4,doi:10.3390/publications7030050,Dras-Tic Linked Data: Evenly Distributing The ...,journal-article,2019,3,7,,Publications,journal,crossref:1968,
...,...,...,...,...,...,...,...,...,...,...,...
495,doi:10.1145/3407194,Early Detection Of Social Media Hoaxes At Scale,journal-article,2020,4,14,,Acm Transactions On The Web,journal,crossref:320,
496,doi:10.3390/app10144893,Cognitive Aspects-Based Short Text Representat...,journal-article,2020,14,10,,Applied Sciences,journal,crossref:1968,
497,doi:10.1145/3309547,Temporal Relational Ranking For Stock Prediction,journal-article,2019,2,37,,Acm Transactions On Information Systems,journal,crossref:320,
498,doi:10.1007/978-3-030-58285-2_27,Fast Pathfinding In Knowledge Graphs Using Wor...,book-chapter,2020,,,1,Lecture Notes In Computer Science - Ki 2020: A...,book,crossref:297,


Step 2: Create an empty Graph using the rdflib library

In [2]:
from rdflib import Graph
from rdflib import URIRef
from rdflib import Literal
from rdflib import RDF

my_graph = Graph()
len(my_graph)

0

Step 3: Define URIs. Each statement of the Graph will have the form Subject, Predicate, Object. Some of these elements are already defined in the rdflib library (e.g.: RDF.type is an already defined property, corresponding to the URIRef http://www.w3.org/1999/02/22-rdf-syntax-ns#type), the missing ones must be defined according to the UML model.

In [3]:
# some classes of resources (not finished yet)
JournalArticle = URIRef("https://schema.org/ScholarlyArticle")
BookChapter = URIRef("https://schema.org/Chapter")
Journal = URIRef("https://schema.org/Periodical")
Book = URIRef("https://schema.org/Book")

# some attributes related to classes (not finished yet)
doi = URIRef("https://schema.org/identifier")
publicationYear = URIRef("https://schema.org/datePublished")
title = URIRef("https://schema.org/name")
issue = URIRef("https://schema.org/issueNumber")
volume = URIRef("https://schema.org/volumeNumber")
identifier = URIRef("https://schema.org/identifier")
name = URIRef("https://schema.org/name")

# some relations among classes (not finished yet)
publicationVenue = URIRef("https://schema.org/isPartOf")

#base url for our subjects
base_url = "https://comp-data.github.io/res/"

Step 3: Create statements according to the dataframe (above) and load them into the graph.

In [4]:
#Load statements about publications:
#iterrows() is a method that allows us to select each row of the dataframe, one by one. Each row has its own index (idx)

for idx, row in graph_publications.iterrows():
    local_id = "publication-" + str(idx)
    
    # The shape of the new resources that are publications is
    # 'https://comp-data.github.io/res/publication-<integer>'
    subj = URIRef(base_url + local_id)
    
    #In this first case, the if condition tells us if each of the publications is of type "journal-article"
    #(look at the dataframe above)
    if row["type"] == "journal-article":
        my_graph.add((subj, RDF.type, JournalArticle))
    
        # These two statements applies only to journal article:
        my_graph.add((subj, issue, Literal(row["issue"])))
        my_graph.add((subj, volume, Literal(row["volume"])))
    else:
        my_graph.add((subj, RDF.type, BookChapter))
        
    my_graph.add((subj, name, Literal(row["title"])))
    
    # The original value here has been casted to string since the Date type
    # in schema.org ('https://schema.org/Date') is actually a string-like value

So far, we have loaded a CSV file as a dataframe. We have then distinguished publications of type "journal-article" from publications of type "book-chapter". Each type of publications has its own attributes (issue and volume; title). 

Step 4: how many triples have we inserted in the graph?

In [5]:
len(my_graph)

1814

Step 5: create a Graph Database and populate it. In other words, we use the 1814(?) triples we have created so far to create a Graph database.

In [6]:
from rdflib.plugins.stores.sparqlstore import SPARQLUpdateStore

store = SPARQLUpdateStore()

# The URL of the SPARQL endpoint is the same URL of the Blazegraph
# instance + '/sparql'
endpoint = 'http://127.0.0.1:9999/blazegraph/sparql'

# It opens the connection with the SPARQL endpoint instance
store.open((endpoint, endpoint))

for triple in my_graph.triples((None, None, None)):
   store.add(triple)
    
# Once finished, remeber to close the connection
store.close()

Step 6: try to retrieve some data using SPARQL queries. In this case, we want to retrieve the <b>titles</b> (?title) of the publications having type <b>journal-article</b> (?journal_article). We need to recall two prefixes: the first one is the prefix referring to the property <code>rdf:type</code>; the second one contains both the object <code>schema:ScholarlyArticle</code> (i.e., the equivalent of the type Journal Article) and the property <code>schema:name</code>. 

In [7]:
from sparql_dataframe import get

endpoint = "http://127.0.0.1:9999/blazegraph/sparql"
query = """
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX schema: <https://schema.org/>

SELECT ?journal_article ?title
WHERE {
    ?journal_article rdf:type schema:ScholarlyArticle .
    ?journal_article schema:name ?title .
}
"""
df_sparql = get(endpoint, query, True)
df_sparql

Unnamed: 0,journal_article,title
0,https://comp-data.github.io/res/publication-486,Inferring User Interests In Microblogging Soci...
1,https://comp-data.github.io/res/publication-353,Enhanced Semantic Representation Of Coaxiality...
2,https://comp-data.github.io/res/publication-38,Ontology-Based Model To Support Ubiquitous Hea...
3,https://comp-data.github.io/res/publication-437,Relation Prediction In Knowledge Graph By Mult...
4,https://comp-data.github.io/res/publication-357,Webulous And The Webulous Google Add-On - A We...
...,...,...
402,https://comp-data.github.io/res/publication-47,The Use Of Computer-Interpretable Clinical Gui...
403,https://comp-data.github.io/res/publication-85,Fair Metadata Standards For Low Carbon Energy ...
404,https://comp-data.github.io/res/publication-89,Legal Information Retrieval Systems: State-Of-...
405,https://comp-data.github.io/res/publication-72,Metadata Schemas And Ontologies For Building E...


In [8]:
endpoint = "http://127.0.0.1:9999/blazegraph/sparql"
query1 = """
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX schema: <https://schema.org/>

SELECT ?book_chapter ?title
WHERE {
    ?book_chapter rdf:type schema:Chapter .
    ?book_chapter schema:name ?title .
}
"""
df_sparql_2 = get(endpoint, query1, True)
df_sparql_2

Unnamed: 0,book_chapter,title
0,https://comp-data.github.io/res/publication-130,Sentiment Polarity Detection In Social Network...
1,https://comp-data.github.io/res/publication-88,Challenges In The Implementation Of Privacy En...
2,https://comp-data.github.io/res/publication-30,SantÃ©: A Light-Weight End-To-End Semantic Sea...
3,https://comp-data.github.io/res/publication-56,Se-Diagenf: An Ontology-Based Expert System Fo...
4,https://comp-data.github.io/res/publication-159,Comparative Study Of Rdf And Owl Ontology Lang...
...,...,...
88,https://comp-data.github.io/res/publication-31,Improving Answer Type Classification Quality T...
89,https://comp-data.github.io/res/publication-133,A Knowledge-Based Computational Environment Fo...
90,https://comp-data.github.io/res/publication-446,Enriching Wikidata With Cultural Heritage Data...
91,https://comp-data.github.io/res/publication-184,Mapping The Web Ontology Language To The Opena...
