# **ArangoRDF**

<img src="https://github.com/ArangoDB-Community/ArangoRDF/blob/main/examples/assets/adb_logo.png?raw=1" alt="rdf" width="250"/>
<img src="https://github.com/ArangoDB-Community/ArangoRDF/blob/main/examples/assets/rdf_logo.png?raw=1" alt="rdf" width="200"/>

# **Run the full version with Google Colab**
<img src="https://colab.research.google.com/assets/colab-badge.svg" href="https://colab.research.google.com/github/ArangoDB-Community/ArangoRDF/blob/main/examples/ArangoRDF.ipynb" alt="Open In Colab"/>

# Setup

In [None]:
%%capture
!pip install adb-cloud-connector
!pip install arango-rdf
!git clone -b "main" https://github.com/ArangoDB-Community/ArangoRDF.git 

In [None]:
from arango import ArangoClient
from adb_cloud_connector import get_temp_credentials
import json

from arango_rdf import ArangoRDF

# Create a Temporary ArangoDB Cloud Instance

In [None]:
# Request temporary instance from the managed ArangoDB Cloud Service.
con = get_temp_credentials()
print(json.dumps(con, indent=2))

# Connect to the db via the python-arango driver
db = ArangoClient(hosts=con["url"]).db(con["dbName"], con["username"], con["password"], verify=True)

# About RDF
RDF is a standard model for data interchange on the Web. RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed.

RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (this is usually referred to as a "triple"). Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications.

This linking structure forms a directed, labeled graph, where the edges represent the named link between two resources, represented by the graph nodes. This graph view is the easiest possible mental model for RDF and is often used in easy-to-understand visual explanations.

Resources to get started:

* [RDF Primer](https://www.w3.org/TR/rdf11-concepts/)
* [RDFLib (Python)](https://pypi.org/project/rdflib/)
* [One Example for Modeling RDF as ArangoDB Graphs](https://www.arangodb.com/docs/stable/data-modeling-graphs-from-rdf.html)

# Get Started with ArangoRDF


## Initialization
The first steps to take when attempting to import RDF data to ArangoDB involves:
* Passing a database connection to the ArangoRDF constructor
* You can also set a `default_graph` or `sub_graph`
  * A `sub_graph` is equivalent to a named RDF graph and for now is only stored on the documents that are imported.
* Instantiating ArangoRDF also creates an ArangoDB named graph for `default_graph`
* Set any configuration options and metadata. Currenlty, the only supported option is `normalize_literals` which is `False` by default. You can write any other metadata to save here as well.
* Finally, initialize the collections. Here you can use the defaults or set your own collection names, we set the blank nodes collection to `Blank`.



In [None]:
# Clean up existing data and collections
if db.has_graph("default_graph"):
    db.delete_graph("default_graph", drop_collections=True, ignore_missing=True)

# Initializes default_graph and sets RDF graph identifier (ArangoDB sub_graph)
# Optional: sub_graph (stores graph name as the 'graph' attribute on all edges in Statement collection)
# Optional: default_graph (name of ArangoDB Named Graph, defaults to 'default_graph',
#           is root graph that contains all collections/relations)
adb_rdf = ArangoRDF(db, sub_graph="http://data.sfgov.org/ontology") 
print("initialized graph")
config = {"normalize_literals": False}  # default: False

# RDF Import
adb_rdf.init_rdf_collections(bnode="Blank")
print("initialized collections")

# Import RDF Data
Now that we have setup our scaffolding for the RDF graphs, let's import some data.

## Import an Ontology
To import RDF graphs you simply call the `import_rdf` function and pass in the:


*   file path
*   format
*   `config` object
*   `save_config` is boolean that stores any config and metadata in the configuration collection to be used later.







In [None]:
print("importing ontology...")
# Start with importing the ontology
adb_rdf.import_rdf("./ArangoRDF/examples/data/airport-ontology.owl", format="xml", config=config, save_config=True)
print("Ontology imported")

Notice that we supplied `xml` for the format, this is because this owl file is serialized to RDF/xml. If we hadn't supplied this we would receive an error and the application would halt. 

## Import RDF Data Graphs

Now that you have imported your ontology it is time to import some data.
The process is the same, the only difference is now we are now importing using a [Turtle](https://www.w3.org/2007/02/turtle/primer/) file whose format is `ttl`.

I have also gone ahead and added an item to our config dictionary so that we can see how to get saved configuration information later.

In [None]:
config['Avocados_Are_Delicious'] = True

print("importing aircraft data...")

# Next, let's import the actual graph data
adb_rdf.import_rdf(f"./ArangoRDF/examples/data/sfo-aircraft-partial.ttl", format="ttl", config=config, save_config=True)
print("aircraft data imported")


## Configuration

Now that we have stored the configuration information we have a couple easy ways to retrieve it should we ever need to.

We can lookup saved configurations using to functions:
* `get_config_by_latest()`
* `get_config_by_key_value('key', 'value')`

Now, you can pass this config dictionary to `import_rdf()` as is or change options in it. If no config is found, the application halts to avoid excessive import time with wrong configuration information.

In [None]:
# Get the last saved config
config = adb_rdf.get_config_by_latest()
print(config)
print('')

# Get the most recent config that matches our search 
config = adb_rdf.get_config_by_key_value('Avocados_Are_Delicious', True)
print(config)

## Exporting to RDF
Should the need ever arise that you are required to export data from ArangoDB you can do so using the `export_rdf()` function.

Export only takes in the:
* output filename
* format 

In [None]:
print("exporting data...")
adb_rdf.export_rdf(f"./ArangoRDF/examples/data/rdfExport.xml", format="xml")
print("export complete")

# What's Next?

### We need your help!
This is a fresh community project that has a lot more to be done. We gladly welcome any feedback, issues, and PRs. You can find the repository in the ArangoDB-Community GitHub organization, [here](https://github.com/ArangoDB-Community/ArangoRDF).