# Getting started with Azure Cosmos DB in the Gremlin API

Azure Cosmos DB offers a fully managed graph database as a service called the **Gremlin API**. Gremlin API is a graph database based on the [Apache Tinkerpop](http://tinkerpop.apache.org) open-source standard. The following is an introduction to Gremlin API on Azure Cosmos DB.

Using a graph database engine allows you to model your data in terms of entities and relationships between them. As opposed to a relational database engine, a graph database will persist all relationships, which will simplify the queries that resolve adjacent entities. Similarly, the **Gremlin query language** is fluent in functions that are common to graph algorithms and traversals across multiple objects. 

Learn more about the [Azure Cosmos DB Gremlin API](https://docs.microsoft.com/en-us/azure/cosmos-db/graph-introduction).

**Note: Make sure to run these cells in order since they contain dependencies across each other.**

In [None]:
import sys, traceback
!{sys.executable} -m pip install gremlinpython==3.4.3 --user
!{sys.executable} -m pip install futures --user
!{sys.executable} -m pip install networkx --user

print(sys.version)

**Note: You may have to restart the Kernel to install the libraries appropriately. Please see the top bar for more information.**

## Client initialization
To initialize a Gremlin client, you need to use the connection details from your **Cosmos DB Account**:
- Your database account name in the format of `<your_account_name>.gremlin.cosmosdb.azure.com`.
- Your username is the path to your collection in the following format `/dbs/<your database name>/colls/<your collection name>`.
- Your password is any read/write key for your Cosmos DB Account.

In [None]:
from gremlin_python.driver import client, serializer

client = client.Client(
        'wss://<your_account_name>.gremlin.cosmosdb.azure.com:443/','g', 
        username="/dbs/<your_db_name>/colls/<your_coll_name>",
    password="<your_password>",
        message_serializer=serializer.GraphSONSerializersV2d0()
)

## Executing Gremlin queries

The client object uses a `submitAsync` call that allows to send any given Gremlin query. This will pass the results into a `callback` object that will provide an interface to iterate through the results.

The following is a sample of wrapper function for the `submitAsync()` call that sends the Gremlin queries to the server. This wrapper handles common exceptions including `ConflictException` and `RequestRateTooLargeException`.  

In [None]:
from gremlin_python.driver.protocol import GremlinServerError

cosmosdb_messages = {
    409: 'Conflict exception. You\'re probably inserting the same ID again.',
    429: 'Not enough RUs for this query. Try again.'
}

def executeGremlinQuery(gremlinQuery, message=None, params=None):
    try: 
        callback = client.submitAsync(gremlinQuery)
        if callback.result() is not None:
            return callback.result().one()
    except GremlinServerError as ex:
        status=ex.status_attributes['x-ms-status-code']
        print('There was an exception: {0}'.format(status))
        print(cosmosdb_messages[status])
        

The following query obtains the number of vertices in the collection: `g.V().count()`

In [None]:
result = executeGremlinQuery("g.V().count()")
print("Count of vertices: {0}".format(result))

If you're running this for the first time, you'll get a `Count of vertices: [0]` since we haven't inserted any data. We will explore more queries within the next steps.

## CRUD Gremlin queries

The following are examples for Create, Read, Update and Delete operations using Gremlin.

### Create vertices
- To create vertices, use the `addV(<Label>)` function. The parameter for this function is the Label field of a vertex. The Label field determines the _type_ of entity that the vertex denotes. 

- Right after the `addV(<Label>)` function, you can append the `.property(<key>, <value>)` function any number of times to add properties to this vertex. 

- Finally, all vertices in a partitioned collection need to specify a partition key value for the given partition key field. The name of the partition key property was set when the collection was created. To set this, you can use the `.property(<partitionKey>, <value>)` function.

In the example below we will create a vertex of type `person` with a property `id` with a value of `Luis`, and a partition key with a field named `pk` and a value of `pk`:

In [None]:
result = executeGremlinQuery("g.addV('person').property('id', 'Luis').property('pk', 'pk')")
print("Result: {0}".format(result))

**Let's insert a few more vertices**

In [None]:
result = executeGremlinQuery("g.addV('person').property('id', 'Andrew').property('pk', 'pk')")
print("Result: {0}\n".format(result))

result = executeGremlinQuery("g.addV('person').property('id', 'Rimma').property('pk', 'pk')")
print("Result: {0}\n".format(result))

### Create edges

To add an edge between two vertices, you need to specify the source and the target. This can be done by using the `.addE(<Label>)` function against a vertex, and then using the `.to()` or `.from()` predicates to select either a target or a source vertex. Using either of those predicates will define the direction of the edge. 

A common edge query in the outgoing direction is created in the following way: `g.V(<sourceId>).addE(<relationship>).to(g.V(<target>))`. The example below illustrates this behavior.

In [None]:
result = executeGremlinQuery("g.V('Luis').addE('knows').to(g.V('Andrew'))")
print("Result: {0}\n".format(result))

Let's add another edge. This time selecting the target first and then adding an edge from the source.

In [None]:
result = executeGremlinQuery("g.V('Rimma').addE('knows').from(g.V('Andrew'))")
print("Raw result: {0}\n".format(result))

# Selecting the only result from the list
edge = result[0]

# Printing each property separately
print("Edge ID: {0}\n".format(edge['id']))
print("From: {0}\n".format(edge['outV']))
print("To: {0}".format(edge['inV']))

### Read vertices

Similarly, to issue a read query you can submit the Gremlin statement and return a resultSet that can be retrieved. Then the function that handles the response can iterate over the results.

In [None]:
def executeReadQuery(gremlinQuery, params=None):
    try: 
        callback = client.submitAsync(gremlinQuery)
        if callback.result() is not None:
            return callback.result().one()
    except GremlinServerError as ex:
        status=ex.status_attributes['x-ms-status-code']
        print('There was an exception: {0}'.format(status))
        print(cosmosdb_messages[status])

#### Select all vertices
The query below returns all vertices in the collection with the following Gremlin query: `g.V()`

In [None]:
results = executeReadQuery("g.V()")

for result in results:
    print("ID: {0}\tLabel:{1}\n".format(result['id'],result['label']))

#### Select all connections to a given vertex
You can use the `out(<relationship label>)` step to retrieve all connected vertices through edges in the outgoing direction. You can select a single vertex by using the `g.V(<id>)` vertex. 

The query below gets all people Luis knows: `g.V('Luis').out('knows')`

In [None]:
results = executeReadQuery("g.V('Luis').out('knows')")

for result in results:
    print("ID: {0}\tLabel:{1}\n".format(result['id'],result['label']))

If the `out()` step is executed more than once, you can get the next level of connections.

In [None]:
results = executeReadQuery("g.V('Luis').out('knows').out('knows')")

for result in results:
    print("ID: {0}\tLabel:{1}\n".format(result['id'],result['label']))

Learn more about Gremlin API:
- [Supported Gremlin steps and driver versions in Azure Cosmos DB Gremlin API](https://docs.microsoft.com/en-us/azure/cosmos-db/gremlin-support)
- [Gremlin Python quick start](https://docs.microsoft.com/en-us/azure/cosmos-db/create-graph-python)
- [Gremlin response headers](https://docs.microsoft.com/en-us/azure/cosmos-db/gremlin-headers)