# Configure the Gremlin server

In [None]:
%%capture output
%%graph_notebook_config
{
    "host": "127.0.0.1",
    "port": 8182,
    "ssl": false
}

# Create Schema
Create the graph schema. This is done by adding vertices and edges in gremlin to match the schema of the graph that will be loaded.

## ~metadata Vertex
There is a special metadata vertex required when creating schema. This is added as with a `T.id` of `~metadata`.

This vertex has the following properties:

| Property            | Description                                                                     | Required | Default | Type    |
|---------------------|---------------------------------------------------------------------------------|----------|---------|---------|
| `replicationFactor` | The replication factor of Aerospike that will be used.                          | Yes      | N/A     | Integer |
| `maxEdgeCacheSize`  | The max number of edges that will be packed into a single Aerospike record.     | No       | 8000    | Integer |
| `edgePackSize`      | The max number of edges that will be packed into an edge record.                | No       | 10      | Integer |
| `vertexLabelSindex` | True if a secondary index on the vertex label will be created, otherwise false. | No       | False   | Boolean |

In general, `maxEdgeCacheSize` and `edgePackSize` should not be changed from their defaults.

An example that sets the `replicationFactor` to 2, the `edgePackSize` to 12, leaves `maxEdgeCacheSize` as 8000, and enables `vertexLabelSindex`:
```
g.addV().property(T.id, "~metadata").
  property("replicationFactor", 2).
  property("edgePackSize", 12).
  property("vertexLabelSindex", true).iterate()
```

## Special Properties
There are a few special properties that can be added to vertices and edges to signal how they are stored in Aerospike.
These properties are only applicable to vertices and edges that are part of the schema, not the `~metadata` vertex.

| Property                    | Description                                                                     | Required                          | Default | Type    |
|-----------------------------|---------------------------------------------------------------------------------|-----------------------------------|---------|---------|
| `<label>.count`             | The number of the relevant vertex or edge expected in the graph.                | Yes                               | N/A     | Integer |
| `<property_key>`            | The value of the `<property_key>` should be the type of the property.           | Yes                               | N/A     | String  |
| `<property_key>.sindexed`   | True if a secondary index on the property will be created.                      | No                                | False   | Boolean |
| `<property_key>.likelihood` | The likelihood that a property will exist, where 1.0 is 100%, 0.0 is 0%.        | No                                | 1.0     | Double  |
| `<property_key>.valueSize`  | True if a secondary index on the vertex label will be created, otherwise false. | No                                | false   | Boolean |
| `<property_key>.size`       | The number of bytes the specified type will take up on average.                 | Only for String and byte[] types. | N/A     | Integer |

An example vertex with label `Person` that is expected to occur 1000 times in the graph, has a property `name` that is a String, has a secondary index on `name`, and has a likelihood of 100%, where names are on average 15 characters, and has a property `age` that is an Integer, has a likelihood of 50%, and is not secondary indexed:

```
person = g.addV("Person").property(T.id, "Person").
  property("Person.count", 1000).
  property("name", "String").
  property("name.sindexed", true).
  property("name.valueSize", 15).
  property("age", Integer).
  next()
```

An example edge with label `KNOWS` that is expected to occur 100 times per Person, totalling 100,000 times in the graph, has a property `since` that is a Long, has a likelihood of 100%, and is not secondary indexed:

```
g.addE("KNOWS").from(person).to(person).
  property("KNOWS.count", 100000).
  property("since", Long).
  next()
```

## A Complete Schema Example
In the example below a dog grooming facility will be modelled. In this high-tech dog grooming facility there will be the following vertices:
- Groomer
- Client
- Pet
- Appointment
- Service

With the following edges:
- CLIENT_OF (from Client to Pet)
- WITH_GROOMER (from Appointment to Groomer)
- WITH_PET (from Appointment to Pet)
- SERVICE (from Appointment to Service)
- PROVIDES_SERVICE (from Groomer to Service)

In [None]:
%%gremlin
g.V().drop().iterate()

// Set replicationFactor of 2 and enable the vertexLabelSindex since this application will be querying by vertex label alone.
// maxEdgeCacheSize and edgePackSize are left as their defaults.
g.addV().property(T.id, "~metadata").
    property("replicationFactor", 2).
    property("vertexLabelSindex", true).iterate()

// Groomers have names and phone numbers. In this grooming salon we have 10 individual Groomers.
// The name of the Groomer is secondary indexed so it can be looked up very fast.
groomer = g.addV("Groomer").
    property("Groomer.count", 10).
    property("name", "String").
    property("name.sindexed", true).
    property("name.valueSize", 15).
    property("phone", "String").
    property("phone.valueSize", 12).
    next()

// Clients have names and phone numbers. Each Groomer has about 100 Clients, leading to 1000 Clients in total.
// No secondary indexes are created on client data.
client = g.addV("Client").
    property("Client.count", 1000).
    property("name", "String").
    property("name.valueSize", 15).
    property("phone", "String").
    property("phone.valueSize", 12).
    next()

// Pets have a name and a breed. Each client has on average 1.5 pets, leading to 1500 pets in total.
// No secondary indexes are created on pet data.
pet = g.addV("Pet").
    property("Pet.count", 1500).
    property("name", "String").
    property("name.valueSize", 15).
    property("breed", "String").
    property("breed.valueSize", 10).
    next()

// Every pet has an appointment with a groomer scheduled whenever their last appointment ends, leading to 1500 appointments.
// Each appointment has a date, a time, and an estimatedAppointmentLength, where the date is a String, and the time and estimated appointment time are Integers.
// No secondary indexes are created on appointment data.
// The label Appt is used to abbreviate Appointment.
appointment = g.addV("Appt").
    property("Appt.count", 1500).
    property("date", "String").
    property("date.valueSize", 10).
    property("time", "Integer").
    property("estimatedAppointmentLength", "Integer").
    next()

// Groomers provide services. Each Groomer provides 5 services, leading to 50 services in total.
// A service has a serviceType.
// No secondary indexes are created on service data.
service = g.addV("Service").
    property("Service.count", 50).
    property("serviceType", "String").
    property("serviceType.valueSize", 25).
    next()

// Note - since we do not use the return value of edges, iterate() is used to terminate the query instead of next.

// Clients are connected to Groomers with a CLIENT_OF edge. Each Client has a CLIENT_OF edge to 1 Groomer.
// Since there are 1000 Clients, there are 1000 CLIENT_OF edges.
// Client edges detail the date the client-groomer relationship was established with a date String.
g.addE("CLIENT_OF").from(client).to(groomer).
    property("CLIENT_OF.count", 1000).
    property("date", "String").
    property("date.valueSize", 10).
    iterate()

// Pets are connected to Clients with a OWNER edge. Each Pet has an OWNER edge to 1 Client.
// Since there are 1500 Pets, there are 1500 OWNER edges.
g.addE("OWNER").from(pet).to(client).
    property("OWNER.count", 1500).
    iterate()

// Appointments are connected to Groomers with a WITH_GROOMER edge. Each Appointment has a WITH_GROOMER edge to 1 Groomer.
// Since there are 1500 Appointments, there are 1500 WITH_GROOMER edges.
g.addE("WITH_GROOMER").from(appointment).to(groomer).
    property("WITH_GROOMER.count", 1500).
    iterate()

// Appointments are connected to Pets with a WITH_PET edge. Each Appointment has a WITH_PET edge to 1 Pet.
// Since there are 1500 Appointments, there are 1500 WITH_PET edges.
g.addE("WITH_PET").from(appointment).to(pet).
    property("WITH_PET.count", 1500).
    iterate()

// Appointments are connected to Services with a SERVICE edge. Each Appointment has SERVICE edges to many Services.
// On average, 3 services are done per Appointment, leading to 4500 SERVICE edges.
g.addE("SERVICE").from(appointment).to(service).
    property("SERVICE.count", 4500).
    iterate()

// Groomers are connected to Services with a PROVIDES_SERVICE edge. Each Groomer has a PROVIDES_SERVICE edge to many Services.
// On average, 10 services are provided by each Groomer, leading to 100 PROVIDES_SERVICE edges.
g.addE("PROVIDES_SERVICE").from(groomer).to(service).
    property("PROVIDES_SERVICE.count", 100).
    iterate()

return "Success"

# Visualize the Schema
Using the following query the schema can be visualized. After executing the query, select Graph from the result.

From here you can select the details button on the upper right side of the visualization and then select any graph element to get more details.

In [None]:
%%gremlin --edge-label-max-length 30 --label-max-length 30 -p v,oute,inv
g.V().emit(__.or(__.and(
    __.out().count().is(P.eq(0)),
    __.in().count().is(P.eq(0))),loops().is(P.eq(1)))).
repeat(outE().inV()).
path().by(elementMap())

# Size the Graph
If the schema is correct and everything is ready, run the cell below to get details for sizing your Aerospike cluster for the graph.

In [None]:
%%gremlin
g.call("sizing-tool").next()