# Importing Data

If you want to follow the introduction on how to import data from CSV files into a Neo4J database you can follow the tutorial in this repository, by running `:play http://127.0.0.1:8001/social_network.html` in the web client of the database in your VM.

## Creating and Using Indexes

> A database index is a redundant copy of information in the database for the purpose of making retrieving said data more efficient. This comes at the cost of additional storage space and slower writes, so deciding what to index and what not to index is an important and often non-trivial task.
>
> Cypher allows the creation of indexes over a property for all nodes that have a given label. Once an index has been created, it will automatically be managed and kept up to date by the database whenever the graph is changed. Neo4j will automatically pick up and start using the index once it has been created and brought online. http://neo4j.com/docs/developer-manual/current/cypher/schema/index/

You can create an index on certain attributes as in the following:

```cypher
CREATE INDEX ON :Person(name);
```

Usually you do not need to specify which indexes to use in a query. When indexes exist, they will be used in `WHERE` clauses for comparison operations, including equality, inequality, `IN`, `STARTS WITH`, `has`, `exists`, etc. 


In case you want to get rid of an index, run:

```cypher
DROP INDEX ON :Person(name);
```



# Creating and Using Constraints

```cypher
CREATE CONSTRAINT ON (p:Person) ASSERT p.id IS UNIQUE;
```


# Importing Data from a CSV File in Cypher


In this example, we are going to import our data from CSV files via Cypher.


## `LOAD CSV`


`LOADCSV` is used to import data from CSV files. The URL of the CSV file is specified by using `FROM` followed by an arbitrary expression evaluating to the URL in question.
It is required to specify a variable for the CSV data using `AS`.

`LOAD CSV` supports resources compressed with gzip, Deflate, as well as ZIP archives.
CSV files can be stored on the database server and are then accessible using a `file:///` URL. Alternatively, `LOAD CSV` also supports accessing CSV files via HTTPS, HTTP, and FTP.

```cypher
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///social_network_nodes.csv" AS row
MERGE (:Person {id: toInt(row.node_id), name: row.name, job: row.job, birthday: row.birthday});
```


```cypher
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS FROM "file:///social_network_edges.csv" AS row                   
MATCH (f:Person {id: toInt(row.source_node_id)}), (t:Person {id: toInt(row.target_node_id)})                            
CREATE (f)-[:ENDORSES]->(t);
```

If the CSV file contains a significant number of rows, i.e., hundreds of thousands or millions, `USING PERIODIC COMMIT` can be used to instruct Neo4j to perform a commit after a number of rows. This reduces the memory overhead of the transaction state. By default, the commit will happen every 1000 rows. For your exercises it is a good idea to rely `USING PERIODIC COMMIT` as the VM does not provide a lot of memory.

### `MERGE`


The `MERGE` clause ensures that a pattern exists in the graph. Either the pattern already exists, or it needs to be created.

`MERGE` either matches existing nodes and binds them, or it creates new data and binds that. It is like a combination of `MATCH` and `CREATE` that additionally allows you to specify what happens if the data was matched or created.

For example, you can specify that the graph must contain a node for a user with a certain name. If there is not a node with the correct name, a new node will be created and its name property set.






## Importing Data from the Command-line

Alternatively, you can import data directly from the command-line with help of the `neo4j-import` tool.

This tool, creates a new database from a collection of CSV files, which you would typically do for an initial data import.

However, this will not work on the VM as it does not have sufficient memory. On a real database server this will work.

```bash
cd /var/lib/neo4j

sudo neo4j-admin import --mode csv --database imported.db --nodes:Person=/var/lib/neo4j/import/social_network_nodes.csv --relationships:ENDORSES=/var/lib/neo4j/import/social_network_edges.csv --multiline-fields=true
```

In `/etc/neo4j/neo4j.conf` change 
```bash
# The name of the database to mount
#dbms.active_database=graph.db
```
into
```bash
# The name of the database to mount
dbms.active_database=imported.db
```


After restarting your the DB via 

```
sudo neo4j restart
```

the execution of the following query counting the amount of nodes and relationships in combination with the example data given above:

```cypher
MATCH ()
WITH count(*) AS count
RETURN "nodes" AS type, count
UNION
MATCH ()-[]->()
WITH count(*) AS count
RETURN "relationships" AS type, count
```

should show the following output:

```
╒═══════════════╤══════════╕
│"type"         │"count"   │
╞═══════════════╪══════════╡
│"nodes"        │"500000"  │
├───────────────┼──────────┤
│"relationships"│"11205208"│
└───────────────┴──────────┘
```

# Accessing Neo4J from Java

In the following is a condensed tutorial on how to excute Cypher queries and access a Neo4J database from Java code. The guide assumes that you use Maven to manage your code dependencies.


  * Create a Maven project. In NetBeans `New Project -> Maven -> Java Application`
  * Add a dependency to the Neo4J driver to your project configuration (`pom.xml`) 
  
```xml
    <dependencies>
        <!-- tag::bolt-dependency[] -->
        <dependency>
            <groupId>org.neo4j.driver</groupId>
            <artifactId>neo4j-java-driver</artifactId>
            <version>1.1.0</version>
        </dependency>
        
        <!-- ...any other dependencies -->
    </dependencies>
```

  * Create a Java Class `ConnectionTest.java` and type in the following code:
  
```java
package dk.cphbusiness.db.neo4j.intro;

import org.neo4j.driver.v1.*;

/**
 *
 * @author Helge
 */
public class ConnectionTest {

    public static void main(String[] args) {
        Driver driver = GraphDatabase.driver( 
                "bolt://localhost:7687", 
                AuthTokens.basic( "neo4j", "class" ) );
        Session session = driver.session();

        // Run a query matching all nodes
        StatementResult result = session.run( 
                "MATCH (s)" +
                "RETURN s.name AS name, s.job AS job");

        while ( result.hasNext() ) {
            Record record = result.next();
            System.out.println( record.get("name").asString() );
        }
        
        session.close();
        driver.close();
    }
}
```

  * This program should print a record for each node in your Neo4J database.

This example is based on: https://neo4j.com/developer/java/#_the_example_project

# The Neo4J Java API

Alternatively, if you do not want to rely on Cypher queries to comminicate with your Neo4J database, you can access it directly via the Java API (http://neo4j.com/docs/java-reference/current/javadocs/).


# Spring Data Neo4J, an Object-Graph Mapper

The following is based on chapter nine of *Neo4j in Action*.

Until now we have been working directly with the core Neo4j graph primitives—nodes and relationships—to represent and interact with (that is, read and persist) various domain model concepts.

Though that approach is extremely powerful and flexible, operating with the low-level Neo4j APIs can sometimes be quite verbose and result in a lot of boilerplate code, especially when it comes to working with domain model entities.

“In a nutshell, *Spring Data Neo4j* (SDN), is an *object-graph mapping* (OGM) framework that was created to make life easier for (currently only Java) developers who need, or would prefer, to work with a POJO-based domain model, where some or all of the data is stored in Neo4j.




```java
public class User {
    String userId;
    String name;
    Set<User> friends;
    Set<Viewing> views;
    User referredBy;
}
public class Movie {
    String title;
    Set<Viewing> views;
}
public class Viewing {
    User user;
    Movie movie;
    Integer stars;
}
```


SDN is an annotation-based object-graph mapping library. This means it is a library that relies on being able to recognize certain SDN-specific annotations attached to parts of your code. These annotations provide instructions about how to transform the associated code to the underlying structures in the graph.

Sometimes you may even find that you do not need to annotate certain pieces of code. This is because SDN tries to infer some sensible defaults, applying the principle of convention over configuration. OGM is to graphs what ORM is to an RDBMS.


```java
@NodeEntity
public class User {
    String name;
    
    @Indexed(unique=true)
    String userId;
    
    @GraphId
    Long nodeId;
    User referredBy;

    @RelatedTo(type = "IS_FRIEND_OF", direction = Direction.BOTH)
    Set<User> friends;

    @RelatedToVia
    Set<Viewing> views;
}

@NodeEntity
public class Movie {
    String title;
    
    @GraphId
    Long nodeId;
    
    @RelatedToVia(direction = Direction.INCOMING)
    Iterable<Viewing> views;
}

@RelationshipEntity(type = "HAS_SEEN")
public class Viewing {
    Integer stars;
    
    @GraphId
    Long relationshipId;
    
    @StartNode
    User user;
    
    @EndNode
    Movie movie;
}
```

https://neo4j.com/developer/spring-data-neo4j/

# Getting Practical!

The PageRank Algorithm as described in *Artificial Intelligence: A Modern Approach* Third Edition by Stuart J. Russell and Peter Norvig.

![alog_descr](./images/pr_descr.png)


## Preparing the Graph

Consequently, you likely want to prepare your graph and add a page rank field `pr` to each node.

```cypher
MATCH (n)
SET n.pr = 1.0
RETURN n
```

Furthermore, since there is no in-built function for getting the outdegree of a node, we have to write a query for it.

```cypher
MATCH (m)-[:ENDORSES]->(n)
WHERE m.name = "Sol Linkert" 
RETURN count(*) as outdeg
```