Plan de la tâche - Stakeholders' map

1. Création des types de données
2. Création des tables csv
3. Importation dans neo4j sous forme de noeuds-relations
4. Design des requêtes
5. Possibilité d'interroger la base via un notebook Jupyter


Lien du github : https://github.com/MaximeCapron/foodforneo4j

# 5. Possibilité d'interroger la base

In [120]:
!pip install py2neo
# If you see warnings or errors when you run this command, re-run the command. It should run with no errors.

You should consider upgrading via the '/Users/maxime.capron/anaconda3/bin/python -m pip install --upgrade pip' command.[0m


In [121]:
import numpy as np
import pandas as pd
from py2neo import Graph, Database
from py2neo.data import Node, Relationship

In [122]:
graph = Graph("bolt://3.90.65.37:34875", auth=("neo4j", "volumes-capes-retrievals"))

## Initialiser le graph

In [None]:
def initialize():
    
    code_initialisation = """

    // Supprimer tout ce qui préexiste pour avoir une feuille propre

    MATCH (n)
    DETACH DELETE n ;


    // Importer les noeuds 

    LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/MaximeCapron/foodforneo4j/master/fichiers%20csv/Humains.csv' AS row FIELDTERMINATOR ';'
    MERGE (p:Person {id: row.index})
    ON CREATE SET p.nom = row.Nom, p.position_A = row.Position_A, p.position_B = row.Position_B, p.position_C = row.Position_C, p.position_D = row.Position_D, p.position_E = row.Position_E;


    LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/MaximeCapron/foodforneo4j/master/fichiers%20csv/Entreprises.csv' AS row FIELDTERMINATOR ';'
    MERGE (c:Company {id: row.index})
    ON CREATE SET c.titre = row.Titre;


    // Importer les relations

    LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/MaximeCapron/foodforneo4j/master/fichiers%20csv/Relations1.csv' AS rel FIELDTERMINATOR ';'
    MATCH (p1 {id: rel.index})
    MATCH (p2 {id: rel.index_relation})
    MERGE (p1)-[r:CONNAISSANCE]->(p2)
    ON CREATE SET r.contexte = rel.contexte;


    LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/MaximeCapron/foodforneo4j/master/fichiers%20csv/Relations2.csv' AS rel FIELDTERMINATOR ';'
    MATCH (p1 {id: rel.index})
    MATCH (p2 {id: rel.index_relation})
    MERGE (p1)-[r:SUPERIEUR]->(p2) ;


    LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/MaximeCapron/foodforneo4j/master/fichiers%20csv/Relations3.csv' AS rel FIELDTERMINATOR ';'
    MATCH (c1 {id: rel.index})
    MATCH (c2 {id: rel.index_relation})
    MERGE (c1)-[r:FILIALE]->(c2) ;


    LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/MaximeCapron/foodforneo4j/master/fichiers%20csv/Relations4.csv' AS rel FIELDTERMINATOR ';'
    MATCH (p {id: rel.index})
    MATCH (c {id: rel.index_relation})
    MERGE (p)-[r:EMPLOYE_DANS]->(c)
    ON CREATE SET r.position = rel.position

    """
    
    graph.run(code_initialisation)

## Ajouter ou supprimer un noeud

2 solutions :
- si vous voulez faire quelque chose de lourd, l'option préférable est probablement de modifier les bases de données, en ajoutant, modifiant ou supprimant des lignes, puis en faisant tourner la cellule ci-dessus.
- si vous voulez simplement ajouter ou supprimer un noeud en vous basant sur ses caractéristiques, c'est faisable avec les fonctions ci-dessous.

Vous souhaitez ajouter un nouveau personnage. Il s'appelle Bob, on lui donne l'identifiant 5 (au hasard), et il est boulanger.

In [135]:
Bob = Node("Person",name="Bob",emploi="Boulanger",index=5)
Jean = Node("Person",name="Jean",emploi="Pâtissier",index=5)

In [146]:
graph.create(Bob)

Supprimons Bob.

In [150]:
graph.delete(Bob)

## Ajouter ou supprimer une relation

Supposons que l'on a deux noeuds, Bob et Joe, et que l'on veut créer une relation entre eux deux. Précisions une caractéristique pour chacun d'entre deux et la nature de la relation, et allons-y !

In [152]:
Bob = Node("Person",name="Bob",emploi="Boulanger",index=5)
Joe = Node("Person",name="Joe",emploi="Pâtissier",index=5)

In [153]:
graph.create(Relationship(Bob,"AMI",Joe))

In [154]:
graph.delete(Relationship(Bob,"AMI",Joe))

Passer par les outils de py2neo peut être pratique pour quelques tâches simples, mais la documentation est assez mal faite, et il y a peu de fonctionnalités disponibles... Il peut être préférable de construire ses propres fonctions, comme dans ce qui suit :

### Moyens alternatifs : passer par le code Cypher

#### Créer Bob

In [155]:
dic = {"id":5,"nom":"Bob","emploi":"Boulanger"}a

In [156]:
def add_person(dic):
    code = "MERGE (p:Person {"
    for i in dic.keys():
        if type(dic[i]) == str:
            code += i + ":'" + dic[i] + "',"
        elif type(dic[i]) == int or type(dic[i]) == float:
            code += i + ":" + str(dic[i]) + ","
        else:
            raise "type inconnu"
    code = code[:-1] + "})"
    # return code
    graph.run(code)

In [158]:
add_person(dic)

#### Supprimer Bob

In [159]:
char = {"nom":"Bob"}

In [160]:
def suppr_person(char):
    code = "MATCH (p:Person {"
    for i in char.keys():
        if type(char[i]) == str:
            code += i + ":'" + char[i] + "',"
        elif type(char[i]) == int or type(char[i]) == float:
            code += i + ":" + str(char[i]) + ","
        else:
            raise "type inconnu"
    code = code[:-1] + "}) DETACH DELETE p"
    graph.run(code)

In [161]:
suppr_person(char)

#### Créer une relation

In [162]:
n1 = {"nom":"Bob"}
n2 = {"nom":"Joe"}
rel = "AMI"
dir_rel = 2

NB. dir_rel représente la pointe de la flèche de la relation. dir_rel peut donc être égal à 0, 1 ou 2 (en fonction de si la relation n'a pas de sens particulier, ou bien pointe vers n1, ou bien pointe vers n2).

In [163]:
def add_relation(n1,n2,rel,dir_rel):
    code = "MATCH (p1:Person {"
    for i in n1.keys():
        if type(n1[i]) == str:
            code += i + ":'" + n1[i] + "',"
        elif type(n1[i]) == int or type(n1[i]) == float:
            code += i + ":" + str(n1[i]) + ","
        else:
            raise "type inconnu"
    code = code[:-1] + "}) MATCH (p2:Person {"    
    for i in n2.keys():
        if type(n2[i]) == str:
            code += i + ":'" + n2[i] + "',"
        elif type(n2[i]) == int or type(n2[i]) == float:
            code += i + ":" + str(n2[i]) + ","
        else:
            raise "type inconnu"
    code = code[:-1] + "}) "
    if dir_rel == 0 :
        code += "MERGE (p1)-[r : "+rel+"]-(p2)"
    elif dir_rel == 1 :
        code += "MERGE (p1)<-[r : "+rel+"]-(p2)"
    elif dir_rel == 2 :
        code += "MERGE (p1)-[r : "+rel+"]->(p2)"
    else :
        raise "Wrong dir_rel"
        
    graph.run(code)

In [164]:
add_relation(n1,n2,rel,dir_rel)

#### Supprimons cette relation

In [165]:
def suppr_relation(n1,n2,rel,dir_rel):
    code = "MATCH (p1:Person {"
    for i in n1.keys():
        if type(n1[i]) == str:
            code += i + ":'" + n1[i] + "',"
        elif type(n1[i]) == int or type(n1[i]) == float:
            code += i + ":" + str(n1[i]) + ","
        else:
            raise "type inconnu"
    code = code[:-1] + "}) MATCH (p2:Person {"    
    for i in n2.keys():
        if type(n2[i]) == str:
            code += i + ":'" + n2[i] + "',"
        elif type(n2[i]) == int or type(n2[i]) == float:
            code += i + ":" + str(n2[i]) + ","
        else:
            raise "type inconnu"
    code = code[:-1] + "}) "
    if dir_rel == 0 :
        code += "MATCH (p1)-[r : "+rel+"]-(p2) "
    elif dir_rel == 1 :
        code += "MATCH (p1)<-[r : "+rel+"]-(p2) "
    elif dir_rel == 2 :
        code += "MATCH (p1)-[r : "+rel+"]->(p2) "
    else :
        raise "Wrong dir_rel"
    code += "DELETE r"
    
    graph.run(code)


In [166]:
suppr_relation(n1,n2,rel,dir_rel)

## Afficher des informations utiles

### Qui reste-il à convaincre d'un projet ?

In [16]:
def à_convaincre(nom_du_projet):
    query = """
    MATCH (p:Person)-[rel:EMPLOYE_DANS]-(c:Company)
    WHERE p.position_"""+nom_du_projet+""" = "A convaincre"
    RETURN p.nom AS nom, c.titre AS entreprise, rel.position AS position
    """

    return graph.run(query).to_data_frame()

In [17]:
à_convaincre("D")

Unnamed: 0,nom,entreprise,position
0,Anne Gautier,DCorp,Technique
1,Joseph Thomas,DCorp,Technique


Certains détracteurs subsistent : à contacter !

### Trouver la distance entre deux personnes

#### En fonction de leurs positions et de leurs entreprises

In [36]:
def distance_pos(A,B):
    query = """
    MATCH (a:Person)-[:EMPLOYE_DANS {position:'"""+A[0]+"""'}]-(:Company {titre:'"""+A[1]+"""'})
    MATCH (b:Person)-[:EMPLOYE_DANS {position:'"""+B[0]+"""'}]-(:Company {titre:'"""+B[1]+"""'})
    MATCH p = shortestPath((a)-[*]-(b))
    RETURN a.nom,b.nom,length(p) AS distance"""
    return graph.run(query).to_data_frame()

In [37]:
A = ["CEO","ECorp"]
B = ["CFO","OBS"]

In [38]:
distance_pos(A,B)

Unnamed: 0,a.nom,b.nom,distance
0,Joséphine Michel,Lucien Martin,2


#### En fonction de leurs noms

In [39]:
def distance_noms(nomA,nomB):
    query = "MATCH p = shortestPath((a:Person {nom:'"+nomA+"'})-[*]-(b:Person {nom:'"+nomB+"'})) RETURN a.nom, b.nom, length(p) AS distance"
    return graph.run(query).to_data_frame()

In [40]:
nomA = "Joseph Leclerc"
nomB = "Auguste Petit"

In [41]:
distance_noms(nomA,nomB)

Unnamed: 0,a.nom,b.nom,distance
0,Joseph Leclerc,Auguste Petit,2


### Quels partenaires n'ont aucune relation directe avec OBS (ne sont pas "surveillés") ?

In [42]:
def a_connaitre():
    query = """
    MATCH (a:Company)-[:EMPLOYE_DANS]-(p:Person)
    WHERE NOT a.titre = "OBS"
    WITH p
    MATCH (c:Company {titre:"OBS"})
    MATCH short=shortestpath((c)-[*]-(p))
    WITH p,length(short) AS len
    WHERE len > 2
    RETURN p.nom AS nom
    """
    return graph.run(query).to_data_frame()

In [43]:
a_connaitre()

Unnamed: 0,nom
0,Marthe Guerin
1,Maurice Duval
2,Pierre Leclerc
3,Henriette Bonnet
4,Renée Morin
5,Jules Joly
6,Anne Gautier
7,Alice Fournier
8,Maria Payet
9,Georgette Gerard


Nous avons une liste de 12 personnes potentielles à contacter !

## Afficher le graphique

In [62]:
from IPython.display import IFrame
import json
import uuid

def vis_network(nodes, edges, physics=False):
    html = """
    <html>
    <head>
      <script type="text/javascript" src="../lib/vis/dist/vis.js"></script>
      <link href="../lib/vis/dist/vis.css" rel="stylesheet" type="text/css">
    </head>
    <body>
    <div id="{id}"></div>
    <script type="text/javascript">
      var nodes = {nodes};
      var edges = {edges};
      var container = document.getElementById("{id}");
      var data = {{
        nodes: nodes,
        edges: edges
      }};
      var options = {{
          nodes: {{
              shape: 'dot',
              size: 25,
              font: {{
                  size: 14
              }}
          }},
          edges: {{
              font: {{
                  size: 14,
                  align: 'middle'
              }},
              color: 'gray',
              arrows: {{
                  to: {{enabled: true, scaleFactor: 0.5}}
              }},
              smooth: {{enabled: false}}
          }},
          physics: {{
              enabled: {physics}
          }}
      }};
      var network = new vis.Network(container, data, options);
    </script>
    </body>
    </html>
    """

    unique_id = str(uuid.uuid4())
    html = html.format(id=unique_id, nodes=json.dumps(nodes), edges=json.dumps(edges), physics=json.dumps(physics))

    filename = "figure/graph-{}.html".format(unique_id)

    file = open(filename, "w")
    file.write(html)
    file.close()

    return IFrame(filename, width="100%", height="400")

def draw(graph, options, physics=False, limit=100):
    # The options argument should be a dictionary of node labels and property keys; it determines which property
    # is displayed for the node label. For example, in the movie graph, options = {"Movie": "title", "Person": "name"}.
    # Omitting a node label from the options dict will leave the node unlabeled in the visualization.
    # Setting physics = True makes the nodes bounce around when you touch them!
    query = """
    MATCH (n)
    WITH n, rand() AS random
    ORDER BY random
    LIMIT {limit}
    OPTIONAL MATCH (n)-[r]->(m)
    RETURN n AS source_node,
           id(n) AS source_id,
           r,
           m AS target_node,
           id(m) AS target_id
    """

    data = graph.run(query, limit=limit)

    nodes = []
    edges = []

    def get_vis_info(node, id):
        node_label = list(node.labels())[0]
        prop_key = options.get(node_label)
        vis_label = node.properties.get(prop_key, "")

        return {"id": id, "label": vis_label, "group": node_label, "title": repr(node.properties)}

    for row in data:
        source_node = row[0]
        source_id = row[1]
        rel = row[2]
        target_node = row[3]
        target_id = row[4]

        source_info = get_vis_info(source_node, source_id)

        if source_info not in nodes:
            nodes.append(source_info)

        if rel is not None:
            target_info = get_vis_info(target_node, target_id)

            if target_info not in nodes:
                nodes.append(target_info)

            edges.append({"from": source_info["id"], "to": target_info["id"], "label": rel.type()})

    return vis_network(nodes, edges, physics=physics)

In [63]:
from py2neo import Node

In [85]:
nicole = Node("Person",name="Nicole",age=24)
drew = Node("Person",name="Drew",age=20)
mtdew = Node("Drink",name="Mountain Dew",calories=9000)
cokezero = Node("Drink",name="Mountain Dew",calories=0)
coke = Node("Manufacturer",name="Coca Cola")
pepsi = Node("Manufacturer",name="Pepsi")

In [93]:
graph.create(nicole | drew | mtdew | cokezero | coke | pepsi)

In [96]:
graph

<Graph database=<Database uri='bolt://3.90.65.37:34875' secure=False user_agent='py2neo/4.1.3 neo4j-python/1.6.3 Python/3.6.3-final-0 (darwin)'> name='data'>

!pip install ipython-cypher

In [103]:
import networkx as nx

In [104]:
%matplotlib inline

In [105]:
results = graph.run("MATCH (n) RETURN n LIMIT 1")

In [111]:
import jgraph

In [113]:
jgraph.draw([(1,2),(2,3),(3,4),(5,1),(4,5),(5,2)])

In [114]:
data = graph.run("MATCH (n)-->(m) RETURN ID(n),ID(m)")

In [115]:
data = [tuple(x) for x in data]

In [116]:
jgraph.draw(data)