# Neo4j sandbox tutorial

Tutorial para Neo4j sandbox, seguindo [este](https://towardsdatascience.com/create-a-graph-database-in-neo4j-using-python-4172d40f89c4) material.

Instalando a library para trabalhar com o Neo4j:

In [5]:
!pip install neo4j

Collecting neo4j
  Downloading neo4j-4.3.1.tar.gz (74 kB)
[K     |████████████████████████████████| 74 kB 1.2 MB/s eta 0:00:011
Building wheels for collected packages: neo4j
  Building wheel for neo4j (setup.py) ... [?25ldone
[?25h  Created wheel for neo4j: filename=neo4j-4.3.1-py3-none-any.whl size=99332 sha256=deaefbe6d9fdf0bdc34b2251df9345fb3f6badc6be2dbf73d8b0fccd4f4a07f6
  Stored in directory: /home/anderson/.cache/pip/wheels/ca/bf/84/9c2593d3ceb4bae93a1beb960133c5edeedf3df55e67aca54a
Successfully built neo4j
Installing collected packages: neo4j
Successfully installed neo4j-4.3.1


## 1. Usando a base de dados de fimes (base pronta de exemplo, já populada)

- Username: neo4j
- Password: dangers-suppressions-directive
- IP Address: 34.207.92.112
- HTTP Port: 7474
- Bolt Port:7687
- Bolt URL: bolt://34.207.92.112:7687
- Websocket Bolt URL: bolt+s://6f18df2cea526b508d9308c20ca7633e.neo4jsandbox.com:7687

Libraries necessárias:

In [1]:
from neo4j import GraphDatabase
import pandas as pd

Criando uma classe para providenciar a conexão com a sandbox:

In [2]:
class Neo4jConnection:
    
    def __init__(self, uri, user, pwd):
        self.__uri = uri
        self.__user = user
        self.__pwd = pwd
        self.__driver = None
        try:
            self.__driver = GraphDatabase.driver(self.__uri, auth=(self.__user, self.__pwd))
        except Exception as e:
            print("Failed to create the driver:", e)
        
    def close(self):
        if self.__driver is not None:
            self.__driver.close()
        
    def query(self, query, parameters=None, db=None):
        assert self.__driver is not None, "Driver not initialized!"
        session = None
        response = None
        try: 
            session = self.__driver.session(database=db) if db is not None else self.__driver.session() 
            response = list(session.run(query, parameters))
        except Exception as e:
            print("Query failed:", e)
        finally: 
            if session is not None:
                session.close()
        return response

Inicializando uma conexão:

In [9]:
conn = Neo4jConnection(uri="bolt://34.207.92.112:7687", 
                       user="neo4j",              
                       pwd="dangers-suppressions-directive")

Criando uma query:

In [10]:
query = 'Match (m:Movie) where m.released > 2000 RETURN m limit 5'

Executando:

In [12]:
output = conn.query(query)

Inspecionando:

In [18]:
output

[<Record m=<Node id=9 labels=frozenset({'Movie'}) properties={'tagline': 'Free your mind', 'title': 'The Matrix Reloaded', 'released': 2003}>>,
 <Record m=<Node id=10 labels=frozenset({'Movie'}) properties={'tagline': 'Everything that has a beginning has an end', 'title': 'The Matrix Revolutions', 'released': 2003}>>,
 <Record m=<Node id=154 labels=frozenset({'Movie'}) properties={'title': "Something's Gotta Give", 'released': 2003}>>,
 <Record m=<Node id=161 labels=frozenset({'Movie'}) properties={'tagline': 'This Holiday Season… Believe', 'title': 'The Polar Express', 'released': 2004}>>,
 <Record m=<Node id=92 labels=frozenset({'Movie'}) properties={'tagline': "Based on the extraordinary true story of one man's fight for freedom", 'title': 'RescueDawn', 'released': 2006}>>]

In [31]:
dict(output[0])['m']

<Node id=9 labels=frozenset({'Movie'}) properties={'tagline': 'Free your mind', 'title': 'The Matrix Reloaded', 'released': 2003}>

In [30]:
pd.DataFrame([dict(x)['m'] for x in output])

Unnamed: 0,released,tagline,title
0,2003,Free your mind,The Matrix Reloaded
1,2003,Everything that has a beginning has an end,The Matrix Revolutions
2,2003,,Something's Gotta Give
3,2004,This Holiday Season… Believe,The Polar Express
4,2006,Based on the extraordinary true story of one m...,RescueDawn


Encerrando a conexão:

In [35]:
conn.close()

## Exemplo criando uma base do "zero"

- Username: neo4j
- Password: technology-eves-hairs
- IP Address: 34.206.71.108
- HTTP Port: 7474
- Bolt Port: 7687
- Bolt URL: bolt://34.206.71.108:7687
- Websocket Bolt URL: bolt+s://60a1907fe0c3c12a9d50ad4eeaf76108.neo4jsandbox.com:7687

Libraries necessárias: 

In [10]:
from itertools import permutations

Dados sintéticos:

In [11]:
# tabela de fragmentos
fragment_table = pd.DataFrame(
    {
        'id':range(1,5),
        'area':10
    }
)

In [12]:
conn_table = pd.DataFrame()
quality_list = list()

# combinacoes da tabela de conexao
for i in permutations(fragment_table.id.to_list(), 2):
    conn_table = conn_table.append(pd.DataFrame(i).T)

# ajustando nome das colunas e o indice
conn_table.columns = ['source', 'target']
conn_table.reset_index(inplace=True, drop=True)

# adicionando informacao da qualidade da conexao entre os framegmentos
for i in conn_table.index:
    if 3 in conn_table.loc[i].to_list():
        quality_list.append(1)
    elif (1 in conn_table.loc[i].to_list()) and (4 in conn_table.loc[i].to_list()):
        quality_list.append(0.1)
    else:
        quality_list.append(0.5)
        
conn_table['quality'] = quality_list

Inicializando uma conexão:

In [7]:
# del conn

In [13]:
conn = Neo4jConnection(uri="bolt://localhost:7687",
                       user="admin",              
                       pwd="123123")

Criando constraints:

In [14]:
conn.query('CREATE CONSTRAINT frag IF NOT EXISTS ON (f:Fragment) ASSERT f.id IS UNIQUE')

[]

Funções para inserção de dados: 

In [15]:
def add_frag(df):
    '''
    df é o dataframe da tabela de framentos. Precisa ter a coluna `id`, com o id único de cada fragmento.
    '''
    
    for i in df.id:
        query = f'CREATE (:fragment  {{ id: "{i}" }} )'
        conn.query(query)
    
    return 'Done.'

In [16]:
def add_connections(df):
    '''
    Aqui, df é a tabela de conexões entre os fragmentos (`conn_table`).
    '''
    
    for (_, x) in df.iterrows():

        i = x.source
        j = x.target
        q = x.quality

        query = f'''
        MATCH (a:fragment), (b:fragment)
        WHERE a.id = "{str(int(i))}" AND b.id = "{str(int(j))}"
        CREATE (a)-[rel:CONNECTED{{quality: {q}}}]->(b)
        RETURN a, b
        '''

        #print(query)

        conn.query(query)
        
    return 'Done.'

Adicionando os dados:

In [17]:
add_frag(fragment_table)

'Done.'

In [18]:
add_connections(conn_table)

'Done.'

Explorando:

In [72]:
query_nodes = 'MATCH (m) RETURN m'

In [25]:
node_list = conn.query(query_nodes)

In [99]:
pd.DataFrame({'id': [dict(node_list[x])['m'].id for x in range(len(node_list))],
              'class': [list(dict(node_list[x])['m'].labels)[0] for x in range(len(node_list))]})

Unnamed: 0,id,class
0,0,fragment
1,1,fragment
2,2,fragment
3,3,fragment


In [101]:
query_conn = '''
Match (n)-[r]->(m)
Return n,r,m
'''

In [102]:
conn_list = conn.query(query_conn)

In [None]:
pd.DataFrame({'id': [dict(conn_list[x])['m'].id for x in range(len(conn_list))],
              'class': [list(dict(node_list[x])['m'].labels)[0] for x in range(len(node_list))]})

In [144]:
pd.DataFrame({
'A': [dict(conn_list[i])['m']['id'] for i in range(len(conn_list))],
'B': [dict(conn_list[i])['n']['id'] for i in range(len(conn_list))],
'quality': [dict(conn_list[i])['r']['quality'] for i in range(len(conn_list))]
    }).sort_values(['A','B'])

Unnamed: 0,A,B,quality
2,1,2,0.5
1,1,3,1.0
0,1,4,0.1
5,2,1,0.5
4,2,3,1.0
3,2,4,0.5
8,3,1,1.0
7,3,2,1.0
6,3,4,1.0
11,4,1,0.1


Encerrando a conexão:

In [154]:
conn.close()

In [None]:
#### CONTINUAR DAQUI ####
# explorar queries de interesse: 
# - menor caminho (geral)
# - menor caminho, considerando o "quality" das conexões
# - menor caminho, considerando o "quality" das conexões e um número "x" de passos
####

: )