## Setting up database connection

Importing required python libraries

In [1]:
import os
import sys
sys.path.append(os.path.abspath('../../'))
from query_indicators import generate_save_path

import py2neo
from nesta.core.luigihacks.misctools import get_config
from nesta.core.orms.orm_utils import graph_session
import igraph as ig
import pandas as pd
import boto3

In [2]:
S3 = boto3.resource('s3')
SAVE_PATH = generate_save_path()  # EURITO collaborators: this is generated assuming you have stuck to the convention 'theme_x/something/something_else.ipynb'
BUCKET = 'eurito-indicators'  # EURITO collaborators: please don't change this
SAVE_RESULTS = True  # Set this to "False" when you want to view figures inline. When "True", results will be saved to S3.

Establish connection to the Neo4j database

In [3]:
conf = get_config('neo4j.config', 'neo4j')
gkwargs = dict(host=conf['host'], secure=True,
                auth=(conf['user'], conf['password']))

In [29]:
def _s3_savetable(df, object_path):
    """Upload the table to s3"""
    if not SAVE_RESULTS:
        return
    if len(df.columns) == 1:
        df.columns = ['value']
    #df = df / df.max().max()
    table_data = df.to_csv(sep='|').encode()
    obj = S3.Object(BUCKET, os.path.join(f'tables/{SAVE_PATH}', object_path))
    obj.put(Body=table_data)

## Retrieving nodes from Neo4j

Create a graph object which will be used for our queries

In [5]:
with graph_session(**gkwargs) as tx:
    graph = tx.graph

Set the type of the node that should be retrieved. Available types are: "Project", "Organisation", "Publication", "Topic", "Report", "Datasets", "Software", "Proposal_Call". Simply change the word "Organisation" below to the required node type and re-run the cell.

**Nodes of type "Organisation" have the following fields:** <br>
*name* - organisation name <br>
*betw* - organisation centrality <br>
*country_code* - 2 letter country code <br>
*country_name* - country name <br>

In [9]:
node_type = "Organisation"

Create a list from the graph nodes.

In [None]:
node_list = list(graph.nodes.match(node_type))

Convert to dataframe, sort the table according to centrality column and print the top 15

In [14]:
node_table = pd.DataFrame(node_list)
node_table = node_table.drop("centrality", axis=1)
top_betw = node_table.sort_values(by=['betw'], ascending=False)
top_nodes = top_betw.head(15)
top_nodes

Unnamed: 0,betw,country_code,country_name,id,name
51249,114674300.0,FR,France,999997930,CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE CNRS
51147,48472320.0,DE,Germany,999984059,FRAUNHOFER GESELLSCHAFT ZUR FOERDERUNG DER ANG...
51208,27941040.0,ES,Spain,999991722,AGENCIA ESTATAL CONSEJO SUPERIOR DEINVESTIGACI...
51123,25759600.0,IT,Italy,999979500,CONSIGLIO NAZIONALE DELLE RICERCHE
51214,25258850.0,FR,France,999992401,COMMISSARIAT A L ENERGIE ATOMIQUE ET AUX ENERG...
51103,23797380.0,GB,United Kingdom,999977172,THE CHANCELLOR MASTERS AND SCHOLARS OF THE UNI...
51150,23357050.0,GB,United Kingdom,999984350,"THE CHANCELLOR, MASTERS AND SCHOLARS OF THE UN..."
51224,16548970.0,GB,United Kingdom,999993468,IMPERIAL COLLEGE OF SCIENCE TECHNOLOGY AND MED...
51121,15721580.0,CH,Switzerland,999979015,EIDGENOESSISCHE TECHNISCHE HOCHSCHULE ZUERICH
51205,15070290.0,BE,Belgium,999991334,KATHOLIEKE UNIVERSITEIT LEUVEN


Save the generated table into S3 bucket

In [15]:
 _s3_savetable(top_nodes, object_path=f'{node_type}/organisation_centrality_top15.csv')

**Nodes of type "Project" have the following fields:** <br>
*acronym* - project acronym <br>
*betw* - project centrality <br>
*ec_contribution* - funding by EC, mio EUR <br>
*start_date_code, end_date_code* - project start and end date <br>
*framework, funded_under, funding_scheme* - funding framework and program <br>
*grant_num* - 6 digit grant number <br>
*objective* - project objective <br>
*project_description* - project description <br>
*rcn* - project identificator <br>
*status* - project status (e.g. closed, ongoing) <br>
*total_cost* - total budget, mio EUR <br>
*website* - project website <br>

In [22]:
node_type = "Project"

In [23]:
node_list = list(graph.nodes.match(node_type))
node_table = pd.DataFrame(node_list)
top_betw = node_table.sort_values(by=['betw'], ascending=False)
#Get first 15 nodes of the specified type
top_nodes = top_betw.head(15)
top_nodes

Unnamed: 0,acronym,betw,ec_contribution,end_date_code,framework,funded_under,funding_scheme,objective,project_description,rcn,start_date_code,status,title,total_cost,website
32035,GrapheneCore1,3074818.0,89000000,2018-03-31T00:00:00,H2020,[EXCELLENT SCIENCE - Future and Emerging Techn...,SGA-RIA - SGA-RIA,This project is the second in the series of EC...,,200853,2016-04-01T00:00:00,,Graphene-based disruptive technologies,89000000,https://graphene-flagship.eu/
21203,GRAPHENE,2729211.0,54000000,2016-03-31T00:00:00,FP7,"[Specific Programme ""Cooperation"": Information...",CPCSA - Combined Collaborative Project and Coo...,This Flagship aims to take graphene and relate...,\nFET Flagships\nThe Graphene Flagship project...,109691,2013-10-01T00:00:00,CLOSED,Graphene-Based Revolutions in ICT And Beyond,74979522,http://www.graphene-flagship.eu/
8120,EGI-InSPIRE,2452014.0,25000000,2014-12-31T00:00:00,FP7,"[Specific Programme ""Capacities"": Research inf...",CPCSA - Combined Collaborative Project and Coo...,Scientific research is no longer conducted wit...,\nDistributed computing infrastructure (DCI)\n\n,95923,2010-05-01T00:00:00,CLOSED,European Grid Initiative: Integrated Sustainab...,70337893,
43829,GrapheneCore2,2069696.0,88000000,2020-03-31T00:00:00,H2020,[FET Flagships],SGA-RIA - SGA-RIA,This proposal describes the third stage of the...,,216122,2018-04-01T00:00:00,ONGOING,Graphene Flagship Core Project 2,88000000,
21317,HBP,1782999.0,54000000,2017-02-28T00:00:00,FP7,"[Specific Programme ""Cooperation"": Information...",CPCSA - Combined Collaborative Project and Coo...,Understanding the human brain is one of the gr...,\nFET Flagships\n\n,109805,2013-10-01T00:00:00,CLOSED,The Human Brain Project,72522840,
34576,HBP SGA1,1755587.0,89000000,2018-03-31T00:00:00,H2020,[EXCELLENT SCIENCE - Future and Emerging Techn...,SGA-RIA - SGA-RIA,Understanding the human brain is one of the gr...,,205371,2016-04-01T00:00:00,CLOSED,Human Brain Project Specific Grant Agreement 1,89000000,https://www.humanbrainproject.eu/en/
645,EGEE-III,1695839.0,32000000,2010-04-30T00:00:00,FP7,"[Specific Programme ""Capacities"": Research inf...",CPCSA - Combined Collaborative Project and Coo...,A globally distributed computing Grid now play...,\ne-Science Grid infrastructures\n\n,87264,2008-05-01T00:00:00,CLOSED,Enabling Grids for E-sciencE III,49022472,
46991,HBP SGA2,1581863.0,88000000,2020-03-31T00:00:00,H2020,[FET Flagships],SGA-RIA - SGA-RIA,The Human Brain Project (HBP) is a major Europ...,,220793,2018-04-01T00:00:00,ONGOING,Human Brain Project Specific Grant Agreement 2,88000000,
30058,ELIXIR-EXCELERATE,1228839.0,19051482,2019-08-31T00:00:00,H2020,[Developing new world-class research infrastru...,RIA - Research and Innovation action,The life sciences are undergoing a transformat...,,198519,2015-09-01T00:00:00,CLOSED,ELIXIR-EXCELERATE: Fast-track ELIXIR implement...,19051482,https://www.elixir-europe.org/excelerate
3465,ECHORD,1174091.0,18969760,2013-12-31T00:00:00,FP7,"[Specific Programme ""Cooperation"": Information...",CP - Collaborative project (generic),The European robotics industry plays a key rol...,"\nCognitive Systems, Interaction, Robotics\nSt...",90429,2009-01-01T00:00:00,CLOSED,European Clearing House for Open Robotics Deve...,25841074,http://www.echord.info/wikis/home-wiki/home


In [30]:
 _s3_savetable(top_nodes, object_path=f'{node_type}/project_centrality_top15.csv')