### Required libraries

In [None]:
import commons.graph
import commons.parse
import commons.scores
import warnings
warnings.filterwarnings("ignore")

## Preliminary steps
#### Create a GraphMaker
Specify the stopword list and the desired stemmer:
- ```POR```: Porter Stemmer
- ```SNO```: Snowball Stemmer (English)
- ```LAN```: Lancaster Stemmer

In [None]:
gm = commons.graph.GraphMaker('resources/longStopwords.txt', 'LAN')

#### Create (or update) the list of allowed articles
Set the minimum number of nodes for a graph to be considered and run the function.
For the current dataset the file is already made for you, **no need to run it again**.
This function is meant to be executed only the first time or if the dataset changes, e.g. some articles are added or removed.

In [None]:
min_nodes = 5
commons.parse.update_allowed_forbidden_files(gm, min_nodes)

### Sample the articles
There are 35403 allowed articles available for sampling.

In [None]:
sample_size = 5000
parsed_articles = commons.parse.parse_and_sample(sample_size, gm)

#### Set the run name
You will find the results in ```experiments/run_name/```.

In [None]:
run_name = 'myRun'

## Compute the centralities!
The names for the centralities are:
- ```PR```: PageRank centrality
- ```CC```: Closeness Centrality
- ```BC```: Betwenness centrality
- ```LCC```: Local Clustering Coefficient

For the approximation there is an integer flag:
- ```0```: exact centrality
- ```1```: approximated centrality

In [None]:
commons.scores.centrality_print_scores(parsed_articles, 'BC', 0, run_name)

In [None]:
commons.scores.centrality_print_scores(parsed_articles, 'BC', 1, run_name)

In [None]:
commons.scores.centrality_print_scores(parsed_articles, 'PR', 0, run_name)

In [None]:
commons.scores.centrality_print_scores(parsed_articles, 'PR', 1, run_name)

In [None]:
commons.scores.centrality_print_scores(parsed_articles, 'LCC', 0, run_name)

In [None]:
commons.scores.centrality_print_scores(parsed_articles, 'LCC', 1, run_name)

In [None]:
commons.scores.centrality_print_scores(parsed_articles, 'CC', 0, run_name)

In [None]:
commons.scores.centrality_print_scores(parsed_articles, 'CC', 1, run_name)

## Look at the results!
TODO - non so se il codice sia a posto così ?

In [None]:
#compute average scores, variances etc...
commons.scores.significant_differences(['PR','CC','BC','LCC'], 2, 'R@20', 'half_articles')
display(commons.scores.average_metric('P@20', 'half_articles'))