All the function for our project are contained in the file Modules.py.
For the creation of a graph we had load the json file and we had create two dictionary through some function and dictionary.
- pubblicationDictionary(json): This function return a dictionary with the name of the Author and his ID. We have used this information
to construct the nodes of the Graph, where the ID of the Author is the ID of the Node and the name of the Author is an attribute of the node.
- Jaccard(a,b): This function return the jaccard similarity of two list
Then we had create another two dictionary:
- dew= {}: Is a dictionary that use the pubblicationDictionary(json) result to obtain a list of pubblication for each author. The element from this dictionary
are the input of Jaccard similary function.
- dict_publ={}: Is a dictionary that we have used to create edge. In particular, in this dictionary the keys are the pubblication for each pubblication the values are the Author (tuples: (AuthorName,AuthorID)) that had collaborate on that respective publication.
Finally, we have used all this dictionaries in function of networkx library to create nodes and weighted edge.
All the plot and function for this part are explained in REPORT.pdf.
The second task that we had to complete was, given an author in input, create the subgraph induced by the nodes that have hop distance
at most equal to value d.
To do it, we used two function:
- hop_distance(G,start,end): This function returns the length of the shortest path via networkx's shortest path function from start node to end node passed by input.
- author_dist(author, d): This function given in input an author name and hop distance d, and return the subgraph with hop distance equal to d from author node. Thus we have plotted the subgraph.
In this part we had write our version of Dijkstra's algorithm
-
Dijkstra(G, start,end): This function returns the minimum weighted distance from a start node to the end node passed by input, using dijkstra algorithm. We implement it using heapq to make it faster.
-
distance_to_aris(authorid): This function returns the minimum weighted distance from Aris to the author id passed by input.
-
Group_number(list_nodes): This function returns a dictionary with all the id of the authors as keys and as values the minimum weighted distance from the set of nodes given as input.