Skip to content

ivan-ferrante/GraphDBLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

All the function for our project are contained in the file Modules.py.

Load Data and create a graph

For the creation of a graph we had load the json file and we had create two dictionary through some function and dictionary.

  • pubblicationDictionary(json): This function return a dictionary with the name of the Author and his ID. We have used this information to construct the nodes of the Graph, where the ID of the Author is the ID of the Node and the name of the Author is an attribute of the node.
  • Jaccard(a,b): This function return the jaccard similarity of two list

Then we had create another two dictionary:

  • dew= {}: Is a dictionary that use the pubblicationDictionary(json) result to obtain a list of pubblication for each author. The element from this dictionary are the input of Jaccard similary function.
  • dict_publ={}: Is a dictionary that we have used to create edge. In particular, in this dictionary the keys are the pubblication for each pubblication the values are the Author (tuples: (AuthorName,AuthorID)) that had collaborate on that respective publication.

Finally, we have used all this dictionaries in function of networkx library to create nodes and weighted edge.

Statistics and subgraph

All the plot and function for this part are explained in REPORT.pdf.
The second task that we had to complete was, given an author in input, create the subgraph induced by the nodes that have hop distance at most equal to value d.
To do it, we used two function:

  • hop_distance(G,start,end): This function returns the length of the shortest path via networkx's shortest path function from start node to end node passed by input.
  • author_dist(author, d): This function given in input an author name and hop distance d, and return the subgraph with hop distance equal to d from author node. Thus we have plotted the subgraph.

Generalized version of Erdos Number

In this part we had write our version of Dijkstra's algorithm

  • Dijkstra(G, start,end): This function returns the minimum weighted distance from a start node to the end node passed by input, using dijkstra algorithm. We implement it using heapq to make it faster.

  • distance_to_aris(authorid): This function returns the minimum weighted distance from Aris to the author id passed by input.

  • Group_number(list_nodes): This function returns a dictionary with all the id of the authors as keys and as values the minimum weighted distance from the set of nodes given as input.

About

Graph representation of DBLP dataset in Python.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages