Skip to content

Wikipedia Network Analysis Tool - this project creates a graph given a Wikipedia link and the article’s connections to other articles. The graph node can be clicked which then displays its connections.

SungHyunShin/Data-Structures-Final-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

---- Data Structures Final Project ----

-- Wikipedia Network Analysis Tool --

========================================

Done with Python and Jupyter Notebook, this project creates a graph given a Wikipedia link and the article’s connections to other articles. The graph node can be clicked which then displays its connections.

===============================================================

Team Member:

Nick Marcopoli (nmarcopo) Andy Shin (ashin1) Austin Sura (asura) Tina Wu (ywu6)

===============================================================

Contribution:

Nick and Andy mainly worked on the visualization part of the project: creating the graph and adding user-interaction features Tina and Austin mainly worked on the python code that gets the weblinks from each wikipedia page

Everyone contributed equally to the completion of this project.

================================================================

Video Demonstration:

https://www.youtube.com/watch?v=-WMY3JJ0s_I

================================================================

Execution of the programs:

To run the backend: ./wikiCrawler.py or ./wikiCrawler.py -h To run the visualization: open the program in Jupyter Notebook (jupyter notebook --ip XXXXX --port XXXX --no-browser)

================================================================

Test Suite:

We made a test script (timememory.py) to benchmark the runtime and memory usage of varying #links and #layers. A sample of output is shown below:

Wikipedia Article URL: https://en.wikipedia.org/wiki/Fortnite

#LINKS #LAYERS Time Usage Memory Usage
1 1 1.008846 36.476562
2 1 1.318799 38.101562
3 1 1.509769 38.398438
4 1 1.751732 40.304688
5 1 2.036690 42.351562
1 2 1.308800 38.660156
2 2 2.553611 48.914062
3 2 3.889408 52.621094
4 2 5.636142 54.152344
5 2 7.903798 54.449219
1 3 1.542764 40.953125
2 3 4.384332 54.925781
3 3 11.133306 60.109375
4 3 24.119331 61.691406
5 3 45.060150 76.214844
1 4 1.743734 42.371094
2 4 9.182603 63.675781
3 4 39.776951 66.445312
4 4 112.903831 72.882812
5 4 259.600525 107.906250
------------- ------------- -------------- --------------

To run the test script: ./timememory.py

As mentioned in our video, it is not possible to test the output every time because wikipedia pages are constantly changing. Therefore, we manually tested the outputs by clicking into the web pages and seeing what links are the most relevant.

Since we used python and did not manually allocate memories, we did not have a memory issue (we tested by running valgrind, but making a sperate script for that did not seem necessary).

About

Wikipedia Network Analysis Tool - this project creates a graph given a Wikipedia link and the article’s connections to other articles. The graph node can be clicked which then displays its connections.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published