Web Information Retrieval Project
• Getting Started • Built With • Repository Content • Running Programs • Authors
WIR project for 2020 course at MSC in Engineering in Computer Science, Sapienza University, Rome. The goal of the project is to rank computer scientists based on their influence using Google's PageRank and HITS algorithm and to classify computer scientists categories.
- SPARQLWrapper
- bs4 Beautiful Soup
- urllib
- requests
- re
- NetworkX
- Matplotlib
- Sortedcontainers
- Collections
- Glob
- Tarfile
.
├── README.md
└── Report
│ ├── tex_files
│ ├── Computer_Scientist_Retrieval_Old.pptx
│ ├── Computer_Scientist_Retrieval.pptx
│ ├── WIR_Report.pdf
│ └── Computer_Scientist_Retrieval.pdf
│
└── files
│ ├── categorization.json
│ ├── dbpedia_data.json
│ ├── good_name_links.json
│ ├── hits_top_20_categories.txt
│ ├── my_graph.pdf
│ ├── my_pagerank_top20.txt
│ ├── name_links.json
│ └── pagerank_top_20_categories.txt
│
└── src
├── DBpedia
├── First_Phase
├── Second_Phase
├── Third_Phase
├── Fourth_Phase
└── requirements.txt
lucasmac@author:~$ cd src
DBpedia First_Phase Fourth_Phase Second_Phase Third_Phase requirements.txt
lucasmac@author:~$ pip install -r requirements.txt
Now you can run any python file you find in one of the 5 folders.
Github |
Telegram |
WebPage |
|||
---|---|---|---|---|---|
Luca Tomei | |||||
Andrea Aurizi | |||||
Daniele Iacomini |