ModernInformationRetrieval---Spring2020

The Modern Information Retrieval course @ Sharif University of Technology ---Spring2020

P1: A complete information retrieval system on Persian wikipedia webpages. First, the data was normalized, tokenized and stemmed. Second, the data was indexed. Both positional indexing and bigram indexing (for spell check) was implemented. Third, a vector space approach was employed using tf-idf (both ltn-lnn and ltc-lnc were used). Also, a phrasal search was implemented. Different evaluation metrics, F1, Precision, Recall, MAP, and NDCG were also implemented for testing.

P2: On a part of AG News dataset, many algorithms were implemented and tested. kNN and Naive Bayes were implemented from the scratch, using the vector space created by tf-idf (ntn). kNN was implemented both using Cosine Similarity and Euclidean distance. Also, a [1, 3, 5] were tested for k. Naive Bayes was used with smoothing and the parameter search was done on a validation set. In the next part, nltk was used for for stemming and stopwords removal. Its effect was examined using evaluation metrics, Macro Averaged F1, Precision, Recall, Confusion Matrix. Using sklearn, Random Forests and SVM were also investigated. Hypermeter search was done on a validation set. Finally, t-SNE was used to visualize other vectorization techniques such as Word2Vec. k-Means was finally used to cluster the data and its results with different parameters was inspected.

P3: In the first part, a scrapy crawler crawls SemanticScholar to find papers. They are saved in a JSON format. Secondly, they are indexed in a Elasticsearch server. In the third part, PageRank is calculated for the pages. Then, different searching scenarios are implemented. In the last part, HITS algorithm is implemented to rank the authors of the papers.

Name	Name	Last commit message	Last commit date
Latest commit Soroosh-Bsl Update README.md Nov 13, 2020 b8dffd7 · Nov 13, 2020 History 5 Commits
P1	P1	P1 added	Nov 13, 2020
P2	P2	P2 added	Nov 13, 2020
P3	P3	P3 added	Nov 13, 2020
README.md	README.md	Update README.md	Nov 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModernInformationRetrieval---Spring2020

About

Releases

Packages

Languages

Soroosh-Bsl/ModernInformationRetrieval---Spring2020

Folders and files

Latest commit

History

Repository files navigation

ModernInformationRetrieval---Spring2020

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages