- Michael Pereira (500896409)
- Hitarth Chudgar (500888845)
This project requires node.js.
# install dependencies
$ yarn install
# serve with hot reload at localhost:8080
$ yarn dev
# build for production and launch server
$ yarn build
$ yarn start
Alternatively, can use npm instead of yarn.
Both files can be found under generated
directory when the program generates them when a keyword is searched.
All files required for the assignment is found under the static
directory.
- Nuxt.js - for building user interfaces and connecting Javascript/Typescript code
- Bulma - for UI components and styling
- express - for running a server locally to access local files
- stopword - for removing stopwords from strings
- natural - for stemming words in strings
- Buefy - for using UI components for Vue.js based on Bulma
- axios - for the promise based HTTP client to handle requests
The posting lists are in the ascending order of document ID.
Within the posting file, it is via the names:
term [documentId, TF [positions]]
To find our IDF threshold value we made use of: Finding a set A of documents that are contenders, where K< |A| << N
We made use of the index-elimination method: as it only considers documents containing terms whose idfexceeds a threshold, and containing many (or all) of the query terms.
Our threshold values were:
idfValues[i] > 1.60 && idfValues[i] < 3.51 (1.60, 3.51) - based on lower and upper limit of document matching.
Hence, K values lies between 1.60 and 3.51.
We made use of the conventional weighting scheme for tf-idf like so:
Step 1: Computing the Term Frequency(tf)
fij measures term frequency in document.
Step 2: Compute the Inverse Document Frequency – idf
idfi= log(N/dfi) where N is the number of documents in the collection, dfimeasures how many documents term ki occurs in
Step 3: Calculating the weighting scheme
Combining IDF factors with TF
wij= tfij* idfi