📌 Group Members

Michael Pereira (500896409)
Hitarth Chudgar (500888845)

🔍 Vector Space Model

This project requires node.js.

Build Setup

# install dependencies
$ yarn install 

# serve with hot reload at localhost:8080
$ yarn dev

# build for production and launch server
$ yarn build
$ yarn start

Alternatively, can use npm instead of yarn.

Dictionary & Postings

Both files can be found under generated directory when the program generates them when a keyword is searched.

CACM Resources

All files required for the assignment is found under the static directory.

📚 Frameworks

Nuxt.js - for building user interfaces and connecting Javascript/Typescript code
Bulma - for UI components and styling

🔐 Back-End

Dependencies

express - for running a server locally to access local files
stopword - for removing stopwords from strings
natural - for stemming words in strings

🎨 Front-End

Dependencies

Buefy - for using UI components for Vue.js based on Bulma
axios - for the promise based HTTP client to handle requests

📝 Program Details

Posting list order

The posting lists are in the ascending order of document ID.

Within the posting file, it is via the names:

term [documentId, TF [positions]]

top-K method and value

To find our IDF threshold value we made use of: Finding a set A of documents that are contenders, where K< |A| << N

We made use of the index-elimination method: as it only considers documents containing terms whose idfexceeds a threshold, and containing many (or all) of the query terms.

Our threshold values were:

idfValues[i] > 1.60 && idfValues[i] < 3.51 (1.60, 3.51) - based on lower and upper limit of document matching.

Hence, K values lies between 1.60 and 3.51.

tf-idf weighting scheme

We made use of the conventional weighting scheme for tf-idf like so:

Step 1: Computing the Term Frequency(tf)

fij measures term frequency in document.

Step 2: Compute the Inverse Document Frequency – idf

idfi= log(N/dfi) where N is the number of documents in the collection, dfimeasures how many documents term ki occurs in

Step 3: Calculating the weighting scheme

Combining IDF factors with TF

wij= tfij* idfi

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
generated		generated
interfaces		interfaces
layouts		layouts
pages		pages
serverMiddleware		serverMiddleware
services		services
static		static
.editorconfig		.editorconfig
.eslintrc.js		.eslintrc.js
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierrc		.prettierrc
README.md		README.md
nuxt.config.js		nuxt.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📌 Group Members

🔍 Vector Space Model

Build Setup

Dictionary & Postings

CACM Resources

📚 Frameworks

🔐 Back-End

Dependencies

🎨 Front-End

Dependencies

📝 Program Details

Posting list order

top-K method and value

tf-idf weighting scheme

About

Releases

Packages

Contributors 2

Languages

Mwpereira/Vector-Space-Model

Folders and files

Latest commit

History

Repository files navigation

📌 Group Members

🔍 Vector Space Model

Build Setup

Dictionary & Postings

CACM Resources

📚 Frameworks

🔐 Back-End

Dependencies

🎨 Front-End

Dependencies

📝 Program Details

Posting list order

top-K method and value

tf-idf weighting scheme

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages