Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added documentation #21

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Aashu-Adhikari
Copy link

added inline comments and docstrings to explain what the code is actually doing.

Copy link

@bhattbhuwan13 bhattbhuwan13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the suggested changes

self.doc_freqs = [] # list of dictionaries of term_frequency of each document
self.idf = {} # idf score of each word in whole corpus
self.doc_len = [] # list of length of each document in corpus
self.tokenizer = tokenizer # user input tokenizer, defaults to none

if tokenizer:
corpus = self._tokenize_corpus(corpus)

nd = self._initialize(corpus)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is nd here? You should explain it

Comment on lines +38 to +40
Example:
corpus = [['ram', 'is', 'a', 'good', 'boy'], ['ram', 'does', 'cycling', 'and', 'racing'], ['ram', 'is', 'healthy'], ['rita', 'likes', 'shyam'], ['good', 'luck']]
nd = {'ram': 3, 'is': 2, 'a': 1, 'good': 2, 'boy': 1, 'does': 1, 'cycling': 1, 'and': 1, 'racing': 1, 'healthy': 1, 'rita': 1, 'likes': 1, 'shyam': 1, 'luck': 1}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorten the examples so that I don't need to scroll. The functionality can also be explained only using 2 items in the list.

for document in corpus:
self.doc_len.append(len(document))
num_doc += len(document)
num_words += len(document) # total number of words in whole corpus

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function of variable num_words has already been explained.

frequencies = {}
term_frequencies = (
{}
) # term frequency of each word in a document........ changed frequencies to term_frequencies

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to comment that you changed the name of variable. git keeps track of it.

Comment on lines +53 to +54
if word not in term_frequencies:
term_frequencies[word] = 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block of code can be removed by using defaultdict instead of the normal dictionary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants