-
Notifications
You must be signed in to change notification settings - Fork 0
ICP_7
b. Setthe tfidf vectorizer parameter to use bigram and see how the accuracy changes TfidfVectorizer(ngram_range=(1,2))
First import all the required packeges, then get the twentytrain to do vectorizer by doing this:
then set the vectorrizer with given range and also declare anther with given argument:
Undergone to fit model to find the better accuracy for training:
Then find the MultinomialNB accuracy,MultinomialNB accuracy on bigram and MultinomialNB accuracy when adding the stopwords.
After doing these we find SVM and seen how the accuraccy changes:
Impored all the required libraries then extract the web URL in a function:
Then create a file and append all the data in to it.
Apply the following on the “input.txt” file: •Tokenization •POS •Stemming •Lemmatization •Trigram •Named Entity Recognition.
Import Natual language toolkit then read the extracted file:
Implemmented word streaming and scentence streaming:
Implemented streaming(converts the word in to a base form):
Implementing POS And Lemmatization(Converts the word in to a meaningful base form):
Implenting Trigram(Sequence of words):
Implementing Named entity recognization(classifies the data in to catagories):
Here the output:
Learned from these ICP:Natural language toolkit