tf-idf-

The current assignment is to write the python scripts for Apache Spark. The tasks are divided into three parts as below:

WordCount-To count the occurrences of words in a book on a per-book basis and compare the results with those of Assignment1.
pyspark.ml. feature- To count the tf-idf values for the unigram and bigrams using the pyspark.ml.feature p ackage of Mlib library of Spark. Find the execution time using 5,10 and 15 reducers.
Word2Vec-Find the feature vectors of words using the word2vec class of Mlib library

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Assignment2.pdf		Assignment2.pdf
README.md		README.md
asst2.py		asst2.py
asst2_part2_1.py		asst2_part2_1.py
asst2_part2_2.py		asst2_part2_2.py
asst2_part3.py		asst2_part3.py

Provide feedback