The current assignment is to write the python scripts for Apache Spark. The tasks are divided into three parts as below:
- WordCount-To count the occurrences of words in a book on a per-book basis and compare the results with those of Assignment1.
- pyspark.ml. feature- To count the tf-idf values for the unigram and bigrams using the pyspark.ml.feature p ackage of Mlib library of Spark. Find the execution time using 5,10 and 15 reducers.
- Word2Vec-Find the feature vectors of words using the word2vec class of Mlib library