Exercises 7: Text Processing
Use the full movies dataset covered in the lecture.
Some useful exercises are provided below:
- Write a function that finds all the movie summaries that contain the given word
- You may want to do this before stemming!
- Find the number of total words and unique words used in each of the movies. Which movie summary has the largest vocabulary?
- Pick a movie from the list (suggestions: "The Godfather" or "Lord of the Rings").
- Find the top words (excluding stopwords!) in your chosen movie
- Find the most similar movie (using cosine similarity) to your chosen movie (among the full set of 90 movies)
I encourage you to copy/paste/write your code in steps, and make sure you understand what is being accomplished at each step.