# Sprint Challenge
## *Data Science Sprint 13*

After a sprint of Natural Language Processing, you've learned some cool new stuff: how to process text, how turn text into vectors, and how to model topics from documents. Apply your newly acquired skills to one of the most famous NLP datasets out there: [Yelp](https://www.yelp.com/dataset). As part of the job selection process, some of my friends have been asked to create analysis of this dataset, so I want to empower you to have a head start.  

The real dataset is massive (almost 8 gigs uncompressed). The data is sampled for you to something more manageable for the Sprint Challenge. You can analyze the full dataset as a stretch goal or after the sprint challenge.

## Challenge Objectives
Successfully complete all these objectives to earn full credit.

**Successful completion is defined as passing all the unit tests in each objective.**  

There are 8 total possible points in this sprint challenge.


There are more details on each objective further down in the notebook.*
* <a href="#p1">Part 1</a>: Write a function to tokenize the yelp reviews
* <a href="#p2">Part 2</a>: Create a vector representation of those tokens
* <a href="#p3">Part 3</a>: Use your tokens in a classification model on Yelp rating
* <a href="#p4">Part 4</a>: Estimate & Interpret a topic model of the Yelp reviews

____

# Before you submit your notebook you must first

1) Restart your notebook's Kernel

2) Run all cells sequentially, from top to bottom, so that cell numbers are sequential numbers (i.e. 1,2,3,4,5...)
- Easiest way to do this is to click on the **Cell** tab at the top of your notebook and select **Run All** from the drop down menu.

3) **Comment out the cell that generates a pyLDAvis visual in objective 4 (see instructions in that section).**
____

### Part 0: Import Necessary Packages
For this section, you will need to import:
- `spacy`
- `Pandas`
- `Seaborn`
- `Matplotlib`
- `NearestNeighbors`
- `Pipeline`
- `TfidfVectorizer`
- `KneighborsClassifier`
- `GridSearchCV`
- `corpora`
- `LdaModel`
- `gensim`
- `re`

> **Note: This assignment is optimized to work with these specific packages. You can use import different packages, but note that this may affect how CodeGrade works, and may cause CodeGrade to fail.**

In [None]:
!pip install pyLDAvis

  and should_run_async(code)




In [None]:
!python -m spacy download en_core_web_lg

Collecting en-core-web-lg==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl (587.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m587.7/587.7 MB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: en-core-web-lg
Successfully installed en-core-web-lg-3.7.1
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [None]:
!pip install pipeline

  and should_run_async(code)




In [None]:
# YOUR CODE HERE
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import spacy
import gensim
import re
import pipeline
import pyLDAvis
import pyLDAvis.gensim_models
import gensim.corpora as corpora
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import NearestNeighbors, KNeighborsClassifier


In [None]:
import spacy
from spacy.tokenizer import Tokenizer

nlp = spacy.load("en_core_web_lg")
tokenizer = Tokenizer(nlp.vocab)

sample_text = "Natural Language Processing is really fun!"
[token.text for token in tokenizer(sample_text)]

['Natural', 'Language', 'Processing', 'is', 'really', 'fun!']

In [None]:
# Visible Testing
assert pd.__package__ == 'pandas'


  and should_run_async(code)




### Part 0: Import Data

In [None]:


# Load reviews from URL
data_url = 'https://raw.githubusercontent.com/bloominstituteoftechnology/data-science-practice-datasets/main/unit_4/unit1_nlp/review_sample.json'

# Import data into a DataFrame named df
# YOUR CODE HERE
df = pd.read_json(data_url, lines=True)

  and should_run_async(code)


In [None]:
# Visible Testing
assert isinstance(df, pd.DataFrame), 'df is not a DataFrame. Did you import the data into df?'
assert df.shape[0] == 10000, 'DataFrame df has the wrong number of rows.'

  and should_run_async(code)


## Part 1: Tokenize Function
<a id="#p1"></a>

Complete the function `tokenize`. Your function should
- Accept one document at a time
- Return a list of tokens

You are free to use any method you have learned this week.

**TO PASS CODEGRADE RUNTIME:**
- Do not run your tokenize function more than one time in your notebook! It is not needed until Part 4!

In [None]:
# Optional: Consider using spaCy in your function. The spaCy library can be imported by running this cell.
# A pre-trained model (en_core_web_sm) has been made available to you in the CodeGrade container.
# If you DON'T need use the en_core_web_sm model, you can comment it out below.

nlp = spacy.load('en_core_web_lg')

  and should_run_async(code)


In [None]:
df.head()

  and should_run_async(code)
  elif pd.api.types.is_categorical_dtype(df[column]):


Unnamed: 0,business_id,cool,date,funny,review_id,stars,text,useful,user_id
0,nDuEqIyRc8YKS1q1fX0CZg,1,2015-03-31 16:50:30,0,eZs2tpEJtXPwawvHnHZIgQ,1,"BEWARE!!! FAKE, FAKE, FAKE....We also own a sm...",10,n1LM36qNg4rqGXIcvVXv8w
1,eMYeEapscbKNqUDCx705hg,0,2015-12-16 05:31:03,0,DoQDWJsNbU0KL1O29l_Xug,4,Came here for lunch Togo. Service was quick. S...,0,5CgjjDAic2-FAvCtiHpytA
2,6Q7-wkCPc1KF75jZLOTcMw,1,2010-06-20 19:14:48,1,DDOdGU7zh56yQHmUnL1idQ,3,I've been to Vegas dozens of times and had nev...,2,BdV-cf3LScmb8kZ7iiBcMA
3,k3zrItO4l9hwfLRwHBDc9w,3,2010-07-13 00:33:45,4,LfTMUWnfGFMOfOIyJcwLVA,1,We went here on a night where they closed off ...,5,cZZnBqh4gAEy4CdNvJailQ
4,6hpfRwGlOzbNv7k5eP9rsQ,1,2018-06-30 02:30:01,0,zJSUdI7bJ8PNJAg4lnl_Gg,4,"3.5 to 4 stars\n\nNot bad for the price, $12.9...",5,n9QO4ClYAS7h9fpQwa5bhA


In [None]:
def tokenize(text):
# YOUR CODE HERE
  doc = nlp(text)
  lemma_list = [token.lemma_.lower().strip() for token in doc if (not token.is_stop)
                                                              and (not token.is_punct)
                                                              and (token.lemma_.strip()!="")
                                                              and (len(token.lemma_.strip())>1)]
  return lemma_list

  and should_run_async(code)


In [None]:
'''Testing'''
assert isinstance(tokenize(df.sample(n=1)["text"].iloc[0]), list), "Make sure your tokenizer function accepts a single document and returns a list of tokens!"

  and should_run_async(code)


## Part 2: Vector Representation
<a id="#p2"></a>
1. Create a vector representation of the reviews (i.e. create a doc-term matrix).
    * Name that doc-term matrix `dtm`

In [None]:
df.head()

  and should_run_async(code)
  elif pd.api.types.is_categorical_dtype(df[column]):


Unnamed: 0,business_id,cool,date,funny,review_id,stars,text,useful,user_id
0,nDuEqIyRc8YKS1q1fX0CZg,1,2015-03-31 16:50:30,0,eZs2tpEJtXPwawvHnHZIgQ,1,"BEWARE!!! FAKE, FAKE, FAKE....We also own a sm...",10,n1LM36qNg4rqGXIcvVXv8w
1,eMYeEapscbKNqUDCx705hg,0,2015-12-16 05:31:03,0,DoQDWJsNbU0KL1O29l_Xug,4,Came here for lunch Togo. Service was quick. S...,0,5CgjjDAic2-FAvCtiHpytA
2,6Q7-wkCPc1KF75jZLOTcMw,1,2010-06-20 19:14:48,1,DDOdGU7zh56yQHmUnL1idQ,3,I've been to Vegas dozens of times and had nev...,2,BdV-cf3LScmb8kZ7iiBcMA
3,k3zrItO4l9hwfLRwHBDc9w,3,2010-07-13 00:33:45,4,LfTMUWnfGFMOfOIyJcwLVA,1,We went here on a night where they closed off ...,5,cZZnBqh4gAEy4CdNvJailQ
4,6hpfRwGlOzbNv7k5eP9rsQ,1,2018-06-30 02:30:01,0,zJSUdI7bJ8PNJAg4lnl_Gg,4,"3.5 to 4 stars\n\nNot bad for the price, $12.9...",5,n9QO4ClYAS7h9fpQwa5bhA


In [None]:
%%time
# YOUR CODE HERE
from sklearn.feature_extraction.text import CountVectorizer

fake_review = ['this was by far one of the best objects that i have found in a while',
               'on a scale from one to even i literally cant']

vect = TfidfVectorizer()

vect.fit(df.text)

dtm = vect.transform(df.text)

  and should_run_async(code)


CPU times: user 1.79 s, sys: 12.5 ms, total: 1.8 s
Wall time: 1.8 s


In [None]:
print(dtm)

  (0, 27035)	0.04767143002712919
  (0, 26907)	0.033545649953732444
  (0, 26693)	0.10254289847165508
  (0, 26566)	0.04010062022080262
  (0, 26375)	0.06661017206197539
  (0, 26083)	0.042011964793667395
  (0, 25201)	0.0820245996446463
  (0, 24705)	0.07409347413713287
  (0, 24628)	0.04576428842315356
  (0, 24488)	0.032203405345831826
  (0, 24413)	0.048527822877930525
  (0, 24400)	0.08539729788309706
  (0, 24395)	0.0348625422088561
  (0, 22330)	0.06959959650979039
  (0, 21472)	0.1327358542058429
  (0, 21186)	0.14161658443901748
  (0, 20912)	0.06732679954093036
  (0, 19855)	0.12775415999823592
  (0, 19676)	0.09345036536800914
  (0, 18043)	0.17553564726989873
  (0, 17939)	0.08301191489703681
  (0, 17310)	0.08377334708744409
  (0, 16949)	0.07005643563457087
  (0, 16681)	0.2998386399935032
  (0, 16607)	0.037636638303316844
  :	:
  (9999, 3363)	0.05740688637273261
  (9999, 2881)	0.06384960619262742
  (9999, 2752)	0.03662990045733817
  (9999, 2713)	0.03395276739851202
  (9999, 2699)	0.02704164302

  and should_run_async(code)


In [None]:
print(vect.get_feature_names_out())

['00' '000' '001695' ... '食べ物はうまい' '餐後點了甜點' '３時間後の便']


  and should_run_async(code)


In [None]:
print(dtm)

  (0, 27035)	0.04767143002712919
  (0, 26907)	0.033545649953732444
  (0, 26693)	0.10254289847165508
  (0, 26566)	0.04010062022080262
  (0, 26375)	0.06661017206197539
  (0, 26083)	0.042011964793667395
  (0, 25201)	0.0820245996446463
  (0, 24705)	0.07409347413713287
  (0, 24628)	0.04576428842315356
  (0, 24488)	0.032203405345831826
  (0, 24413)	0.048527822877930525
  (0, 24400)	0.08539729788309706
  (0, 24395)	0.0348625422088561
  (0, 22330)	0.06959959650979039
  (0, 21472)	0.1327358542058429
  (0, 21186)	0.14161658443901748
  (0, 20912)	0.06732679954093036
  (0, 19855)	0.12775415999823592
  (0, 19676)	0.09345036536800914
  (0, 18043)	0.17553564726989873
  (0, 17939)	0.08301191489703681
  (0, 17310)	0.08377334708744409
  (0, 16949)	0.07005643563457087
  (0, 16681)	0.2998386399935032
  (0, 16607)	0.037636638303316844
  :	:
  (9999, 3363)	0.05740688637273261
  (9999, 2881)	0.06384960619262742
  (9999, 2752)	0.03662990045733817
  (9999, 2713)	0.03395276739851202
  (9999, 2699)	0.02704164302

  and should_run_async(code)



2. Write a fake review. Assign the text of the review to an object called `fake_review`.
3. Query the fake review for the 10 most similar reviews, print the text of the reviews.
    - Given the size of the dataset, use `NearestNeighbors` model for this. Name the model `nn`.

In [None]:
dtm = pd.DataFrame(dtm.todense(), columns=vect.get_feature_names_out())
print(dtm.shape)
dtm.head()

(2, 21)


  and should_run_async(code)


Unnamed: 0,best,by,cant,even,far,found,from,have,in,literally,...,of,on,one,scale,that,the,this,to,was,while
0,0.272103,0.272103,0.0,0.0,0.272103,0.272103,0.0,0.272103,0.272103,0.0,...,0.272103,0.0,0.193603,0.0,0.272103,0.272103,0.272103,0.0,0.272103,0.272103
1,0.0,0.0,0.364996,0.364996,0.0,0.0,0.364996,0.0,0.0,0.364996,...,0.0,0.364996,0.259698,0.364996,0.0,0.0,0.0,0.364996,0.0,0.0


In [None]:
help(dtm)

In [None]:
# Create and fit a NearestNeighbors model named "nn"
# YOUR CODE HERE
nn = NearestNeighbors(n_neighbors = 10)

nn.fit(dtm)

  and should_run_async(code)


In [None]:
print(nn.n_neighbors)

10


  and should_run_async(code)


In [None]:
'''Testing.'''
assert nn.__module__ == 'sklearn.neighbors._unsupervised', ' nn is not a NearestNeighbors instance.'
assert nn.n_neighbors == 10, 'nn has the wrong value for n_neighbors'

  and should_run_async(code)


In [None]:
# Create a fake review and find the 10 most similar reviews

# YOUR CODE HERE
fake_review = "this is the worst fake review in the history of fake reviews"

  and should_run_async(code)


In [None]:
fake_review = ''.join(fake_review)

  and should_run_async(code)


In [None]:
# Visible Testing
assert isinstance(fake_review, str), "Did you write a review in the correct data type?"

  and should_run_async(code)


  and should_run_async(code)


<10000x27588 sparse matrix of type '<class 'numpy.float64'>'
	with 713990 stored elements in Compressed Sparse Row format>

In [None]:
query_doc

  and should_run_async(code)


array([[0.27210261, 0.27210261, 0.        , 0.        , 0.27210261,
        0.27210261, 0.        , 0.27210261, 0.27210261, 0.        ,
        0.27210261, 0.27210261, 0.        , 0.19360325, 0.        ,
        0.27210261, 0.27210261, 0.27210261, 0.        , 0.27210261,
        0.27210261]])

In [None]:
vect_text = vect.transform(fake_review)

In [None]:
doc_index = 0
query_doc = vect.transform([fake_review]) # re-cast as a row vector

# Query Using the kneighbors method
# NOTE: nn counts the original document as one of the neighbors, so if we want 5 nearest neighbors, we should set n_neightbors=6
neigh_dist, neigh_index = nn.kneighbors(query_doc, n_neighbors=10)

  and should_run_async(code)


In [None]:
print(neigh_index)

[[   0 6019 7322   14  441 7709 7099 7878 4607 2379]]


  and should_run_async(code)


## Part 3: Classification
<a id="#p3"></a>
Your goal in this section will be to predict `stars` from the review dataset.

1. Create a pipeline object with a sklearn `CountVectorizer` or `TfidfVector` and any sklearn classifier.
    - Use that pipeline to train a model to predict the `stars` feature (i.e. the labels).
    - Use that pipeline to predict a star rating for your fake review from Part 2.



2. Create a parameter dict including `one parameter for the vectorizer` and `one parameter for the model`.
    - Include 2 possible values for each parameter
        - **Keep the values for each parameter low. Extreme values will compromise runtime**
    - **Use `n_jobs` = 1**
    - Due to limited computational resources on CodeGrader `DO NOT INCLUDE ADDITIONAL PARAMETERS OR VALUES PLEASE.`
    
    
3. Train the entire pipeline with a GridSearch
    - Name your GridSearch object as `gs`

In [None]:
df.head()

In [None]:
!pip install Pipeline

In [None]:
df.head()

  and should_run_async(code)
  elif pd.api.types.is_categorical_dtype(df[column]):


Unnamed: 0,business_id,cool,date,funny,review_id,stars,text,useful,user_id
0,nDuEqIyRc8YKS1q1fX0CZg,1,2015-03-31 16:50:30,0,eZs2tpEJtXPwawvHnHZIgQ,0,"BEWARE!!! FAKE, FAKE, FAKE....We also own a sm...",10,n1LM36qNg4rqGXIcvVXv8w
1,eMYeEapscbKNqUDCx705hg,0,2015-12-16 05:31:03,0,DoQDWJsNbU0KL1O29l_Xug,3,Came here for lunch Togo. Service was quick. S...,0,5CgjjDAic2-FAvCtiHpytA
2,6Q7-wkCPc1KF75jZLOTcMw,1,2010-06-20 19:14:48,1,DDOdGU7zh56yQHmUnL1idQ,2,I've been to Vegas dozens of times and had nev...,2,BdV-cf3LScmb8kZ7iiBcMA
3,k3zrItO4l9hwfLRwHBDc9w,3,2010-07-13 00:33:45,4,LfTMUWnfGFMOfOIyJcwLVA,0,We went here on a night where they closed off ...,5,cZZnBqh4gAEy4CdNvJailQ
4,6hpfRwGlOzbNv7k5eP9rsQ,1,2018-06-30 02:30:01,0,zJSUdI7bJ8PNJAg4lnl_Gg,3,"3.5 to 4 stars\n\nNot bad for the price, $12.9...",5,n9QO4ClYAS7h9fpQwa5bhA


In [None]:
target = 'star'
y = df['stars']

  and should_run_async(code)


In [None]:
y.unique()

  and should_run_async(code)


array([0, 3, 2, 4, 1])

In [None]:
vect = TfidfVectorizer(stop_words='english', max_features=500, ngram_range=(1,2))
#clf = XGBClassifier(learning_rate=0.1, max_depth=-5, random_state=42)

pipe = Pipeline([('vect', vect)])

  and should_run_async(code)


In [None]:
y = y-1

In [None]:
X.head()

  and should_run_async(code)
  elif pd.api.types.is_categorical_dtype(df[column]):


Unnamed: 0,business_id,cool,date,funny,review_id,text,useful,user_id
0,nDuEqIyRc8YKS1q1fX0CZg,1,2015-03-31 16:50:30,0,eZs2tpEJtXPwawvHnHZIgQ,"BEWARE!!! FAKE, FAKE, FAKE....We also own a sm...",10,n1LM36qNg4rqGXIcvVXv8w
1,eMYeEapscbKNqUDCx705hg,0,2015-12-16 05:31:03,0,DoQDWJsNbU0KL1O29l_Xug,Came here for lunch Togo. Service was quick. S...,0,5CgjjDAic2-FAvCtiHpytA
2,6Q7-wkCPc1KF75jZLOTcMw,1,2010-06-20 19:14:48,1,DDOdGU7zh56yQHmUnL1idQ,I've been to Vegas dozens of times and had nev...,2,BdV-cf3LScmb8kZ7iiBcMA
3,k3zrItO4l9hwfLRwHBDc9w,3,2010-07-13 00:33:45,4,LfTMUWnfGFMOfOIyJcwLVA,We went here on a night where they closed off ...,5,cZZnBqh4gAEy4CdNvJailQ
4,6hpfRwGlOzbNv7k5eP9rsQ,1,2018-06-30 02:30:01,0,zJSUdI7bJ8PNJAg4lnl_Gg,"3.5 to 4 stars\n\nNot bad for the price, $12.9...",5,n9QO4ClYAS7h9fpQwa5bhA


In [None]:
from sklearn.pipeline import Pipeline

  and should_run_async(code)


In [None]:
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier

  and should_run_async(code)


In [None]:
X = df.text
y = df.stars

tfidf= TfidfVectorizer(stop_words='english', tokenizer=None)

rfc = RandomForestClassifier(random_state=42)

pipe= Pipeline([('vect', tfidf),
                 ("clf", rfc)])

parameters = {
    'vect__max_df' : (0.75,1.0),
    'clf__max_depth' : (10,20)
}

gs = GridSearchCV(pipe,parameters, n_jobs=1)
gs.fit(X,y)

print(X.shape)
print(y.shape)

  and should_run_async(code)
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if is_sparse(pd_dtype):
  if 

(10000,)
(10000,)


In [None]:
y[1]

  and should_run_async(code)


3

In [None]:
y.shape

  and should_run_async(code)


(10000,)

In [None]:
y = [x - 1 for x in y]

  and should_run_async(code)


In [None]:
y.head()

In [None]:
df['stars'].unique()

  and should_run_async(code)


array([1, 4, 3, 5, 2])

In [None]:
dft = pd.DataFrame(y)

In [None]:
# Visible Testing
prediction = gs.predict(["This is your prediction statement."])[0]
assert prediction in df.stars.values, 'You gs object should be able to accept raw text within a list. Did you include a vectorizer in your pipeline?'

## Part 4: Topic Modeling

Let's find out what those yelp reviews are saying! :D

1. Estimate a LDA topic model of the review text
    - Set num_topics to `5`
    - Name your LDA model `lda`
2. Create 1-2 visualizations of the results
    - You can use the most important 3 words of a topic in relevant visualizations.
3. In markdown, write 1-2 paragraphs of analysis on the results of your topic model

When you instantiate your LDA model, it should look like this:

```python
lda = LdaModel(corpus=corpus,
               id2word=id2word,
               random_state=723812,
               num_topics = num_topics,
               passes=1
              )

```

__*Note*__: You can pass the DataFrame column of text reviews to gensim. You do not have to use a generator.

## Note about  pyLDAvis

**pyLDAvis** is the Topic modeling package that we used in class to visualize the topics that LDA generates for us.

You are welcomed to use pyLDAvis if you'd like for your visualization. However, **you MUST comment out the code that imports the package and the cell that generates the visualization before you submit your notebook to CodeGrade.**

Although you should leave the print out of the visualization for graders to see (i.e. comment out the cell after you run it to create the viz).

### 1. Estimate a LDA topic model of the review text

* Use the `tokenize` function you created earlier to create tokens.
* Create an `id2word` object.
> Hint: Use `corpora.Dictionary`
* Create a `corpus` object.
> Hint: Use `id2word.doc2bow`
* Instantiate an `lda` model.

>> Remember to read the LDA docs for more information on the various class attributes and methods available to you in the LDA model: https://radimrehurek.com/gensim/models/ldamodel.html

In [None]:
# Do not change this value
num_topics = 5

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

#### Testing

In [None]:
# Visible Testing

assert lda.get_topics().shape[0] == 5, 'Did your model complete its training? Did you set num_topics to 5?'

#### 2. Create 2 visualizations of the results:
1. Create a visualization using pyLDAvis. Run the cell, then comment out your code before submission, leaving the visualization in the cell.

2. Create a visualization using the matplotlib library and utilizing the subplots function. Assign this visualization to a variable called `visual_plot`.


In [None]:
# Cell for pyLDAvis visualization
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Cell for matplotlib visualzation
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Visible testing
assert visual_plot is not None, "Variable 'visual_plot' is not created."