## Importing libraries

In [None]:
import pandas as pd
import numpy as np
import os
import spacy 
from tqdm import tqdm

### Read reviews data

In [None]:
con=open("../data/Samsung.txt",'r', encoding="utf-8")
samsung_reviews=con.read()
con.close()

### Can we reduce the time taken?
[Pipelines (Spacy)](https://spacy.io/usage/processing-pipelines)


<img src='./images/spacy_pipeline.png'>

In [None]:
# shorten the pipline loading
nlp=spacy.load('en_core_web_sm',disable=['parser','ner'])

In [None]:
nouns = []
for review in tqdm(samsung_reviews.split("\n")[0:1000]):
    doc = nlp(review)
    for tok in doc:
        if tok.pos_=="NOUN":
            nouns.append(tok.lemma_.lower())

100%|██████████| 1000/1000 [00:06<00:00, 148.24it/s]


In [None]:
len(samsung_reviews.split("\n"))

In [None]:
(46355/1000)*6

278.13

In [None]:
278/60

4.633333333333334

### Lets process all the reviews now and see if time taken is less !!!

In [None]:
nouns = []
for review in tqdm(samsung_reviews.split("\n")):
    doc = nlp(review)
    for tok in doc:
        if tok.pos_=="NOUN":
            nouns.append(tok.lemma_.lower())

100%|██████████| 46355/46355 [04:27<00:00, 173.42it/s]


### Does the hypothesis of nouns capturing `product features` hold?

In [None]:
nouns=pd.Series(nouns)
nouns.value_counts().head(5)

phone      43237
battery     4350
product     3907
time        3825
screen      3746
dtype: int64

In [None]:
nouns.value_counts().head(10)

phone      43237
battery     4350
product     3907
time        3825
screen      3746
card        3399
price       3148
problem     3120
camera      2773
app         2606
dtype: int64

### We now know that people mention `battery`, `product`, `screen` etc. But we still don't know in what context they mention these keywords

### Summary:
 - Most frequently used lemmatised forms of noun, inform us about the product features people are talking about in product reviews
 - In order to process the review data faster spacy allows us to use the idea of enabling parts of model inference pipeline via `spacy.loads()` command and `disable` parameter