# Part 01 - NLP Preprocessing of Amazon Reviews

<blockquote style='color:red;font-size:1.5em'> 02/07/24 note: this is originally from movie reviews project. Need to take the code from Part 01 (old one) and update this notebook with the files and any new workflows </blockquote>

### Amazon Data Intro

In [101]:
from IPython.display import display, Markdown
with open("data/Amazon Product Reviews.md") as f:
    info = f.read()

display(Markdown(info))

# Amazon Product Reviews

- URL: https://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_reviews 

## Description

This is a large crawl of product reviews from Amazon. This dataset contains 82.83 million unique reviews, from around 20 million users.

## Basic statistics

| Ratings:  | 82.83 million        |
| --------- | -------------------- |
| Users:    | 20.98 million        |
| Items:    | 9.35 million         |
| Timespan: | May 1996 - July 2014 |

## Metadata

- reviews and ratings
- item-to-item relationships (e.g. "people who bought X also bought Y")
- timestamps
- helpfulness votes
- product image (and CNN features)
- price
- category
- salesRank

## Example

```
{  "reviewerID": "A2SUAM1J3GNN3B",  "asin": "0000013714",  "reviewerName": "J. McDonald",  "helpful": [2, 3],  "reviewText": "I bought this for my husband who plays the piano.  He is having a wonderful time playing these old hymns.  The music  is at times hard to read because we think the book was published for singing from more than playing from.  Great purchase though!",  "overall": 5.0,  "summary": "Heavenly Highway Hymns",  "unixReviewTime": 1252800000,  "reviewTime": "09 13, 2009" }
```

## Download link

See the [Amazon Dataset Page](https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/) for download information.

The 2014 version of this dataset is [also available](https://cseweb.ucsd.edu/~jmcauley/datasets/amazon/links.html).

## Citation

Please cite the following if you use the data:

**Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering**

R. He, J. McAuley

*WWW*, 2016
[pdf](https://cseweb.ucsd.edu/~jmcauley/pdfs/www16a.pdf)

**Image-based recommendations on styles and substitutes**

J. McAuley, C. Targett, J. Shi, A. van den Hengel

*SIGIR*, 2015
[pdf](https://cseweb.ucsd.edu/~jmcauley/pdfs/sigir15.pdf)

## TO DO
- Add updated intro for this new dataset

In [102]:
import os, sys, joblib,json
# sys.path.append(os.path.abspath("../NLP/"))
# sys.path.append(os.path.abspath("../"))
# sys.path.append(os.path.abspath("../../"))
%load_ext autoreload
%autoreload 2
    
# import custom_functions as fn
# import project_functions as pf

# !pip install -U dojo_ds -q
import dojo_ds as ds
ds.__version__

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


'1.0.9'

In [103]:
import matplotlib.pyplot as plt
import missingno
import matplotlib as mpl
import seaborn as sns
import numpy as np
import pandas as pd

pd.set_option("display.max_columns",50)
# pd.set_option('display.max_colwidth', 250)

fav_style = ('ggplot','tableau-colorblind10')
fav_context  ={'context':'notebook', 'font_scale':1.1}
plt.style.use(fav_style)
sns.set_context(**fav_context)
plt.rcParams['savefig.transparent'] = False
plt.rcParams['savefig.bbox'] = 'tight'

In [104]:
from pprint import pprint
FPATHS_FILE = "config/filepaths.json"
with open(FPATHS_FILE) as f:
    FPATHS = json.load(f)
pprint(FPATHS)

{'data': {'app': {},
          'ml-nlp': {'reviews-with-target_json': 'data/modeling/processed-nlp-reviews-for-ml.json',
                     'test_joblib': 'Data-NLP/modeling/testing-data.joblib',
                     'train_joblib': 'data/modeling/training-data.joblib'},
          'ml-tabular': {'reviews-with-ml-target_json': 'Data/modeling/processed-movie-data-for-ml.json',
                         'test_joblib': 'data/modeling/testing-data.joblib',
                         'train_joblib': 'data/modeling/training-data.joblib'},
          'nn': {'test_dir': 'data/modeling/testing-data-tf/',
                 'train_dir': 'data/modeling/training-data-tf/'},
          'processed': {'processed-reviews-spacy_joblib': 'data/processed/processed-reviews.joblib',
                        'processed-reviews-spacy_json': 'data/processed/processed-reviews.json'},
          'raw': {'metadata_csv': 'data/subset/amazon-metadata-subset-grocery-most-common-products.csv.gz',
                  'reviews-

# Load the Data

We will load our **corpus** of Amazon Reviews for Miracle Noodle products.

In [70]:
df = pd.read_csv(FPATHS['data']['raw']['reviews-subset_selected-brand_csv'])#'data/subset/amazon-reviews-subset-brand-Miracle Noodle.csv.gz')
df.head()

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category
0,B007JINB0W,A1P9BVW2JB1OVL,"This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just couldn't get past the...",Odd chewy texture,3.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki
1,B007JINB0W,A5JZ2DBS9H3F6,They smell of fish and have a rubbery hard to chew texture. Yuck,Fishy gross,1.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki
2,B007JINB0W,A3VYMBAX7IFV3B,MOM DID NOT LIKE THESE,One Star,1.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki
3,B007JINB0W,A25MLB8QXVM2LS,The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol or claim that the online description shows.,The noodles themselves are fine. The Amazon label description claims they are certified ...,2.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki
4,B007JINB0W,A2DZN9RBFVVY7L,"So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a try. Out of the bag, they d...",What's important to you?,4.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki


In [71]:
df.isna().sum()

asin          0
reviewerID    0
reviewText    0
summary       0
overall       0
year          0
title         0
brand         0
category      0
dtype: int64

In [72]:
# Check for duplicated review text
df.duplicated(subset=['reviewerID','reviewText']).sum()

0

In [73]:
df.shape

(4363, 9)

### Combine All Review Text

- The reviews are split into 2 parts. The reviewText, which is the majority of the review, and the summary, which is a 1-line summary of the review (that often includes the actual rating: e.g., "Fours stars- best vacuum)

In [74]:
df['review-text-full'] = df['summary'] + ": " + df['reviewText']
df.head()

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full
0,B007JINB0W,A1P9BVW2JB1OVL,"This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just couldn't get past the...",Odd chewy texture,3.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Odd chewy texture: This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just co..."
1,B007JINB0W,A5JZ2DBS9H3F6,They smell of fish and have a rubbery hard to chew texture. Yuck,Fishy gross,1.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Fishy gross: They smell of fish and have a rubbery hard to chew texture. Yuck
2,B007JINB0W,A3VYMBAX7IFV3B,MOM DID NOT LIKE THESE,One Star,1.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,One Star: MOM DID NOT LIKE THESE
3,B007JINB0W,A25MLB8QXVM2LS,The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol or claim that the online description shows.,The noodles themselves are fine. The Amazon label description claims they are certified ...,2.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,The noodles themselves are fine. The Amazon label description claims they are certified ...: The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol ...
4,B007JINB0W,A2DZN9RBFVVY7L,"So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a try. Out of the bag, they d...",What's important to you?,4.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"What's important to you?: So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a t..."


### Removing HTML/HTTPS (Orig From Notebook 6B)

In [75]:
df['review-text-full_raw'] = df['review-text-full'].copy()
df

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw
0,B007JINB0W,A1P9BVW2JB1OVL,"This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just couldn't get past the...",Odd chewy texture,3.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Odd chewy texture: This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just co...","Odd chewy texture: This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just co..."
1,B007JINB0W,A5JZ2DBS9H3F6,They smell of fish and have a rubbery hard to chew texture. Yuck,Fishy gross,1.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Fishy gross: They smell of fish and have a rubbery hard to chew texture. Yuck,Fishy gross: They smell of fish and have a rubbery hard to chew texture. Yuck
2,B007JINB0W,A3VYMBAX7IFV3B,MOM DID NOT LIKE THESE,One Star,1.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,One Star: MOM DID NOT LIKE THESE,One Star: MOM DID NOT LIKE THESE
3,B007JINB0W,A25MLB8QXVM2LS,The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol or claim that the online description shows.,The noodles themselves are fine. The Amazon label description claims they are certified ...,2.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,The noodles themselves are fine. The Amazon label description claims they are certified ...: The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol ...,The noodles themselves are fine. The Amazon label description claims they are certified ...: The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol ...
4,B007JINB0W,A2DZN9RBFVVY7L,"So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a try. Out of the bag, they d...",What's important to you?,4.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"What's important to you?: So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a t...","What's important to you?: So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a t..."
...,...,...,...,...,...,...,...,...,...,...,...
4358,B007JINB0W,A2GQJHRX6192Q8,"Good product overall, but one of the enclosed bags had noodles that were not edible, as they were dried out and hard as leather.",I like the product but one of the bags was not edible.,4.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"I like the product but one of the bags was not edible.: Good product overall, but one of the enclosed bags had noodles that were not edible, as they were dried out and hard as leather.","I like the product but one of the bags was not edible.: Good product overall, but one of the enclosed bags had noodles that were not edible, as they were dried out and hard as leather."
4359,B007JINB0W,A32VFZJ4BR737W,"The packages of noodles appeared to be intact, but the box was soggy. Foul odor. I am unfamiliar with product, but I did not want to risk it... I threw it all away.",Very disappointed.,1.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Very disappointed.: The packages of noodles appeared to be intact, but the box was soggy. Foul odor. I am unfamiliar with product, but I did not want to risk it... I threw it all away.","Very disappointed.: The packages of noodles appeared to be intact, but the box was soggy. Foul odor. I am unfamiliar with product, but I did not want to risk it... I threw it all away."
4360,B007JINB0W,A9QJH6RPTSEUB,"As many others have stated, these are not remotely similar to rice or a noodle. The bag feels like a squishy wet diaper, and smells worse than a diaper when you open it! The smell does mostly go away after you drain and rinse, but not completely....",Gross rubbery nastiness!,1.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Gross rubbery nastiness!: As many others have stated, these are not remotely similar to rice or a noodle. The bag feels like a squishy wet diaper, and smells worse than a diaper when you open it! The smell does mostly go away after you drain and ...","Gross rubbery nastiness!: As many others have stated, these are not remotely similar to rice or a noodle. The bag feels like a squishy wet diaper, and smells worse than a diaper when you open it! The smell does mostly go away after you drain and ..."
4361,B007JINB0W,A3SZZEPNVX3P3I,Love these,Five Stars,5.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Five Stars: Love these,Five Stars: Love these


In [76]:
# Checking for links
df.loc[df['review-text-full'].str.contains('http')]


Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw
75,B007JINB0W,A1VDTM4ITCSHQ8,"We have eaten shirataki noodles for many years because of my husbands diabetes, but this was our first time trying the shirataki Miracle Rice, and it was fantastic! (I actually like it more than the noodles.) Yes it does have a ocean smell when y...",Great alternative to heavy strarches!,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Great alternative to heavy strarches!: We have eaten shirataki noodles for many years because of my husbands diabetes, but this was our first time trying the shirataki Miracle Rice, and it was fantastic! (I actually like it more than the noodles....","Great alternative to heavy strarches!: We have eaten shirataki noodles for many years because of my husbands diabetes, but this was our first time trying the shirataki Miracle Rice, and it was fantastic! (I actually like it more than the noodles...."
1061,B007JINB0W,A25ZES0OTED0S5,"This stuff is repugnant. I cooked the ""Fettuccine"" noodles exactly as specified on the Miracle Noodle website, https://www.miraclenoodle.com/t-how-to-cook-shirataki-noodles.aspx - to summarize:\n\n1. Remove from package, rinse for 1-2 minutes\n2...",Disgusting,1.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Disgusting: This stuff is repugnant. I cooked the ""Fettuccine"" noodles exactly as specified on the Miracle Noodle website, https://www.miraclenoodle.com/t-how-to-cook-shirataki-noodles.aspx - to summarize:\n\n1. Remove from package, rinse for 1-...","Disgusting: This stuff is repugnant. I cooked the ""Fettuccine"" noodles exactly as specified on the Miracle Noodle website, https://www.miraclenoodle.com/t-how-to-cook-shirataki-noodles.aspx - to summarize:\n\n1. Remove from package, rinse for 1-..."
1172,B007JINB0W,A25Y0KLV7I19FA,"<div id=""video-block-R2QVYQA389CT7S"" class=""a-section a-spacing-small a-spacing-top-mini video-block""></div><input type=""hidden"" name="""" value=""https://images-na.ssl-images-amazon.com/images/I/91E2G7ukhBS.mp4"" class=""video-url""><input type=""hidde...",Family love it !!!,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Family love it !!!: <div id=""video-block-R2QVYQA389CT7S"" class=""a-section a-spacing-small a-spacing-top-mini video-block""></div><input type=""hidden"" name="""" value=""https://images-na.ssl-images-amazon.com/images/I/91E2G7ukhBS.mp4"" class=""video-url...","Family love it !!!: <div id=""video-block-R2QVYQA389CT7S"" class=""a-section a-spacing-small a-spacing-top-mini video-block""></div><input type=""hidden"" name="""" value=""https://images-na.ssl-images-amazon.com/images/I/91E2G7ukhBS.mp4"" class=""video-url..."
1473,B007JINB0W,A3J6ABN4ZOG502,http://www.amazon.com/gp/product/B007JINB0W?redirect=true&ref_=cm_cr_ryp_prd_ttl_sol_37,http: //www. amazon. com/gp/product/B007JINB0W?,5.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,http: //www. amazon. com/gp/product/B007JINB0W?: http://www.amazon.com/gp/product/B007JINB0W?redirect=true&ref_=cm_cr_ryp_prd_ttl_sol_37,http: //www. amazon. com/gp/product/B007JINB0W?: http://www.amazon.com/gp/product/B007JINB0W?redirect=true&ref_=cm_cr_ryp_prd_ttl_sol_37
2175,B007JINB0W,A162S75UMDTC,"I first heard about these Shirataki noodles on an episode of BEGIN Japanology dealing with potatos: https://www.youtube.com/watch?v=FPwbbdo2p6c\n\nSeemed too good to be true - a food product that's almost entirely fiber, with no real caloric con...",surprisingly decent,4.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"surprisingly decent: I first heard about these Shirataki noodles on an episode of BEGIN Japanology dealing with potatos: https://www.youtube.com/watch?v=FPwbbdo2p6c\n\nSeemed too good to be true - a food product that's almost entirely fiber, wit...","surprisingly decent: I first heard about these Shirataki noodles on an episode of BEGIN Japanology dealing with potatos: https://www.youtube.com/watch?v=FPwbbdo2p6c\n\nSeemed too good to be true - a food product that's almost entirely fiber, wit..."
3816,B007JINB0W,A2PIOAUQSBG074,"I used to buy yam noodles in the local asian market. I love them and wanted to find them on Amazon, and they are (http://www.amazon.com/JFC-Brown-Shirataki-Yam-Noodles/dp/B002FDW6H0/ref=sr_1_cc_3?s=aps&ie=UTF8&qid=1395427420&sr=1-3-catcorr&keywor...",Hard to chew....,2.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Hard to chew....: I used to buy yam noodles in the local asian market. I love them and wanted to find them on Amazon, and they are (http://www.amazon.com/JFC-Brown-Shirataki-Yam-Noodles/dp/B002FDW6H0/ref=sr_1_cc_3?s=aps&ie=UTF8&qid=1395427420&sr=...","Hard to chew....: I used to buy yam noodles in the local asian market. I love them and wanted to find them on Amazon, and they are (http://www.amazon.com/JFC-Brown-Shirataki-Yam-Noodles/dp/B002FDW6H0/ref=sr_1_cc_3?s=aps&ie=UTF8&qid=1395427420&sr=..."


In [77]:
# Checking for raw html
df.loc[df['review-text-full_raw'].str.contains('<')]

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw
164,B007JINB0W,A2M9IS41H1HJAI,Quick update on 11/21/14\n\nJust started putting Old Bay Seasoning in the water that I boil these in. Seems to add some flavor to them but it also changes the color. Looks very close to the color of pasta!\n\n---------------\n\nIt's kind of sad t...,"Follow the directions, and these will be really really good.",5.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Follow the directions, and these will be really really good.: Quick update on 11/21/14\n\nJust started putting Old Bay Seasoning in the water that I boil these in. Seems to add some flavor to them but it also changes the color. Looks very close t...","Follow the directions, and these will be really really good.: Quick update on 11/21/14\n\nJust started putting Old Bay Seasoning in the water that I boil these in. Seems to add some flavor to them but it also changes the color. Looks very close t..."
266,B007JINB0W,A2AZR0HQEOAT8J,"Because they look and feel like exactly that: noodles made of transparent rubber. If you don't care or are planning in burying them in enough sauce you can't see them, go for it. Personally, I'm not that desperate to lose the calories and carbs f...","They're great, if you like eating rubber",1.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"They're great, if you like eating rubber: Because they look and feel like exactly that: noodles made of transparent rubber. If you don't care or are planning in burying them in enough sauce you can't see them, go for it. Personally, I'm not that ...","They're great, if you like eating rubber: Because they look and feel like exactly that: noodles made of transparent rubber. If you don't care or are planning in burying them in enough sauce you can't see them, go for it. Personally, I'm not that ..."
895,B007JINB0W,AN79B2EUCG5O,"bought the variety pack... the rice and the angle hair are ok. I think for me they are thin/small enough to not be a substantial part of a bite, so less contribution to the mouth feel of the bite of food. The Fettuchini on the other hand, I coul...",Be prepaired to experement to find the best way to eat them.,3.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Be prepaired to experement to find the best way to eat them.: bought the variety pack... the rice and the angle hair are ok. I think for me they are thin/small enough to not be a substantial part of a bite, so less contribution to the mouth feel...","Be prepaired to experement to find the best way to eat them.: bought the variety pack... the rice and the angle hair are ok. I think for me they are thin/small enough to not be a substantial part of a bite, so less contribution to the mouth feel..."
1172,B007JINB0W,A25Y0KLV7I19FA,"<div id=""video-block-R2QVYQA389CT7S"" class=""a-section a-spacing-small a-spacing-top-mini video-block""></div><input type=""hidden"" name="""" value=""https://images-na.ssl-images-amazon.com/images/I/91E2G7ukhBS.mp4"" class=""video-url""><input type=""hidde...",Family love it !!!,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Family love it !!!: <div id=""video-block-R2QVYQA389CT7S"" class=""a-section a-spacing-small a-spacing-top-mini video-block""></div><input type=""hidden"" name="""" value=""https://images-na.ssl-images-amazon.com/images/I/91E2G7ukhBS.mp4"" class=""video-url...","Family love it !!!: <div id=""video-block-R2QVYQA389CT7S"" class=""a-section a-spacing-small a-spacing-top-mini video-block""></div><input type=""hidden"" name="""" value=""https://images-na.ssl-images-amazon.com/images/I/91E2G7ukhBS.mp4"" class=""video-url..."
1350,B007JINB0W,A3UEE22RNGQ2L8,"This product has seriously changed my LIFE. I fight every day to keep my weight at its current level, and I simply must avoid carbs. The conflict is that my favorite food, since I was a child is spaghetti with meat sauce. Until I learned about th...","ZERO CALORIES, ZERO CARBS and EXACTLY like spaghetti. Miracle noodles, indeed. Changed my life!",5.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"ZERO CALORIES, ZERO CARBS and EXACTLY like spaghetti. Miracle noodles, indeed. Changed my life!: This product has seriously changed my LIFE. I fight every day to keep my weight at its current level, and I simply must avoid carbs. The conflict is ...","ZERO CALORIES, ZERO CARBS and EXACTLY like spaghetti. Miracle noodles, indeed. Changed my life!: This product has seriously changed my LIFE. I fight every day to keep my weight at its current level, and I simply must avoid carbs. The conflict is ..."
1498,B007JINB0W,A237SW9SPH1DAD,"Holly guacamole, I love these things! Follow instructions and get creative with spices and sauces. The ""funky"" smell so many have mentioned is no big deal and goes away. Texture is good, specially if you like ramen or asian food.\n If you are lik...","These make your plate ""full"" and plenty.",5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"These make your plate ""full"" and plenty.: Holly guacamole, I love these things! Follow instructions and get creative with spices and sauces. The ""funky"" smell so many have mentioned is no big deal and goes away. Texture is good, specially if you ...","These make your plate ""full"" and plenty.: Holly guacamole, I love these things! Follow instructions and get creative with spices and sauces. The ""funky"" smell so many have mentioned is no big deal and goes away. Texture is good, specially if you ..."
1514,B007JINB0W,A14A4YYKPLYY26,"When I decided to buy this&nbsp;<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Rice/dp/B00BP36S7U/ref=cm_cr_arp_d_rvw_txt?ie=UTF8"">Miracle Noodle Rice</a>&nbsp;I did so, after having bought the&nbsp;<a data-hook=""p...",Meh! Disappointing..............Tastes NOTHING Like Real Rice!!!,2.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Meh! Disappointing..............Tastes NOTHING Like Real Rice!!!: When I decided to buy this&nbsp;<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Rice/dp/B00BP36S7U/ref=cm_cr_arp_d_rvw_txt?ie=UTF8"">Miracle Noodle Ri...","Meh! Disappointing..............Tastes NOTHING Like Real Rice!!!: When I decided to buy this&nbsp;<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Rice/dp/B00BP36S7U/ref=cm_cr_arp_d_rvw_txt?ie=UTF8"">Miracle Noodle Ri..."
2288,B007JINB0W,A1NF7CRBZD2AF8,"Great product.<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Zero-Carb-Gluten-Free-Shirataki-Pasta-and-Rice-6-bag-Variety-Pack-44-ounces-Includes-2-Shirataki-Angel-Hair-2-Shirataki-Rice-and-2-Shirataki-Fettuccini/d...",Great product.,5.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Great product.: Great product.<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Zero-Carb-Gluten-Free-Shirataki-Pasta-and-Rice-6-bag-Variety-Pack-44-ounces-Includes-2-Shirataki-Angel-Hair-2-Shirataki-Rice-and-2-Shirat...","Great product.: Great product.<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Zero-Carb-Gluten-Free-Shirataki-Pasta-and-Rice-6-bag-Variety-Pack-44-ounces-Includes-2-Shirataki-Angel-Hair-2-Shirataki-Rice-and-2-Shirat..."
2378,B007JINB0W,A14A4YYKPLYY26,"Earlier this year, I started a wheat-free and low-carb, mostly grain-free&nbsp;<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Wheat-Belly/dp/1609611543/ref=cm_cr_arp_d_rvw_txt?ie=UTF8"">Wheat Belly</a>&nbsp;diet, and among the man...","I Can Have Noodles Again! Now, If Only There Could Be a Similar Zero-Carb or Low-Carb Equivalent for Bagels & Crusty Baguettes!",5.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"I Can Have Noodles Again! Now, If Only There Could Be a Similar Zero-Carb or Low-Carb Equivalent for Bagels & Crusty Baguettes!: Earlier this year, I started a wheat-free and low-carb, mostly grain-free&nbsp;<a data-hook=""product-link-linked"" cla...","I Can Have Noodles Again! Now, If Only There Could Be a Similar Zero-Carb or Low-Carb Equivalent for Bagels & Crusty Baguettes!: Earlier this year, I started a wheat-free and low-carb, mostly grain-free&nbsp;<a data-hook=""product-link-linked"" cla..."
2404,B007JINB0W,A1FFJRP833Y1MH,"We love all the Miracle noodles but the&nbsp;<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Shirataki-Zero-Carb-Gluten-Free-Pasta-Garlic-and-Herb-Fettuccini-7-Ounce/dp/B01N91YE5A/ref=cm_cr_arp_d_rvw_txt?ie=UTF8"">Mi...",Delicious!!,5.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Delicious!!: We love all the Miracle noodles but the&nbsp;<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Shirataki-Zero-Carb-Gluten-Free-Pasta-Garlic-and-Herb-Fettuccini-7-Ounce/dp/B01N91YE5A/ref=cm_cr_arp_d_rvw_tx...","Delicious!!: We love all the Miracle noodles but the&nbsp;<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Shirataki-Zero-Carb-Gluten-Free-Pasta-Garlic-and-Herb-Fettuccini-7-Ounce/dp/B01N91YE5A/ref=cm_cr_arp_d_rvw_tx..."


### Remove HTML Tags

In [78]:
import re

# Regular expression to match HTML tags
regex_html = r"<[^>]*>"

# Apply the regex to the DataFrame column using str.replace
df['review-text-full'] = df['review-text-full'].str.replace(regex_html, '', regex=True)
df

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw
0,B007JINB0W,A1P9BVW2JB1OVL,"This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just couldn't get past the...",Odd chewy texture,3.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Odd chewy texture: This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just co...","Odd chewy texture: This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just co..."
1,B007JINB0W,A5JZ2DBS9H3F6,They smell of fish and have a rubbery hard to chew texture. Yuck,Fishy gross,1.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Fishy gross: They smell of fish and have a rubbery hard to chew texture. Yuck,Fishy gross: They smell of fish and have a rubbery hard to chew texture. Yuck
2,B007JINB0W,A3VYMBAX7IFV3B,MOM DID NOT LIKE THESE,One Star,1.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,One Star: MOM DID NOT LIKE THESE,One Star: MOM DID NOT LIKE THESE
3,B007JINB0W,A25MLB8QXVM2LS,The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol or claim that the online description shows.,The noodles themselves are fine. The Amazon label description claims they are certified ...,2.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,The noodles themselves are fine. The Amazon label description claims they are certified ...: The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol ...,The noodles themselves are fine. The Amazon label description claims they are certified ...: The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol ...
4,B007JINB0W,A2DZN9RBFVVY7L,"So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a try. Out of the bag, they d...",What's important to you?,4.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"What's important to you?: So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a t...","What's important to you?: So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a t..."
...,...,...,...,...,...,...,...,...,...,...,...
4358,B007JINB0W,A2GQJHRX6192Q8,"Good product overall, but one of the enclosed bags had noodles that were not edible, as they were dried out and hard as leather.",I like the product but one of the bags was not edible.,4.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"I like the product but one of the bags was not edible.: Good product overall, but one of the enclosed bags had noodles that were not edible, as they were dried out and hard as leather.","I like the product but one of the bags was not edible.: Good product overall, but one of the enclosed bags had noodles that were not edible, as they were dried out and hard as leather."
4359,B007JINB0W,A32VFZJ4BR737W,"The packages of noodles appeared to be intact, but the box was soggy. Foul odor. I am unfamiliar with product, but I did not want to risk it... I threw it all away.",Very disappointed.,1.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Very disappointed.: The packages of noodles appeared to be intact, but the box was soggy. Foul odor. I am unfamiliar with product, but I did not want to risk it... I threw it all away.","Very disappointed.: The packages of noodles appeared to be intact, but the box was soggy. Foul odor. I am unfamiliar with product, but I did not want to risk it... I threw it all away."
4360,B007JINB0W,A9QJH6RPTSEUB,"As many others have stated, these are not remotely similar to rice or a noodle. The bag feels like a squishy wet diaper, and smells worse than a diaper when you open it! The smell does mostly go away after you drain and rinse, but not completely....",Gross rubbery nastiness!,1.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Gross rubbery nastiness!: As many others have stated, these are not remotely similar to rice or a noodle. The bag feels like a squishy wet diaper, and smells worse than a diaper when you open it! The smell does mostly go away after you drain and ...","Gross rubbery nastiness!: As many others have stated, these are not remotely similar to rice or a noodle. The bag feels like a squishy wet diaper, and smells worse than a diaper when you open it! The smell does mostly go away after you drain and ..."
4361,B007JINB0W,A3SZZEPNVX3P3I,Love these,Five Stars,5.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Five Stars: Love these,Five Stars: Love these


In [79]:
# Compare original with cleaned
compare_cols = ['review-text-full_raw','review-text-full']

pd.set_option('display.max_colwidth',250)

In [80]:
df.loc[df['review-text-full_raw'].str.contains('<'), compare_cols]

Unnamed: 0,review-text-full_raw,review-text-full
164,"Follow the directions, and these will be really really good.: Quick update on 11/21/14\n\nJust started putting Old Bay Seasoning in the water that I boil these in. Seems to add some flavor to them but it also changes the color. Looks very close t...","Follow the directions, and these will be really really good.: Quick update on 11/21/14\n\nJust started putting Old Bay Seasoning in the water that I boil these in. Seems to add some flavor to them but it also changes the color. Looks very close t..."
266,"They're great, if you like eating rubber: Because they look and feel like exactly that: noodles made of transparent rubber. If you don't care or are planning in burying them in enough sauce you can't see them, go for it. Personally, I'm not that ...","They're great, if you like eating rubber: Because they look and feel like exactly that: noodles made of transparent rubber. If you don't care or are planning in burying them in enough sauce you can't see them, go for it. Personally, I'm not that ..."
895,"Be prepaired to experement to find the best way to eat them.: bought the variety pack... the rice and the angle hair are ok. I think for me they are thin/small enough to not be a substantial part of a bite, so less contribution to the mouth feel...","Be prepaired to experement to find the best way to eat them.: bought the variety pack... the rice and the angle hair are ok. I think for me they are thin/small enough to not be a substantial part of a bite, so less contribution to the mouth feel..."
1172,"Family love it !!!: <div id=""video-block-R2QVYQA389CT7S"" class=""a-section a-spacing-small a-spacing-top-mini video-block""></div><input type=""hidden"" name="""" value=""https://images-na.ssl-images-amazon.com/images/I/91E2G7ukhBS.mp4"" class=""video-url...",Family love it !!!: &nbsp;Love this stuff !!!! Guilt Free perfect if your in a weight loss journey like I am!!! Easy to cook !!!! Will order more
1350,"ZERO CALORIES, ZERO CARBS and EXACTLY like spaghetti. Miracle noodles, indeed. Changed my life!: This product has seriously changed my LIFE. I fight every day to keep my weight at its current level, and I simply must avoid carbs. The conflict is ...","ZERO CALORIES, ZERO CARBS and EXACTLY like spaghetti. Miracle noodles, indeed. Changed my life!: This product has seriously changed my LIFE. I fight every day to keep my weight at its current level, and I simply must avoid carbs. The conflict is ..."
1498,"These make your plate ""full"" and plenty.: Holly guacamole, I love these things! Follow instructions and get creative with spices and sauces. The ""funky"" smell so many have mentioned is no big deal and goes away. Texture is good, specially if you ...","These make your plate ""full"" and plenty.: Holly guacamole, I love these things! Follow instructions and get creative with spices and sauces. The ""funky"" smell so many have mentioned is no big deal and goes away. Texture is good, specially if you ..."
1514,"Meh! Disappointing..............Tastes NOTHING Like Real Rice!!!: When I decided to buy this&nbsp;<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Rice/dp/B00BP36S7U/ref=cm_cr_arp_d_rvw_txt?ie=UTF8"">Miracle Noodle Ri...","Meh! Disappointing..............Tastes NOTHING Like Real Rice!!!: When I decided to buy this&nbsp;Miracle Noodle Rice&nbsp;I did so, after having bought the&nbsp;Miracle Noodle Angel Hair Pasta&nbsp;and ABSOLUTELY LOVING it. Because I am on a&nbs..."
2288,"Great product.: Great product.<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Zero-Carb-Gluten-Free-Shirataki-Pasta-and-Rice-6-bag-Variety-Pack-44-ounces-Includes-2-Shirataki-Angel-Hair-2-Shirataki-Rice-and-2-Shirat...","Great product.: Great product.Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta and Rice, 6 bag Variety Pack, 44 ounces (Includes: 2 Shirataki Angel Hair, 2 Shirataki Rice and 2 Shirataki Fettuccini)"
2378,"I Can Have Noodles Again! Now, If Only There Could Be a Similar Zero-Carb or Low-Carb Equivalent for Bagels & Crusty Baguettes!: Earlier this year, I started a wheat-free and low-carb, mostly grain-free&nbsp;<a data-hook=""product-link-linked"" cla...","I Can Have Noodles Again! Now, If Only There Could Be a Similar Zero-Carb or Low-Carb Equivalent for Bagels & Crusty Baguettes!: Earlier this year, I started a wheat-free and low-carb, mostly grain-free&nbsp;Wheat Belly&nbsp;diet, and among the m..."
2404,"Delicious!!: We love all the Miracle noodles but the&nbsp;<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Shirataki-Zero-Carb-Gluten-Free-Pasta-Garlic-and-Herb-Fettuccini-7-Ounce/dp/B01N91YE5A/ref=cm_cr_arp_d_rvw_tx...","Delicious!!: We love all the Miracle noodles but the&nbsp;Miracle Noodle Shirataki Zero Carb Gluten Free Pasta, Garlic and Herb Fettuccini, 7 Ounce&nbsp;is especially great. Marry it to some canned clams and basil and herbs fettuccine sauce and ..."


### Replace Links with `[LINK]`

In [81]:
regex_url = "https?:\/\/(?:www\.)?[^\s]+"
df.loc[df['review-text-full'].str.contains(regex_url), compare_cols]

Unnamed: 0,review-text-full_raw,review-text-full
75,"Great alternative to heavy strarches!: We have eaten shirataki noodles for many years because of my husbands diabetes, but this was our first time trying the shirataki Miracle Rice, and it was fantastic! (I actually like it more than the noodles....","Great alternative to heavy strarches!: We have eaten shirataki noodles for many years because of my husbands diabetes, but this was our first time trying the shirataki Miracle Rice, and it was fantastic! (I actually like it more than the noodles...."
1061,"Disgusting: This stuff is repugnant. I cooked the ""Fettuccine"" noodles exactly as specified on the Miracle Noodle website, https://www.miraclenoodle.com/t-how-to-cook-shirataki-noodles.aspx - to summarize:\n\n1. Remove from package, rinse for 1-...","Disgusting: This stuff is repugnant. I cooked the ""Fettuccine"" noodles exactly as specified on the Miracle Noodle website, https://www.miraclenoodle.com/t-how-to-cook-shirataki-noodles.aspx - to summarize:\n\n1. Remove from package, rinse for 1-..."
1473,http: //www. amazon. com/gp/product/B007JINB0W?: http://www.amazon.com/gp/product/B007JINB0W?redirect=true&ref_=cm_cr_ryp_prd_ttl_sol_37,http: //www. amazon. com/gp/product/B007JINB0W?: http://www.amazon.com/gp/product/B007JINB0W?redirect=true&ref_=cm_cr_ryp_prd_ttl_sol_37
2175,"surprisingly decent: I first heard about these Shirataki noodles on an episode of BEGIN Japanology dealing with potatos: https://www.youtube.com/watch?v=FPwbbdo2p6c\n\nSeemed too good to be true - a food product that's almost entirely fiber, wit...","surprisingly decent: I first heard about these Shirataki noodles on an episode of BEGIN Japanology dealing with potatos: https://www.youtube.com/watch?v=FPwbbdo2p6c\n\nSeemed too good to be true - a food product that's almost entirely fiber, wit..."
3816,"Hard to chew....: I used to buy yam noodles in the local asian market. I love them and wanted to find them on Amazon, and they are (http://www.amazon.com/JFC-Brown-Shirataki-Yam-Noodles/dp/B002FDW6H0/ref=sr_1_cc_3?s=aps&ie=UTF8&qid=1395427420&sr=...","Hard to chew....: I used to buy yam noodles in the local asian market. I love them and wanted to find them on Amazon, and they are (http://www.amazon.com/JFC-Brown-Shirataki-Yam-Noodles/dp/B002FDW6H0/ref=sr_1_cc_3?s=aps&ie=UTF8&qid=1395427420&sr=..."


In [82]:
df['review-text-full'] = df['review-text-full'].str.replace(regex_url, '[LINK]', regex=True)
df

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw
0,B007JINB0W,A1P9BVW2JB1OVL,"This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just couldn't get past the...",Odd chewy texture,3.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Odd chewy texture: This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just co...","Odd chewy texture: This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just co..."
1,B007JINB0W,A5JZ2DBS9H3F6,They smell of fish and have a rubbery hard to chew texture. Yuck,Fishy gross,1.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Fishy gross: They smell of fish and have a rubbery hard to chew texture. Yuck,Fishy gross: They smell of fish and have a rubbery hard to chew texture. Yuck
2,B007JINB0W,A3VYMBAX7IFV3B,MOM DID NOT LIKE THESE,One Star,1.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,One Star: MOM DID NOT LIKE THESE,One Star: MOM DID NOT LIKE THESE
3,B007JINB0W,A25MLB8QXVM2LS,The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol or claim that the online description shows.,The noodles themselves are fine. The Amazon label description claims they are certified ...,2.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,The noodles themselves are fine. The Amazon label description claims they are certified ...: The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol ...,The noodles themselves are fine. The Amazon label description claims they are certified ...: The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol ...
4,B007JINB0W,A2DZN9RBFVVY7L,"So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a try. Out of the bag, they d...",What's important to you?,4.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"What's important to you?: So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a t...","What's important to you?: So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a t..."
...,...,...,...,...,...,...,...,...,...,...,...
4358,B007JINB0W,A2GQJHRX6192Q8,"Good product overall, but one of the enclosed bags had noodles that were not edible, as they were dried out and hard as leather.",I like the product but one of the bags was not edible.,4.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"I like the product but one of the bags was not edible.: Good product overall, but one of the enclosed bags had noodles that were not edible, as they were dried out and hard as leather.","I like the product but one of the bags was not edible.: Good product overall, but one of the enclosed bags had noodles that were not edible, as they were dried out and hard as leather."
4359,B007JINB0W,A32VFZJ4BR737W,"The packages of noodles appeared to be intact, but the box was soggy. Foul odor. I am unfamiliar with product, but I did not want to risk it... I threw it all away.",Very disappointed.,1.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Very disappointed.: The packages of noodles appeared to be intact, but the box was soggy. Foul odor. I am unfamiliar with product, but I did not want to risk it... I threw it all away.","Very disappointed.: The packages of noodles appeared to be intact, but the box was soggy. Foul odor. I am unfamiliar with product, but I did not want to risk it... I threw it all away."
4360,B007JINB0W,A9QJH6RPTSEUB,"As many others have stated, these are not remotely similar to rice or a noodle. The bag feels like a squishy wet diaper, and smells worse than a diaper when you open it! The smell does mostly go away after you drain and rinse, but not completely....",Gross rubbery nastiness!,1.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Gross rubbery nastiness!: As many others have stated, these are not remotely similar to rice or a noodle. The bag feels like a squishy wet diaper, and smells worse than a diaper when you open it! The smell does mostly go away after you drain and ...","Gross rubbery nastiness!: As many others have stated, these are not remotely similar to rice or a noodle. The bag feels like a squishy wet diaper, and smells worse than a diaper when you open it! The smell does mostly go away after you drain and ..."
4361,B007JINB0W,A3SZZEPNVX3P3I,Love these,Five Stars,5.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Five Stars: Love these,Five Stars: Love these


In [83]:
df.loc[df['review-text-full'].str.contains('http'), compare_cols]

Unnamed: 0,review-text-full_raw,review-text-full
1473,http: //www. amazon. com/gp/product/B007JINB0W?: http://www.amazon.com/gp/product/B007JINB0W?redirect=true&ref_=cm_cr_ryp_prd_ttl_sol_37,http: //www. amazon. com/gp/product/B007JINB0W?: [LINK]


## Part 2) Spacy Preprocessing for EDA

In [84]:
# combined_reviews_fname = FPATHS['data']['raw']['movie-reviews']
# reviews.to_csv(combined_reviews_fname, index=False, compression='gzip')

In [85]:
# del reviews

**1) Data Preprocessing:**

- Load and inspect the dataset.
    - How many reviews?
    - What does the distribution of ratings look like?
    - Any null values?



- Use the rating column to create a new target column with two groups: high-rating and low-rating groups.
    - We recommend defining "High-rating" reviews as any review with a rating >=9; and "Low-rating" reviews as any review with a rating <=4. The middle ratings between 4 and 9 will be excluded from the analysis.
    - You may use an alternative definition for High and Low reviews, but justify your choice in your notebook/README.



- Utilize NLTK and SpaCy for basic text processing, including:

    - removing stopwords
    - tokenization
    - lemmatization
    - Tips:
        - Be sure to create a custom nlp object and disable the named entity recognizer. Otherwise, processing will take a very long time!
        - **You will want to create several versions of the data, lemmatized, tokenized, lemmatized, and joined back to one string per review, and tokenized and joined back to one string per review.** This will be useful for different analysis and modeling techniques.

    

- Save your processed data frame in a **joblib** file saved in the "Data-NLP/" folder for future modeling.

    

In [86]:
# import spacy
# # Disable parser and ner
# nlp_light = spacy.load("en_core_web_sm", disable=['parser','ner'])
# # Print active components
# nlp_light.pipe_names

In [87]:
import spacy
# Custom NLP Object
nlp_custom = ds.nlp.make_custom_nlp(disable=['ner'],#'parser'],
                                contractions=[],
                            stopwords_to_add=["★"])
nlp_custom

<spacy.lang.en.English at 0x2b9cbc220>

> Changed review_text column to remove HTML and URLs as of 01/22/24

In [88]:
%%time
print(f"- Running full spacy preprocessing code (this will take several minutes).")
df = df.copy()
df["tokens-dirty"] = ds.nlp.batch_preprocess_texts(
    df["review-text-full"],
    remove_stopwords=False,
    remove_punct=True,
    use_lemmas=False,
    nlp=nlp_custom,
)
df["tokens"] = ds.nlp.batch_preprocess_texts(
    df["review-text-full"],
    remove_stopwords=True,
    remove_punct=True,
    use_lemmas=False,
    nlp=nlp_custom,
)
df["lemmas"] = ds.nlp.batch_preprocess_texts(
    df["review-text-full"],
    remove_stopwords=True,
    remove_punct=True,
    use_lemmas=True,
    nlp=nlp_custom,
)

## Make string versions of processed text
df["tokens-dirty-joined"] = df["tokens-dirty"].map(lambda x: " ".join(x))
df["tokens-joined"] = df["tokens"].map(lambda x: " ".join(x))
df["lemmas-joined"] = df["lemmas"].map(lambda x: " ".join(x))

df.head()

- Running full spacy preprocessing code (this will take several minutes).


4363it [00:46, 93.26it/s]  
4363it [00:46, 93.20it/s]  
4363it [00:47, 92.58it/s]  

CPU times: user 9.76 s, sys: 1.28 s, total: 11 s
Wall time: 2min 20s





Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw,tokens-dirty,tokens,lemmas,tokens-dirty-joined,tokens-joined,lemmas-joined
0,B007JINB0W,A1P9BVW2JB1OVL,"This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just couldn't get past the...",Odd chewy texture,3.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Odd chewy texture: This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just co...","Odd chewy texture: This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just co...","[odd, chewy, texture, this, has, a, odd, chewy, texture, and, not, much, flavor, but, used, as, a, substitute, for, pasta, it, helps, cut, calories, and, carbs, i, can, tolerate, it, but, it, is, n't, really, tasty, surprisingly, my, husband, enj...","[odd, chewy, texture, odd, chewy, texture, flavor, substitute, pasta, helps, cut, calories, carbs, tolerate, tasty, surprisingly, husband, enjoyed, past, texture, rubber, noodles]","[odd, chewy, texture, odd, chewy, texture, flavor, substitute, pasta, help, cut, calorie, carb, tolerate, tasty, surprisingly, husband, enjoy, past, texture, rubber, noodle]",odd chewy texture this has a odd chewy texture and not much flavor but used as a substitute for pasta it helps cut calories and carbs i can tolerate it but it is n't really tasty surprisingly my husband enjoyed it more than i did i just could n't...,odd chewy texture odd chewy texture flavor substitute pasta helps cut calories carbs tolerate tasty surprisingly husband enjoyed past texture rubber noodles,odd chewy texture odd chewy texture flavor substitute pasta help cut calorie carb tolerate tasty surprisingly husband enjoy past texture rubber noodle
1,B007JINB0W,A5JZ2DBS9H3F6,They smell of fish and have a rubbery hard to chew texture. Yuck,Fishy gross,1.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Fishy gross: They smell of fish and have a rubbery hard to chew texture. Yuck,Fishy gross: They smell of fish and have a rubbery hard to chew texture. Yuck,"[fishy, gross, they, smell, of, fish, and, have, a, rubbery, hard, to, chew, texture, yuck]","[fishy, gross, smell, fish, rubbery, hard, chew, texture, yuck]","[fishy, gross, smell, fish, rubbery, hard, chew, texture, yuck]",fishy gross they smell of fish and have a rubbery hard to chew texture yuck,fishy gross smell fish rubbery hard chew texture yuck,fishy gross smell fish rubbery hard chew texture yuck
2,B007JINB0W,A3VYMBAX7IFV3B,MOM DID NOT LIKE THESE,One Star,1.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,One Star: MOM DID NOT LIKE THESE,One Star: MOM DID NOT LIKE THESE,"[one, star, mom, did, not, like, these]","[star, mom, like]","[star, mom, like]",one star mom did not like these,star mom like,star mom like
3,B007JINB0W,A25MLB8QXVM2LS,The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol or claim that the online description shows.,The noodles themselves are fine. The Amazon label description claims they are certified ...,2.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,The noodles themselves are fine. The Amazon label description claims they are certified ...: The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol ...,The noodles themselves are fine. The Amazon label description claims they are certified ...: The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol ...,"[the, noodles, themselves, are, fine, the, amazon, label, description, claims, they, are, certified, the, noodles, themselves, are, fine, the, amazon, label, description, claims, they, are, certified, non, gmo, and, vegan, but, the, actual, packa...","[noodles, fine, amazon, label, description, claims, certified, noodles, fine, amazon, label, description, claims, certified, non, gmo, vegan, actual, package, contains, symbol, claim, online, description, shows]","[noodle, fine, amazon, label, description, claim, certify, noodle, fine, amazon, label, description, claim, certify, non, gmo, vegan, actual, package, contain, symbol, claim, online, description, show]",the noodles themselves are fine the amazon label description claims they are certified the noodles themselves are fine the amazon label description claims they are certified non gmo and vegan but the actual package contains no such symbol or clai...,noodles fine amazon label description claims certified noodles fine amazon label description claims certified non gmo vegan actual package contains symbol claim online description shows,noodle fine amazon label description claim certify noodle fine amazon label description claim certify non gmo vegan actual package contain symbol claim online description show
4,B007JINB0W,A2DZN9RBFVVY7L,"So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a try. Out of the bag, they d...",What's important to you?,4.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"What's important to you?: So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a t...","What's important to you?: So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a t...","[what, 's, important, to, you, so, how, bad, do, you, want, to, restrict, your, calories, i, gain, weight, easily, so, i, 'm, obsessive, about, counting, my, calories, but, many, times, i, am, left, not, feeling, very, full, i, read, about, these...","[important, bad, want, restrict, calories, gain, weight, easily, obsessive, counting, calories, times, left, feeling, read, noodles, decided, try, bag, smell, like, fish, goes, away, rinsing, yes, texture, like, chewing, rubber, bands, tried, tri...","[important, bad, want, restrict, calorie, gain, weight, easily, obsessive, count, calorie, time, leave, feel, read, noodle, decide, try, bag, smell, like, fish, go, away, rinse, yes, texture, like, chew, rubber, band, try, trick, boil, 30, minute...",what 's important to you so how bad do you want to restrict your calories i gain weight easily so i 'm obsessive about counting my calories but many times i am left not feeling very full i read about these noodles and decided to give them a try o...,important bad want restrict calories gain weight easily obsessive counting calories times left feeling read noodles decided try bag smell like fish goes away rinsing yes texture like chewing rubber bands tried tricks boiling 30 minutes frying com...,important bad want restrict calorie gain weight easily obsessive count calorie time leave feel read noodle decide try bag smell like fish go away rinse yes texture like chew rubber band try trick boil 30 minute fry completely dry change texture g...


## Save Preprocessed Reviews

### Saving a JSON file

In [98]:
# df = df.set_index("review_id")#, errors='ignore')
# df

In [107]:
# fpath_json = "Data-NLP/processed-nlp-data.json"
fpath_json = FPATHS['data']['processed']['processed-reviews-spacy_json']
fpath_json

'data/processed/processed-reviews.json'

In [108]:
df.head(2).to_json(orient='index')

'{"0":{"asin":"B007JINB0W","reviewerID":"A1P9BVW2JB1OVL","reviewText":"This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn\'t really tasty. Surprisingly, my husband enjoyed it more than I did. I just couldn\'t get past the texture (rubber noodles).","summary":"Odd chewy texture","overall":3.0,"year":2014,"title":"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)","brand":"Miracle Noodle","category":"Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki","review-text-full":"Odd chewy texture: This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn\'t really tasty. Surprisingly, my husband enjoyed it more than I did. I just couldn\'t get past the texture (rubber noodles).","review-text-full_raw":"Odd chewy texture: This has a odd chewy texture and no

In [109]:
# Save to json
df.to_json(fpath_json)

In [110]:
temp_df = pd.read_json(fpath_json)#.reset_index(drop=False)
temp_df

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw,tokens-dirty,tokens,lemmas,tokens-dirty-joined,tokens-joined,lemmas-joined
0,B007JINB0W,A1P9BVW2JB1OVL,"This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just couldn't get past the...",Odd chewy texture,3,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Odd chewy texture: This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just co...","Odd chewy texture: This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just co...","[odd, chewy, texture, this, has, a, odd, chewy, texture, and, not, much, flavor, but, used, as, a, substitute, for, pasta, it, helps, cut, calories, and, carbs, i, can, tolerate, it, but, it, is, n't, really, tasty, surprisingly, my, husband, enj...","[odd, chewy, texture, odd, chewy, texture, flavor, substitute, pasta, helps, cut, calories, carbs, tolerate, tasty, surprisingly, husband, enjoyed, past, texture, rubber, noodles]","[odd, chewy, texture, odd, chewy, texture, flavor, substitute, pasta, help, cut, calorie, carb, tolerate, tasty, surprisingly, husband, enjoy, past, texture, rubber, noodle]",odd chewy texture this has a odd chewy texture and not much flavor but used as a substitute for pasta it helps cut calories and carbs i can tolerate it but it is n't really tasty surprisingly my husband enjoyed it more than i did i just could n't...,odd chewy texture odd chewy texture flavor substitute pasta helps cut calories carbs tolerate tasty surprisingly husband enjoyed past texture rubber noodles,odd chewy texture odd chewy texture flavor substitute pasta help cut calorie carb tolerate tasty surprisingly husband enjoy past texture rubber noodle
1,B007JINB0W,A5JZ2DBS9H3F6,They smell of fish and have a rubbery hard to chew texture. Yuck,Fishy gross,1,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Fishy gross: They smell of fish and have a rubbery hard to chew texture. Yuck,Fishy gross: They smell of fish and have a rubbery hard to chew texture. Yuck,"[fishy, gross, they, smell, of, fish, and, have, a, rubbery, hard, to, chew, texture, yuck]","[fishy, gross, smell, fish, rubbery, hard, chew, texture, yuck]","[fishy, gross, smell, fish, rubbery, hard, chew, texture, yuck]",fishy gross they smell of fish and have a rubbery hard to chew texture yuck,fishy gross smell fish rubbery hard chew texture yuck,fishy gross smell fish rubbery hard chew texture yuck
2,B007JINB0W,A3VYMBAX7IFV3B,MOM DID NOT LIKE THESE,One Star,1,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,One Star: MOM DID NOT LIKE THESE,One Star: MOM DID NOT LIKE THESE,"[one, star, mom, did, not, like, these]","[star, mom, like]","[star, mom, like]",one star mom did not like these,star mom like,star mom like
3,B007JINB0W,A25MLB8QXVM2LS,The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol or claim that the online description shows.,The noodles themselves are fine. The Amazon label description claims they are certified ...,2,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,The noodles themselves are fine. The Amazon label description claims they are certified ...: The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol ...,The noodles themselves are fine. The Amazon label description claims they are certified ...: The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol ...,"[the, noodles, themselves, are, fine, the, amazon, label, description, claims, they, are, certified, the, noodles, themselves, are, fine, the, amazon, label, description, claims, they, are, certified, non, gmo, and, vegan, but, the, actual, packa...","[noodles, fine, amazon, label, description, claims, certified, noodles, fine, amazon, label, description, claims, certified, non, gmo, vegan, actual, package, contains, symbol, claim, online, description, shows]","[noodle, fine, amazon, label, description, claim, certify, noodle, fine, amazon, label, description, claim, certify, non, gmo, vegan, actual, package, contain, symbol, claim, online, description, show]",the noodles themselves are fine the amazon label description claims they are certified the noodles themselves are fine the amazon label description claims they are certified non gmo and vegan but the actual package contains no such symbol or clai...,noodles fine amazon label description claims certified noodles fine amazon label description claims certified non gmo vegan actual package contains symbol claim online description shows,noodle fine amazon label description claim certify noodle fine amazon label description claim certify non gmo vegan actual package contain symbol claim online description show
4,B007JINB0W,A2DZN9RBFVVY7L,"So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a try. Out of the bag, they d...",What's important to you?,4,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"What's important to you?: So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a t...","What's important to you?: So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a t...","[what, 's, important, to, you, so, how, bad, do, you, want, to, restrict, your, calories, i, gain, weight, easily, so, i, 'm, obsessive, about, counting, my, calories, but, many, times, i, am, left, not, feeling, very, full, i, read, about, these...","[important, bad, want, restrict, calories, gain, weight, easily, obsessive, counting, calories, times, left, feeling, read, noodles, decided, try, bag, smell, like, fish, goes, away, rinsing, yes, texture, like, chewing, rubber, bands, tried, tri...","[important, bad, want, restrict, calorie, gain, weight, easily, obsessive, count, calorie, time, leave, feel, read, noodle, decide, try, bag, smell, like, fish, go, away, rinse, yes, texture, like, chew, rubber, band, try, trick, boil, 30, minute...",what 's important to you so how bad do you want to restrict your calories i gain weight easily so i 'm obsessive about counting my calories but many times i am left not feeling very full i read about these noodles and decided to give them a try o...,important bad want restrict calories gain weight easily obsessive counting calories times left feeling read noodles decided try bag smell like fish goes away rinsing yes texture like chewing rubber bands tried tricks boiling 30 minutes frying com...,important bad want restrict calorie gain weight easily obsessive count calorie time leave feel read noodle decide try bag smell like fish go away rinse yes texture like chew rubber band try trick boil 30 minute fry completely dry change texture g...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4358,B007JINB0W,A2GQJHRX6192Q8,"Good product overall, but one of the enclosed bags had noodles that were not edible, as they were dried out and hard as leather.",I like the product but one of the bags was not edible.,4,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"I like the product but one of the bags was not edible.: Good product overall, but one of the enclosed bags had noodles that were not edible, as they were dried out and hard as leather.","I like the product but one of the bags was not edible.: Good product overall, but one of the enclosed bags had noodles that were not edible, as they were dried out and hard as leather.","[i, like, the, product, but, one, of, the, bags, was, not, edible, good, product, overall, but, one, of, the, enclosed, bags, had, noodles, that, were, not, edible, as, they, were, dried, out, and, hard, as, leather]","[like, product, bags, edible, good, product, overall, enclosed, bags, noodles, edible, dried, hard, leather]","[like, product, bag, edible, good, product, overall, enclose, bag, noodle, edible, dry, hard, leather]",i like the product but one of the bags was not edible good product overall but one of the enclosed bags had noodles that were not edible as they were dried out and hard as leather,like product bags edible good product overall enclosed bags noodles edible dried hard leather,like product bag edible good product overall enclose bag noodle edible dry hard leather
4359,B007JINB0W,A32VFZJ4BR737W,"The packages of noodles appeared to be intact, but the box was soggy. Foul odor. I am unfamiliar with product, but I did not want to risk it... I threw it all away.",Very disappointed.,1,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Very disappointed.: The packages of noodles appeared to be intact, but the box was soggy. Foul odor. I am unfamiliar with product, but I did not want to risk it... I threw it all away.","Very disappointed.: The packages of noodles appeared to be intact, but the box was soggy. Foul odor. I am unfamiliar with product, but I did not want to risk it... I threw it all away.","[very, disappointed, the, packages, of, noodles, appeared, to, be, intact, but, the, box, was, soggy, foul, odor, i, am, unfamiliar, with, product, but, i, did, not, want, to, risk, it, i, threw, it, all, away]","[disappointed, packages, noodles, appeared, intact, box, soggy, foul, odor, unfamiliar, product, want, risk, threw, away]","[disappointed, package, noodle, appear, intact, box, soggy, foul, odor, unfamiliar, product, want, risk, throw, away]",very disappointed the packages of noodles appeared to be intact but the box was soggy foul odor i am unfamiliar with product but i did not want to risk it i threw it all away,disappointed packages noodles appeared intact box soggy foul odor unfamiliar product want risk threw away,disappointed package noodle appear intact box soggy foul odor unfamiliar product want risk throw away
4360,B007JINB0W,A9QJH6RPTSEUB,"As many others have stated, these are not remotely similar to rice or a noodle. The bag feels like a squishy wet diaper, and smells worse than a diaper when you open it! The smell does mostly go away after you drain and rinse, but not completely....",Gross rubbery nastiness!,1,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Gross rubbery nastiness!: As many others have stated, these are not remotely similar to rice or a noodle. The bag feels like a squishy wet diaper, and smells worse than a diaper when you open it! The smell does mostly go away after you drain and ...","Gross rubbery nastiness!: As many others have stated, these are not remotely similar to rice or a noodle. The bag feels like a squishy wet diaper, and smells worse than a diaper when you open it! The smell does mostly go away after you drain and ...","[gross, rubbery, nastiness, as, many, others, have, stated, these, are, not, remotely, similar, to, rice, or, a, noodle, the, bag, feels, like, a, squishy, wet, diaper, and, smells, worse, than, a, diaper, when, you, open, it, the, smell, does, m...","[gross, rubbery, nastiness, stated, remotely, similar, rice, noodle, bag, feels, like, squishy, wet, diaper, smells, worse, diaper, open, smell, away, drain, rinse, completely, chewing, making, teeth, squeak, weird, unpleasant, asked, refund, cha...","[gross, rubbery, nastiness, state, remotely, similar, rice, noodle, bag, feel, like, squishy, wet, diaper, smell, bad, diaper, open, smell, away, drain, rinse, completely, chew, make, tooth, squeak, weird, unpleasant, ask, refund, chalk, lesson, ...",gross rubbery nastiness as many others have stated these are not remotely similar to rice or a noodle the bag feels like a squishy wet diaper and smells worse than a diaper when you open it the smell does mostly go away after you drain and rinse ...,gross rubbery nastiness stated remotely similar rice noodle bag feels like squishy wet diaper smells worse diaper open smell away drain rinse completely chewing making teeth squeak weird unpleasant asked refund chalked lesson learned replacement ...,gross rubbery nastiness state remotely similar rice noodle bag feel like squishy wet diaper smell bad diaper open smell away drain rinse completely chew make tooth squeak weird unpleasant ask refund chalk lesson learn replacement food
4361,B007JINB0W,A3SZZEPNVX3P3I,Love these,Five Stars,5,2015,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Five Stars: Love these,Five Stars: Love these,"[five, stars, love, these]","[stars, love]","[star, love]",five stars love these,stars love,star love


In [111]:
type(temp_df.loc[0, 'tokens'])

list

### Save Joblib

In [112]:
import joblib
fpath_joblib = FPATHS['data']['processed']['processed-reviews-spacy_joblib']
fpath_joblib

'data/processed/processed-reviews.joblib'

In [113]:
# Dump to selectd fpath
joblib.dump(df, fpath_joblib)

['data/processed/processed-reviews.joblib']

In [115]:
# confirming saved properly
loaded = joblib.load(FPATHS['data']['processed']['processed-reviews-spacy_joblib'])
loaded.head()

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw,tokens-dirty,tokens,lemmas,tokens-dirty-joined,tokens-joined,lemmas-joined
0,B007JINB0W,A1P9BVW2JB1OVL,"This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just couldn't get past the...",Odd chewy texture,3.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"Odd chewy texture: This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just co...","Odd chewy texture: This has a odd chewy texture and not much flavor, but used as a substitute for pasta, it helps cut calories and carbs. I can tolerate it, but it isn't really tasty. Surprisingly, my husband enjoyed it more than I did. I just co...","[odd, chewy, texture, this, has, a, odd, chewy, texture, and, not, much, flavor, but, used, as, a, substitute, for, pasta, it, helps, cut, calories, and, carbs, i, can, tolerate, it, but, it, is, n't, really, tasty, surprisingly, my, husband, enj...","[odd, chewy, texture, odd, chewy, texture, flavor, substitute, pasta, helps, cut, calories, carbs, tolerate, tasty, surprisingly, husband, enjoyed, past, texture, rubber, noodles]","[odd, chewy, texture, odd, chewy, texture, flavor, substitute, pasta, help, cut, calorie, carb, tolerate, tasty, surprisingly, husband, enjoy, past, texture, rubber, noodle]",odd chewy texture this has a odd chewy texture and not much flavor but used as a substitute for pasta it helps cut calories and carbs i can tolerate it but it is n't really tasty surprisingly my husband enjoyed it more than i did i just could n't...,odd chewy texture odd chewy texture flavor substitute pasta helps cut calories carbs tolerate tasty surprisingly husband enjoyed past texture rubber noodles,odd chewy texture odd chewy texture flavor substitute pasta help cut calorie carb tolerate tasty surprisingly husband enjoy past texture rubber noodle
1,B007JINB0W,A5JZ2DBS9H3F6,They smell of fish and have a rubbery hard to chew texture. Yuck,Fishy gross,1.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,Fishy gross: They smell of fish and have a rubbery hard to chew texture. Yuck,Fishy gross: They smell of fish and have a rubbery hard to chew texture. Yuck,"[fishy, gross, they, smell, of, fish, and, have, a, rubbery, hard, to, chew, texture, yuck]","[fishy, gross, smell, fish, rubbery, hard, chew, texture, yuck]","[fishy, gross, smell, fish, rubbery, hard, chew, texture, yuck]",fishy gross they smell of fish and have a rubbery hard to chew texture yuck,fishy gross smell fish rubbery hard chew texture yuck,fishy gross smell fish rubbery hard chew texture yuck
2,B007JINB0W,A3VYMBAX7IFV3B,MOM DID NOT LIKE THESE,One Star,1.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,One Star: MOM DID NOT LIKE THESE,One Star: MOM DID NOT LIKE THESE,"[one, star, mom, did, not, like, these]","[star, mom, like]","[star, mom, like]",one star mom did not like these,star mom like,star mom like
3,B007JINB0W,A25MLB8QXVM2LS,The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol or claim that the online description shows.,The noodles themselves are fine. The Amazon label description claims they are certified ...,2.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,The noodles themselves are fine. The Amazon label description claims they are certified ...: The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol ...,The noodles themselves are fine. The Amazon label description claims they are certified ...: The noodles themselves are fine. The Amazon label description claims they are certified non GMO and Vegan but the actual package contains no such symbol ...,"[the, noodles, themselves, are, fine, the, amazon, label, description, claims, they, are, certified, the, noodles, themselves, are, fine, the, amazon, label, description, claims, they, are, certified, non, gmo, and, vegan, but, the, actual, packa...","[noodles, fine, amazon, label, description, claims, certified, noodles, fine, amazon, label, description, claims, certified, non, gmo, vegan, actual, package, contains, symbol, claim, online, description, shows]","[noodle, fine, amazon, label, description, claim, certify, noodle, fine, amazon, label, description, claim, certify, non, gmo, vegan, actual, package, contain, symbol, claim, online, description, show]",the noodles themselves are fine the amazon label description claims they are certified the noodles themselves are fine the amazon label description claims they are certified non gmo and vegan but the actual package contains no such symbol or clai...,noodles fine amazon label description claims certified noodles fine amazon label description claims certified non gmo vegan actual package contains symbol claim online description shows,noodle fine amazon label description claim certify noodle fine amazon label description claim certify non gmo vegan actual package contain symbol claim online description show
4,B007JINB0W,A2DZN9RBFVVY7L,"So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a try. Out of the bag, they d...",What's important to you?,4.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,Grocery & Gourmet Food; Pasta & Noodles; Noodles; Shirataki,"What's important to you?: So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a t...","What's important to you?: So how bad do you want to restrict your calories? I gain weight easily, so I'm obsessive about counting my calories, but many times I am left not feeling very full. I read about these noodles and decided to give them a t...","[what, 's, important, to, you, so, how, bad, do, you, want, to, restrict, your, calories, i, gain, weight, easily, so, i, 'm, obsessive, about, counting, my, calories, but, many, times, i, am, left, not, feeling, very, full, i, read, about, these...","[important, bad, want, restrict, calories, gain, weight, easily, obsessive, counting, calories, times, left, feeling, read, noodles, decided, try, bag, smell, like, fish, goes, away, rinsing, yes, texture, like, chewing, rubber, bands, tried, tri...","[important, bad, want, restrict, calorie, gain, weight, easily, obsessive, count, calorie, time, leave, feel, read, noodle, decide, try, bag, smell, like, fish, go, away, rinse, yes, texture, like, chew, rubber, band, try, trick, boil, 30, minute...",what 's important to you so how bad do you want to restrict your calories i gain weight easily so i 'm obsessive about counting my calories but many times i am left not feeling very full i read about these noodles and decided to give them a try o...,important bad want restrict calories gain weight easily obsessive counting calories times left feeling read noodles decided try bag smell like fish goes away rinsing yes texture like chewing rubber bands tried tricks boiling 30 minutes frying com...,important bad want restrict calorie gain weight easily obsessive count calorie time leave feel read noodle decide try bag smell like fish go away rinse yes texture like chew rubber band try trick boil 30 minute fry completely dry change texture g...
