# Part 02A - NLP Preprocessing of Amazon Reviews (Spacy)

### Introduction

Preprocessing text data is a crucial step in any natural language processing (NLP) project, ensuring that the data is clean and ready for analysis. In this notebook, we focus on preparing Amazon reviews for the "Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair" product using SpaCy, a powerful NLP library. This preprocessing step involves several key tasks to clean and transform the text data into a usable format.

#### Objectives

1. **Load and Inspect Data**:
   - Efficiently load the raw review files and metadata for the selected product.
   - Examine the structure and basic statistics of the dataset, identifying any missing values or duplicates.

2. **Clean the Text Data**:
   - Combine the review text and summary into a single comprehensive text column.
   - Remove unwanted elements such as HTML tags and URLs to ensure clean text.

3. **Preprocess Text with SpaCy**:
   - Create a custom SpaCy NLP pipeline to tokenize, remove stopwords, and lemmatize the text.
   - Generate multiple versions of the processed text for different analytical purposes.

4. **Save the Processed Data**:
   - Save the cleaned and preprocessed reviews in a JSON file for use in subsequent analysis and modeling steps.

By the end of this notebook, we will have a well-preprocessed set of Amazon reviews, ready for detailed analysis to extract valuable insights into customer preferences and sentiments.


### Amazon Data Intro

In [2]:
from IPython.display import display, Markdown
with open("data/Amazon Product Reviews.md") as f:
    info = f.read()

display(Markdown(info))

# Amazon Product Reviews

- URL: https://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_reviews 

## Description

This is a large crawl of product reviews from Amazon. This dataset contains 82.83 million unique reviews, from around 20 million users.

## Basic statistics

| Ratings:  | 82.83 million        |
| --------- | -------------------- |
| Users:    | 20.98 million        |
| Items:    | 9.35 million         |
| Timespan: | May 1996 - July 2014 |

## Metadata

- reviews and ratings
- item-to-item relationships (e.g. "people who bought X also bought Y")
- timestamps
- helpfulness votes
- product image (and CNN features)
- price
- category
- salesRank

## Example

```
{  "reviewerID": "A2SUAM1J3GNN3B",  "asin": "0000013714",  "reviewerName": "J. McDonald",  "helpful": [2, 3],  "reviewText": "I bought this for my husband who plays the piano.  He is having a wonderful time playing these old hymns.  The music  is at times hard to read because we think the book was published for singing from more than playing from.  Great purchase though!",  "overall": 5.0,  "summary": "Heavenly Highway Hymns",  "unixReviewTime": 1252800000,  "reviewTime": "09 13, 2009" }
```

## Download link

See the [Amazon Dataset Page](https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/) for download information.

The 2014 version of this dataset is [also available](https://cseweb.ucsd.edu/~jmcauley/datasets/amazon/links.html).

## Citation

Please cite the following if you use the data:

**Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering**

R. He, J. McAuley

*WWW*, 2016
[pdf](https://cseweb.ucsd.edu/~jmcauley/pdfs/www16a.pdf)

**Image-based recommendations on styles and substitutes**

J. McAuley, C. Targett, J. Shi, A. van den Hengel

*SIGIR*, 2015
[pdf](https://cseweb.ucsd.edu/~jmcauley/pdfs/sigir15.pdf)

In [3]:
import os, sys, joblib,json
# sys.path.append(os.path.abspath("../NLP/"))
# sys.path.append(os.path.abspath("../"))
# sys.path.append(os.path.abspath("../../"))
%load_ext autoreload
%autoreload 2
    
# import custom_functions as fn
# import project_functions as pf


In [4]:

%pip install -U dojo_ds
# %pip install dojo-ds==1.1.12
import dojo_ds as ds
ds.__version__

Note: you may need to restart the kernel to use updated packages.


'1.1.13'

In [5]:
import matplotlib.pyplot as plt
import missingno
import matplotlib as mpl
import seaborn as sns
import numpy as np
import pandas as pd

pd.set_option("display.max_columns",50)
# pd.set_option('display.max_colwidth', 250)

fav_style = ('ggplot','tableau-colorblind10')
fav_context  ={'context':'notebook', 'font_scale':1.1}
plt.style.use(fav_style)
sns.set_context(**fav_context)
plt.rcParams['savefig.transparent'] = False
plt.rcParams['savefig.bbox'] = 'tight'

In [6]:
from pprint import pprint
FPATHS_FILE = "config/filepaths.json"
with open(FPATHS_FILE) as f:
    FPATHS = json.load(f)
pprint(FPATHS)

{'data': {'app': {'product-metadata-llm_json': 'data/metadata/product-info.json',
                  'product-metadata_json': 'data/metadata/amazon-metadata_selected-asins-only.json',
                  'reviews-with-target-for-llm_csv': 'app-assets/reviews-for-llm.csv',
                  'vector-db_dir': './app-assets/reviews_db'},
          'cleaned': {'asin-id-title-dict_json': 'data/metadata/amazon-groceries-asin-titles-lookup.json',
                      'metadata_csv-gz': 'data/metadata/amazon-metadata-groceries-combined.csv.gz',
                      'reviews-by-years_dict': {'dir': 'data/reviews-by-year/',
                                                'glob': 'data/reviews-by-year/*.*'}},
          'ml-nlp': {'test_joblib': 'data/modeling/testing-data.joblib',
                     'train_joblib': 'data/modeling/training-data.joblib'},
          'nn-nlp': {'test_dir': 'data/modeling/testing-data-tf/',
                     'train_dir': 'data/modeling/training-data-tf/',
         

# Load the Data

We will load our **corpus** of Amazon Reviews for Miracle Noodle products.

In [7]:
import boto3
s3 = boto3.client('s3')
# List all files in the bucket
response = s3.list_objects_v2(Bucket=FPATHS['data']['s3']['bucket'])
for obj in response['Contents']:
    print(obj['Key'])

Grocery_and_Gourmet_Food_5.json.gz
amazon-metadata-groceries-combined.csv.gz
amazon-reviews-subset-brand-Miracle Noodle.csv
meta_Grocery_and_Gourmet_Food.json.gz
processed-reviews.json
reviews-df-final.csv.gz


In [8]:
fpath_reviews = FPATHS['data']['subset']['reviews-subset_selected-brand_csv']
fpath_reviews

'data/subset/amazon-reviews-subset-brand-Miracle Noodle.csv'

In [9]:
# s3.download_file(FPATHS['data']['s3']['bucket'], 'reviews-df-final.csv.gz', fpath_reviews)

In [10]:
df = pd.read_csv(fpath_reviews)#'data/subset/amazon-reviews-subset-brand-Miracle Noodle.csv.gz')
df.head()

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category
0,B007JINB0W,A2RQQKUDKUPUO9,I was reading reviews on this product and was ...,I was reading reviews on this product and was ...,1.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ..."
1,B007JINB0W,A1DW1LKZEWPKNC,Working on the low carb lifestyle and would no...,Working on the low carb lifestyle and would no...,4.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ..."
2,B007JINB0W,A28C1309S1WFLR,I followed the directions other people posted ...,Not Bad,4.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ..."
3,B007JINB0W,A2QIJY7RFGZC23,I have tried soooo many pasta substitutes. The...,MUST have for a pasta fanatic on a low/lower c...,5.0,2012,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ..."
4,B007JINB0W,A3CU5S5P90JUIX,"Follow the instructions, rinse and boil. They...",They have no bad flavor and provide the rice/n...,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ..."


In [11]:
df.isna().sum()

asin          0
reviewerID    0
reviewText    0
summary       0
overall       0
year          0
title         0
brand         0
category      0
dtype: int64

In [12]:
# Check for duplicated review text
df.duplicated(subset=['reviewerID','reviewText']).sum()

0

In [13]:
df.shape

(4363, 9)

### Combine All Review Text

- The reviews are split into 2 parts. The reviewText, which is the majority of the review, and the summary, which is a 1-line summary of the review (that often includes the actual rating: e.g., "Fours stars- best vacuum)

In [14]:
df['review-text-full'] = df['summary'] + ": " + df['reviewText']
df.head()

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full
0,B007JINB0W,A2RQQKUDKUPUO9,I was reading reviews on this product and was ...,I was reading reviews on this product and was ...,1.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",I was reading reviews on this product and was ...
1,B007JINB0W,A1DW1LKZEWPKNC,Working on the low carb lifestyle and would no...,Working on the low carb lifestyle and would no...,4.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",Working on the low carb lifestyle and would no...
2,B007JINB0W,A28C1309S1WFLR,I followed the directions other people posted ...,Not Bad,4.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",Not Bad: I followed the directions other peopl...
3,B007JINB0W,A2QIJY7RFGZC23,I have tried soooo many pasta substitutes. The...,MUST have for a pasta fanatic on a low/lower c...,5.0,2012,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",MUST have for a pasta fanatic on a low/lower c...
4,B007JINB0W,A3CU5S5P90JUIX,"Follow the instructions, rinse and boil. They...",They have no bad flavor and provide the rice/n...,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",They have no bad flavor and provide the rice/n...


### Removing HTML/HTTPS (Orig From Notebook 6B)

In [15]:
df['review-text-full_raw'] = df['review-text-full'].copy()
df

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw
0,B007JINB0W,A2RQQKUDKUPUO9,I was reading reviews on this product and was ...,I was reading reviews on this product and was ...,1.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",I was reading reviews on this product and was ...,I was reading reviews on this product and was ...
1,B007JINB0W,A1DW1LKZEWPKNC,Working on the low carb lifestyle and would no...,Working on the low carb lifestyle and would no...,4.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",Working on the low carb lifestyle and would no...,Working on the low carb lifestyle and would no...
2,B007JINB0W,A28C1309S1WFLR,I followed the directions other people posted ...,Not Bad,4.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",Not Bad: I followed the directions other peopl...,Not Bad: I followed the directions other peopl...
3,B007JINB0W,A2QIJY7RFGZC23,I have tried soooo many pasta substitutes. The...,MUST have for a pasta fanatic on a low/lower c...,5.0,2012,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",MUST have for a pasta fanatic on a low/lower c...,MUST have for a pasta fanatic on a low/lower c...
4,B007JINB0W,A3CU5S5P90JUIX,"Follow the instructions, rinse and boil. They...",They have no bad flavor and provide the rice/n...,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",They have no bad flavor and provide the rice/n...,They have no bad flavor and provide the rice/n...
...,...,...,...,...,...,...,...,...,...,...,...
4358,B007JINB0W,AL3Q8HIANLRKJ,"These are my favorite flavor, my friends agree...",Best Noods,5.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...","Best Noods: These are my favorite flavor, my f...","Best Noods: These are my favorite flavor, my f..."
4359,B007JINB0W,A1VCFDBW9W5O99,Opened the box to find one of the packages was...,One Star,1.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",One Star: Opened the box to find one of the pa...,One Star: Opened the box to find one of the pa...
4360,B007JINB0W,A361G13N6TQPKS,Smelled so bad I threw it out please don't sen...,One Star,1.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",One Star: Smelled so bad I threw it out please...,One Star: Smelled so bad I threw it out please...
4361,B007JINB0W,A1U51MX13ZIBT0,this stuff is awful feels like rubber in your...,this stuff is awful feels like rubber in your,1.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",this stuff is awful feels like rubber in your:...,this stuff is awful feels like rubber in your:...


In [16]:
# Checking for links
df.loc[df['review-text-full'].str.contains('http')]


Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw
1322,B007JINB0W,A3J6ABN4ZOG502,http://www.amazon.com/gp/product/B007JINB0W?re...,http: //www. amazon. com/gp/product/B007JINB0W?,5.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",http: //www. amazon. com/gp/product/B007JINB0W...,http: //www. amazon. com/gp/product/B007JINB0W...
1792,B007JINB0W,A25Y0KLV7I19FA,"<div id=""video-block-R2QVYQA389CT7S"" class=""a-...",Family love it !!!,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...","Family love it !!!: <div id=""video-block-R2QVY...","Family love it !!!: <div id=""video-block-R2QVY..."
2309,B007JINB0W,A1VDTM4ITCSHQ8,We have eaten shirataki noodles for many years...,Great alternative to heavy strarches!,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",Great alternative to heavy strarches!: We have...,Great alternative to heavy strarches!: We have...
3279,B007JINB0W,A25ZES0OTED0S5,"This stuff is repugnant. I cooked the ""Fettuc...",Disgusting,1.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",Disgusting: This stuff is repugnant. I cooked...,Disgusting: This stuff is repugnant. I cooked...
3311,B007JINB0W,A2PIOAUQSBG074,I used to buy yam noodles in the local asian m...,Hard to chew....,2.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",Hard to chew....: I used to buy yam noodles in...,Hard to chew....: I used to buy yam noodles in...
3808,B007JINB0W,A162S75UMDTC,I first heard about these Shirataki noodles on...,surprisingly decent,4.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",surprisingly decent: I first heard about these...,surprisingly decent: I first heard about these...


In [17]:
# Checking for raw html
df.loc[df['review-text-full_raw'].str.contains('<')]

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw
25,B007JINB0W,A14A4YYKPLYY26,"Earlier this year, I started a wheat-free and ...","I Can Have Noodles Again! Now, If Only There C...",5.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...","I Can Have Noodles Again! Now, If Only There C...","I Can Have Noodles Again! Now, If Only There C..."
490,B007JINB0W,A237SW9SPH1DAD,"Holly guacamole, I love these things! Follow i...","These make your plate ""full"" and plenty.",5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...","These make your plate ""full"" and plenty.: Holl...","These make your plate ""full"" and plenty.: Holl..."
628,B007JINB0W,A2M9IS41H1HJAI,Quick update on 11/21/14\n\nJust started putti...,"Follow the directions, and these will be reall...",5.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...","Follow the directions, and these will be reall...","Follow the directions, and these will be reall..."
1232,B007JINB0W,A14A4YYKPLYY26,"When I decided to buy this&nbsp;<a data-hook=""...",Meh! Disappointing..............Tastes NOTHING...,2.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",Meh! Disappointing..............Tastes NOTHING...,Meh! Disappointing..............Tastes NOTHING...
1383,B007JINB0W,A3UEE22RNGQ2L8,This product has seriously changed my LIFE. I ...,"ZERO CALORIES, ZERO CARBS and EXACTLY like spa...",5.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...","ZERO CALORIES, ZERO CARBS and EXACTLY like spa...","ZERO CALORIES, ZERO CARBS and EXACTLY like spa..."
1792,B007JINB0W,A25Y0KLV7I19FA,"<div id=""video-block-R2QVYQA389CT7S"" class=""a-...",Family love it !!!,5.0,2018,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...","Family love it !!!: <div id=""video-block-R2QVY...","Family love it !!!: <div id=""video-block-R2QVY..."
2035,B007JINB0W,A14A4YYKPLYY26,"Earlier this year, I started a wheat-free and ...","I Can Have Noodles Again! Now, If Only There C...",4.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...","I Can Have Noodles Again! Now, If Only There C...","I Can Have Noodles Again! Now, If Only There C..."
2163,B007JINB0W,AN79B2EUCG5O,bought the variety pack... the rice and the an...,Be prepaired to experement to find the best wa...,3.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",Be prepaired to experement to find the best wa...,Be prepaired to experement to find the best wa...
2471,B007JINB0W,A1NF7CRBZD2AF8,"Great product.<a data-hook=""product-link-linke...",Great product.,5.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...","Great product.: Great product.<a data-hook=""pr...","Great product.: Great product.<a data-hook=""pr..."
2885,B007JINB0W,A1FFJRP833Y1MH,We love all the Miracle noodles but the&nbsp;<...,Delicious!!,5.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",Delicious!!: We love all the Miracle noodles b...,Delicious!!: We love all the Miracle noodles b...


### Remove HTML Tags

In [18]:
import re

# Regular expression to match HTML tags
regex_html = r"<[^>]*>"

# Apply the regex to the DataFrame column using str.replace
df['review-text-full'] = df['review-text-full'].str.replace(regex_html, '', regex=True)
df

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw
0,B007JINB0W,A2RQQKUDKUPUO9,I was reading reviews on this product and was ...,I was reading reviews on this product and was ...,1.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",I was reading reviews on this product and was ...,I was reading reviews on this product and was ...
1,B007JINB0W,A1DW1LKZEWPKNC,Working on the low carb lifestyle and would no...,Working on the low carb lifestyle and would no...,4.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",Working on the low carb lifestyle and would no...,Working on the low carb lifestyle and would no...
2,B007JINB0W,A28C1309S1WFLR,I followed the directions other people posted ...,Not Bad,4.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",Not Bad: I followed the directions other peopl...,Not Bad: I followed the directions other peopl...
3,B007JINB0W,A2QIJY7RFGZC23,I have tried soooo many pasta substitutes. The...,MUST have for a pasta fanatic on a low/lower c...,5.0,2012,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",MUST have for a pasta fanatic on a low/lower c...,MUST have for a pasta fanatic on a low/lower c...
4,B007JINB0W,A3CU5S5P90JUIX,"Follow the instructions, rinse and boil. They...",They have no bad flavor and provide the rice/n...,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",They have no bad flavor and provide the rice/n...,They have no bad flavor and provide the rice/n...
...,...,...,...,...,...,...,...,...,...,...,...
4358,B007JINB0W,AL3Q8HIANLRKJ,"These are my favorite flavor, my friends agree...",Best Noods,5.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...","Best Noods: These are my favorite flavor, my f...","Best Noods: These are my favorite flavor, my f..."
4359,B007JINB0W,A1VCFDBW9W5O99,Opened the box to find one of the packages was...,One Star,1.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",One Star: Opened the box to find one of the pa...,One Star: Opened the box to find one of the pa...
4360,B007JINB0W,A361G13N6TQPKS,Smelled so bad I threw it out please don't sen...,One Star,1.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",One Star: Smelled so bad I threw it out please...,One Star: Smelled so bad I threw it out please...
4361,B007JINB0W,A1U51MX13ZIBT0,this stuff is awful feels like rubber in your...,this stuff is awful feels like rubber in your,1.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shiratak...",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', ...",this stuff is awful feels like rubber in your:...,this stuff is awful feels like rubber in your:...


In [19]:
# Compare original with cleaned
compare_cols = ['review-text-full_raw','review-text-full']

pd.set_option('display.max_colwidth',250)

In [20]:
df.loc[df['review-text-full_raw'].str.contains('<'), compare_cols]

Unnamed: 0,review-text-full_raw,review-text-full
25,"I Can Have Noodles Again! Now, If Only There Could Be a Similar Zero-Carb or Low-Carb Equivalent for Bagels & Crusty Baguettes!: Earlier this year, I started a wheat-free and low-carb, mostly grain-free&nbsp;<a data-hook=""product-link-linked"" cla...","I Can Have Noodles Again! Now, If Only There Could Be a Similar Zero-Carb or Low-Carb Equivalent for Bagels & Crusty Baguettes!: Earlier this year, I started a wheat-free and low-carb, mostly grain-free&nbsp;Wheat Belly&nbsp;diet, and among the m..."
490,"These make your plate ""full"" and plenty.: Holly guacamole, I love these things! Follow instructions and get creative with spices and sauces. The ""funky"" smell so many have mentioned is no big deal and goes away. Texture is good, specially if you ...","These make your plate ""full"" and plenty.: Holly guacamole, I love these things! Follow instructions and get creative with spices and sauces. The ""funky"" smell so many have mentioned is no big deal and goes away. Texture is good, specially if you ..."
628,"Follow the directions, and these will be really really good.: Quick update on 11/21/14\n\nJust started putting Old Bay Seasoning in the water that I boil these in. Seems to add some flavor to them but it also changes the color. Looks very close t...","Follow the directions, and these will be really really good.: Quick update on 11/21/14\n\nJust started putting Old Bay Seasoning in the water that I boil these in. Seems to add some flavor to them but it also changes the color. Looks very close t..."
1232,"Meh! Disappointing..............Tastes NOTHING Like Real Rice!!!: When I decided to buy this&nbsp;<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Rice/dp/B00BP36S7U/ref=cm_cr_arp_d_rvw_txt?ie=UTF8"">Miracle Noodle Ri...","Meh! Disappointing..............Tastes NOTHING Like Real Rice!!!: When I decided to buy this&nbsp;Miracle Noodle Rice&nbsp;I did so, after having bought the&nbsp;Miracle Noodle Angel Hair Pasta&nbsp;and ABSOLUTELY LOVING it. Because I am on a&nbs..."
1383,"ZERO CALORIES, ZERO CARBS and EXACTLY like spaghetti. Miracle noodles, indeed. Changed my life!: This product has seriously changed my LIFE. I fight every day to keep my weight at its current level, and I simply must avoid carbs. The conflict is ...","ZERO CALORIES, ZERO CARBS and EXACTLY like spaghetti. Miracle noodles, indeed. Changed my life!: This product has seriously changed my LIFE. I fight every day to keep my weight at its current level, and I simply must avoid carbs. The conflict is ..."
1792,"Family love it !!!: <div id=""video-block-R2QVYQA389CT7S"" class=""a-section a-spacing-small a-spacing-top-mini video-block""></div><input type=""hidden"" name="""" value=""https://images-na.ssl-images-amazon.com/images/I/91E2G7ukhBS.mp4"" class=""video-url...",Family love it !!!: &nbsp;Love this stuff !!!! Guilt Free perfect if your in a weight loss journey like I am!!! Easy to cook !!!! Will order more
2035,"I Can Have Noodles Again! Now, If Only There Could Be a Similar Zero-Carb or Low-Carb Equivalent for Bagels & Crusty Baguettes!: Earlier this year, I started a wheat-free and low-carb, mostly grain-free&nbsp;<a data-hook=""product-link-linked"" cla...","I Can Have Noodles Again! Now, If Only There Could Be a Similar Zero-Carb or Low-Carb Equivalent for Bagels & Crusty Baguettes!: Earlier this year, I started a wheat-free and low-carb, mostly grain-free&nbsp;Wheat Belly&nbsp;diet, and among the m..."
2163,"Be prepaired to experement to find the best way to eat them.: bought the variety pack... the rice and the angle hair are ok. I think for me they are thin/small enough to not be a substantial part of a bite, so less contribution to the mouth feel...","Be prepaired to experement to find the best way to eat them.: bought the variety pack... the rice and the angle hair are ok. I think for me they are thin/small enough to not be a substantial part of a bite, so less contribution to the mouth feel..."
2471,"Great product.: Great product.<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Zero-Carb-Gluten-Free-Shirataki-Pasta-and-Rice-6-bag-Variety-Pack-44-ounces-Includes-2-Shirataki-Angel-Hair-2-Shirataki-Rice-and-2-Shirat...","Great product.: Great product.Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta and Rice, 6 bag Variety Pack, 44 ounces (Includes: 2 Shirataki Angel Hair, 2 Shirataki Rice and 2 Shirataki Fettuccini)"
2885,"Delicious!!: We love all the Miracle noodles but the&nbsp;<a data-hook=""product-link-linked"" class=""a-link-normal"" href=""/Miracle-Noodle-Shirataki-Zero-Carb-Gluten-Free-Pasta-Garlic-and-Herb-Fettuccini-7-Ounce/dp/B01N91YE5A/ref=cm_cr_arp_d_rvw_tx...","Delicious!!: We love all the Miracle noodles but the&nbsp;Miracle Noodle Shirataki Zero Carb Gluten Free Pasta, Garlic and Herb Fettuccini, 7 Ounce&nbsp;is especially great. Marry it to some canned clams and basil and herbs fettuccine sauce and ..."


### Replace Links with `[LINK]`

In [21]:
regex_url = "https?:\/\/(?:www\.)?[^\s]+"
df.loc[df['review-text-full'].str.contains(regex_url), compare_cols]

Unnamed: 0,review-text-full_raw,review-text-full
1322,http: //www. amazon. com/gp/product/B007JINB0W?: http://www.amazon.com/gp/product/B007JINB0W?redirect=true&ref_=cm_cr_ryp_prd_ttl_sol_37,http: //www. amazon. com/gp/product/B007JINB0W?: http://www.amazon.com/gp/product/B007JINB0W?redirect=true&ref_=cm_cr_ryp_prd_ttl_sol_37
2309,"Great alternative to heavy strarches!: We have eaten shirataki noodles for many years because of my husbands diabetes, but this was our first time trying the shirataki Miracle Rice, and it was fantastic! (I actually like it more than the noodles....","Great alternative to heavy strarches!: We have eaten shirataki noodles for many years because of my husbands diabetes, but this was our first time trying the shirataki Miracle Rice, and it was fantastic! (I actually like it more than the noodles...."
3279,"Disgusting: This stuff is repugnant. I cooked the ""Fettuccine"" noodles exactly as specified on the Miracle Noodle website, https://www.miraclenoodle.com/t-how-to-cook-shirataki-noodles.aspx - to summarize:\n\n1. Remove from package, rinse for 1-...","Disgusting: This stuff is repugnant. I cooked the ""Fettuccine"" noodles exactly as specified on the Miracle Noodle website, https://www.miraclenoodle.com/t-how-to-cook-shirataki-noodles.aspx - to summarize:\n\n1. Remove from package, rinse for 1-..."
3311,"Hard to chew....: I used to buy yam noodles in the local asian market. I love them and wanted to find them on Amazon, and they are (http://www.amazon.com/JFC-Brown-Shirataki-Yam-Noodles/dp/B002FDW6H0/ref=sr_1_cc_3?s=aps&ie=UTF8&qid=1395427420&sr=...","Hard to chew....: I used to buy yam noodles in the local asian market. I love them and wanted to find them on Amazon, and they are (http://www.amazon.com/JFC-Brown-Shirataki-Yam-Noodles/dp/B002FDW6H0/ref=sr_1_cc_3?s=aps&ie=UTF8&qid=1395427420&sr=..."
3808,"surprisingly decent: I first heard about these Shirataki noodles on an episode of BEGIN Japanology dealing with potatos: https://www.youtube.com/watch?v=FPwbbdo2p6c\n\nSeemed too good to be true - a food product that's almost entirely fiber, wit...","surprisingly decent: I first heard about these Shirataki noodles on an episode of BEGIN Japanology dealing with potatos: https://www.youtube.com/watch?v=FPwbbdo2p6c\n\nSeemed too good to be true - a food product that's almost entirely fiber, wit..."


In [22]:
df['review-text-full'] = df['review-text-full'].str.replace(regex_url, '[LINK]', regex=True)
df

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw
0,B007JINB0W,A2RQQKUDKUPUO9,"I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get over.",I was reading reviews on this product and was so ...,1.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","I was reading reviews on this product and was so ...: I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get o...","I was reading reviews on this product and was so ...: I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get o..."
1,B007JINB0W,A1DW1LKZEWPKNC,Working on the low carb lifestyle and would not make it without this product. Definitely worth the price and will definitely purchase again.,Working on the low carb lifestyle and would not make ...,4.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']",Working on the low carb lifestyle and would not make ...: Working on the low carb lifestyle and would not make it without this product. Definitely worth the price and will definitely purchase again.,Working on the low carb lifestyle and would not make ...: Working on the low carb lifestyle and would not make it without this product. Definitely worth the price and will definitely purchase again.
2,B007JINB0W,A28C1309S1WFLR,"I followed the directions other people posted ,rinse longer than stated,and cook at least 5-6 minutes,i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can,not bad and no calories",Not Bad,4.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","Not Bad: I followed the directions other people posted ,rinse longer than stated,and cook at least 5-6 minutes,i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can,not bad and no calories","Not Bad: I followed the directions other people posted ,rinse longer than stated,and cook at least 5-6 minutes,i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can,not bad and no calories"
3,B007JINB0W,A2QIJY7RFGZC23,I have tried soooo many pasta substitutes. The promises were always hollow and the wanna be pastas were always gross in taste or texture or both. If you follow the instructions and cook these in the sauce/ingredients you are making your dish wit...,MUST have for a pasta fanatic on a low/lower carb diet,5.0,2012,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']",MUST have for a pasta fanatic on a low/lower carb diet: I have tried soooo many pasta substitutes. The promises were always hollow and the wanna be pastas were always gross in taste or texture or both. If you follow the instructions and cook the...,MUST have for a pasta fanatic on a low/lower carb diet: I have tried soooo many pasta substitutes. The promises were always hollow and the wanna be pastas were always gross in taste or texture or both. If you follow the instructions and cook the...
4,B007JINB0W,A3CU5S5P90JUIX,"Follow the instructions, rinse and boil. They have no bad flavor and provide the rice/noodle experience. If they weren't so pricey, I'd start buying in bulk immediately. All products are the same, just cut into rice or noodle shape.",They have no bad flavor and provide the rice/noodle experience,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","They have no bad flavor and provide the rice/noodle experience: Follow the instructions, rinse and boil. They have no bad flavor and provide the rice/noodle experience. If they weren't so pricey, I'd start buying in bulk immediately. All produ...","They have no bad flavor and provide the rice/noodle experience: Follow the instructions, rinse and boil. They have no bad flavor and provide the rice/noodle experience. If they weren't so pricey, I'd start buying in bulk immediately. All produ..."
...,...,...,...,...,...,...,...,...,...,...,...
4358,B007JINB0W,AL3Q8HIANLRKJ,"These are my favorite flavor, my friends agree and love it.",Best Noods,5.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","Best Noods: These are my favorite flavor, my friends agree and love it.","Best Noods: These are my favorite flavor, my friends agree and love it."
4359,B007JINB0W,A1VCFDBW9W5O99,Opened the box to find one of the packages was open and had leaked. Will not be buying again.,One Star,1.0,2015,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']",One Star: Opened the box to find one of the packages was open and had leaked. Will not be buying again.,One Star: Opened the box to find one of the packages was open and had leaked. Will not be buying again.
4360,B007JINB0W,A361G13N6TQPKS,Smelled so bad I threw it out please don't send me anymore communications please,One Star,1.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']",One Star: Smelled so bad I threw it out please don't send me anymore communications please,One Star: Smelled so bad I threw it out please don't send me anymore communications please
4361,B007JINB0W,A1U51MX13ZIBT0,this stuff is awful feels like rubber in your mouth,this stuff is awful feels like rubber in your,1.0,2017,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']",this stuff is awful feels like rubber in your: this stuff is awful feels like rubber in your mouth,this stuff is awful feels like rubber in your: this stuff is awful feels like rubber in your mouth


In [23]:
df.loc[df['review-text-full'].str.contains('http'), compare_cols]

Unnamed: 0,review-text-full_raw,review-text-full
1322,http: //www. amazon. com/gp/product/B007JINB0W?: http://www.amazon.com/gp/product/B007JINB0W?redirect=true&ref_=cm_cr_ryp_prd_ttl_sol_37,http: //www. amazon. com/gp/product/B007JINB0W?: [LINK]


## Part 2) Spacy Preprocessing for EDA

**1) Data Preprocessing:**

- Load and inspect the dataset.
    - How many reviews?
    - What does the distribution of ratings look like?
    - Any null values?



- Use the rating column to create a new target column with two groups: high-rating and low-rating groups.
    - We recommend defining "High-rating" reviews as any review with a rating >=9; and "Low-rating" reviews as any review with a rating <=4. The middle ratings between 4 and 9 will be excluded from the analysis.
    - You may use an alternative definition for High and Low reviews, but justify your choice in your notebook/README.



- Utilize NLTK and SpaCy for basic text processing, including:

    - removing stopwords
    - tokenization
    - lemmatization
    - Tips:
        - Be sure to create a custom nlp object and disable the named entity recognizer. Otherwise, processing will take a very long time!
        - **You will want to create several versions of the data, lemmatized, tokenized, lemmatized, and joined back to one string per review, and tokenized and joined back to one string per review.** This will be useful for different analysis and modeling techniques.

    

- Save your processed data frame in a **joblib** file saved in the "Data-NLP/" folder for future modeling.

    

In [24]:
# import spacy
# # Disable parser and ner
# nlp_light = spacy.load("en_core_web_sm", disable=['parser','ner'])
# # Print active components
# nlp_light.pipe_names

In [25]:
import spacy
# Custom NLP Object
nlp_custom = ds.nlp.make_custom_nlp(disable=['ner'],#'parser'],
                                contractions=[],
                            stopwords_to_add=["★"])
nlp_custom

<spacy.lang.en.English at 0x31fd08450>

> Changed review_text column to remove HTML and URLs as of 01/22/24

In [26]:
df['review-text-full']

0       I was reading reviews on this product and was so ...: I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get o...
1                                                          Working on the low carb lifestyle and would not make ...: Working on the low carb lifestyle and would not make it without this product. Definitely worth the price and will definitely purchase again.
2            Not Bad: I followed the directions other people posted ,rinse longer than stated,and cook at least 5-6 minutes,i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can,not bad and no calories
3       MUST have for a pasta fanatic on a low/lower carb diet: I have tried soooo many pasta substitutes. The promises were always hollow and the wanna be pastas were always gross in taste or texture or both.  If you follow t

In [27]:

ds.show_code(ds.nlp.batch_preprocess_texts)

```python
def batch_preprocess_texts(
	texts,
	nlp=None,
	remove_stopwords=True,
	remove_punct=True,
	use_lemmas=False,
	disable=["ner"],
	batch_size=50,
	n_process=-1,
):
	"""Efficiently preprocess a collection of texts using nlp.pipe()

	Args:
		texts (collection of strings): Collection of texts to process (e.g. df['text'])
		nlp (spacy pipe), optional): Spacy nlp pipe. Defaults to None; if None, it creates a default 'en_core_web_sm' pipe.
		remove_stopwords (bool, optional): Controls stopword removal. Defaults to True.
		remove_punct (bool, optional): Controls punctuation removal. Defaults to True.
		use_lemmas (bool, optional): Lemmatize tokens. Defaults to False.
		disable (list of strings, optional): Named pipeline elements to disable. Defaults to ["ner"]: Used with nlp.pipe(disable=disable)
		batch_size (int, optional): Number of texts to process in a batch. Defaults to 50.
		n_process (int, optional): Number of CPU processors to use. Defaults to -1 (meaning all CPU cores).

	Returns:
		list of tokens: Processed texts as a list of tokens.
	"""
	# from tqdm.notebook import tqdm
	from tqdm import tqdm
	if nlp is None:
		import spacy
		nlp = spacy.load("en_core_web_sm")
	processed_texts = []
	for doc in tqdm(nlp.pipe(texts, disable=disable, batch_size=batch_size, n_process=n_process)):
		tokens = []
		for token in doc:
			# Check if should remove stopwords and if token is stopword
			if (remove_stopwords == True) and (token.is_stop == True):
				# Continue the loop with the next token
				continue
			# Check if should remove stopwords and if token is stopword
			if (remove_punct == True) and (token.is_punct == True):
				continue
			# Check if should remove stopwords and if token is stopword
			if (remove_punct == True) and (token.is_space == True):
				continue
			
			## Determine final form of output list of tokens/lemmas
			if use_lemmas:
				tokens.append(token.lemma_.lower())
			else:
				tokens.append(token.text.lower())
		processed_texts.append(tokens)
	return processed_texts

```

In [28]:
%%time
print(f"- Running full spacy preprocessing code (this will take several minutes).")
df = df.copy()
df["tokens-dirty"] = ds.nlp.batch_preprocess_texts(
    df["review-text-full"],
    remove_stopwords=False,
    remove_punct=True,
    use_lemmas=False,
    nlp=nlp_custom,
)
df["tokens"] = ds.nlp.batch_preprocess_texts(
    df["review-text-full"],
    remove_stopwords=True,
    remove_punct=True,
    use_lemmas=False,
    nlp=nlp_custom,
)
df["lemmas"] = ds.nlp.batch_preprocess_texts(
    df["review-text-full"],
    remove_stopwords=True,
    remove_punct=True,
    use_lemmas=True,
    nlp=nlp_custom,
)


- Running full spacy preprocessing code (this will take several minutes).


4363it [00:11, 367.78it/s] 
4363it [00:11, 390.77it/s] 
4363it [00:11, 387.08it/s] 

CPU times: user 9.56 s, sys: 1.39 s, total: 11 s
Wall time: 34.3 s





In [30]:
df.head()

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw,tokens-dirty,tokens,lemmas,tokens-dirty-joined,tokens-joined,lemmas-joined
0,B007JINB0W,A2RQQKUDKUPUO9,"I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get over.",I was reading reviews on this product and was so ...,1.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","I was reading reviews on this product and was so ...: I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get o...","I was reading reviews on this product and was so ...: I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get o...","[i, was, reading, reviews, on, this, product, and, was, so, i, was, reading, reviews, on, this, product, and, was, so, excited, but, i, gagged, when, i, tried, them, ill, take, regular, pasta, any, day, over, this, weird, stuff, so, gross, the, t...","[reading, reviews, product, reading, reviews, product, excited, gagged, tried, ill, regular, pasta, day, weird, stuff, gross, texture, hard]","[read, review, product, read, review, product, excited, gag, try, ill, regular, pasta, day, weird, stuff, gross, texture, hard]",i was reading reviews on this product and was so i was reading reviews on this product and was so excited but i gagged when i tried them ill take regular pasta any day over this weird stuff so gross the texture is very very hard to get over,reading reviews product reading reviews product excited gagged tried ill regular pasta day weird stuff gross texture hard,read review product read review product excited gag try ill regular pasta day weird stuff gross texture hard
1,B007JINB0W,A1DW1LKZEWPKNC,Working on the low carb lifestyle and would not make it without this product. Definitely worth the price and will definitely purchase again.,Working on the low carb lifestyle and would not make ...,4.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']",Working on the low carb lifestyle and would not make ...: Working on the low carb lifestyle and would not make it without this product. Definitely worth the price and will definitely purchase again.,Working on the low carb lifestyle and would not make ...: Working on the low carb lifestyle and would not make it without this product. Definitely worth the price and will definitely purchase again.,"[working, on, the, low, carb, lifestyle, and, would, not, make, working, on, the, low, carb, lifestyle, and, would, not, make, it, without, this, product, definitely, worth, the, price, and, will, definitely, purchase, again]","[working, low, carb, lifestyle, working, low, carb, lifestyle, product, definitely, worth, price, definitely, purchase]","[work, low, carb, lifestyle, work, low, carb, lifestyle, product, definitely, worth, price, definitely, purchase]",working on the low carb lifestyle and would not make working on the low carb lifestyle and would not make it without this product definitely worth the price and will definitely purchase again,working low carb lifestyle working low carb lifestyle product definitely worth price definitely purchase,work low carb lifestyle work low carb lifestyle product definitely worth price definitely purchase
2,B007JINB0W,A28C1309S1WFLR,"I followed the directions other people posted ,rinse longer than stated,and cook at least 5-6 minutes,i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can,not bad and no calories",Not Bad,4.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","Not Bad: I followed the directions other people posted ,rinse longer than stated,and cook at least 5-6 minutes,i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can,not bad and no calories","Not Bad: I followed the directions other people posted ,rinse longer than stated,and cook at least 5-6 minutes,i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can,not bad and no calories","[not, bad, i, followed, the, directions, other, people, posted, rinse, longer, than, stated, and, cook, at, least, 5, 6, minutes, i, add, these, to, my, pasta, sauce, early, afternoon, so, that, they, have, a, chance, to, absorb, as, much, flavor...","[bad, followed, directions, people, posted, rinse, longer, stated, cook, 5, 6, minutes, add, pasta, sauce, early, afternoon, chance, absorb, flavor, bad, calories]","[bad, follow, direction, people, post, rinse, long, state, cook, 5, 6, minute, add, pasta, sauce, early, afternoon, chance, absorb, flavor, bad, calorie]",not bad i followed the directions other people posted rinse longer than stated and cook at least 5 6 minutes i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can not bad and no calories,bad followed directions people posted rinse longer stated cook 5 6 minutes add pasta sauce early afternoon chance absorb flavor bad calories,bad follow direction people post rinse long state cook 5 6 minute add pasta sauce early afternoon chance absorb flavor bad calorie
3,B007JINB0W,A2QIJY7RFGZC23,I have tried soooo many pasta substitutes. The promises were always hollow and the wanna be pastas were always gross in taste or texture or both. If you follow the instructions and cook these in the sauce/ingredients you are making your dish wit...,MUST have for a pasta fanatic on a low/lower carb diet,5.0,2012,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']",MUST have for a pasta fanatic on a low/lower carb diet: I have tried soooo many pasta substitutes. The promises were always hollow and the wanna be pastas were always gross in taste or texture or both. If you follow the instructions and cook the...,MUST have for a pasta fanatic on a low/lower carb diet: I have tried soooo many pasta substitutes. The promises were always hollow and the wanna be pastas were always gross in taste or texture or both. If you follow the instructions and cook the...,"[must, have, for, a, pasta, fanatic, on, a, low, lower, carb, diet, i, have, tried, soooo, many, pasta, substitutes, the, promises, were, always, hollow, and, the, wanna, be, pastas, were, always, gross, in, taste, or, texture, or, both, if, you,...","[pasta, fanatic, low, lower, carb, diet, tried, soooo, pasta, substitutes, promises, hollow, wanna, pastas, gross, taste, texture, follow, instructions, cook, sauce, ingredients, making, dish, good, smell, open, rinse, goes, away, huge, fan, hone...","[pasta, fanatic, low, low, carb, diet, try, soooo, pasta, substitute, promise, hollow, wanna, pasta, gross, taste, texture, follow, instruction, cook, sauce, ingredient, make, dish, good, smell, open, rinse, go, away, huge, fan, honestly, feel, d...",must have for a pasta fanatic on a low lower carb diet i have tried soooo many pasta substitutes the promises were always hollow and the wanna be pastas were always gross in taste or texture or both if you follow the instructions and cook these i...,pasta fanatic low lower carb diet tried soooo pasta substitutes promises hollow wanna pastas gross taste texture follow instructions cook sauce ingredients making dish good smell open rinse goes away huge fan honestly feel deprived substitute sta...,pasta fanatic low low carb diet try soooo pasta substitute promise hollow wanna pasta gross taste texture follow instruction cook sauce ingredient make dish good smell open rinse go away huge fan honestly feel deprive substitute staple diet
4,B007JINB0W,A3CU5S5P90JUIX,"Follow the instructions, rinse and boil. They have no bad flavor and provide the rice/noodle experience. If they weren't so pricey, I'd start buying in bulk immediately. All products are the same, just cut into rice or noodle shape.",They have no bad flavor and provide the rice/noodle experience,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","They have no bad flavor and provide the rice/noodle experience: Follow the instructions, rinse and boil. They have no bad flavor and provide the rice/noodle experience. If they weren't so pricey, I'd start buying in bulk immediately. All produ...","They have no bad flavor and provide the rice/noodle experience: Follow the instructions, rinse and boil. They have no bad flavor and provide the rice/noodle experience. If they weren't so pricey, I'd start buying in bulk immediately. All produ...","[they, have, no, bad, flavor, and, provide, the, rice, noodle, experience, follow, the, instructions, rinse, and, boil, they, have, no, bad, flavor, and, provide, the, rice, noodle, experience, if, they, were, n't, so, pricey, i, 'd, start, buyin...","[bad, flavor, provide, rice, noodle, experience, follow, instructions, rinse, boil, bad, flavor, provide, rice, noodle, experience, pricey, start, buying, bulk, immediately, products, cut, rice, noodle, shape]","[bad, flavor, provide, rice, noodle, experience, follow, instruction, rinse, boil, bad, flavor, provide, rice, noodle, experience, pricey, start, buy, bulk, immediately, product, cut, rice, noodle, shape]",they have no bad flavor and provide the rice noodle experience follow the instructions rinse and boil they have no bad flavor and provide the rice noodle experience if they were n't so pricey i 'd start buying in bulk immediately all products are...,bad flavor provide rice noodle experience follow instructions rinse boil bad flavor provide rice noodle experience pricey start buying bulk immediately products cut rice noodle shape,bad flavor provide rice noodle experience follow instruction rinse boil bad flavor provide rice noodle experience pricey start buy bulk immediately product cut rice noodle shape


In [31]:

## Make string versions of processed text
df["tokens-dirty-joined"] = df["tokens-dirty"].map(lambda x: " ".join(x))
df["tokens-joined"] = df["tokens"].map(lambda x: " ".join(x))
df["lemmas-joined"] = df["lemmas"].map(lambda x: " ".join(x))

df.head()

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw,tokens-dirty,tokens,lemmas,tokens-dirty-joined,tokens-joined,lemmas-joined
0,B007JINB0W,A2RQQKUDKUPUO9,"I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get over.",I was reading reviews on this product and was so ...,1.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","I was reading reviews on this product and was so ...: I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get o...","I was reading reviews on this product and was so ...: I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get o...","[i, was, reading, reviews, on, this, product, and, was, so, i, was, reading, reviews, on, this, product, and, was, so, excited, but, i, gagged, when, i, tried, them, ill, take, regular, pasta, any, day, over, this, weird, stuff, so, gross, the, t...","[reading, reviews, product, reading, reviews, product, excited, gagged, tried, ill, regular, pasta, day, weird, stuff, gross, texture, hard]","[read, review, product, read, review, product, excited, gag, try, ill, regular, pasta, day, weird, stuff, gross, texture, hard]",i was reading reviews on this product and was so i was reading reviews on this product and was so excited but i gagged when i tried them ill take regular pasta any day over this weird stuff so gross the texture is very very hard to get over,reading reviews product reading reviews product excited gagged tried ill regular pasta day weird stuff gross texture hard,read review product read review product excited gag try ill regular pasta day weird stuff gross texture hard
1,B007JINB0W,A1DW1LKZEWPKNC,Working on the low carb lifestyle and would not make it without this product. Definitely worth the price and will definitely purchase again.,Working on the low carb lifestyle and would not make ...,4.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']",Working on the low carb lifestyle and would not make ...: Working on the low carb lifestyle and would not make it without this product. Definitely worth the price and will definitely purchase again.,Working on the low carb lifestyle and would not make ...: Working on the low carb lifestyle and would not make it without this product. Definitely worth the price and will definitely purchase again.,"[working, on, the, low, carb, lifestyle, and, would, not, make, working, on, the, low, carb, lifestyle, and, would, not, make, it, without, this, product, definitely, worth, the, price, and, will, definitely, purchase, again]","[working, low, carb, lifestyle, working, low, carb, lifestyle, product, definitely, worth, price, definitely, purchase]","[work, low, carb, lifestyle, work, low, carb, lifestyle, product, definitely, worth, price, definitely, purchase]",working on the low carb lifestyle and would not make working on the low carb lifestyle and would not make it without this product definitely worth the price and will definitely purchase again,working low carb lifestyle working low carb lifestyle product definitely worth price definitely purchase,work low carb lifestyle work low carb lifestyle product definitely worth price definitely purchase
2,B007JINB0W,A28C1309S1WFLR,"I followed the directions other people posted ,rinse longer than stated,and cook at least 5-6 minutes,i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can,not bad and no calories",Not Bad,4.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","Not Bad: I followed the directions other people posted ,rinse longer than stated,and cook at least 5-6 minutes,i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can,not bad and no calories","Not Bad: I followed the directions other people posted ,rinse longer than stated,and cook at least 5-6 minutes,i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can,not bad and no calories","[not, bad, i, followed, the, directions, other, people, posted, rinse, longer, than, stated, and, cook, at, least, 5, 6, minutes, i, add, these, to, my, pasta, sauce, early, afternoon, so, that, they, have, a, chance, to, absorb, as, much, flavor...","[bad, followed, directions, people, posted, rinse, longer, stated, cook, 5, 6, minutes, add, pasta, sauce, early, afternoon, chance, absorb, flavor, bad, calories]","[bad, follow, direction, people, post, rinse, long, state, cook, 5, 6, minute, add, pasta, sauce, early, afternoon, chance, absorb, flavor, bad, calorie]",not bad i followed the directions other people posted rinse longer than stated and cook at least 5 6 minutes i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can not bad and no calories,bad followed directions people posted rinse longer stated cook 5 6 minutes add pasta sauce early afternoon chance absorb flavor bad calories,bad follow direction people post rinse long state cook 5 6 minute add pasta sauce early afternoon chance absorb flavor bad calorie
3,B007JINB0W,A2QIJY7RFGZC23,I have tried soooo many pasta substitutes. The promises were always hollow and the wanna be pastas were always gross in taste or texture or both. If you follow the instructions and cook these in the sauce/ingredients you are making your dish wit...,MUST have for a pasta fanatic on a low/lower carb diet,5.0,2012,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']",MUST have for a pasta fanatic on a low/lower carb diet: I have tried soooo many pasta substitutes. The promises were always hollow and the wanna be pastas were always gross in taste or texture or both. If you follow the instructions and cook the...,MUST have for a pasta fanatic on a low/lower carb diet: I have tried soooo many pasta substitutes. The promises were always hollow and the wanna be pastas were always gross in taste or texture or both. If you follow the instructions and cook the...,"[must, have, for, a, pasta, fanatic, on, a, low, lower, carb, diet, i, have, tried, soooo, many, pasta, substitutes, the, promises, were, always, hollow, and, the, wanna, be, pastas, were, always, gross, in, taste, or, texture, or, both, if, you,...","[pasta, fanatic, low, lower, carb, diet, tried, soooo, pasta, substitutes, promises, hollow, wanna, pastas, gross, taste, texture, follow, instructions, cook, sauce, ingredients, making, dish, good, smell, open, rinse, goes, away, huge, fan, hone...","[pasta, fanatic, low, low, carb, diet, try, soooo, pasta, substitute, promise, hollow, wanna, pasta, gross, taste, texture, follow, instruction, cook, sauce, ingredient, make, dish, good, smell, open, rinse, go, away, huge, fan, honestly, feel, d...",must have for a pasta fanatic on a low lower carb diet i have tried soooo many pasta substitutes the promises were always hollow and the wanna be pastas were always gross in taste or texture or both if you follow the instructions and cook these i...,pasta fanatic low lower carb diet tried soooo pasta substitutes promises hollow wanna pastas gross taste texture follow instructions cook sauce ingredients making dish good smell open rinse goes away huge fan honestly feel deprived substitute sta...,pasta fanatic low low carb diet try soooo pasta substitute promise hollow wanna pasta gross taste texture follow instruction cook sauce ingredient make dish good smell open rinse go away huge fan honestly feel deprive substitute staple diet
4,B007JINB0W,A3CU5S5P90JUIX,"Follow the instructions, rinse and boil. They have no bad flavor and provide the rice/noodle experience. If they weren't so pricey, I'd start buying in bulk immediately. All products are the same, just cut into rice or noodle shape.",They have no bad flavor and provide the rice/noodle experience,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","They have no bad flavor and provide the rice/noodle experience: Follow the instructions, rinse and boil. They have no bad flavor and provide the rice/noodle experience. If they weren't so pricey, I'd start buying in bulk immediately. All produ...","They have no bad flavor and provide the rice/noodle experience: Follow the instructions, rinse and boil. They have no bad flavor and provide the rice/noodle experience. If they weren't so pricey, I'd start buying in bulk immediately. All produ...","[they, have, no, bad, flavor, and, provide, the, rice, noodle, experience, follow, the, instructions, rinse, and, boil, they, have, no, bad, flavor, and, provide, the, rice, noodle, experience, if, they, were, n't, so, pricey, i, 'd, start, buyin...","[bad, flavor, provide, rice, noodle, experience, follow, instructions, rinse, boil, bad, flavor, provide, rice, noodle, experience, pricey, start, buying, bulk, immediately, products, cut, rice, noodle, shape]","[bad, flavor, provide, rice, noodle, experience, follow, instruction, rinse, boil, bad, flavor, provide, rice, noodle, experience, pricey, start, buy, bulk, immediately, product, cut, rice, noodle, shape]",they have no bad flavor and provide the rice noodle experience follow the instructions rinse and boil they have no bad flavor and provide the rice noodle experience if they were n't so pricey i 'd start buying in bulk immediately all products are...,bad flavor provide rice noodle experience follow instructions rinse boil bad flavor provide rice noodle experience pricey start buying bulk immediately products cut rice noodle shape,bad flavor provide rice noodle experience follow instruction rinse boil bad flavor provide rice noodle experience pricey start buy bulk immediately product cut rice noodle shape


## Save Preprocessed Reviews

### Saving a JSON file

In [32]:
# df = df.set_index("review_id")#, errors='ignore')
# df

In [33]:
# fpath_json = "Data-NLP/processed-nlp-data.json"
fpath_json = FPATHS['data']['processed-nlp']['processed-reviews-spacy_json']
fpath_json

'data/processed/processed-reviews.json'

In [34]:
df.head(2).to_json(orient='index')

'{"0":{"asin":"B007JINB0W","reviewerID":"A2RQQKUDKUPUO9","reviewText":"I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get over.","summary":"I was reading reviews on this product and was so ...","overall":1.0,"year":2014,"title":"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)","brand":"Miracle Noodle","category":"[\'Grocery & Gourmet Food\', \'Pasta & Noodles\', \'Noodles\', \'Shirataki\']","review-text-full":"I was reading reviews on this product and was so ...: I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get over.","review-text-full_raw":"I was reading reviews on this product and was so ...: I was reading reviews on this product and was so excited. But I gagg

In [35]:
# Save to json
df.to_json(fpath_json)

In [36]:
# upload fpath_json to s3
s3.upload_file(fpath_json, FPATHS['data']['s3']['bucket'], os.path.basename(fpath_json))

In [37]:
temp_df = pd.read_json(fpath_json)#.reset_index(drop=False)
temp_df

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw,tokens-dirty,tokens,lemmas,tokens-dirty-joined,tokens-joined,lemmas-joined
0,B007JINB0W,A2RQQKUDKUPUO9,"I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get over.",I was reading reviews on this product and was so ...,1,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","I was reading reviews on this product and was so ...: I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get o...","I was reading reviews on this product and was so ...: I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get o...","[i, was, reading, reviews, on, this, product, and, was, so, i, was, reading, reviews, on, this, product, and, was, so, excited, but, i, gagged, when, i, tried, them, ill, take, regular, pasta, any, day, over, this, weird, stuff, so, gross, the, t...","[reading, reviews, product, reading, reviews, product, excited, gagged, tried, ill, regular, pasta, day, weird, stuff, gross, texture, hard]","[read, review, product, read, review, product, excited, gag, try, ill, regular, pasta, day, weird, stuff, gross, texture, hard]",i was reading reviews on this product and was so i was reading reviews on this product and was so excited but i gagged when i tried them ill take regular pasta any day over this weird stuff so gross the texture is very very hard to get over,reading reviews product reading reviews product excited gagged tried ill regular pasta day weird stuff gross texture hard,read review product read review product excited gag try ill regular pasta day weird stuff gross texture hard
1,B007JINB0W,A1DW1LKZEWPKNC,Working on the low carb lifestyle and would not make it without this product. Definitely worth the price and will definitely purchase again.,Working on the low carb lifestyle and would not make ...,4,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']",Working on the low carb lifestyle and would not make ...: Working on the low carb lifestyle and would not make it without this product. Definitely worth the price and will definitely purchase again.,Working on the low carb lifestyle and would not make ...: Working on the low carb lifestyle and would not make it without this product. Definitely worth the price and will definitely purchase again.,"[working, on, the, low, carb, lifestyle, and, would, not, make, working, on, the, low, carb, lifestyle, and, would, not, make, it, without, this, product, definitely, worth, the, price, and, will, definitely, purchase, again]","[working, low, carb, lifestyle, working, low, carb, lifestyle, product, definitely, worth, price, definitely, purchase]","[work, low, carb, lifestyle, work, low, carb, lifestyle, product, definitely, worth, price, definitely, purchase]",working on the low carb lifestyle and would not make working on the low carb lifestyle and would not make it without this product definitely worth the price and will definitely purchase again,working low carb lifestyle working low carb lifestyle product definitely worth price definitely purchase,work low carb lifestyle work low carb lifestyle product definitely worth price definitely purchase
2,B007JINB0W,A28C1309S1WFLR,"I followed the directions other people posted ,rinse longer than stated,and cook at least 5-6 minutes,i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can,not bad and no calories",Not Bad,4,2013,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","Not Bad: I followed the directions other people posted ,rinse longer than stated,and cook at least 5-6 minutes,i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can,not bad and no calories","Not Bad: I followed the directions other people posted ,rinse longer than stated,and cook at least 5-6 minutes,i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can,not bad and no calories","[not, bad, i, followed, the, directions, other, people, posted, rinse, longer, than, stated, and, cook, at, least, 5, 6, minutes, i, add, these, to, my, pasta, sauce, early, afternoon, so, that, they, have, a, chance, to, absorb, as, much, flavor...","[bad, followed, directions, people, posted, rinse, longer, stated, cook, 5, 6, minutes, add, pasta, sauce, early, afternoon, chance, absorb, flavor, bad, calories]","[bad, follow, direction, people, post, rinse, long, state, cook, 5, 6, minute, add, pasta, sauce, early, afternoon, chance, absorb, flavor, bad, calorie]",not bad i followed the directions other people posted rinse longer than stated and cook at least 5 6 minutes i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can not bad and no calories,bad followed directions people posted rinse longer stated cook 5 6 minutes add pasta sauce early afternoon chance absorb flavor bad calories,bad follow direction people post rinse long state cook 5 6 minute add pasta sauce early afternoon chance absorb flavor bad calorie
3,B007JINB0W,A2QIJY7RFGZC23,I have tried soooo many pasta substitutes. The promises were always hollow and the wanna be pastas were always gross in taste or texture or both. If you follow the instructions and cook these in the sauce/ingredients you are making your dish wit...,MUST have for a pasta fanatic on a low/lower carb diet,5,2012,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']",MUST have for a pasta fanatic on a low/lower carb diet: I have tried soooo many pasta substitutes. The promises were always hollow and the wanna be pastas were always gross in taste or texture or both. If you follow the instructions and cook the...,MUST have for a pasta fanatic on a low/lower carb diet: I have tried soooo many pasta substitutes. The promises were always hollow and the wanna be pastas were always gross in taste or texture or both. If you follow the instructions and cook the...,"[must, have, for, a, pasta, fanatic, on, a, low, lower, carb, diet, i, have, tried, soooo, many, pasta, substitutes, the, promises, were, always, hollow, and, the, wanna, be, pastas, were, always, gross, in, taste, or, texture, or, both, if, you,...","[pasta, fanatic, low, lower, carb, diet, tried, soooo, pasta, substitutes, promises, hollow, wanna, pastas, gross, taste, texture, follow, instructions, cook, sauce, ingredients, making, dish, good, smell, open, rinse, goes, away, huge, fan, hone...","[pasta, fanatic, low, low, carb, diet, try, soooo, pasta, substitute, promise, hollow, wanna, pasta, gross, taste, texture, follow, instruction, cook, sauce, ingredient, make, dish, good, smell, open, rinse, go, away, huge, fan, honestly, feel, d...",must have for a pasta fanatic on a low lower carb diet i have tried soooo many pasta substitutes the promises were always hollow and the wanna be pastas were always gross in taste or texture or both if you follow the instructions and cook these i...,pasta fanatic low lower carb diet tried soooo pasta substitutes promises hollow wanna pastas gross taste texture follow instructions cook sauce ingredients making dish good smell open rinse goes away huge fan honestly feel deprived substitute sta...,pasta fanatic low low carb diet try soooo pasta substitute promise hollow wanna pasta gross taste texture follow instruction cook sauce ingredient make dish good smell open rinse go away huge fan honestly feel deprive substitute staple diet
4,B007JINB0W,A3CU5S5P90JUIX,"Follow the instructions, rinse and boil. They have no bad flavor and provide the rice/noodle experience. If they weren't so pricey, I'd start buying in bulk immediately. All products are the same, just cut into rice or noodle shape.",They have no bad flavor and provide the rice/noodle experience,5,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","They have no bad flavor and provide the rice/noodle experience: Follow the instructions, rinse and boil. They have no bad flavor and provide the rice/noodle experience. If they weren't so pricey, I'd start buying in bulk immediately. All produ...","They have no bad flavor and provide the rice/noodle experience: Follow the instructions, rinse and boil. They have no bad flavor and provide the rice/noodle experience. If they weren't so pricey, I'd start buying in bulk immediately. All produ...","[they, have, no, bad, flavor, and, provide, the, rice, noodle, experience, follow, the, instructions, rinse, and, boil, they, have, no, bad, flavor, and, provide, the, rice, noodle, experience, if, they, were, n't, so, pricey, i, 'd, start, buyin...","[bad, flavor, provide, rice, noodle, experience, follow, instructions, rinse, boil, bad, flavor, provide, rice, noodle, experience, pricey, start, buying, bulk, immediately, products, cut, rice, noodle, shape]","[bad, flavor, provide, rice, noodle, experience, follow, instruction, rinse, boil, bad, flavor, provide, rice, noodle, experience, pricey, start, buy, bulk, immediately, product, cut, rice, noodle, shape]",they have no bad flavor and provide the rice noodle experience follow the instructions rinse and boil they have no bad flavor and provide the rice noodle experience if they were n't so pricey i 'd start buying in bulk immediately all products are...,bad flavor provide rice noodle experience follow instructions rinse boil bad flavor provide rice noodle experience pricey start buying bulk immediately products cut rice noodle shape,bad flavor provide rice noodle experience follow instruction rinse boil bad flavor provide rice noodle experience pricey start buy bulk immediately product cut rice noodle shape
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4358,B007JINB0W,AL3Q8HIANLRKJ,"These are my favorite flavor, my friends agree and love it.",Best Noods,5,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","Best Noods: These are my favorite flavor, my friends agree and love it.","Best Noods: These are my favorite flavor, my friends agree and love it.","[best, noods, these, are, my, favorite, flavor, my, friends, agree, and, love, it]","[best, noods, favorite, flavor, friends, agree, love]","[good, nood, favorite, flavor, friend, agree, love]",best noods these are my favorite flavor my friends agree and love it,best noods favorite flavor friends agree love,good nood favorite flavor friend agree love
4359,B007JINB0W,A1VCFDBW9W5O99,Opened the box to find one of the packages was open and had leaked. Will not be buying again.,One Star,1,2015,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']",One Star: Opened the box to find one of the packages was open and had leaked. Will not be buying again.,One Star: Opened the box to find one of the packages was open and had leaked. Will not be buying again.,"[one, star, opened, the, box, to, find, one, of, the, packages, was, open, and, had, leaked, will, not, be, buying, again]","[star, opened, box, find, packages, open, leaked, buying]","[star, open, box, find, package, open, leak, buy]",one star opened the box to find one of the packages was open and had leaked will not be buying again,star opened box find packages open leaked buying,star open box find package open leak buy
4360,B007JINB0W,A361G13N6TQPKS,Smelled so bad I threw it out please don't send me anymore communications please,One Star,1,2017,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']",One Star: Smelled so bad I threw it out please don't send me anymore communications please,One Star: Smelled so bad I threw it out please don't send me anymore communications please,"[one, star, smelled, so, bad, i, threw, it, out, please, do, n't, send, me, anymore, communications, please]","[star, smelled, bad, threw, send, anymore, communications]","[star, smell, bad, throw, send, anymore, communication]",one star smelled so bad i threw it out please do n't send me anymore communications please,star smelled bad threw send anymore communications,star smell bad throw send anymore communication
4361,B007JINB0W,A1U51MX13ZIBT0,this stuff is awful feels like rubber in your mouth,this stuff is awful feels like rubber in your,1,2017,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']",this stuff is awful feels like rubber in your: this stuff is awful feels like rubber in your mouth,this stuff is awful feels like rubber in your: this stuff is awful feels like rubber in your mouth,"[this, stuff, is, awful, feels, like, rubber, in, your, this, stuff, is, awful, feels, like, rubber, in, your, mouth]","[stuff, awful, feels, like, rubber, stuff, awful, feels, like, rubber, mouth]","[stuff, awful, feel, like, rubber, stuff, awful, feel, like, rubber, mouth]",this stuff is awful feels like rubber in your this stuff is awful feels like rubber in your mouth,stuff awful feels like rubber stuff awful feels like rubber mouth,stuff awful feel like rubber stuff awful feel like rubber mouth


In [38]:
type(temp_df.loc[0, 'tokens'])

list

### Save Joblib

In [42]:
import joblib
fpath_joblib = FPATHS['data']['processed-nlp']['processed-reviews-spacy_joblib']
fpath_joblib

'data/processed/processed-reviews.joblib'

In [40]:
# Dump to selectd fpath
joblib.dump(df, fpath_joblib)

['data/processed/processed-reviews.joblib']

In [41]:
# confirming saved properly
loaded = joblib.load(FPATHS['data']['processed-nlp']['processed-reviews-spacy_joblib'])
loaded.head()

Unnamed: 0,asin,reviewerID,reviewText,summary,overall,year,title,brand,category,review-text-full,review-text-full_raw,tokens-dirty,tokens,lemmas,tokens-dirty-joined,tokens-joined,lemmas-joined
0,B007JINB0W,A2RQQKUDKUPUO9,"I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get over.",I was reading reviews on this product and was so ...,1.0,2014,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","I was reading reviews on this product and was so ...: I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get o...","I was reading reviews on this product and was so ...: I was reading reviews on this product and was so excited. But I gagged when I tried them. Ill take regular pasta any day over this weird stuff! So gross, the texture is very very hard to get o...","[i, was, reading, reviews, on, this, product, and, was, so, i, was, reading, reviews, on, this, product, and, was, so, excited, but, i, gagged, when, i, tried, them, ill, take, regular, pasta, any, day, over, this, weird, stuff, so, gross, the, t...","[reading, reviews, product, reading, reviews, product, excited, gagged, tried, ill, regular, pasta, day, weird, stuff, gross, texture, hard]","[read, review, product, read, review, product, excited, gag, try, ill, regular, pasta, day, weird, stuff, gross, texture, hard]",i was reading reviews on this product and was so i was reading reviews on this product and was so excited but i gagged when i tried them ill take regular pasta any day over this weird stuff so gross the texture is very very hard to get over,reading reviews product reading reviews product excited gagged tried ill regular pasta day weird stuff gross texture hard,read review product read review product excited gag try ill regular pasta day weird stuff gross texture hard
1,B007JINB0W,A1DW1LKZEWPKNC,Working on the low carb lifestyle and would not make it without this product. Definitely worth the price and will definitely purchase again.,Working on the low carb lifestyle and would not make ...,4.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']",Working on the low carb lifestyle and would not make ...: Working on the low carb lifestyle and would not make it without this product. Definitely worth the price and will definitely purchase again.,Working on the low carb lifestyle and would not make ...: Working on the low carb lifestyle and would not make it without this product. Definitely worth the price and will definitely purchase again.,"[working, on, the, low, carb, lifestyle, and, would, not, make, working, on, the, low, carb, lifestyle, and, would, not, make, it, without, this, product, definitely, worth, the, price, and, will, definitely, purchase, again]","[working, low, carb, lifestyle, working, low, carb, lifestyle, product, definitely, worth, price, definitely, purchase]","[work, low, carb, lifestyle, work, low, carb, lifestyle, product, definitely, worth, price, definitely, purchase]",working on the low carb lifestyle and would not make working on the low carb lifestyle and would not make it without this product definitely worth the price and will definitely purchase again,working low carb lifestyle working low carb lifestyle product definitely worth price definitely purchase,work low carb lifestyle work low carb lifestyle product definitely worth price definitely purchase
2,B007JINB0W,A28C1309S1WFLR,"I followed the directions other people posted ,rinse longer than stated,and cook at least 5-6 minutes,i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can,not bad and no calories",Not Bad,4.0,2013,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","Not Bad: I followed the directions other people posted ,rinse longer than stated,and cook at least 5-6 minutes,i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can,not bad and no calories","Not Bad: I followed the directions other people posted ,rinse longer than stated,and cook at least 5-6 minutes,i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can,not bad and no calories","[not, bad, i, followed, the, directions, other, people, posted, rinse, longer, than, stated, and, cook, at, least, 5, 6, minutes, i, add, these, to, my, pasta, sauce, early, afternoon, so, that, they, have, a, chance, to, absorb, as, much, flavor...","[bad, followed, directions, people, posted, rinse, longer, stated, cook, 5, 6, minutes, add, pasta, sauce, early, afternoon, chance, absorb, flavor, bad, calories]","[bad, follow, direction, people, post, rinse, long, state, cook, 5, 6, minute, add, pasta, sauce, early, afternoon, chance, absorb, flavor, bad, calorie]",not bad i followed the directions other people posted rinse longer than stated and cook at least 5 6 minutes i add these to my pasta sauce early afternoon so that they have a chance to absorb as much flavor as they can not bad and no calories,bad followed directions people posted rinse longer stated cook 5 6 minutes add pasta sauce early afternoon chance absorb flavor bad calories,bad follow direction people post rinse long state cook 5 6 minute add pasta sauce early afternoon chance absorb flavor bad calorie
3,B007JINB0W,A2QIJY7RFGZC23,I have tried soooo many pasta substitutes. The promises were always hollow and the wanna be pastas were always gross in taste or texture or both. If you follow the instructions and cook these in the sauce/ingredients you are making your dish wit...,MUST have for a pasta fanatic on a low/lower carb diet,5.0,2012,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']",MUST have for a pasta fanatic on a low/lower carb diet: I have tried soooo many pasta substitutes. The promises were always hollow and the wanna be pastas were always gross in taste or texture or both. If you follow the instructions and cook the...,MUST have for a pasta fanatic on a low/lower carb diet: I have tried soooo many pasta substitutes. The promises were always hollow and the wanna be pastas were always gross in taste or texture or both. If you follow the instructions and cook the...,"[must, have, for, a, pasta, fanatic, on, a, low, lower, carb, diet, i, have, tried, soooo, many, pasta, substitutes, the, promises, were, always, hollow, and, the, wanna, be, pastas, were, always, gross, in, taste, or, texture, or, both, if, you,...","[pasta, fanatic, low, lower, carb, diet, tried, soooo, pasta, substitutes, promises, hollow, wanna, pastas, gross, taste, texture, follow, instructions, cook, sauce, ingredients, making, dish, good, smell, open, rinse, goes, away, huge, fan, hone...","[pasta, fanatic, low, low, carb, diet, try, soooo, pasta, substitute, promise, hollow, wanna, pasta, gross, taste, texture, follow, instruction, cook, sauce, ingredient, make, dish, good, smell, open, rinse, go, away, huge, fan, honestly, feel, d...",must have for a pasta fanatic on a low lower carb diet i have tried soooo many pasta substitutes the promises were always hollow and the wanna be pastas were always gross in taste or texture or both if you follow the instructions and cook these i...,pasta fanatic low lower carb diet tried soooo pasta substitutes promises hollow wanna pastas gross taste texture follow instructions cook sauce ingredients making dish good smell open rinse goes away huge fan honestly feel deprived substitute sta...,pasta fanatic low low carb diet try soooo pasta substitute promise hollow wanna pasta gross taste texture follow instruction cook sauce ingredient make dish good smell open rinse go away huge fan honestly feel deprive substitute staple diet
4,B007JINB0W,A3CU5S5P90JUIX,"Follow the instructions, rinse and boil. They have no bad flavor and provide the rice/noodle experience. If they weren't so pricey, I'd start buying in bulk immediately. All products are the same, just cut into rice or noodle shape.",They have no bad flavor and provide the rice/noodle experience,5.0,2016,"Miracle Noodle Zero Carb, Gluten Free Shirataki Pasta, Spinach Angel Hair, 7-Ounce (Pack of 24)",Miracle Noodle,"['Grocery & Gourmet Food', 'Pasta & Noodles', 'Noodles', 'Shirataki']","They have no bad flavor and provide the rice/noodle experience: Follow the instructions, rinse and boil. They have no bad flavor and provide the rice/noodle experience. If they weren't so pricey, I'd start buying in bulk immediately. All produ...","They have no bad flavor and provide the rice/noodle experience: Follow the instructions, rinse and boil. They have no bad flavor and provide the rice/noodle experience. If they weren't so pricey, I'd start buying in bulk immediately. All produ...","[they, have, no, bad, flavor, and, provide, the, rice, noodle, experience, follow, the, instructions, rinse, and, boil, they, have, no, bad, flavor, and, provide, the, rice, noodle, experience, if, they, were, n't, so, pricey, i, 'd, start, buyin...","[bad, flavor, provide, rice, noodle, experience, follow, instructions, rinse, boil, bad, flavor, provide, rice, noodle, experience, pricey, start, buying, bulk, immediately, products, cut, rice, noodle, shape]","[bad, flavor, provide, rice, noodle, experience, follow, instruction, rinse, boil, bad, flavor, provide, rice, noodle, experience, pricey, start, buy, bulk, immediately, product, cut, rice, noodle, shape]",they have no bad flavor and provide the rice noodle experience follow the instructions rinse and boil they have no bad flavor and provide the rice noodle experience if they were n't so pricey i 'd start buying in bulk immediately all products are...,bad flavor provide rice noodle experience follow instructions rinse boil bad flavor provide rice noodle experience pricey start buying bulk immediately products cut rice noodle shape,bad flavor provide rice noodle experience follow instruction rinse boil bad flavor provide rice noodle experience pricey start buy bulk immediately product cut rice noodle shape
