# Wrangling Grocery and Gourmet Food Data

* [Questions](#Questions)
* [Summary](#Summary)
* [Tests](#Tests)

## Imports

In [3]:
import json
import gzip
import unittest
import pandas as pd
import numpy as np
import csv
import os
import os.path
from dotenv import load_dotenv
import logging

## Environment Variables

In [4]:
load_dotenv()
data_filepath = os.getenv('DATA_FILEPATH')
price_logs = os.getenv('PRICES_LOG')
# This path is still in the markdown below because haven't found a working way to use a variable there.
doc_filepath = os.getenv('DOC_FILEPATH')

## Introduction
Data Wrangling is the process of collecting, organizing, and determining how well-defined the data is.  
See [Grocery Recommender - Capstone Two](../Grocery_Recommender_-_Capstone_Two.pdf) for details about this project. 


## Load Data <sup>[1](#references)</sup>

In [5]:
##Load the data
data = []
with gzip.open(data_filepath) as f:
    for l in f:
        data.append(json.loads(l.strip()))

In [6]:
# Confirm data loaded by checking total number of products.
print(len(data))

287051


In [7]:
# View raw data.
print(data[0])

{'category': ['Grocery & Gourmet Food', 'Dairy, Cheese & Eggs', 'Cheese', 'Gouda'], 'tech1': '', 'description': ['BEEMSTER GOUDA CHEESE AGED 18/24 MONTHS', 'Statements regarding dietary supplements have not been evaluated by the FDA and are not intended to diagnose, treat, cure, or prevent any disease or health condition.'], 'fit': '', 'title': 'Beemster Gouda - Aged 18/24 Months - App. 1.5 Lbs', 'also_buy': [], 'image': [], 'tech2': '', 'brand': 'Ariola Imports', 'feature': [], 'rank': '165,181 in Grocery & Gourmet Food (', 'also_view': ['B0000D9MYM', 'B0000D9MYL', 'B00ADHIGBA', 'B00H9OX598', 'B001LM42GY', 'B001LM5TDY'], 'main_cat': 'Grocery', 'similar_item': '', 'date': '', 'price': '$41.91', 'asin': '0681727810'}


In [8]:
# Will use a pandas dataframe to explore and analyse data.
grocery_data = pd.DataFrame.from_dict(data)
grocery_data.head()

Unnamed: 0,category,tech1,description,fit,title,also_buy,image,tech2,brand,feature,rank,also_view,main_cat,similar_item,date,price,asin,details
0,"[Grocery & Gourmet Food, Dairy, Cheese & Eggs,...",,"[BEEMSTER GOUDA CHEESE AGED 18/24 MONTHS, Stat...",,Beemster Gouda - Aged 18/24 Months - App. 1.5 Lbs,[],[],,Ariola Imports,[],"165,181 in Grocery & Gourmet Food (","[B0000D9MYM, B0000D9MYL, B00ADHIGBA, B00H9OX59...",Grocery,,,$41.91,681727810,
1,"[Grocery & Gourmet Food, Cooking & Baking, Sug...",,"[Shipped from UK, please allow 10 to 21 busine...",,Trim Healthy Mama Xylitol,"[B01898YHXK, B01BCM6LAC, B00Q4OL47O, B00Q4OL5Q...",[https://images-na.ssl-images-amazon.com/image...,,,[],"315,867 in Grocery & Gourmet Food (",[],Grocery,,,,853347867,
2,"[Grocery & Gourmet Food, Cooking & Baking, Fro...",,[Jazz up your cakes with a sparkling monogram ...,,Letter C - Swarovski Crystal Monogram Wedding ...,[],[],,Unik Occasions,[],"[>#669,941 in Kitchen & Dining (See Top 100 in...",[B07DXN65TF],Amazon Home,,"September 21, 2010",$29.95,1888861118,
3,"[Grocery & Gourmet Food, Cooking & Baking, Fro...",,"[Large Letter - Height 4.75""]",,Letter H - Swarovski Crystal Monogram Wedding ...,[],[],,Other,"[Large Letter - Height 4.75""]","[>#832,581 in Kitchen & Dining (See Top 100 in...",[],Amazon Home,,"September 11, 2011",$11.45,1888861517,
4,"[Grocery & Gourmet Food, Cooking & Baking, Fro...",,"[4.75""]",,Letter S - Swarovski Crystal Monogram Wedding ...,[],[],,Unik Occasions,"[4.75"" height]","[>#590,999 in Kitchen & Dining (See Top 100 in...",[],Amazon Home,,"September 11, 2011",$15.00,1888861614,


In [9]:
# Overview of the data shows it already is pretty well defined.
# Only the details column is missing (NaN) some values.
grocery_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 287051 entries, 0 to 287050
Data columns (total 18 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   category      287051 non-null  object
 1   tech1         287051 non-null  object
 2   description   287051 non-null  object
 3   fit           287051 non-null  object
 4   title         287051 non-null  object
 5   also_buy      287051 non-null  object
 6   image         287051 non-null  object
 7   tech2         287051 non-null  object
 8   brand         287051 non-null  object
 9   feature       287051 non-null  object
 10  rank          287051 non-null  object
 11  also_view     287051 non-null  object
 12  main_cat      287051 non-null  object
 13  similar_item  287051 non-null  object
 14  date          287051 non-null  object
 15  price         287051 non-null  object
 16  asin          287051 non-null  object
 17  details       287027 non-null  object
dtypes: object(18)
memory usa

In [10]:
# Another way to see which features are missing values. 
# This is easier to read if there's a lot of columns and similar number counts.
grocery_data.isnull().any()

category        False
tech1           False
description     False
fit             False
title           False
also_buy        False
image           False
tech2           False
brand           False
feature         False
rank            False
also_view       False
main_cat        False
similar_item    False
date            False
price           False
asin            False
details          True
dtype: bool

In [11]:
# The number of features for each item/row vary (17-18).
grocery_data.count(1)

0         17
1         17
2         17
3         17
4         17
          ..
287046    18
287047    18
287048    18
287049    18
287050    18
Length: 287051, dtype: int64

### Observations
- Think useful features will probably be title, also_buy, rank, also_view, similar_item, and date.
- many special characters in multiple columns.
- 'date' is full month dd, yyyy.
- 'price' has $ appended.
- Probably want to replace index with 'asin'.
- From the numbers above it looks the various counts between columns, rows, and non-null values are consistent. However, looking at a small sample of the data it appears like more than just one value is null for several of the rows.

## Explore Data
### Find Data of Interest

In [12]:
# Interested in gluten free data because of diet restrictions.
# Find all products whose title includes gluten free using 'standard' lists and loops.
gluten_free_list = []
for d in data:
    if 'title' in d and 'gluten free' in d['title'].lower():
        gluten_free_list.append(d)

print(len(gluten_free_list))

5537


In [13]:
# Example product titles.
for d in gluten_free_list[:10]:
    print(d['title'])

Sans Sucre Mousse Mix Gluten Free, Strawberry
Sans Sucre Chocolate Mousse Mix - Gluten Free
Sans Sucre Cheesecake Mousse Mix - Gluten Free
Sans Sucre Lemon Mousse Mix - Gluten Free
Ajika Royal Basmati Pilaf Instant Meal, Side Dish, Gluten Free, No Salt or Msg
Natures Plus Source of Life Energy Shake - Granola Flavor - 2.2 lbs Multivitamin, Mineral &amp; Protein Powder - Whole Food Meal Replacement - Non GMO, Vegetarian, Gluten Free - 26 Servings
Borsari All Natural Seasoned Salt, Citrus Blend, Gluten Free, No MSG, 4 Ounce Shaker Bottle
Natures Plus Spirutein Shake - Nutty Berry Burst Flavor - 2.4 lbs, Spirulina Protein Powder - Plant Based Meal Replacement, Vitamins &amp; Minerals for Energy - Vegetarian, Gluten Free - 30 Servings
Nando's Medium PERi-PERi Sauce - Gluten Free | Non GMO | 4.7 Oz (2 Pack)
Amish Country Popcorn - Medium Yellow Popcorn - Old Fashioned, Non GMO, and Gluten Free - with Recipe Guide and 1 Year Freshness Warranty (2 Pound Burlap)


In [14]:
# The Pandas way of doing the above is less lines of code
gluten_free_dataframe = grocery_data[grocery_data['title'].str.lower().str.contains('gluten free')]
gluten_free_dataframe.count()

category        5537
tech1           5537
description     5537
fit             5537
title           5537
also_buy        5537
image           5537
tech2           5537
brand           5537
feature         5537
rank            5537
also_view       5537
main_cat        5537
similar_item    5537
date            5537
price           5537
asin            5537
details         5537
dtype: int64

In [15]:
gluten_free_dataframe.title.iloc[:10]

1180        Sans Sucre Mousse Mix Gluten Free, Strawberry
1184        Sans Sucre Chocolate Mousse Mix - Gluten Free
1185       Sans Sucre Cheesecake Mousse Mix - Gluten Free
1187            Sans Sucre Lemon Mousse Mix - Gluten Free
1543    Ajika Royal Basmati Pilaf Instant Meal, Side D...
1702    Natures Plus Source of Life Energy Shake - Gra...
1704    Borsari All Natural Seasoned Salt, Citrus Blen...
1717    Natures Plus Spirutein Shake - Nutty Berry Bur...
1755    Nando's Medium PERi-PERi Sauce - Gluten Free |...
1954    Amish Country Popcorn - Medium Yellow Popcorn ...
Name: title, dtype: object

In [16]:
# Finding specific brand I enjoy.
# Even though this is gluten free it doesn't show up in the gluten free dataframe because 'gluten free' isn't in the title.
brazi_bites = grocery_data[grocery_data['brand'].str.lower().str.contains('brazi bites')].T
brazi_bites

Unnamed: 0,255470,266702
category,"[Grocery & Gourmet Food, Frozen, Bread & Dough]","[Grocery & Gourmet Food, Snack Foods, Crackers]"
tech1,,
description,"[These are 100 percent USDA Certified organic,...","[Brazi Bites Original Brazilian Cheese Bread, ..."
fit,,
title,"Brazi Bites, Brazilian Cheese Bread, Original,...","Brazi Bites Original Brazilian Cheese Bread, 1..."
also_buy,"[B00FE9LC0O, B0047IG6OU, B000RULSYK, B078HRLNL...",[]
image,[https://images-na.ssl-images-amazon.com/image...,[https://images-na.ssl-images-amazon.com/image...
tech2,,
brand,Brazi Bites,Brazi Bites
feature,[],[]


### Observations
- Probably want to replace index with 'asin'.
- Will try to use only pandas' logic going forward in notebooks.
- So far haven't found any values in 'tech1', 'fit', 'tech2', or 'similar_item'.

Even though there are missing values in the data I'm interested there isn't appear to be enough data to justify only analyzing gluten fee items.
Therefore, shouldn't need to worry about dropping columns just based off the information I'm interested in. Instead, will worry about dropping based off of no data.

In [17]:
missing = pd.concat([grocery_data.isnull().sum(), 100 * grocery_data.isnull().mean()], axis=1)
missing.columns=['count', '%']
missing.sort_values(by=['count'], ascending=False)

Unnamed: 0,count,%
details,24,0.008361
tech1,0,0.0
asin,0,0.0
price,0,0.0
date,0,0.0
similar_item,0,0.0
main_cat,0,0.0
also_view,0,0.0
rank,0,0.0
category,0,0.0


This check also shows that hardly any data is missing. This is desired; however, it doesn't seem to be correct.

In [18]:
# Researching on a smaller data set to see what is happening.

tech1_has_blanks = gluten_free_dataframe[gluten_free_dataframe['tech1'] == ''].index
print('tech1 should have a length:', tech1_has_blanks)

asin_no_blanks = gluten_free_dataframe[gluten_free_dataframe['asin'] == ''].index
print('asin should not have anything:', asin_no_blanks)


tech1 should have a length: Int64Index([  1180,   1184,   1185,   1187,   1543,   1702,   1704,   1717,
              1755,   1954,
            ...
            286670, 286675, 286709, 286740, 286771, 286845, 286947, 286981,
            287001, 287009],
           dtype='int64', length=5534)
asin should not have anything: Int64Index([], dtype='int64')


In [19]:
# One way to solve the problem.

example_fix = gluten_free_dataframe.replace('', np.nan)
example_fix.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5537 entries, 1180 to 287009
Data columns (total 18 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   category      5537 non-null   object 
 1   tech1         3 non-null      object 
 2   description   5537 non-null   object 
 3   fit           0 non-null      float64
 4   title         5537 non-null   object 
 5   also_buy      5537 non-null   object 
 6   image         5537 non-null   object 
 7   tech2         0 non-null      float64
 8   brand         5379 non-null   object 
 9   feature       5537 non-null   object 
 10  rank          5537 non-null   object 
 11  also_view     5537 non-null   object 
 12  main_cat      5531 non-null   object 
 13  similar_item  4 non-null      object 
 14  date          13 non-null     object 
 15  price         2652 non-null   object 
 16  asin          5537 non-null   object 
 17  details       5537 non-null   object 
dtypes: float64(2), object(1

This only fixed the issues with single cells. Blank lists are still an issue.

In [20]:
# Thought 'asin' values were unique; however, they aren't.
grocery_data['asin'].value_counts().head(3700)

B0000CEUF9    2
B00016XJX8    2
B0002LRHAS    2
B000144GX2    2
B0001AVSTG    2
             ..
B0001ENZ9S    2
B0001KH69W    2
B00104FG54    1
B00AO9WVEK    1
B007ZEM3DG    1
Name: asin, Length: 3700, dtype: int64

In [21]:
# Spot checked some and it looks like these are just duplicates in the data of the same thing; only the index is different..
# Also, percentage wise the majority of the 'asin' number are unique so should be ok to use as id.
grocery_data[grocery_data['asin'] == 'B0001M0YTO']

Unnamed: 0,category,tech1,description,fit,title,also_buy,image,tech2,brand,feature,rank,also_view,main_cat,similar_item,date,price,asin,details
2480,"[Grocery & Gourmet Food, Herbs, Spices & Seaso...",,[Seasoning Pizza. It is a high quality healthy...,,Frontier Natural Products Pizza Seasoning -- 1...,"[B0001M10ZG, B001VNKT0Q, B001VNGKAO, B00GCDPLC...",[],,Frontier,[],"56,442 in Grocery & Gourmet Food (","[B0001M10ZG, B076THTQ1K, B00DUF3CHU, B000WR2GN...",Grocery,,,$14.99,B0001M0YTO,{'  Product Dimensions: ': '4 x 4 x 10...
6177,"[Grocery & Gourmet Food, Herbs, Spices & Seaso...",,[Seasoning Pizza. It is a high quality healthy...,,Frontier Natural Products Pizza Seasoning -- 1...,"[B0001M10ZG, B001VNKT0Q, B001VNGKAO, B00GCDPLC...",[],,Frontier,[],"56,442 in Grocery & Gourmet Food (","[B0001M10ZG, B076THTQ1K, B00DUF3CHU, B000WR2GN...",Grocery,,,$14.99,B0001M0YTO,{'  Product Dimensions: ': '4 x 4 x 10...


In [22]:
# # Most common brands.
grocery_data['brand'].value_counts()

                        11419
Unknown                  1780
Black Tie Mercantile     1458
Trader Joe's             1234
McCormick                1041
                        ...  
Sugo                        1
nots                        1
Pure ViaReg;                1
Michaszki                   1
Wrigely "5"                 1
Name: brand, Length: 38904, dtype: int64

## Copying Original Data after Troubleshooting Research
Ran into problems analyzing the data because didn't fully understand the info summary above.
See the Questions section below for additional details.
Copying dataframe to convert types and then analyze further. <sup>[2](#References)</sup>

In [23]:
grocery_data_copy = grocery_data.copy()

grocery_data_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 287051 entries, 0 to 287050
Data columns (total 18 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   category      287051 non-null  object
 1   tech1         287051 non-null  object
 2   description   287051 non-null  object
 3   fit           287051 non-null  object
 4   title         287051 non-null  object
 5   also_buy      287051 non-null  object
 6   image         287051 non-null  object
 7   tech2         287051 non-null  object
 8   brand         287051 non-null  object
 9   feature       287051 non-null  object
 10  rank          287051 non-null  object
 11  also_view     287051 non-null  object
 12  main_cat      287051 non-null  object
 13  similar_item  287051 non-null  object
 14  date          287051 non-null  object
 15  price         287051 non-null  object
 16  asin          287051 non-null  object
 17  details       287027 non-null  object
dtypes: object(18)
memory usa

In [24]:
# Clean blank values
grocery_data_copy.replace('', np.nan, inplace=True)
grocery_data_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 287051 entries, 0 to 287050
Data columns (total 18 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   category      287051 non-null  object 
 1   tech1         744 non-null     object 
 2   description   287051 non-null  object 
 3   fit           4 non-null       object 
 4   title         287048 non-null  object 
 5   also_buy      287051 non-null  object 
 6   image         287051 non-null  object 
 7   tech2         0 non-null       float64
 8   brand         275632 non-null  object 
 9   feature       287051 non-null  object 
 10  rank          287051 non-null  object 
 11  also_view     287051 non-null  object 
 12  main_cat      285688 non-null  object 
 13  similar_item  256 non-null     object 
 14  date          9662 non-null    object 
 15  price         133858 non-null  object 
 16  asin          287051 non-null  object 
 17  details       287027 non-null  object 
dtypes: f

In [25]:
# Clean prices
# Note: This has many additional checks and concepts added in different ways for learning purposes.
def convert_currency(val: str) -> float:
    """
    Converts input to string and output to Decimal (an accurate float for currency).
     - Remove $ and commas.
     - Change number ranges into np.NaN.
     - Change other problem characters to np.NaN.
     - All values must already be in the same currency.
    """
    new_val = str(val)
    
    # Setup Logging
    logging.basicConfig(filename=price_logs, filemode='w+', level=logging.INFO)
    
    # Changing problem values such as HTML and blanks to np.NaN as soon as possible.
    # TODO: best way to handle performance for these checks?
    if new_val == '':
        return np.NaN
    if ('/' in new_val) | ('{' in new_val) | ('}' in new_val) | ('<' in new_val) | ('>' in new_val):
        logging.info(f"Converting problem character(s) to np.NaN. Original: {new_val}")
        return np.NaN
    # TODO: Could change to the min, mean, or max price in the future.
    if ' - ' in new_val:
        logging.warning(f"Converting range of numbers to np.NaN. Original: {new_val}")
        return np.NaN
        
    # If other common problem characters are found in the data they should be removed here.
    # The exception is other currency symbols because that will remove key meaning from the data.
    new_val = new_val.replace(',','').replace('$', '')
    
    # Return the desired output or provide a warning.
    try:
        new_val = round(float(new_val), 2)
        return new_val
    except ValueError:
        # This is a type of issue the user can fix on a case by case basis.
        logging.warning(f"""
            WARNING: The new value cannot be converted. Only numbers and '.' are allowed in the data.
            - If there are values in multiple currencies the data must be converted into just $ first.
            - Most likely this problem is because of unexpected character(s) at: {val}.  
        """)
        raise ValueError

In [26]:
grocery_data_copy['price'] = grocery_data_copy['price'].apply(convert_currency)
grocery_data_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 287051 entries, 0 to 287050
Data columns (total 18 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   category      287051 non-null  object 
 1   tech1         744 non-null     object 
 2   description   287051 non-null  object 
 3   fit           4 non-null       object 
 4   title         287048 non-null  object 
 5   also_buy      287051 non-null  object 
 6   image         287051 non-null  object 
 7   tech2         0 non-null       float64
 8   brand         275632 non-null  object 
 9   feature       287051 non-null  object 
 10  rank          287051 non-null  object 
 11  also_view     287051 non-null  object 
 12  main_cat      285688 non-null  object 
 13  similar_item  256 non-null     object 
 14  date          9662 non-null    object 
 15  price         132036 non-null  float64
 16  asin          287051 non-null  object 
 17  details       287027 non-null  object 
dtypes: f

In [27]:
# Reformat date
grocery_data_copy['date'] = pd.to_datetime((grocery_data_copy['date']), errors='coerce')
grocery_data_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 287051 entries, 0 to 287050
Data columns (total 18 columns):
 #   Column        Non-Null Count   Dtype         
---  ------        --------------   -----         
 0   category      287051 non-null  object        
 1   tech1         744 non-null     object        
 2   description   287051 non-null  object        
 3   fit           4 non-null       object        
 4   title         287048 non-null  object        
 5   also_buy      287051 non-null  object        
 6   image         287051 non-null  object        
 7   tech2         0 non-null       float64       
 8   brand         275632 non-null  object        
 9   feature       287051 non-null  object        
 10  rank          287051 non-null  object        
 11  also_view     287051 non-null  object        
 12  main_cat      285688 non-null  object        
 13  similar_item  256 non-null     object        
 14  date          9555 non-null    datetime64[ns]
 15  price         132

### Questions<a id='Questions' />

1) Why do the counts appear off?  
- Is it because of whitespace? Don't think so because of the strip method when loading data. Also, selecting the cell area inside the dataframe doesn't highlight anything in these cells. However, it does select the text in non-empty cells.
- There are definitely more missing/blank values than what the initial information revealed.
#### Answers
- Blank values.
- Certain cells have the full HTML DOM (so long it doesn't display in Pandas and looks empty).
- Some prices contain a range of values.

2) Difference(s) for blank value versus blank list in Pandas?  
#### Answers
- The list is actually an object. 
- Can fix the data for the list using series functions. To fix the blank values can use dataframe functions.

3) Is all currency in dollars. Also, is this in the actual data or is this a formatter in Pandas for this column?
#### Answers
- All prices that had values were in dollars in the data.

4) Some lambda functions can only run one time per session. 
- Running just that cell multiple times results in an error.
- Is this common? 
- Should I handle this scenario or is it expected everything will only run once in sequence? 
#### Answers
- Running it one time was also having an issue. It just wasn't as severe as the error on the second attempt.

5) What is the best way of handling the duplicate 'asin'?
- Will making the 'asin' the index automatically merge the duplicates?
- Should half of them be deleted first?
- Should all of them be deleted since they are less than 1.5% of the data?

6) See TODOs in convert_currency function.


## Summary<a id='Summary' />

## Tests <a id='Tests' />
These would usually be in a separate directory or file.

### Unit Tests

In [29]:
class UnitTestExamples(unittest.TestCase):
    
    def test_formatting_removed(self):
        self.assertEqual(convert_currency('$5,000.00'), 5000.00)
    
    def test_whole_number(self):
        self.assertEqual(convert_currency(5), 5)
    
    def test_rounding_floats(self):
        self.assertEqual(convert_currency(.16666666666666666666), 0.17)
    
    def test_warn_problem_character(self):
        with self.assertRaises(ValueError):
            convert_currency('€5')
        
    def test_blank_values(self):
        self.assertTrue(np.isnan(convert_currency('')))
            
    def test_problem_characters(self):
        self.assertTrue(np.isnan(convert_currency('.a-box-inner{background-color:#fff}#alohaBuyBoxWidget')))
            
    def test_range_of_values(self):
        self.assertTrue(np.isnan(convert_currency('$17.24 - $39.20')))
   

### Integration Tests
        

In [30]:
class IntegrationTestExamples(unittest.TestCase):

   def test_data_file_exists(self):
       self.assertTrue(os.path.isfile(data_filepath))
       
   def test_documentation_file_exists(self):
       self.assertTrue(os.path.isfile(doc_filepath))
       
   def test_file_exists_logic(self):
       self.assertFalse(os.path.isfile('Does_Not_Exist.csv'))

### Test Results

In [31]:
unittest.main(argv=[''], verbosity=2, exit=False)

test_data_file_exists (__main__.IntegrationTestExamples) ... ok
test_documentation_file_exists (__main__.IntegrationTestExamples) ... ok
test_file_exists_logic (__main__.IntegrationTestExamples) ... ok
test_blank_values (__main__.UnitTestExamples) ... ok
test_formatting_removed (__main__.UnitTestExamples) ... ok
test_problem_characters (__main__.UnitTestExamples) ... ok
test_range_of_values (__main__.UnitTestExamples) ... ok
test_rounding_floats (__main__.UnitTestExamples) ... ok
test_warn_problem_character (__main__.UnitTestExamples) ... ok
test_whole_number (__main__.UnitTestExamples) ... ok

----------------------------------------------------------------------
Ran 10 tests in 0.010s

OK


<unittest.main.TestProgram at 0x2626d93da00>

## References <a id='references' />
1) Jianmo Ni 2018 [Amazon Review Data (2018)](https://nijianmo.github.io/amazon/).  
2) Rahul Agarwal 2019 [Apply and Lambda usage in pandas. Learn these to master Pandas.](https://towardsdatascience.com/apply-and-lambda-usage-in-pandas-b13a1ea037f7)  
3) Chris Moffitt 2018 [Overview of Pandas Data Types](https://pbpython.com/pandas_dtypes.html).