# Preprocess purchases and products for the ML inference tasks


Output: Saves a CSV summarizing the products data: 

`./preprocessed/products_table.csv`



Uses the already collected Google Books API data.

Makes a table mapping Product code (ASIN/ISBN) to product title, category

products_table:
```
ASIN/ISBN (Product Code), Category, Title, Purchasers
```

Some product IDs have multiple (different) titles and categories. Make this the canonical table. Choose the title/category as the most commonly occuring in the dataset.

Purchasers is the number of distinct Response IDs that purchased the product.

Also:
- fixes some ISBNs that lost 0's in their prefix


In [106]:
import datetime

from IPython.display import display
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# For Amazon data, make easier to access col names by defining here
RESPONSE_ID = 'Survey ResponseID'
DATE = 'Order Date'
UNIT_PRICE = 'Purchase Price Per Unit'
QUANTITY = 'Quantity'
TITLE = 'Title'
PRODUCT_CODE = 'ASIN/ISBN (Product Code)'
CATEGORY = 'Category'

# Additional information we add to the products table
GENDERED_CATEGORY = 'GENDERED CATEGORY'
RENAMED_CATEGORY = 'RENAMED CATEGORY'
FIXED_PRODUCT_CODE = 'FIXED ASIN/ISBN'

PURCHASES = 'purchases'
PURCHASERS = 'purchasers'

# use a consistent random state to reproduce outputs in the models
RANDOM = 0

In [107]:
products_table_fpath = './preprocessed/products_table.csv'

## Amazon data

Read in Amazon data.

In [108]:
amzn_data_fpath = '../data/amazon-data/amazon-data-cleaned.csv'
amzn_data = pd.read_csv(amzn_data_fpath, parse_dates=['Order Date'])
# convert NaNs in string columns to strings to avoid losing data in groupby/merge operations
amzn_data[PRODUCT_CODE] = amzn_data[PRODUCT_CODE].astype(str)
amzn_data[CATEGORY] = amzn_data[CATEGORY].astype(str)
amzn_data[TITLE] = amzn_data[TITLE].astype(str)
amzn_data.head(3)

Unnamed: 0,Order Date,Purchase Price Per Unit,Quantity,Shipping Address State,Title,ASIN/ISBN (Product Code),Category,Survey ResponseID
0,2018-12-04,7.98,1.0,NJ,SanDisk Ultra 16GB Class 10 SDHC UHS-I Memory Card up to 80MB/s (SDSDUNC-016G-GN6IN),B0143RTB1E,FLASH_MEMORY,R_01vNIayewjIIKMF
1,2018-12-22,13.99,1.0,NJ,"Betron BS10 Earphones Wired Headphones in Ear Noise Isolating Earbuds with Microphone and Volume Control Powerful Bass Driven Sound, 12mm Large Drivers, Ergonomic Design",B01MA1MJ6H,HEADPHONES,R_01vNIayewjIIKMF
2,2018-12-24,8.99,1.0,NJ,,B078JZTFN3,,R_01vNIayewjIIKMF


In [109]:
def print_amazon_data_metrics(df):
    print('%s purchases' % len(df))
    print('%s response Ids' % df[RESPONSE_ID].nunique())
    
print_amazon_data_metrics(amzn_data)

1850717 purchases
5027 response Ids


Warning: This is a slow operation

I do this with groupby operations because it is much faster this way.

In [110]:
products_table = amzn_data.groupby(
    [PRODUCT_CODE]
)[RESPONSE_ID].nunique().sort_values(ascending=False).rename(PURCHASERS).to_frame()
product_cats = (
    amzn_data.groupby([PRODUCT_CODE])[CATEGORY].apply(list)
    .apply(lambda cats: pd.Series(cats).apply(str).apply(str.upper).mode()[0])
)
product_titles = amzn_data.groupby(
    [PRODUCT_CODE]
)[TITLE].apply(list).apply(lambda ts: pd.Series(ts).apply(str).mode()[0])
products_table[CATEGORY] = products_table.index.map(product_cats)
products_table[TITLE] = products_table.index.map(product_titles)

In [111]:
products_table.head()

Unnamed: 0_level_0,purchasers,Category,Title
ASIN/ISBN (Product Code),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
B00IX1I3G6,1157,GIFT_CARD,Amazon.com Gift Card Balance Reload
B086KKT3RX,875,ABIS_GIFT_CARD,Amazon Reload
B07PCMWTSG,543,GIFT_CARD,Amazon.com eGift Card
B004LLIKVU,467,GIFT_CARD,Amazon.com eGift Card
B07FZ8S74R,377,DIGITAL_DEVICE_3,"Echo Dot (3rd Gen, 2018 release) - Smart speaker with Alexa - Charcoal"


In [112]:
products_table[CATEGORY].nunique()

1857

In [113]:
amzn_data[CATEGORY].nunique()

1872

In [114]:
def is_isbn(pcode):
    pcode = str(pcode)
    if (len(pcode) < 8) or (len(pcode) > 13):
        return False
    # Sometimes ISBNs end in X
    if pcode.upper().endswith('X'):
        pcode = pcode[:-1]
    try:
        int(pcode)
        return True
    except Exception as e:
        return False

In [115]:
isbns = products_table[products_table.index.map(is_isbn)==True].index.tolist()

### Fixing duplicate product codes caused by leading 0's

Question: Are there ISBNs in the data that should have a 0 prefix? i.e. they lost a prefixed zero in the data transfer and preprocessing?

Answer: Yes

This matters for ISBNs that are in the data twice -- both with and without the prefixed 0, included as separate products, possibly resulting in different categories.

In [116]:
prefix0_codes = [isbn for isbn in isbns if isbn.startswith('0')]
prefix00_codes = [isbn for isbn in isbns if isbn.startswith('00')]
print('%s codes start with 0' % len(prefix0_codes))
print('%s codes start with 00' % len(prefix00_codes))

23890 codes start with 0
2678 codes start with 00


In [117]:
"""
Make a map of codes to potentially fix
prefix00_fixes = {code without 00: code with 00 if found}
"""
prefix00_fixes = dict()
for isbn in isbns:
    if (isbn.startswith('0') or (len(isbn) > 9)): continue
    if '00'+isbn in prefix00_codes:
        prefix00_fixes[isbn] = '00'+isbn
print('found %s ISBNs to add 00 prefix' % len(prefix00_fixes))

found 140 ISBNs to add 00 prefix


In [118]:
prefix00_fixes

{'62970704': '0062970704',
 '62457713': '0062457713',
 '62457802': '0062457802',
 '61438294': '0061438294',
 '60935464': '0060935464',
 '60899220': '0060899220',
 '60850523': '0060850523',
 '62413511': '0062413511',
 '62422995': '0062422995',
 '62315005': '0062315005',
 '62316117': '0062316117',
 '62319795': '0062319795',
 '62803832': '0062803832',
 '61172111': '0061172111',
 '60555661': '0060555661',
 '61992275': '0061992275',
 '62049534': '0062049534',
 '62073486': '0062073486',
 '62086286': '0062086286',
 '61537969': '0061537969',
 '63240807': '0063240807',
 '62963678': '0062963678',
 '60012781': '0060012781',
 '60093463': '0060093463',
 '60248025': '0060248025',
 '60256672': '0060256672',
 '60283246': '0060283246',
 '60753641': '0060753641',
 '60780665': '0060780665',
 '60837020': '0060837020',
 '60839783': '0060839783',
 '60883286': '0060883286',
 '60959479': '0060959479',
 '60520620': '0060520620',
 '60530898': '0060530898',
 '60559713': '0060559713',
 '60731338': '0060731338',
 

In [119]:
"""
Make a map of codes to potentially fix
prefix0_fixes = {code without 0: code with 0 if found}
"""
prefix0_fixes = dict()
for isbn in isbns:
    if (isbn.startswith('0') or (len(isbn) > 9)): continue
    if isbn in prefix00_fixes.keys(): continue
    if '0'+isbn in prefix0_codes:
        prefix0_fixes[isbn] = '0'+isbn
print('found %s ISBNs to add 0 prefix' % len(prefix0_fixes))

found 1019 ISBNs to add 0 prefix


In [120]:
product_code_fixes = {**prefix00_fixes, **prefix0_fixes}
print('%s total product codes to fix' % len(product_code_fixes))

1159 total product codes to fix


In [121]:
products_table[FIXED_PRODUCT_CODE] = products_table.index.map(lambda p: product_code_fixes[p] if p in product_code_fixes else p)

In [122]:
products_table.head()

Unnamed: 0_level_0,purchasers,Category,Title,FIXED ASIN/ISBN
ASIN/ISBN (Product Code),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
B00IX1I3G6,1157,GIFT_CARD,Amazon.com Gift Card Balance Reload,B00IX1I3G6
B086KKT3RX,875,ABIS_GIFT_CARD,Amazon Reload,B086KKT3RX
B07PCMWTSG,543,GIFT_CARD,Amazon.com eGift Card,B07PCMWTSG
B004LLIKVU,467,GIFT_CARD,Amazon.com eGift Card,B004LLIKVU
B07FZ8S74R,377,DIGITAL_DEVICE_3,"Echo Dot (3rd Gen, 2018 release) - Smart speaker with Alexa - Charcoal",B07FZ8S74R


#### Data inquiry

Before making any changes:
- How many distinct book products (ISBNs)?
- How many different categories for them?
- How many are categorized as...?
    - BOOK
    - ABIS_BOOK
    - BOOK OR ABIS_BOOK
    
This is a count using unfixed ASIN/ISBNs

In [123]:
isbn_category_counts = products_table[
    products_table.index.map(is_isbn)==True
][CATEGORY].value_counts().sort_values(ascending=False)
print('%s total categories for ISBNs' % len(isbn_category_counts))
isbn_category_counts.head(20)

184 total categories for ISBNs


Category
ABIS_BOOK          56634
NAN                  552
CALENDAR             506
BLANK_BOOK           216
TOYS_AND_GAMES       145
PHYSICAL_MOVIE       138
TABLETOP_GAME        137
PLANNER              125
PUZZLES              107
BOARD_GAME            61
GREETING_CARD         59
FLASH_CARD            53
OFFICE_PRODUCTS       52
ABIS_MUSIC            42
STICKER_DECAL         38
MAPS                  33
ABIS_DVD              31
TOY_FIGURE            30
WALL_ART              23
ART_CRAFT_KIT         20
Name: count, dtype: int64

In [124]:
isbn_category_counts.sum()

59440

In [125]:
print('%s total ISBNs/book products' % len(products_table[
    products_table.index.map(is_isbn)==True
]))
print('%0.3f = %s/%s have category "ABIS_BOOK"' % (
    isbn_category_counts.loc['ABIS_BOOK']/isbn_category_counts.sum(), 
    isbn_category_counts.loc['ABIS_BOOK'], isbn_category_counts.sum()))
print('%0.3f = %s/%s have category "BOOK"' % (
    isbn_category_counts.loc['BOOK']/isbn_category_counts.sum(), 
    isbn_category_counts.loc['BOOK'], isbn_category_counts.sum()))
print('%0.3f = %s/%s have category "ABIS_BOOK" or "BOOK"' % (
    (isbn_category_counts.loc['ABIS_BOOK']+isbn_category_counts.loc['BOOK'])/isbn_category_counts.sum(), 
    (isbn_category_counts.loc['ABIS_BOOK']+isbn_category_counts.loc['BOOK']), isbn_category_counts.sum()))

print('%0.3f = %s/%s have no category ("NAN")' % (
    isbn_category_counts.loc['NAN']/isbn_category_counts.sum(), 
    isbn_category_counts.loc['NAN'], isbn_category_counts.sum()))

print('%0.3f = %s/%s have category "ABIS_BOOK" or "BOOK" or "NAN"' % (
    (isbn_category_counts.loc['ABIS_BOOK']+isbn_category_counts.loc['BOOK']+isbn_category_counts.loc['NAN'])/isbn_category_counts.sum(), 
    (isbn_category_counts.loc['ABIS_BOOK']+isbn_category_counts.loc['BOOK']+isbn_category_counts.loc['NAN']), isbn_category_counts.sum()))

59440 total ISBNs/book products
0.953 = 56634/59440 have category "ABIS_BOOK"
0.000 = 19/59440 have category "BOOK"
0.953 = 56653/59440 have category "ABIS_BOOK" or "BOOK"
0.009 = 552/59440 have no category ("NAN")
0.962 = 57205/59440 have category "ABIS_BOOK" or "BOOK" or "NAN"


### Category fixes

Recategorize 'ABIS_BOOK' to generic 'BOOK'

Fixups:

These were found by inspecting products with ISBNs but not categorized as books.

There are likely more badly categorized products.

In [126]:
BOOK = 'BOOK' # manually reassigned. Not ABIS_BOOK

# maps ISBN to appropriate category
manual_cat_fixups = {
    '0374300216': BOOK,
    '1338323210': BOOK,
    '0761189602': BOOK,
    '1338236598': BOOK,
    '0062498533': BOOK,
    '0803736800': BOOK,
    '1572245379': BOOK,
    # Some of these books were categorized as Nan; likely purchased on non US Amazon
    '1728677572': BOOK,
    '1020310243': BOOK,
    
    '0840701055': BOOK,
    '0062270451': BOOK,
    '1973919095': BOOK,
    '0451474570': BOOK,
    '0786967226': BOOK,
    '0970267126': BOOK,
    '1608874478': BOOK,
    '1673455980': BOOK,
    '1593995520': BOOK,
    '0061128392': BOOK,
    '1250170990': BOOK, # not a leotard
    '374300216': BOOK,
    '1484757386': BOOK,

    '1091852979': BOOK, # nan

    '1933054395': 'GAMES'
}

In [127]:
# Before fixups
products_table.loc[manual_cat_fixups.keys()]

Unnamed: 0_level_0,purchasers,Category,Title,FIXED ASIN/ISBN
ASIN/ISBN (Product Code),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
374300216,34,SCREEN_PROTECTOR,If Animals Kissed Good Night,374300216
1338323210,28,HEADPHONES,Dog Man: Fetch-22: From the Creator of Captain Underpants (Dog Man #8),1338323210
761189602,24,ELECTRONIC_ADAPTER,Paint by Sticker Kids: Zoo Animals: Create 10 Pictures One Sticker at a Time!,761189602
1338236598,21,BOOK_COVER,Dog Man: For Whom the Ball Rolls: From the Creator of Captain Underpants (Dog Man #7),1338236598
62498533,16,SCREEN_PROTECTOR,The Hate U Give: A Printz Honor Winner,62498533
803736800,13,HEADPHONES,Dragons Love Tacos,803736800
1572245379,12,BOOK_COVER,The Untethered Soul: The Journey Beyond Yourself,1572245379
1728677572,10,NAN,,1728677572
1020310243,9,NAN,,1020310243
840701055,1,CABINET,Vest Pocket New Testament With Psalms,840701055


In [128]:
# Reassign cat fixups
code_cat = products_table[CATEGORY].to_dict()
code_cat.update(manual_cat_fixups)
code_cat = {code:(BOOK if cat=='ABIS_BOOK' else cat) for (code, cat) in code_cat.items()}
products_table[CATEGORY] = products_table.index.map(code_cat)

In [129]:
products_table.loc[manual_cat_fixups.keys()]

Unnamed: 0_level_0,purchasers,Category,Title,FIXED ASIN/ISBN
ASIN/ISBN (Product Code),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
374300216,34,BOOK,If Animals Kissed Good Night,374300216
1338323210,28,BOOK,Dog Man: Fetch-22: From the Creator of Captain Underpants (Dog Man #8),1338323210
761189602,24,BOOK,Paint by Sticker Kids: Zoo Animals: Create 10 Pictures One Sticker at a Time!,761189602
1338236598,21,BOOK,Dog Man: For Whom the Ball Rolls: From the Creator of Captain Underpants (Dog Man #7),1338236598
62498533,16,BOOK,The Hate U Give: A Printz Honor Winner,62498533
803736800,13,BOOK,Dragons Love Tacos,803736800
1572245379,12,BOOK,The Untethered Soul: The Journey Beyond Yourself,1572245379
1728677572,10,BOOK,,1728677572
1020310243,9,BOOK,,1020310243
840701055,1,BOOK,Vest Pocket New Testament With Psalms,840701055


### Prefix categories with MEN/WOMEN

In [130]:
import string

def for_men_title(title):
    t = str(title).translate(str.maketrans('', '', string.punctuation)).lower().split()
    if any([(w in t) for w in ["womens", "women"]]): return False
    if any([w in t for w in ["mens", "men"]]): return True
    return False

def for_women_title(title):
    t = str(title).translate(str.maketrans('', '', string.punctuation)).lower().split()
    if any([w in t for w in ["mens", "men"]]): return False
    if any([w in t for w in ["womens", "women"]]): return True
    return False

In [131]:
products_table['MEN'] = products_table[TITLE].apply(for_men_title)
products_table['WOMEN'] = products_table[TITLE].apply(for_women_title)

In [132]:
def get_men_women_category(row):
    if row['WOMEN']: return 'WOMEN:%s'%row[CATEGORY]
    if row['MEN']: return 'MEN:%s'%row[CATEGORY]
    return  row[CATEGORY]

products_table[GENDERED_CATEGORY] = products_table.apply(get_men_women_category, axis=1)
products_table[RENAMED_CATEGORY] = products_table[GENDERED_CATEGORY]

In [133]:
pd.set_option('display.max_colwidth', None)
products_table[products_table['WOMEN']==True].head()

Unnamed: 0_level_0,purchasers,Category,Title,FIXED ASIN/ISBN,MEN,WOMEN,GENDERED CATEGORY,RENAMED CATEGORY
ASIN/ISBN (Product Code),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
B0787GLBMV,184,MANUAL_SHAVING_RAZOR,"Schick Hydro Silk Touch-Up Dermaplaning Tool, 3 Count | Eyebrow Razor, Face Razors for Women, Face Shaver, Dermaplane",B0787GLBMV,False,True,WOMEN:MANUAL_SHAVING_RAZOR,WOMEN:MANUAL_SHAVING_RAZOR
B00DFSHD5O,67,HAIR_TIE,"Goody Ouchless Womens Elastic Hair Tie - 27 Count, Black - 4MM for Medium Hair- Hair Accessories for Women Perfect for Long Lasting Braids, Ponytails and More - Pain-Free (Packaging May Vary)",B00DFSHD5O,False,True,WOMEN:HAIR_TIE,WOMEN:HAIR_TIE
B007L0DPE0,66,VITAMIN,"Vitafusion Women's Multivitamin Gummies, Berry Flavored Womens Daily Multivitamins, 150 Count",B007L0DPE0,False,True,WOMEN:VITAMIN,WOMEN:VITAMIN
B00314WHW6,55,HOME_BED_AND_BATH,"SlipX Solutions Bottomless Bath | Overflow Drain Cover for Tub | Best Gifts for Mom, Spa & Bath Accessories | Drain Block, Water Stopper Plug | Bath Essentials for Women | 4"" Diameter, Clear",B00314WHW6,False,True,WOMEN:HOME_BED_AND_BATH,WOMEN:HOME_BED_AND_BATH
B01ALAVIK4,48,SHAVING_AGENT,"Gillette Satin Care Ultra Sensitive Shave Gel for Women, Pack of 2, 7oz Each, Frangrance Free",B01ALAVIK4,False,True,WOMEN:SHAVING_AGENT,WOMEN:SHAVING_AGENT


### Add Books categories

criteria:

have ISBNs: https://en.wikipedia.org/wiki/ISBN

[Sometimes ISBNs end in X](https://www.isbn.org/faqs_general_questions#:~:text=Back%20to%20top-,Why%20do%20some%20ISBNs%20end%20in%20an%20%22X%22%3F,upper%20case%20X%20can%20appear)

There are some very badly categorized items.

In [134]:
books_table = products_table[products_table.index.map(is_isbn)==True][[PURCHASERS, CATEGORY, TITLE]]
print('%s products in the books table' % len(books_table))

59440 products in the books table


In [135]:
print('%s books categories with Amazon data alone' % books_table[CATEGORY].nunique())

177 books categories with Amazon data alone


### Read in the categorized books table

These categories were pulled from the Google Books API.

For many ISBNs, category data was not available from the Google Books API.

When categories are available we rename category as 

BOOK:[Google Books Category]

Otherwise: Keep category as-is.

In [136]:
google_books_api_isbn_category_fpath = './preprocessed/google_books_api_isbn_category.csv'
isbn_cat_df = pd.read_csv(google_books_api_isbn_category_fpath, 
                          dtype={'ISBN':'str'}, index_col=0)
isbn_cat_df = isbn_cat_df.dropna()
isbn_cat_df[CATEGORY] = isbn_cat_df[CATEGORY].astype(str).apply(str.upper)
isbn_cat_df.head()

Unnamed: 0_level_0,Category
ISBN,Unnamed: 1_level_1
143127748,MEDICAL
1524763136,BIOGRAPHY & AUTOBIOGRAPHY
735211299,BUSINESS & ECONOMICS
786965606,GAMES & ACTIVITIES
1501110365,FICTION


In [137]:
print('%s ISBNs in Google books API table after dropping uncategorized' % len(isbn_cat_df))

42676 ISBNs in Google books API table after dropping uncategorized


In [138]:
display(isbn_cat_df[CATEGORY].value_counts().describe())

count    2697.000000
mean       15.823508
std       183.407022
min         1.000000
25%         1.000000
50%         1.000000
75%         2.000000
max      5687.000000
Name: count, dtype: float64

In [139]:
print('Top categories:')
display(isbn_cat_df[CATEGORY].value_counts().head(10))

Top categories:


Category
FICTION                      5687
JUVENILE FICTION             5595
JUVENILE NONFICTION          2576
COMICS & GRAPHIC NOVELS      2281
BIOGRAPHY & AUTOBIOGRAPHY    1609
COOKING                      1303
RELIGION                     1288
HISTORY                      1271
BUSINESS & ECONOMICS         1239
EDUCATION                     964
Name: count, dtype: int64

In [140]:
def get_book_category(row):
    isbn = row.name
    if isbn in isbn_cat_df.index:
        return '%s:%s' % (BOOK, isbn_cat_df.loc[isbn][CATEGORY])
    return row[CATEGORY]

books_table[RENAMED_CATEGORY] = books_table.apply(get_book_category, axis=1)
books_table.head()

Unnamed: 0_level_0,purchasers,Category,Title,RENAMED CATEGORY
ASIN/ISBN (Product Code),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
143127748,65,BOOK,"The Body Keeps the Score: Brain, Mind, and Body in the Healing of Trauma",BOOK:MEDICAL
1524763136,63,BOOK,Becoming,BOOK:BIOGRAPHY & AUTOBIOGRAPHY
735211299,59,BOOK,Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,BOOK:BUSINESS & ECONOMICS
786965606,52,TABLETOP_GAME,D&D Player’s Handbook (Dungeons & Dragons Core Rulebook),BOOK:GAMES & ACTIVITIES
1501110365,51,BOOK,It Ends with Us: A Novel (1),BOOK:FICTION


In [141]:
product_code_cat = products_table[RENAMED_CATEGORY].to_dict()
product_code_cat.update(books_table[RENAMED_CATEGORY].to_dict())
products_table[RENAMED_CATEGORY] = products_table.index.map(product_code_cat)
products_table.head()

Unnamed: 0_level_0,purchasers,Category,Title,FIXED ASIN/ISBN,MEN,WOMEN,GENDERED CATEGORY,RENAMED CATEGORY
ASIN/ISBN (Product Code),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
B00IX1I3G6,1157,GIFT_CARD,Amazon.com Gift Card Balance Reload,B00IX1I3G6,False,False,GIFT_CARD,GIFT_CARD
B086KKT3RX,875,ABIS_GIFT_CARD,Amazon Reload,B086KKT3RX,False,False,ABIS_GIFT_CARD,ABIS_GIFT_CARD
B07PCMWTSG,543,GIFT_CARD,Amazon.com eGift Card,B07PCMWTSG,False,False,GIFT_CARD,GIFT_CARD
B004LLIKVU,467,GIFT_CARD,Amazon.com eGift Card,B004LLIKVU,False,False,GIFT_CARD,GIFT_CARD
B07FZ8S74R,377,DIGITAL_DEVICE_3,"Echo Dot (3rd Gen, 2018 release) - Smart speaker with Alexa - Charcoal",B07FZ8S74R,False,False,DIGITAL_DEVICE_3,DIGITAL_DEVICE_3


In [142]:
print('saving products_table to %s...' % products_table_fpath)
products_table.to_csv(products_table_fpath)
print('...saved')

saving products_table to ./preprocessed/products_table.csv...
...saved


In [143]:
pd.read_csv(products_table_fpath, index_col=0).head()

Unnamed: 0_level_0,purchasers,Category,Title,FIXED ASIN/ISBN,MEN,WOMEN,GENDERED CATEGORY,RENAMED CATEGORY
ASIN/ISBN (Product Code),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
B00IX1I3G6,1157,GIFT_CARD,Amazon.com Gift Card Balance Reload,B00IX1I3G6,False,False,GIFT_CARD,GIFT_CARD
B086KKT3RX,875,ABIS_GIFT_CARD,Amazon Reload,B086KKT3RX,False,False,ABIS_GIFT_CARD,ABIS_GIFT_CARD
B07PCMWTSG,543,GIFT_CARD,Amazon.com eGift Card,B07PCMWTSG,False,False,GIFT_CARD,GIFT_CARD
B004LLIKVU,467,GIFT_CARD,Amazon.com eGift Card,B004LLIKVU,False,False,GIFT_CARD,GIFT_CARD
B07FZ8S74R,377,DIGITAL_DEVICE_3,"Echo Dot (3rd Gen, 2018 release) - Smart speaker with Alexa - Charcoal",B07FZ8S74R,False,False,DIGITAL_DEVICE_3,DIGITAL_DEVICE_3
