# Create product categories

The main objective of this project is to use data to define a pricing strategy concerning discounts. However, the analysis of a numerical variable (**product prices**) is highly enriched by the presence of a meaningful categorical variable by which data can be sliced, filtered and grouped. This categorical variable is going to be **product categories**.

We first need to import the Pandas module:

In [1]:
import pandas as pd

To create product categories, we use the cleaned products DataFrame. 

In [3]:
# products_cl.csv
url = "https://drive.google.com/file/d/1b9MQQa_NRUiIiGgM7-BVznXgTYePkeZ6/view?usp=sharing"
path = "https://drive.google.com/uc?export=download&id="+url.split("/")[-2]
products_cl = pd.read_csv(path)

In [4]:
product_category_df = products_cl.copy()

In [6]:
product_category_df.sample(3)

Unnamed: 0,sku,name,desc,price,in_stock,type
4261,STM0094,"Dux STM Case Macbook Air 13 ""Black / Transparent",Transparent protective cover for MacBook Air 13,59.95,1,13835403
8110,LOG0237,Logitech Harmony Remote Intelligent Companion,Intelligent controller with direct control button and compatible numeric keys with 270.000 devices,129.0,1,11905404
6486,SNN0022,Sonnet Echo Express SE Thunderbolt PCIe box,Thunderbolt Expansion Chassis PCIe Cards,482.79,0,12995397


Categories will emerge from the ``products`` table, specifically from the ``name`` and the ``desc`` columns. Therefore, we will explore them and come up with general rules about patterns (words, characters, sequences…) that can reliably tell that a certain product belongs to one of the categories we want to have.

We set a high number for pandas ``max_rows`` to be able to scroll through the DataFrame and be able to read the full product name and description, and check whether or not the rules we created make sense.

In [7]:
pd.set_option('display.max_rows', 1000)
pd.set_option("display.max_colwidth", 100)

## 1.&nbsp; Category creation by search term
Let's start by creating a column `category`. For now we'll fill this column with a blank string `""`.

In [8]:
product_category_df["category"] = ""
product_category_df.sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
1120,PAC0593,Apple Mac mini Core i5 14GHz | 4GB RAM | 1TB SSD,PC Mac mini Core i5 14GHz 4GB 1TB SSD (MGEM2YP / A).,1275.59,0,1282,
8445,NKI0014,Nokia Body Cardio Balance Scale White,scale measuring cardiovascular health and body APP for iPad and iPhone,179.99,1,11905404,


We can find all the products with certain words in their `description` using `.loc[]` and `.str.contains()`. Here we'll look at all the items that have the word `keyboard` in their description.

In [10]:
product_category_df.loc[product_category_df["desc"].str.contains("keyboard", case=False)].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
6994,LOG0229,Logitech Desktop MK120 Keyboard and USB Mouse Black,Keyboard and mouse with USB connection for Mac and PC,25.99,1,13855401,
9625,MAT0004-A,Open - Matias Aluminum Keyboard Spanish,Reconditioned keyboard cable compatible with Mac features,69.99,0,13855401,


Next, we change the value in the category column to `keyboard` for all of these keyboard products.

In [11]:
product_category_df.loc[product_category_df["desc"].str.contains("keyboard", case=False), "category"] = "keyboard"

Let's take a look at the effect that had on the `category` column.

In [12]:
product_category_df["category"].value_counts()

category
            9903
keyboard      89
Name: count, dtype: int64

## 2.&nbsp; Category creation using regex
We can also use a product's `name` to select products for our categories.

In [13]:
product_category_df.loc[product_category_df["name"].str.contains("apple iphone", case=False)].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
2058,APP1014,Lightning White Apple iPhone Dock,Stand with Lightning Dock Connector and iPhone in White.,45.0,0,13615399,
6880,APP2032,Open - Apple iPhone 6s 64GB Gold - like new,Apple Apple iPhone 6s 64GB Gold (MKQP2QL / A),769.0,0,1716,


Let's look for names, which starts with ``apple iphone``:

In [14]:
product_category_df.loc[product_category_df["name"].str.contains("^apple iphone", case=False)].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
8386,APP2529,Apple iPhone Leather Case Cover Skin 8/7 (PRODUCT) RED,ultrathin leather case and microfiber premium for iPhone 8/7,55.0,1,11865403,
7519,APP2099,Apple iPhone Leather Case Cover 7 Pink Geranium,ultrathin leather case and microfiber premium for iPhone 7,55.0,0,11865403,


Looks like we get a lot of accessories included in this search. We can refine this using a little regex. Here, we will add `.{0,7}` at the beginning of the search: this means we will find all `apple iphone`s that have 7 or less characters preceding the term "apple iphone" - if there's 8 characters preceding the search term, it won't be found. This should help refine our search by using the nomenclature of the DataFrame to our advantage.

If you feel unsure about regex, you can use [regex101](https://regex101.com/). It's really useful for checking your code, and parts of other people's code that you're unsure about.

In [16]:
product_category_df.loc[product_category_df["name"].str.contains("^.{0,7}apple iphone", case=False)].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
8389,APP2532,Apple iPhone Leather Case Cover Skin 8/7 Blue Cosmos,ultrathin leather case and microfiber premium for iPhone 8/7,55.0,1,11865403,
2547,APP1137,Case Apple iPhone 6 / 6S Gray Pink Leather Case,ultrathin leather case and microfiber premium for iPhone 6 / 6S.,55.0,0,11865403,


We can use the same trick as before to set the category - selecting the `category` column and setting it to the string of our choice. However, ``^.{0,7}apple iphone`` is still not an accurate definition to be used for apple iphones categories.

## 3.&nbsp; One product with multiple categories
A product may fit into multiple categories. To help us create multiple categories for one product, we will use the python addition assignment `+=`. The addition assignment is a shorthand way to add something (number, string, etc...) to a variable without changing the variable name.

Now let's look at how this can help us in our category creation.

First, we'll reset all the values in the category column to an empty string `""`.

In [18]:
product_category_df["category"] = ""

In [23]:
product_category_df.sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
1410,KEN0208,Kensington UH4000C Hub 4 port power supply,Mac and PC Hub 4 USB 3.0 ports.,44.99,0,12995397,
715,NTE0115,NewerTech Maxpower 2-port USB / eSATA 6Gb / s PCIe Mac Pro OS X 10.6 - 10.8.2,NewerTech eSATA PCIe adapter card for Mac Pro and USB3 2006-2012.,181.99,0,1276,


Now, let's create some categories and utilise the addition assignment.

In [24]:
product_category_df.loc[product_category_df["desc"].str.contains("keyboard", case=False), "category"] += ", keyboard"
product_category_df.loc[product_category_df["name"].str.contains("^.{0,3}apple iphone", case=False), "category"] += ", smartphone"
product_category_df.loc[product_category_df["name"].str.contains("^.{0,3}apple ipod", case=False), "category"] += ", ipod"
product_category_df.loc[product_category_df["name"].str.contains("^.{0,3}apple ipad|tablet", case=False), "category"] += ", tablet"
product_category_df.loc[product_category_df["name"].str.contains("imac|mac mini|mac pro", case=False), "category"] += ", desktop"

In [36]:
product_category_df.sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
1082,PAC0520,Synology DS115j Pack | WD 4TB Network,Synology DS115j NAS Server Pack + 4TB WD Network for Mac and PC.,260.99,0,12175397,
8215,APP2484,Apple iPhone 8 Plus 64GB Silver,Apple iPhone 8 Plus 64GB Silver,919.0,1,113281716,", smartphone"


In [37]:
product_category_df["category"].value_counts()

category
                       8467
, desktop               923
, tablet                284
, smartphone            189
, keyboard               86
, ipod                   40
, keyboard, desktop       2
, keyboard, tablet        1
Name: count, dtype: int64

As we can see, some products now have 2 categories instead of just one. At the end, we can use some string methods to tidy up the opening comma and space in the `category` column:

In [38]:
product_category_df.loc[:, "category"]=product_category_df["category"].str[2:]
product_category_df["category"].value_counts()

category
                     8467
desktop               923
tablet                284
smartphone            189
keyboard               86
ipod                   40
keyboard, desktop       2
keyboard, tablet        1
Name: count, dtype: int64

We can also define a column to categorize products using their price:

In [39]:
product_category_df.loc[:, "price_category"] = ["low" if price < 100 else "Medium" if price <= 500 else "High" for price in product_category_df["price"]]

In [40]:
product_category_df.sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category,price_category
678,MOS0115,Moshi SenseCover iPhone Case SE / 5s / 5 Black,Case for iPhone touch sensor SE / 5s / 5,45.0,0,11865403,,low
3931,BOS0034,Bose SoundLink AE II Bluetooth Headset with Microphone Black,HD light with microphone for iPhone iPad and iPod wireless headset.,229.0,0,5384,,Medium


We can also define a mask to look for strings in both ``name`` and ``description``.

In [41]:
hard_disk_mask = (
    (product_category_df["name"].str.contains("hard drive|Hard disk", case=False)) 
    | 
    (product_category_df["desc"].str.contains("hard drive|Hard disk", case=False)) 
    & 
    ~(product_category_df["name"].str.contains("adapter", case=False)) 
    | 
    ~(product_category_df["desc"].str.contains("adapter", case=False))
    )

product_category_df.loc[hard_disk_mask, "category"] += ", hard disk drive"

In [42]:
product_category_df["category"].value_counts()

category
, hard disk drive                     8150
desktop, hard disk drive               904
                                       317
tablet, hard disk drive                284
smartphone, hard disk drive            189
keyboard, hard disk drive               86
ipod, hard disk drive                   40
desktop                                 19
keyboard, desktop, hard disk drive       2
keyboard, tablet, hard disk drive        1
Name: count, dtype: int64

Let's create multiple categories for the most repetitive expressions in ``name`` and ``description`` columns:

In [43]:
hard_drive_mask = (product_category_df["name"].str.contains("hard drive", case=False)) | (product_category_df["desc"].str.contains("hard drive", case=False))
hard_disk_mask = (product_category_df["name"].str.contains("Hard disk", case=False)) | (product_category_df["desc"].str.contains("Hard disk", case=False))
product_category_df.loc[hard_drive_mask | hard_disk_mask, "category"] += ", hard disk drive"
product_category_df.loc[(product_category_df["name"].str.contains("keyboard", case=False)) | (product_category_df["desc"].str.contains("keyboard", case=False)), "category"] += ", keyboard"
product_category_df.loc[(product_category_df["name"].str.contains("^.{0,7}apple iphone", case=False)) | (product_category_df["desc"].str.contains("apple iphone", case=False)), "category"] += ", smartphone"
product_category_df.loc[(product_category_df["name"].str.contains("apple ipod", case=False)) | (product_category_df["desc"].str.contains("^.{0,7}apple ipod", case=False)), "category"] += ", ipod"
product_category_df.loc[(product_category_df["name"].str.contains("^.{0,7}apple ipad|tablet|ipad", case=False)) | (product_category_df["desc"].str.contains("^.{0,7}apple ipad|tablet|ipad", case=False)), "category"] += ", tablet"
product_category_df.loc[(product_category_df["name"].str.contains("imac|mac mini|mac pro", case=False)) | (product_category_df["desc"].str.contains("imac|mac mini|mac pro", case=False)), "category"] += ", desktop"
product_category_df.loc[(product_category_df["name"].str.contains("software", case=False)) | (product_category_df["desc"].str.contains("software", case=False)), "category"] += ", software"
product_category_df.loc[(product_category_df["name"].str.contains("Battery", case=False)) | (product_category_df["desc"].str.contains("Battery", case=False)), "category"] += ", battery"
product_category_df.loc[(product_category_df["name"].str.contains("Dell", case=False)) | (product_category_df["desc"].str.contains("Dell", case=False)), "category"] += ", Dell"
product_category_df.loc[(product_category_df["name"].str.contains("Display", case=False)) | (product_category_df["desc"].str.contains("Display", case=False)), "category"] += ", display"
product_category_df.loc[(product_category_df["name"].str.contains("Monitor", case=False)) | (product_category_df["desc"].str.contains("Monitor", case=False)), "category"] += ", monitor"
product_category_df.loc[(product_category_df["name"].str.contains("bulb", case=False)) | (product_category_df["desc"].str.contains("bulb", case=False)), "category"] += ", Light bulb"
product_category_df.loc[(product_category_df["name"].str.contains("Case|cover", case=False)) | (product_category_df["desc"].str.contains("Case|cover", case=False)), "category"] += ", Case"
product_category_df.loc[(product_category_df["name"].str.contains("Bose", case=False)) | (product_category_df["desc"].str.contains("Bose", case=False)), "category"] += ", Bose"
product_category_df.loc[(product_category_df["name"].str.contains("Headphones", case=False)) | (product_category_df["desc"].str.contains("Headphones", case=False)) | (product_category_df["name"].str.contains("headset", case=False)) | (product_category_df["desc"].str.contains("headset", case=False)), "category"] += ", headphone"
product_category_df.loc[(product_category_df["name"].str.contains("RAM | Memory Card", case=False)) | (product_category_df["desc"].str.contains("RAM | Memory Card", case=False)), "category"] += ", RAM"
product_category_df.loc[(product_category_df["name"].str.contains("strip", case=False)) | (product_category_df["desc"].str.contains("strip", case=False)), "category"] += ", plug strip"
product_category_df.loc[(product_category_df["name"].str.contains("Seagate", case=False)) | (product_category_df["desc"].str.contains("Seagate", case=False)), "category"] += ", Seagate hard disk drive"
product_category_df.loc[(product_category_df["name"].str.contains("USB", case=False)) | (product_category_df["desc"].str.contains("USB", case=False)), "category"] += ", USB"
product_category_df.loc[(product_category_df["name"].str.contains("adapter", case=False)) | (product_category_df["desc"].str.contains("adapter", case=False)), "category"] += ", adapter"
product_category_df.loc[(product_category_df["name"].str.contains("Nas", case=False)) | (product_category_df["desc"].str.contains("Nas", case=False)), "category"] += ", Nas (network-attached storage)"
product_category_df.loc[(product_category_df["name"].str.contains("Replacement|piece|pieces", case=False)) | (product_category_df["desc"].str.contains("Replacement|piece|pieces", case=False)), "category"] += ", pieces"
product_category_df.loc[(product_category_df["name"].str.contains("Glass|screen protector", case=False)) | (product_category_df["desc"].str.contains("Glass|screen protector", case=False)), "category"] += ", glass or screen protector"
product_category_df.loc[(product_category_df["name"].str.contains("Backpack", case=False)) | (product_category_df["desc"].str.contains("Backpack", case=False)), "category"] += ", Backpack"
product_category_df.loc[(product_category_df["name"].str.contains("Cable", case=False)) | (product_category_df["desc"].str.contains("Cable", case=False)), "category"] += ", Cable"
product_category_df.loc[(product_category_df["name"].str.contains("player", case=False)) | (product_category_df["desc"].str.contains("player", case=False)), "category"] += ", music player"
product_category_df.loc[(product_category_df["name"].str.contains("Repair", case=False)) | (product_category_df["desc"].str.contains("Repair", case=False)), "category"] += ", repair"
product_category_df.loc[(product_category_df["name"].str.contains("gloves", case=False)) | (product_category_df["desc"].str.contains("gloves", case=False)), "category"] += ", gloves"
product_category_df.loc[(product_category_df["name"].str.contains("expansion", case=False)) | (product_category_df["desc"].str.contains("expansion", case=False)), "category"] += ", expansion"
product_category_df.loc[(product_category_df["name"].str.contains("holder", case=False)) | (product_category_df["desc"].str.contains("holder", case=False)), "category"] += ", holder"
product_category_df.loc[(product_category_df["name"].str.contains("Synology", case=False)) | (product_category_df["desc"].str.contains("Synology", case=False)), "category"] += ", cloud storage"
product_category_df.loc[(product_category_df["name"].str.contains("Beats", case=False)) | (product_category_df["desc"].str.contains("Beats", case=False)), "category"] += ", Beats"
product_category_df.loc[(product_category_df["name"].str.contains("support|Bracelet|cuff", case=False)) | (product_category_df["desc"].str.contains("support|Bracelet|cuff", case=False)), "category"] += ", accessories"
product_category_df.loc[(product_category_df["name"].str.contains("Apple Watch", case=False)) | (product_category_df["desc"].str.contains("Apple Watch", case=False)), "category"] += ", Apple Watch"
product_category_df.loc[(product_category_df["name"].str.contains("charger", case=False)) | (product_category_df["desc"].str.contains("charger", case=False)), "category"] += ", charger"
product_category_df.loc[(product_category_df["name"].str.contains("storage|SSD", case=False)) | (product_category_df["desc"].str.contains("storage|SSD", case=False)), "category"] += ", storage"
product_category_df.loc[(product_category_df["name"].str.contains("Refurbished|Open|Like new", case=False)) | (product_category_df["desc"].str.contains("Refurbished|Open|Like new", case=False)), "category"] += ", second-hand/refurbished"
product_category_df.loc[(product_category_df["name"].str.contains("Moleskine", case=False)) | (product_category_df["desc"].str.contains("Moleskine", case=False)), "category"] += ", Moleskine produces"
product_category_df.loc[(product_category_df["name"].str.contains("Router", case=False)) | (product_category_df["desc"].str.contains("Router", case=False)), "category"] += ", router"
product_category_df.loc[(product_category_df["name"].str.contains("handsfree", case=False)) | (product_category_df["desc"].str.contains("handsfree", case=False)), "category"] += ", handsfree"
product_category_df.loc[(product_category_df["name"].str.contains("sensor", case=False)) | (product_category_df["desc"].str.contains("sensor", case=False)), "category"] += ", sensor"
product_category_df.loc[(product_category_df["name"].str.contains("Sleeve", case=False)) | (product_category_df["desc"].str.contains("Sleeve", case=False)), "category"] += ", Laptop Cases"
product_category_df.loc[(product_category_df["name"].str.contains("locator", case=False)) | (product_category_df["desc"].str.contains("locator", case=False)), "category"] += ", Bluetooth locator"
product_category_df.loc[(product_category_df["name"].str.contains("Trackpad", case=False)) | (product_category_df["desc"].str.contains("Trackpad", case=False)), "category"] += ", Trackpad"
product_category_df.loc[(product_category_df["name"].str.contains("Spray", case=False)) | (product_category_df["desc"].str.contains("Spray", case=False)), "category"] += ", cleaners"
product_category_df.loc[(product_category_df["name"].str.contains("Smartwatch", case=False)) | (product_category_df["desc"].str.contains("Smartwatch", case=False)), "category"] += ", Smartwatch"
product_category_df.loc[(product_category_df["name"].str.contains("Fitbit", case=False)) | (product_category_df["desc"].str.contains("Fitbit", case=False)), "category"] += ", Fitbit"
product_category_df.loc[(product_category_df["name"].str.contains("hardware|Mouse", case=False)) | (product_category_df["desc"].str.contains("hardware|Mouse", case=False)), "category"] += ", hardware"
product_category_df.loc[(product_category_df["name"].str.contains("Care|warranty", case=False)) | (product_category_df["desc"].str.contains("Care|warranty", case=False)), "category"] += ", Care|warranty"
product_category_df.loc[(product_category_df["name"].str.contains("service", case=False)) | (product_category_df["desc"].str.contains("service", case=False)), "category"] += ", service"
product_category_df.loc[(product_category_df["name"].str.contains("tools|Screwdriver", case=False)) | (product_category_df["desc"].str.contains("tools|Screwdriver", case=False)), "category"] += ", tools"
product_category_df.loc[(product_category_df["name"].str.contains("Camera", case=False)) | (product_category_df["desc"].str.contains("Camera", case=False)), "category"] += ", Camera"
product_category_df.loc[(product_category_df["name"].str.contains("games|game", case=False)) | (product_category_df["desc"].str.contains("games|game", case=False)), "category"] += ", games"
product_category_df.loc[(product_category_df["name"].str.contains("robot|Robotic", case=False)) | (product_category_df["desc"].str.contains("robot|Robotic", case=False)), "category"] += ", robot"


In [48]:
product_category_df["category"].value_counts()

category
, hard disk drive, Case                                                                                                 893
, hard disk drive                                                                                                       311
desktop, hard disk drive, desktop, RAM                                                                                  290
, hard disk drive, tablet                                                                                               265
, hard disk drive, RAM, storage                                                                                         236
, hard disk drive, USB                                                                                                  203
desktop, hard disk drive, desktop, RAM, storage                                                                         202
, hard disk drive, hard disk drive, USB                                                                                 187

In [49]:
product_category_df["category"].value_counts().count()

np.int64(776)

In [45]:
product_category_df.sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category,price_category
6455,SAN0133-A,"Open - SanDisk SSD 480GB Plus 25 ""SATA 6Gb / s",Hard SSD 480GB 25 inches,143.99,0,1298,", hard disk drive, storage, second-hand/refurbished",Medium
653,PAR0014,Parrot Flower Power Plants Wireless Sensor Blue,Plants Wireless Sensor Monitor for iPhone.,59.95,0,11905404,", hard disk drive, monitor, sensor",low


Here, we can see that **776 categories** have been created. This indicates that the multiple categories approach is not very effective because the number of categories is excessively large, and each product belongs to multiple categories. As a result, products do not fit into a single overarching category, making it better to explore an alternative method.

## 4.&nbsp; Using `type` to create categories
There is another way to create categories. We have the mysterious column `type` in the `products` table. This could potentially be ready-made categories labelled with numbers instead of words. Let's investigate.

In [50]:
category_type_df = products_cl.copy()

First, let's see how many `type`s account for most of our products?

In [65]:
n = 30
print(f"With the {n} largest types, we account for {((category_type_df.groupby('type').count().nlargest(n, 'sku')['sku'].sum()) / (category_type_df.shape[0]) * 100).round(2)}% of all products.")

With the 30 largest types, we account for 78.4% of all products.


> Looks like we can simply investigate 30 types and set the categories, then the remaining 20% of products can have the category **`other`**.

Here are the `type`s that have the most products.

In [51]:
category_type_df.groupby("type").count().nlargest(30, "sku")

Unnamed: 0_level_0,sku,name,desc,price,in_stock
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
11865403,1057,1057,1057,1057,1057
12175397,939,939,939,939,939
1298,783,783,783,783,783
11935397,562,562,562,562,562
11905404,454,454,454,454,454
1282,373,373,373,373,373
12635403,362,362,362,362,362
13835403,269,269,269,269,269
"5,74E+15",247,247,247,247,247
1364,216,216,216,216,216


Let's have a look at the first `type` to see if we can make categories from this column.

In [57]:
#1. 11865403
category_type_df.loc[category_type_df["type"] == "11865403"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type
1003,LIF0042,LifeProof Fre iPhone 6 Black Waterproof Case,waterproof and extreme conditions for iPhone 6 case.,79.99,0,11865403
4524,OTT0132,OtterBox Symmetry Alpha Glass Case + Screen Protector + Program 1 M ± or guarantee for iPhone 6 ...,Pack OtterBox Symmetry Case + Screen Protector + 1 M ± or warranty Black iPhone 6S,49.99,0,11865403


In [58]:
category_type_df.loc[category_type_df["type"] == "11865403"]['desc'].nunique()

551

Let's look at these 551 row with the same type:

In [60]:
#category_type_df.loc[category_type_df["type"] == "11865403"]["desc"].unique()

Looks like this is a category of phone cases. Thus, we assign these row to phone cases categories:

In [61]:
#1. 11865403
category_type_df.loc[category_type_df["type"] == "11865403","category"] = ", iphone case"

Let's have a look at the 2nd largest type to see if that's also a clear category.

In [63]:
#2. 12175397
category_type_df.loc[category_type_df["type"] == "12175397"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
9352,PAC2481,Synology DS218 NAS Server | 6TB (2x3TB) Seagate Iron Wolf,2-bay NAS server can accommodate 4K Ultra HD files,553.44,0,12175397,
5919,PAC1750,QNAP TS-131p | 8TB (1x8TB) WD Red,NAS 8TB capacity WD Red Hard Drive for Mac and PC,525.99,0,12175397,


Looks like this category is full of servers.

In [64]:
#2. 12175397
category_type_df.loc[category_type_df["type"] == "12175397",'category'] = ", NAS Server / Network"

In [66]:
#3. 1298
category_type_df.loc[category_type_df["type"] == "1298"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
6199,SAT0018-A,Open - Satechi Aluminum Rose Gold iPhone Support,Stand with aluminum finish Lightning access to cable and non-slip surface for iPhone and iPod,34.99,0,1298,
3300,IFX0013-A,(Open) iFixit repair kit complete repair iPhone,Repair Kit tools for iPhone.,19.99,0,1298,


In [67]:
#3. 1298
category_type_df.loc[category_type_df["type"] == "1298" , 'category'] = ", open"

In [68]:
#4. 11935397
category_type_df.loc[category_type_df["type"] == "11935397"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
9658,GTE0131,G-Technology G-SPEED XL 96TB RAID Thunderbolt Shuttle 3,External Storage 96TB and Thunderbolt 3 Connection for Mac and PC,10212.99,0,11935397,
5900,WDT0377,WD My Passport 2TB 25 PRO Wireless Wifi USB 3.0 Hard Drive,2TB external hard drive with built-in WiFi SD slot and USB 3.0 for Mac and PC,229.99,1,11935397,


In [69]:
#4. 11935397
category_type_df.loc[category_type_df["type"] == "11935397",'category'] = ', Hard drive'

In [70]:
#5. 11905404
category_type_df.loc[category_type_df["type"] == "11905404"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
3436,IKM0049,IK Multimedia iRig Keys 37 PRO USB MIDI controller for Mac iPhone iPad iPod,USB MIDI keyboard 37 keys for Mac iPhone iPad iPod.,121.99,1,11905404,
5534,PAR0068,Mando Parrot FLYPAD,Mando compatible with minidrones includes support for iPhone,39.0,0,11905404,


In [71]:
#5. 11905404
category_type_df.loc[category_type_df["type"] == "11905404",'category'] = ', bluetooth device'

In [72]:
#6. 1282
category_type_df.loc[category_type_df["type"] == "1282"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
2737,APP1198,"Apple iMac 27 ""Core i5 3.2GHz Retina 5K | 8GB | 1TB HDD",IMac desktop computer 27 inch 5K Retina 8GB RAM 1TB HDD (MK462Y / A).,2129.0,0,1282,
1686,APP0958,"Apple MacBook Pro Retina 13 ""i5 27 Ghz | 8GB RAM | 128GB Flash",New MacBook Pro 13-inch Retina screen i5 128GB RAM 8GB Flash 27GHz (MF839Y / A).,1449.0,0,1282,


In [73]:
##6. 1282
category_type_df.loc[category_type_df["type"] == "1282",'category'] = ', Apple imac'

In [74]:
#7. 12635403
category_type_df.loc[category_type_df["type"] == "12635403"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
4066,APP1537,"Apple iPad Silicone Case Pro 97 ""Menta",Silicone light cover and soft touch for iPad Pro 97-inch.,79.0,0,12635403,
6764,APP2015,Apple iPad Smart Cover Case Blue Night,smart cover with different positions for iPad (2017 Edition),45.0,0,12635403,


In [75]:
#7. 12635403
category_type_df.loc[category_type_df["type"] == "12635403",'category'] = ', iPad case'

In [76]:
#8. 13835403
category_type_df.loc[category_type_df["type"] == "13835403"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
4039,MOS0185,ClearGuard Moshi Magic Keyboard Keyboard Protector Transparent,Transparent keyboard protector Keyboard Magic.,29.99,0,13835403,
4562,SPE0175,"Speck SeeThru Case Macbook Pro 13 ""Blue Calypso",Protective polycarbonate shell for MacBook Pro 13-inch,49.9,0,13835403,


In [77]:
#8. 13835403
category_type_df.loc[category_type_df["type"] == "13835403",'category'] = ', macbook case'

In [82]:
#9. 5,74E+15
category_type_df.loc[category_type_df["type"] == "5,74E+15"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
2840,PAC0978,"Apple iMac 27 ""Core i5 3.2GHz Retina 5K | 32GB | 512GB Flash | AMD Radeon R9 M390",IMac desktop computer 27 inch 5K Retina i5 3.2GHz 512GB Flash RAM 32GB and AMD Radeon R9 M390 (M...,3409.0,0,"5,74E+15",
7319,PAC2126,"Apple iMac 27 ""Core i5 3.8GHz Retina 5K | 32GB | 1TB SSD",IMac desktop computer 27 inch Retina 5K RAM 32GB SSD 1TB PCle,4039.0,0,"5,74E+15",


In [83]:
#9. 5,74E+15
category_type_df.loc[category_type_df["type"] == "5,74E+15",'category'] = ', Apple iMac 27'

In [85]:
#10. 1364
category_type_df.loc[category_type_df["type"] == "1364"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
1893,FCM0007-4,Mac memory FCM 32GB (4x8GB) SO-DIMM DDR3 1333MHz,RAM 32GB (4x8GB) iMac (2010/11).,275.96,0,1364,
7936,OWC0191-4,Mac OWC 128GB memory (4x32GB) 1333MHz DIMM,128GB RAM (4x32GB) Mac Pro 2010-2012.,1383.96,0,1364,


In [86]:
#10. 1364
category_type_df.loc[category_type_df["type"] == "1364",'category'] = ', mac RAM'

In [87]:
#11. 12585395
category_type_df.loc[category_type_df["type"] == "12585395"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
9109,SAT0070,Satechi USB adapter to HDMI 4K-C 60Hz Plata,USB-C adapter with HDMI connection for Mac and PC,34.99,1,12585395,
2123,SNN0048,Sonnet Thunderbolt Adapter eSATA + USB 3.0,Thunderbolt adapter with USB 3.0 and eSATA for Mac and PC port.,120.0,0,12585395,


In [88]:
#11. 12585395
category_type_df.loc[category_type_df["type"] == "12585395",'category'] = ', adapter'

In [91]:
#12. 1296
category_type_df.loc[category_type_df["type"] == "1296"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
5911,LGE0053,"34UC79G-B LG Monitor 34 ""UHD USB 3.0 HDMI 1ms 144Hz",Monitor 34 for the game 1ms response time 144Hz frequency for Mac and PC,599.0,0,1296,
3837,LGE0037,"24MP58VQ-W Monitor LG 24 ""5ms HDMI White",24-inch FHD IPS Monitor HDMI connection 5ms response time thin frame.,159.0,0,1296,


In [92]:
#12. 1296
category_type_df.loc[category_type_df["type"] == "1296",'category'] = ', monitor'

In [93]:
#13 1325
category_type_df.loc[category_type_df["type"] == "1325"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
124,KAN0009,Kanex ATV Pro HDMI to VGA adapter,HDMI to VGA adapter that supports audio.,59.95,0,1325,
2110,OWC0150,OWC Thunderbolt Cable 2 10m Black,Thunderbolt Cable 2 10m,294.99,0,1325,


In [94]:
#13 1325
category_type_df.loc[category_type_df["type"] == "1325",'category'] = ', Cable'

In [95]:
#14 5384
category_type_df.loc[category_type_df["type"] == "5384"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
855,JAB0037,Classic Jabra Bluetooth Headset White,Bluetooth hands-free headset for iPhone.,29.95,0,5384,
2363,JAB0044,Jabra Sport Coach Wireless Headset for iPhone and iPod Amarillo,Headphones and Dolby intelligent voice system for iPhone and iPod.,119.95,0,5384,


In [96]:
#14 5384
category_type_df.loc[category_type_df["type"] == "5384",'category'] = ', Headphone'

In [98]:
#15 1433
category_type_df.loc[category_type_df["type"] == "1433"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
4901,PAC1503,"Crucial MX300 Pack + 1050GB Installation Kit iMac 27 ""2012-2015",1050GB SSD MX300 + toolkit for 27-inch iMac 2012-2015,373.32,1,1433,
4897,PAC1499,"Samsung SSD 850 expansion kit EVO 1TB iMac 27 ""2012-2015",SSD upgrade kit 1TB iMac 27-inch Late 2012 Late 2015 tools,469.97,1,1433,


In [99]:
#15 1433
category_type_df.loc[category_type_df["type"] == "1433",'category'] = ', SSD expansion kit'

In [101]:
#16 12215397
category_type_df.loc[category_type_df["type"] == "12215397"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
2186,OWC0156,Aura OWC 960GB 6G SSD iMac 2012,960GB SSD hard drive for iMac131 and iMac132.,469.99,0,12215397,
1482,OWC0142,OWC Aura Pro Express 6G - 240GB SSD MacBook Air 2012,240GB SSD hard drive for MacBook Air 2012.,205.99,0,12215397,


In [102]:
#16 12215397
category_type_df.loc[category_type_df["type"] == "12215397",'category'] = ', SSD'

In [104]:
#17 5398
category_type_df.loc[category_type_df["type"] == "5398"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
2079,BOS0021,Bose SoundLink Color Speaker Mint,Bluetooth wireless speaker for iPhone iPad and iPod.,139.0,0,5398,
4747,JBL0112,Charge 3 JBL Bluetooth Portable Speaker Black,Bluetooth portable speaker waterproof for iPhone iPad and iPod.,169.99,1,5398,


In [105]:
#17 5398
category_type_df.loc[category_type_df["type"] == "5398",'category'] = ', Speaker'

In [122]:
#18 1,02E+12
category_type_df.loc[category_type_df["type"] == "1,02E+12"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
7372,APP2292,"Apple MacBook Pro 13 ""Core i5 Touch Bar 33GHz | 8GB | 256GB SSD Silver",New MacBook Pro 13 inch Touch Bar 33 GHz Core i5 with 8GB of RAM and 256GB PCIe SSD,2119.0,0,"1,02E+12",
7367,APP2288,"Apple MacBook Pro 13 ""Core i7 Touch Bar 35GHz | 8GB | 1TB SSD Silver",New MacBook Pro 13 inch Touch Bar 35 GHz Core i7 with 8GB of RAM and 1TB PCIe SSD,3109.0,0,"1,02E+12",


In [123]:
#18 1,02E+12
category_type_df.loc[category_type_df["type"] == "1,02E+12",'category'] = ', Apple MacBook Pro 13/15'

In [124]:
#19 1,44E+11
category_type_df.loc[category_type_df["type"] == "1,44E+11"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
6561,REP0360,LCD screen repair iPad Air 2,Repair service including parts and labor for iPad Air 2,299.99,0,"1,44E+11",
240,REP0099,iPad 3 front camera repair,Repair service including parts and labor for iPad 3,69.99,0,"1,44E+11",


In [125]:
#19 1,44E+11
category_type_df.loc[category_type_df["type"] == "1,44E+11",'category'] = ', service'

In [129]:
#20 57445397
category_type_df.loc[category_type_df["type"] == "57445397"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
3733,LEX0029,Lexar jumpdrive M20C PenDrive USB-C / USB 3.0 16GB,USB flash drive-C reversible USB 3.0 16GB for Mac and PC.,20.99,0,57445397,
3431,KIN0148,Kingston SDXC Memory Card UHS Class 3 | 128 GB,SDXC Memory Card UHS Class 3 128GB with speeds of 90MB / 80MB,59.99,0,57445397,


In [130]:
#20 57445397
category_type_df.loc[category_type_df["type"] == "57445397",'category'] = ', Memory Card'

In [134]:
#21 1334
category_type_df.loc[category_type_df["type"] == "1334"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
478,DLK0044,D-Link DIR-860L Wireless AC1200 Dual-Band Cloud Router,Wireless Router DLink Cloud with AC SmartBeam and 867 Mbps transfer rate.,129.99,0,1334,
3522,DLK0118,D-Link DWA-182 USB Adapter Wi-Fi AC1200 Dual-Band,Wi-Fi USB adapter AC 5G dual-band for Mac and PC.,59.99,0,1334,


In [135]:
#21 1334
category_type_df.loc[category_type_df["type"] == "1334",'category'] = ', Router'

In [142]:
#22 2158
category_type_df.loc[category_type_df["type"] == "2158"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
5997,APP1958,"Apple MacBook Pro 13 ""Core i5 2GHz | RAM 16GB | 512GB PCIe SSD Gray Space",MacBook Pro 13 inch i5 2GHz RAM 16GB SSD and 512GB PCIe (MLL42Y / A),2179.0,0,2158,
5645,APP1881,"Apple MacBook Pro 15 ""Core i7 Touch Bar 26GHz | 16GB RAM | 1TB PCIe SSD | 460 4GB Radeon Pro Spa...",New MacBook Pro 15-inch Touch Bar to 26GHz Core i7 with 16GB of RAM and 1TB PCIe SSD (MLH32Y / A),3659.0,0,2158,


In [143]:
#22 2158
category_type_df.loc[category_type_df["type"] == "2158",'category'] = ', Apple MacBook Pro'

In [144]:
#23. 2449
category_type_df.loc[category_type_df["type"] == "2449"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
3323,CYG0073,Cygnett Luxband Apple Watch Strap Black 42mm,Leather strap for easy installation Apple Watch 42mm.,49.99,0,2449,
3538,BAN0008,Panama Band & Strap Watch Strap 42mm Blue Apple,Leather strap for easy installation Apple Watch 42mm.,59.0,0,2449,


In [145]:
#23. 2449
category_type_df.loc[category_type_df["type"] == "2449",'category'] = ', Apple Watch Strap'

In [148]:
#24 12655397
category_type_df.loc[category_type_df["type"] == "12655397"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
3944,WDT0316,"Blue WD Hard Drive 1TB 35 ""Mac and PC",Western Digital Internal Hard Drive 1TB Sata 6GBs 35 inches for Mac and PC.,65.0,1,12655397,
9951,WDT0417,"WD Hard Drive 6TB Gold 35 ""Servers",Hard Western Digital 6TB 35 inches SATA 6 Gb / s for servers and enterprise storage systems,329.0,0,12655397,


In [149]:
#24 12655397
category_type_df.loc[category_type_df["type"] == "12655397",'category'] = ', Hard Drive'

In [150]:
#25 1229
category_type_df.loc[category_type_df["type"] == "1229"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
9895,ADN0060,Adonith Stylus Pro Pointer Gray Pixel Space,Bluetooth digital pen tip 19mm Pro for iPad,84.99,1,1229,
76,WAC0046,Intuos Wacom stylus ArtPen 5/4,special stylus Mac and PC for Intuos4 graphics tablet / 5.,109.9,0,1229,


In [151]:
#25 1229
category_type_df.loc[category_type_df["type"] == "1229",'category'] = ', pointer'

In [152]:
#26 12995397
category_type_df.loc[category_type_df["type"] == "12995397"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
8322,OWC0234,OWC USB Travel Dock-C / USB 3.1 / HDMI / SD Card Gold,Dock USB-C multiple connections and SD reader Macbook.,66.99,0,12995397,
9203,SNN0069,Sonnet eGFX Breakaway Puck Radeon RX560 Thunderbolt 3,Box Portable expansion graphics cards with Radeon RX560 included,699.99,0,12995397,


In [153]:
#26 12995397
category_type_df.loc[category_type_df["type"] == "12995397",'category'] = ', Thunderbolt/ports'

In [154]:
#27 1515
category_type_df.loc[category_type_df["type"] == "1515"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
6015,XOO0011,Xoopar SQUID MAX External Battery 2500mAh Gray,Lightning external battery cable and grip suction cups for iPhone,24.99,0,1515,
9207,HTE0021,Hyper Pearl 1600mAh battery Mini USB Mirror and Ice Cream,Mini Portable USB Mirror + 1600mAh (2.1A) battery cable for iPhone,24.99,1,1515,


In [155]:
#27 1515
category_type_df.loc[category_type_df["type"] == "1515",'category'] = ', battery'

In [156]:
#28 13615399
category_type_df.loc[category_type_df["type"] == "13615399"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
4119,PAC1395,MIXIT Pack Belkin Car Charger and house 24A Gray,Metallic car charger + wall charger 24A for iPad iPhone and iPad.,34.98,1,13615399,
9301,TWS0124,Twelve South HiRise based load Duet,Stand with Lightning connector and charger for Apple iPhone Watch,129.99,1,13615399,


In [157]:
#28 13615399
category_type_df.loc[category_type_df["type"] == "13615399",'category'] = ', Charger'

In [158]:
#29 13555403
category_type_df.loc[category_type_df["type"] == "13555403"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
300,MOS0086,Moshi iVisor XT protector iPhone / 5s / 5c / 5 black,IPhone Protector SE / 5s / 5 anti glare.,24.95,0,13555403,
5447,ZAG0029,Zagg Invisible Shield Glass Screen Protector iPhone 8/7 Plus,Protector Ultra soft edges beveled tempered glass for iPhone 8 Plus or 7-Plus,19.99,0,13555403,


In [159]:
#29 13555403
category_type_df.loc[category_type_df["type"] == "13555403",'category'] = ', Screen protector'

In [160]:
#30 1405
category_type_df.loc[category_type_df["type"] == "1405"].sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
6260,WAC0225,Education - Wacom Intuos Pro L South,Large graphics tablet includes Bluetooth Intuos Pro Pen Pointer 2 (8192 levels of pressure) for ...,529.9,0,1405,
471,WAC0133,Education - Wacom Intuos Pro M Graphics Tablet,Exclusive discount for students and teachers.,349.99,0,1405,


In [161]:
#30 1405
category_type_df.loc[category_type_df["type"] == "1405",'category'] = ', Graphic tablet/pen'

In [162]:
category_type_df.loc[category_type_df["category"] == "", 'category'] = 'other'

In [163]:
category_type_df.groupby("type").count().nlargest(30, "sku")

Unnamed: 0_level_0,sku,name,desc,price,in_stock,category
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
11865403,1057,1057,1057,1057,1057,1057
12175397,939,939,939,939,939,939
1298,783,783,783,783,783,783
11935397,562,562,562,562,562,562
11905404,454,454,454,454,454,454
1282,373,373,373,373,373,373
12635403,362,362,362,362,362,362
13835403,269,269,269,269,269,269
"5,74E+15",247,247,247,247,247,247
1364,216,216,216,216,216,216


In [165]:
category_type_df.sample(5)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
2598,NTE0100,NewerTech Guardian Maximus RAID mini FW800 / eSATA / USB,Mini Raid system with FW800 / eSATA / USB connection for Mac and PC.,120.99,0,11935397,", Hard drive"
6812,HOC0008,Nike hoco Series Apple Watch Strap 42mm Black / Gray,Silicone Strap Sports Watch 42mm Apple,45.0,0,2449,", Apple Watch Strap"
7636,PAC1941,Apple Mac Pro 35GHz 6 cores | 32GB RAM | 256GB PCIe SSD,New Mac Pro with 32GB RAM and two 6-core GPU 35GHz AMD FirePro D500 (MD878Y / A),3935.59,0,21632158,
5556,IKM0062,IK Multimedia iRig Guitar Interface 96kHz HD 2,Guitar interface for recording high quality audio 24bit / 96kHz compatible headphone amplifier p...,121.99,1,11905404,", bluetooth device"
4997,BEL0265,InvisiGlass Belkin Screen Protector Apple Watch 42mm,Tempered glass protector with ultrafine thickness 02 mm and FluidFlex technology for Apple Watch,19.99,1,2425,


After creating categories we can write all of our category creation code in a single function. It will help us keep things organized.

## 5.&nbsp; Writing a function to create categories

In [166]:
product_category_type_df = products_cl.copy()

We use a dictionary to map each type to an already defined category:

In [167]:
type_dict = {
    '5,74E+15':'Apple iMac 27',
    '1,02E+12':'Apple Macbook Pro 13/15',
    '1282':'Apple iMac',
    '11935397':'Hard drive',
    '1296':'Monitor',
    '12175397':'NAS Server / Network',
    '2,17E+11':'(Open) Apple Macbook Pro 13/ 15',
    '12215397':'SSD',
    '1405':'graphic tablet',
    '2,16E+11':'Apple iMac 21.5',
    '1364':'mac RAM',
    '12655397':'Mac PC Hard Drive',
    '2158':'Apple MacBook Pro',
    '51601716':'(Open) Apple iPhone',
    '11905404':'bluetooth device',
    '113281716':'Apple iPhone 8 Plus',
    '5384':'Headphones',
    '113291716':'Apple iPhone 8',
    '85641716':'Apple iPhone 7',
    '85651716':'Apple iPhone 7 Plus',
    '106431714':'Apple iPad 64gb',
    '1433':'SSD',
    '24895185':'Apple Watch',
    '1298':'Open',
    '5398':'speaker',
    '1714':'iPad',
    '113271716':'Apple iPhone X',
    '21561716':'Apple iPhone 6',
    '12585395':'Adapter',
    '11865403':'iPhone case',
    '1231':'Service',
    '21632158':'mac',
    '5,39E+11':'macbook',
    '12995397':'thunderbolt',
    '10142':'battery',
    '13855401':'keyboard',
    '51861714':'iPad pro 256gb',
    '24821716':'iphone 6s',
    '118692158':'imac',
    '13005399':'macbook charger',
    '24885185':'apple watch',
    '1229':'pointer',
    '42945397':'lightning usb',
    '12635403':'iPad case',
    '51871714':'ipad',
    '1334':'router',
    '57445397':'memory card',
    '1325':'cable',
    '1387':'mouse',
    '13835403':'macbook case',
    '2449':'Apple Watch Strap'
}
product_category_type_df['category'] = product_category_type_df['type'].map(type_dict)
product_category_type_df.loc[product_category_type_df['category'].isna(),'category'] = 'other'

In [168]:
product_category_type_df.sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category
1624,RAI0018,"Rain Design Mbase Support for iMac 27 """,Minimalist support lifting drawer iMac 27,84.99,1,8696,other
6787,WAC0240,Education - Wacom Bamboo A5 Slate Gray,Bloc notes A5 size smart app includes pen button to save your notes,129.99,0,1405,graphic tablet


In [169]:
product_category_type_df['category'].value_counts(dropna=False)

category
other                              1711
iPhone case                        1057
NAS Server / Network                939
Open                                783
Hard drive                          562
bluetooth device                    454
Apple iMac                          373
iPad case                           362
SSD                                 341
macbook case                        269
Apple iMac 27                       247
mac RAM                             216
Adapter                             190
Monitor                             187
cable                               183
Headphones                          178
speaker                             159
Apple Macbook Pro 13/15             130
memory card                         129
router                              115
Apple Watch Strap                   107
Apple MacBook Pro                   107
Mac PC Hard Drive                   105
pointer                             104
thunderbolt                    

We can also be more specific and define higher-level categories:

In [170]:
# Create a dictionary to map product types to a higher-level category
category_mapping = {
    '5,74E+15': 'Apple iMac',
    '1,02E+12': 'Apple Macbook',
    '1282': 'Apple iMac',
    '11935397': 'Memory Storage',
    '1296': 'Monitor',
    '12175397': 'Network',
    '2,17E+11': 'Apple Macbook',
    '12215397': 'Memory Storage',
    '1405': 'Graphics Tablet',
    '2,16E+11': 'Apple iMac',
    '1364': 'Memory Storage',
    '12655397': 'Memory Storage',
    '2158': 'Apple Macbook Pro',
    '51601716': 'Second hand',
    '11905404': 'Accessories',
    '113281716': 'Apple iPhone',
    '5384': 'Headphones',
    '113291716': 'Apple iPhone',
    '85641716': 'Apple iPhone',
    '85651716': 'Apple iPhone',
    '106431714': 'Apple iPad',
    '1433': 'Memory Storage',
    '24895185': 'Apple Watch',
    '1298': 'Second hand',
    '5398': 'Speakers',
    '1714': 'Apple iPad',
    '113271716': 'Apple iPhone',
    '21561716': 'Apple iPhone',
    '12585395': 'Accessories',
    '11865403': 'Accessories',
    '1231': 'Service',
    '21632158': 'Apple Macbook',
    '5,39E+11': 'Apple Macbook',
    '12995397': 'Accessories',
    '10142': 'Accessories',
    '13855401': 'Accessories',
    '51861714': 'Apple iPad',
    '24821716': 'Apple iPhone',
    '118692158': 'Apple iMac',
    '13005399': 'Accessories',
    '24885185': 'Apple Watch',
    '1229': 'Accessories',
    '42945397': 'Accessories',
    '12635403': 'Accessories',
    '51871714': 'Apple iPad',
    '1334': 'Network',
    '57445397': 'Memory Storage',
    '1325': 'Accessories',
    '1387': 'Accessories',
    '13835403': 'Accessories',
    '2449': 'Accessories'
}

# Apply the mapping to create the new 'category_level' column
product_category_type_df['category_level'] = product_category_type_df['type'].map(category_mapping)

# Fill missing values with 'Other'
product_category_type_df['category_level'] = product_category_type_df['category_level'].fillna('Other')

In [171]:
product_category_type_df.sample(2)

Unnamed: 0,sku,name,desc,price,in_stock,type,category,category_level
6793,OTT0137-A,Open - Otterbox Clearly Protected gel Case iPhone 6 / 6S Transparent,transparent cover and a flexible piece TPU for iPhone 6 and iPhone 6s,19.99,0,1298,Open,Second hand
864,GTE0039,G-Tech G-Speed ​​Studio 16TB Disk RAID Thunderbolt 2,Disk RAID Technology G-4 bay thunderbolt2 transfer rate up 700åÊMB / s.,2600.88,0,11935397,Hard drive,Memory Storage


## 6.&nbsp; Save the categorized products DataFrame
Do not forget to download your categorized products DataFrames

In [172]:
#files.download("products_cl.csv")
product_category_type_df.to_csv("product_category_qu.csv", index=False)