## Product Mapping v2
### Anthony Ung

#### Some Jupyter things you need to be aware of ...
####
#### As long as you run the cells in the correct order, the mapping of the products table is idempotent.
####
#### If you want to run an individual cell, you need to restart the kernel.
#### Go to "Kernel" > "Restart Kernel and Run up to Selected Cell..."
#### Then you can use one of the `DEBUG` methods to dump the state of the product arrays at the time that cell was executed.

In [1]:
import csv
import re

In [2]:
products_old = []
PRODUCTS_MAPPED = []
PRODUCT_CLASSES_NEW = []

# Read the product and product classes files.
with open('Products1.txt', 'r', encoding='utf-8', errors='replace') as csvfile:    
    csv.register_dialect('piper', delimiter='|', quoting=csv.QUOTE_NONE)
    for row in csv.DictReader(csvfile, dialect='piper'):
        row['Size'] = re.sub(r'[^\x00-\x7F]',' ', row['Size'])
        products_old.append(row)

        
with open('product_class.txt', 'r', encoding='utf-8', errors='replace') as csvfile:
    csv.register_dialect('tab', delimiter='\t', quoting=csv.QUOTE_NONE)
    for row in csv.DictReader(csvfile, dialect='tab'):
        PRODUCT_CLASSES_NEW.append(row)


In [3]:
class DEBUG:
    def print_product_classes():
        print("product_class_id|product_subcategory|product_category|product_department|product_family")
        for product in PRODUCT_CLASSES_NEW:
            print(f"{product['product_class_id']}|{product['product_subcategory']}|{product['product_category']}|{product['product_department']}|{product['product_family']}")

    def print_array(product_arr):
        for product in product_arr:
            print(product)

    def print_product(product):
        print(f"{product['Manufacturer']}|{product['Product Name']}")
    
    def product_dump(product_arr, file_name='products_to_be_mapped.csv'):
        with open(file_name, 'w', newline='') as csvfile:
            writer = csv.DictWriter(csvfile, fieldnames=product_arr[0].keys())

            writer.writeheader()
            for product in product_arr:
                writer.writerow(product)
    

### A utility function that invokes some ETL code on our behalf

The convention:  
- `func` - Contains ETL code to be invoked on our behalf.
- `src` - The source array
- `dst1` - The destination array for products successfully mapped
- `dst2` - The destination array for products not successfully mapped.

When creating a definition for `func`, the names `src`, `dst1`, and `dst2` have no meaning to the caller.

Each updated product needs to have the following fields:
- `product_class_id` - The code of the new product class
- `meta_code` - A unique ID.
- `meta_mapped_by` - The initials of the person who mapped the product (eg. AU, SJ, GK, AB, NB, etc.)
- `meta_reason` - The reason why this product was mapped (e.g. from a character match, from a specific manufacturer, etc.)

In [4]:
def pipeline(func, src, dst1, dst2):
    func(src, dst1, dst2)

In [5]:
product_classes_dict = {}

with open('product_class.txt', 'r', encoding='utf-8', errors='replace') as csvfile:
    csv.register_dialect('tab', delimiter='\t', quoting=csv.QUOTE_NONE)
    for row in csv.DictReader(csvfile, dialect='tab'):
        product_classes_dict[row['product_class_id']] = row

In [6]:
def update_product(product, product_class_id, code, mapped_by, reason):
    product['product_class_id'] = product_class_id
    product['product_subcategory'] = product_classes_dict[str(product_class_id)]['product_subcategory']
    product['product_category'] = product_classes_dict[str(product_class_id)]['product_subcategory']
    product['product_department'] = product_classes_dict[str(product_class_id)]['product_department']
    product['product_family'] = product_classes_dict[str(product_class_id)]['product_family']
    product['meta_code'] = code
    product['meta_mapped_by'] = mapped_by
    product['meta_reason'] = reason

#### Slide 9 stipulates that every product must have a key that will be mapped to our dimension table.

In [7]:
def generate_surrogate_key(src, dst1=None, dst2=None):
    product_id = 1

    for product in src:
        product['product_id'] = product_id
        product_id += 1

generate_surrogate_key(products_old)


### Slide 17 stipulates that we have specific suppliers.

In [8]:
def generate_suppliers(src, dst1=None, dst2=None):
    for product in src:
        if product['itemType'] == 'Milk':
            product['Supplier'] = 'Rowan Dairy'
        else:
            product['Supplier'] = 'Rowan Warehouse'
            
generate_suppliers(products_old)


### Some useful conventions in this cell:

Array names in all caps indicate that either (1) this array shall only be appended to, or (2) this array should not be modified at all.
`PRODUCTS_MAPPED` is Type 1. `PRODUCT_CLASSES_NEW` is Type 2.

In [9]:
def natural_mapping(src, dst1, dst2):
    '''
        Disallow duplicate product classes
        Used the following linux command to identify duplicates
            cat product_class.txt | cut -f 2 | sort | uniq -c | sort -r | head
    ''';
    product_subcategories = {}
    for subcategory in PRODUCT_CLASSES_NEW:
        if((subcategory['product_subcategory'] != 'Coffee') \
           and (subcategory['product_subcategory'] != 'Cleaners')):

            product_subcategories[subcategory['product_subcategory']] = subcategory ['product_class_id']

    '''
        Resolve a duplicate and verified by hand to use the smaller of the two
    '''
    product_subcategories['Fresh Vegetables'] = 13

    for product in src:
        if product['itemType'] in product_subcategories.keys():
            update_product( \
                product=product, \
                product_class_id = product_subcategories[product['itemType']], \
                code = 1, \
                mapped_by = 'AU', \
                reason = 'Mapped from old item type into new subcategory')
            dst1.append(product)
        else:
            dst2.append(product)
        
Products_To_Be_Mapped = []
pipeline(natural_mapping, products_old, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Coffee

Style update to Sean's code:  
Each function and each cell should be responsible for one and only one thing.  
Meta codes should increase to the next multiple of 10 + 1 so that we can identify which cell the mapping occurred in.

In [10]:
def map_coffee(src, dst1, dst2):
    seans_set = {'O Organics'}
    seans_set.add('Safeway Kitchens')
    seans_set.add('Folgers')

    anthonys_set = {'Maxwell House'}
    anthonys_set.add('Peets')
    anthonys_set.add('Seattles Best')
    anthonys_set.add('Safeway Kitchens')
    anthonys_set.add('Gevalia Kaffe')

    for product in src:
        if product['Manufacturer'] in seans_set:
            update_product( \
                product=product, \
                product_class_id = 7, \
                code = 11, \
                mapped_by = 'SJ', \
                reason = 'These manufacturers only produce coffee.')
            dst1.append(product)
            continue
            
        if product['Manufacturer'] in anthonys_set:
            
            update_product( \
                product=product, \
                product_class_id = 7, \
                code = 12, \
                mapped_by = 'AU', \
                reason = 'This manufacturer only makes coffee.')
            dst1.append(product)
            continue

        if ((product['Manufacturer'] == 'Starbucks')
            and product['Product Name'][0:6] == 'Coffee'):
            
            update_product( \
                product=product, \
                product_class_id = 7, \
                code = 13, \
                mapped_by = 'SJ', \
                
                reason = 'Starbucks produces coffee but some items require special treatment')
            dst1.append(product)
            continue

        if (product['Manufacturer'] == 'Dunkin Donuts'):
            if ((re.search('Jelly', product['Product Name'])) is not None):
                update_product( \
                    product=product, \
                    product_class_id = 7, \
                    code = 14, \
                    mapped_by = 'AU', \
                    reason = 'Dunkin produces coffee but some items require special treatment')
                dst1.append(product)
            else:
                update_product( \
                    product=product, \
                    product_class_id = 31, \
                    code = 15, \
                    mapped_by = 'AU', \
                    reason = 'Coffee Jelly is a special case for Dunkin Donuts. Cannot character match this.')
                dst1.append(product)
            continue
        
        dst2.append(product)

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_coffee, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Frito-Lay is a huge manufacturer but there are edge cases

In [11]:
def map_frito_lay(src, dst1, dst2):
    frito_lay_products = []
    
    for product in src:
        if (product['Manufacturer'] == 'Frito Lay'):
            frito_lay_products.append(product)
        else:
            dst2.append(product)

    for product in frito_lay_products:
        if (('Doritos' in product['Product Name'])
            or ('Ruffles' in product['Product Name'])):
            
            update_product( \
                product=product, \
                product_class_id = 12, \
                code = 21, \
                mapped_by = 'SJ', \
                reason = 'All Frito Lay items are chips, and all crisps are chips')
            dst1.append(product)
            continue

        if((re.search('Dip ', product['Product Name'])
            or (re.search('Salsa', product['Product Name'])))):
            
            update_product( \
                product=product, \
                product_class_id = 83, \
                code = 21, \
                mapped_by = 'AU', \
                reason = 'Special edge case with Frito Lay products.')
            dst1.append(product)
            continue

        else:
            update_product( \
                product=product, \
                product_class_id = 12, \
                code = 21, \
                mapped_by = 'SJ', \
                reason = 'All Frito Lay items are chips, and all crisps are chips')
            dst1.append(product)

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_frito_lay, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Pringles is a manufacturer with no edge cases

In [12]:
def map_pringles(src, dst1, dst2):    
    for product in src:
        if (product['Manufacturer'] == 'Pringles'):
            update_product( \
                product=product, \
                product_class_id = 12, \
                code = 31, \
                mapped_by = 'SJ', \
                reason = 'All Pringles items are chips')
            dst1.append(product)
        else:
            dst2.append(product)

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_pringles, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Nabisco is a huge manufacturer too but their products may be mapped to 2 categories

In [13]:
def map_nabisco(src, dst1, dst2):    
    nabisco_products = []
    
    for product in src:
        if (product['Manufacturer'] == 'Nabisco'):
            nabisco_products.append(product)
        else:
            dst2.append(product)

    for product in nabisco_products:
        if (product['Product Name'][0:10] == 'Chips Ahoy'):      
            update_product( \
                product=product, \
                product_class_id = 45, \
                code = 41, \
                mapped_by = 'AU', \
                reason = 'Chips Ahoy are Cookies')
            dst1.append(product)
        else:
            update_product( \
                product=product, \
                product_class_id = 82, \
                code = 42, \
                mapped_by = 'AU', \
                reason = 'Wheat Thins are Crackers')
            dst1.append(product)
        

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_nabisco, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### We are assuming that English Muffins are similar to Bagels

In [14]:
def map_thomas(src, dst1, dst2):    

    for product in src:
        if (product['Manufacturer'] == 'Thomas'):
            update_product( \
                product=product, \
                product_class_id = 25, \
                code = 51, \
                mapped_by = 'AU', \
                reason = 'All Thomas Products are Bagels')
            dst1.append(product)
        else:
            dst2.append(product)


temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_thomas, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Kelloggs and Pepperige Farm are problematic from a manufacturer-first approach

In [15]:
def map_pepperidge_farm(src, dst1, dst2):
    pepperidge_products = []

    for product in src:
        if (product['Manufacturer'] == 'Pepperidge Farm'):
            pepperidge_products.append(product)
        else:
            dst2.append(product)

    for product in pepperidge_products:
        if product['Product Name'][0:8] == 'Goldfish':
            update_product( \
                product=product, \
                product_class_id = 82, \
                code = 61, \
                mapped_by = 'SJ', \
                reason = 'Goldfish cracker character match')
            dst1.append(product)
            continue

        elif 'Bagel' in product['Product Name']:
            update_product( \
                product=product, \
                product_class_id = 25, \
                code = 62, \
                mapped_by = 'AU', \
                reason = 'Character match with Bagel')
            dst1.append(product)
            continue

        elif 'Rye' in product['Product Name']:
            update_product( \
                product=product, \
                product_class_id = 27, \
                code = 63, \
                mapped_by = 'AU', \
                reason = 'Character match with Bagel')
            dst1.append(product)
            continue

        elif (('Bread' in product['Product Name'])
            and not 'Stuffing' in product['Product Name']):
            update_product( \
                product=product, \
                product_class_id = 27, \
                code = 64, \
                mapped_by = 'AU', \
                reason = 'From a character match with Bread')
            dst1.append(product)
            continue
        
        else:
            dst2.append(product)
            
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_pepperidge_farm, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

In [16]:
def map_kelloggs_v1(src, dst1, dst2):
    kelloggs_products = []

    for product in src:
        if (product['Manufacturer'] == 'Kelloggs'):
            kelloggs_products.append(product)
        else:
            dst2.append(product)

    for product in kelloggs_products:
        if 'Waffles' in product['Product Name'] or 'Wafflers' in product['Product Name']:
            update_product( \
                product=product, \
                product_class_id = 48, \
                code = 71, \
                mapped_by = 'SJ', \
                reason = 'character match for waffles')
            
            dst1.append(product)
        else:
            dst2.append(product)

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_kelloggs_v1, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Barilla is easy to map

In [17]:
def map_barilla(src, dst1, dst2):
    barilla_products = []

    for product in src:
        if (product['Manufacturer'] == 'Barilla'):
            barilla_products.append(product)
        else:
            dst2.append(product)

    for product in barilla_products:
        if 'Sauce' in product['Product Name']:
            update_product( \
                product=product, \
                product_class_id = 48, \
                code = 81, \
                mapped_by = 'SJ', \
                reason = 'character match for Sauce')
            
            dst1.append(product)
        else:
            update_product( \
                product=product, \
                product_class_id = 5, \
                code = 82, \
                mapped_by = 'SJ', \
                reason = 'Barilla produces Pasta and Sauce. Determined by POE')
            
            dst1.append(product)

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_barilla, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### The remainder of Sean's mappings
#### Mostly miscellaneous character matches

In [18]:
def map_crisps(src, dst1, dst2):
    for product in src:
        if 'Crisps' in product['Product Name']:
            update_product( \
                product=product, \
                product_class_id = 12, \
                code = 91, \
                mapped_by = 'SJ', \
                reason = 'From a characater match with Crisps')
            dst1.append(product)
        else:
            dst2.append(product)

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_crisps, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

In [19]:
def map_dressing(src, dst1, dst2):
    for product in src:
        if 'Dressing' in product['Product Name']:
            update_product( \
                product=product, \
                product_class_id = 48, \
                code = 101, \
                mapped_by = 'SJ', \
                reason = 'dressing character match as a sauce')
            dst1.append(product)
        else:
            dst2.append(product)

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_dressing, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

In [20]:
def map_donuts(src, dst1, dst2):
    for product in src:
        if 'Donut' in product['Product Name']:
            update_product( \
                product=product, \
                product_class_id = 84, \
                code = 111, \
                mapped_by = 'SJ', \
                reason = 'character match Donut')
            dst1.append(product)
        else:
            dst2.append(product)

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_donuts, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

In [21]:
def map_bushs(src, dst1, dst2):
    for product in src:
        if 'Bushs' in product['Manufacturer']:
            update_product( \
                product=product, \
                product_class_id = 62, \
                code = 121, \
                mapped_by = 'SJ', \
                reason = 'bushs only sells baked beans, which is a canned vegetable')
            dst1.append(product)
        else:
            dst2.append(product)

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_bushs, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

In [22]:
def map_tastykake(src, dst1, dst2):
    for product in src:
        if 'Tastykake' in product['Manufacturer']:
            update_product( \
                product=product, \
                product_class_id = 84, \
                code = 131, \
                mapped_by = 'SJ', \
                reason = 'manufacturer of donut-like products')
            dst1.append(product)
        else:
            dst2.append(product)

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_tastykake, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

In [23]:
def map_bagels(src, dst1, dst2):
    for product in src:
        if 'Bagel' in product['Product Name']:
            update_product( \
                product=product, \
                product_class_id = 25, \
                code = 141, \
                mapped_by = 'SJ', \
                reason = 'character match for bagels')
            dst1.append(product)
        else:
            dst2.append(product)

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_bagels, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

In [24]:
def map_syrup(src, dst1, dst2):
    for product in src:
        if 'Syrup' in product['Product Name']:
            update_product( \
                product=product, \
                product_class_id = 48, \
                code = 151, \
                mapped_by = 'SJ', \
                reason = 'Sauce is best fit for syrup')
            dst1.append(product)
        else:
            dst2.append(product)

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_syrup, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

In [25]:
def map_coffee(src, dst1, dst2):
    for product in src:
        if 'Coffee' in product['Product Name']:
            if 'Cake' in product['Product Name']:
                update_product( \
                    product=product, \
                    product_class_id = 84, \
                    code = 161, \
                    mapped_by = 'AU', \
                    reason = 'Assuming coffee cakes are most similar to donuts')
                dst1.append(product)
            else:
                update_product( \
                    product=product, \
                    product_class_id = 7, \
                    code = 162, \
                    mapped_by = 'SJ', \
                    reason = 'Remaining Match for Coffee')
                dst1.append(product)
        else:
            dst2.append(product)

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_coffee, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

In [26]:
def map_juice(src, dst1, dst2):
    for product in src:
        if 'Juice' in product['Product Name']:
            update_product( \
                product=product, \
                product_class_id = 30, \
                code = 171, \
                mapped_by = 'SJ', \
                reason = 'character match for juice')
            dst1.append(product)
        else:
            dst2.append(product)

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_juice, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

In [27]:
def map_hot_cocoa(src, dst1, dst2):
    for product in src:
        if 'Hot Cocoa' in product['Product Name']:
            update_product( \
                product=product, \
                product_class_id = 51, \
                code = 181, \
                mapped_by = 'AU', \
                reason = 'character match for Hot Cocoa')
            dst1.append(product)
        else:
            dst2.append(product)

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_hot_cocoa, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

In [28]:
def map_sauces(src, dst1, dst2):
    for product in src:
        if 'Sauce' in product['Product Name']:
            update_product( \
                product=product, \
                product_class_id = 48, \
                code = 191, \
                mapped_by = 'SJ', \
                reason = 'character match Sauce')
            dst1.append(product)
        else:
            dst2.append(product)

temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_sauces, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

In [29]:
def map_hamburger_helper(src, dst1, dst2):
    for product in src:
        if 'Hamburger Helper' in product['Product Name']:
            update_product( \
                product=product, \
                product_class_id = 4, \
                code = 201, \
                mapped_by = 'SJ', \
                reason = 'character match hamburger helper')
            dst1.append(product)
        else:
            dst2.append(product)
        
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_hamburger_helper, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

In [30]:
def map_cakes(src, dst1, dst2):
    for product in src:
        if (('Cake' in product['Product Name'])
            and (not ('Mix' in product['Product Name']))):
            update_product( \
                product=product, \
                product_class_id = 26, \
                code = 211, \
                mapped_by = 'AU', \
                reason = 'Closest thing to cake is muffin')
            dst1.append(product)
        else:
            dst2.append(product)
        
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_cakes, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Compare manufacturers against a set of known drink manufacturers

In [31]:
def map_drinks(src, dst1, dst2):
    drink_manufacturers = {'Powerade'}
    drink_manufacturers.add('Welchs')
    drink_manufacturers.add('Turkey Hill')
    drink_manufacturers.add('Sunny Delight Drinks')
    drink_manufacturers.add('Sunny D')
    drink_manufacturers.add('Sparkling ICE')
    drink_manufacturers.add('Snapple')
    drink_manufacturers.add('Nestea')
    drink_manufacturers.add('Minute Maid')
    drink_manufacturers.add('Kool Aid')
    drink_manufacturers.add('Jumex')
    drink_manufacturers.add('Hawaiian Punch')
    drink_manufacturers.add('Got Milk')
    drink_manufacturers.add('Glaceau')
    drink_manufacturers.add('Gatorade')
    drink_manufacturers.add('Country Time')
    drink_manufacturers.add('Carnation')
    drink_manufacturers.add('Capri Sun')
    drink_manufacturers.add('Alpine')
    drink_manufacturers.add('Almond Breeze')
    drink_manufacturers.add('A Taste Of Thai')
    drink_manufacturers.add('4C')

    
    for product in src:
        if product['Manufacturer'] in drink_manufacturers:
            update_product( \
                product=product, \
                product_class_id = 52, \
                code = 221, \
                mapped_by = 'AU', \
                reason = 'Comparison against a set of known drink manufacturers')
            dst1.append(product)
        else:
            dst2.append(product)
        
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_drinks, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Deal with Little Debbie and Entenmanns and Kelloggs

In [32]:
def map_little_debbie_entenmanns(src, dst1, dst2):
    target_manufacturers = {'Little Debbie'}
    target_manufacturers.add('Entenmanns')
    target_manufacturers.add('Kelloggs')
    target_manufacturers.add('Quaker')
    target_manufacturers.add('Glutino')

    target = []
    
    for product in src:
        if product['Manufacturer'] in target_manufacturers:
            target.append(product)
        else:
            dst2.append(product)

    for product in target:
        if re.search("Cookie", product["Product Name"]):
            update_product( \
                product=product, \
                product_class_id = 45, \
                code = 231, \
                mapped_by = 'AU', \
                reason = 'Comparison against a set of known snack manufacturers and regular expression match with cookies')
            dst1.append(product)
            continue

        if re.search("Muffin", product["Product Name"]):
            update_product( \
                product=product, \
                product_class_id = 26, \
                code = 232, \
                mapped_by = 'AU', \
                reason = 'Comparison against a set of known snack manufacturers and regular expression match with muffins')
            dst1.append(product)
            continue

        if re.search("Bar", product["Product Name"]):
            update_product( \
                product=product, \
                product_class_id = 45, \
                code = 233, \
                mapped_by = 'AU', 
                reason = 'Comparison against a set of known snack manufacturers and equating bars to cookies')
            dst1.append(product)
            continue

        if re.search("Little Bites", product["Product Name"]):
            update_product( \
                product=product, \
                product_class_id = 26, \
                code = 234, \
                mapped_by = 'AU', 
                reason = 'Comparison against a set of known snack manufacturers and equating Little Bites with Muffins')
            dst1.append(product)
            continue

        if re.search("Danish", product["Product Name"]):
            update_product( \
                product=product, \
                product_class_id = 26, \
                code = 235, \
                mapped_by = 'AU', 
                reason = 'Comparison against a set of known snack manufacturers and equating Danishes with Muffins')
            dst1.append(product)
            continue

        if re.search("Buns", product["Product Name"]):
            update_product( \
                product=product, \
                product_class_id = 26, \
                code = 236, \
                mapped_by = 'AU', 
                reason = 'Comparison against a set of known snack manufacturers and equating Buns with Muffins')
            dst1.append(product)
            continue

        if re.search("Brownies", product["Product Name"]):
            update_product( \
                product=product, \
                product_class_id = 26, \
                code = 237, \
                mapped_by = 'AU', 
                reason = 'Comparison against a set of known snack manufacturers and equating Brownies with Muffins')
            dst1.append(product)
            continue

        if re.search("Pancakes", product["Product Name"]):
            update_product( \
                product=product, \
                product_class_id = 34, \
                code = 238, \
                mapped_by = 'AU', 
                reason = 'Comparison against a set of known snack manufacturers and equating Pancakes with Waffles')
            dst1.append(product)
            continue

        dst2.append(product)        
            
        
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_little_debbie_entenmanns, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Map Bread and deal with edge cases

In [33]:
def map_bread(src, dst1, dst2):
    target = []
    
    for product in src:
        if re.search("Bread", product["Product Name"]):
            target.append(product)
        else:
            dst2.append(product)

    for product in target:
        if (('Stuffing' not in product["Product Name"])
            and ('Mix' not in product["Product Name"])):

            update_product( \
                product=product, \
                product_class_id = 27, \
                code = 241, \
                mapped_by = 'AU', 
                reason = 'Character match with Bread and edge cases handled')

            dst1.append(product)
            continue

        dst2.append(product) 

        
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_bread, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Map cleaners

In [34]:
def map_cleaners(src, dst1, dst2):
    cleaners_mfrs = {'Sunset'}
    cleaners_mfrs.add('Red Wing')
    cleaners_mfrs.add('High Quality')
    cleaners_mfrs.add('Denny')
    cleaners_mfrs.add('Cormorant')
    
    target = []
    
    for product in src:
        if product["Manufacturer"] in cleaners_mfrs:
            target.append(product)
        else:
            dst2.append(product)

    for product in target:
        update_product( \
            product=product, \
            product_class_id = 21, \
            code = 251, \
            mapped_by = 'AU', 
            reason = 'Comparison against set of known cleaner manufacturers and verified by hand')

        dst1.append(product)
        continue
        
        
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_cleaners, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Now map the jellies and jams

In [35]:
def map_jellies_jams(src, dst1, dst2):
    
    for product in src:
        if product["itemType"] == 'Jelly/Jam':
            if 'Jelly' in product["Product Name"]:
                update_product( \
                    product=product, \
                    product_class_id = 31, \
                    code = 261, \
                    mapped_by = 'AU', 
                    reason = 'Used the old item type to map to new item type Jelly')
            else:
                update_product( \
                    product=product, \
                    product_class_id = 32, \
                    code = 262, \
                    mapped_by = 'AU', 
                    reason = 'Used the old item type to map to new item type Jelly')
            dst1.append(product)
        else:
            dst2.append(product)

        
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_jellies_jams, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Now map the cleaner

In [36]:
def map_cleaner(src, dst1, dst2):
    
    for product in src:
        if 'Cleaner' in product["Product Name"]:
            update_product( \
                product=product, \
                product_class_id = 21, \
                code = 271, \
                mapped_by = 'AU', 
                reason = 'From a character match with Cleaner')
            dst1.append(product)
        else:
            dst2.append(product)

        
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_cleaner, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Now map the salt

In [37]:
def map_salt(src, dst1, dst2):
    
    for product in src:
        if product["Manufacturer"] == 'Morton':
            update_product( \
                product=product, \
                product_class_id = 4, \
                code = 281, \
                mapped_by = 'AU', 
                reason = 'Morton makes salt')
            dst1.append(product)
        else:
            dst2.append(product)

        
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_salt, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Now deal with frozen entrees

In [38]:
def map_frozen_entrees(src, dst1, dst2):
    target_mfrs = {'White Castle'}
    target_mfrs.add('Sandwich Bros')
    target_mfrs.add('Lean Cusine')
    target_mfrs.add('Koch Foods')
    target_mfrs.add('Creamette')
    
    target = []
    
    for product in src:
        if product["Manufacturer"] in target_mfrs:
            target.append(product)
        else:
            dst2.append(product)

    for product in target:
        update_product( \
            product=product, \
            product_class_id = 10, \
            code = 291, \
            mapped_by = 'AU', 
            reason = 'Comparison against set of known frozen entree manufacturers and verified by hand')

        dst1.append(product)
        continue
        
        
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_frozen_entrees, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Now deal with flavored drinks

In [39]:
def map_flavored_drinks(src, dst1, dst2):
    target_mfrs = {'Soy Dream'}
    target_mfrs.add('Silk')
    target_mfrs.add('Ovaltine')
    # Nestle is an edge case
    target_mfrs.add('Nesquik')
    target_mfrs.add('Lipton')
    target_mfrs.add('Goya')
    target_mfrs.add('Big K')
    target_mfrs.add('Asian Gourmet')
    
    target = []
    
    for product in src:
        if product["Manufacturer"] in target_mfrs:
            target.append(product)
        else:
            dst2.append(product)

    for product in target:
        update_product( \
            product=product, \
            product_class_id = 52, \
            code = 301, \
            mapped_by = 'AU', 
            reason = 'Comparison against set of known flavored drink manufacturers and verified by hand')

        dst1.append(product)
        continue
        
        
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_flavored_drinks, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Now deal with soups

In [40]:
def map_soups(src, dst1, dst2):
    target_mfrs = {'Chef Boyardee'}
    target_mfrs.add('Campbells')
    target_mfrs.add('Dinty Moore')
    target_mfrs.add('College Inn')

    target = []
    
    for product in src:
        if product["Manufacturer"] in target_mfrs:
            target.append(product)
        else:
            dst2.append(product)

    for product in target:
        update_product( \
            product=product, \
            product_class_id = 58, \
            code = 311, \
            mapped_by = 'AU', 
            reason = 'Consider canned pastas as soups')

        dst1.append(product)
        continue
        
        
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_soups, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Now deal with mixes

In [41]:
def map_mixes(src, dst1, dst2):
    target = []
    
    for product in src:
        if (('Mix' in product["Product Name"])
            and ('Mixed' not in product["Product Name"])):
            target.append(product)
        else:
            dst2.append(product)

    for product in target:
        if 'Rice' in product["Product Name"]:
            update_product( \
                product=product, \
                product_class_id = 57, \
                code = 321, \
                mapped_by = 'AU', 
                reason = 'Character match with Mix and Rice')
            dst1.append(product)
        elif 'Jiffy' in product["Manufacturer"]:
            update_product( \
                product=product, \
                product_class_id = 50, \
                code = 322, \
                mapped_by = 'AU', 
                reason = 'Character match with Mix and Manufacturer Match with Jiffy')
            dst1.append(product)
        elif 'Fleischmanns' in product["Manufacturer"]:
            update_product( \
                product=product, \
                product_class_id = 50, \
                code = 323, \
                mapped_by = 'AU', 
                reason = 'Character match with Mix and Manufacturer Match with Jiffy. These mixes resemble sugar')
            dst1.append(product)
        else:
            dst2.append(product)
        continue
        
        
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_mixes, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

In [42]:
def map_meats(src, dst1, dst2):
    target_mfrs = {'Steakumm'}
    target_mfrs.add('Philly Gourmet')
    target_mfrs.add('Open Nature')
    target_mfrs.add('On Cor')
    target_mfrs.add('Bubba')
    target_mfrs.add('Banquet')
    target_mfrs.add('Al Fresco')
    target_mfrs.add('Aidells')

    target = []
    
    for product in src:
        if product["Manufacturer"] in target_mfrs:
            target.append(product)
        else:
            dst2.append(product)

    for product in target:
        if 'Chicken' in product["Product Name"]:
            update_product( \
                product=product, \
                product_class_id = 100, \
                code = 331, \
                mapped_by = 'AU', 
                reason = 'Character match with Chicken against a set of frozen meat entrees')
            dst1.append(product)
        elif 'Hamburgers' in product["Product Name"]:
            update_product( \
                product=product, \
                product_class_id = 65, \
                code = 332, \
                mapped_by = 'AU', 
                reason = 'Character match with Hamburgers against a set of frozen meat entrees')
            dst1.append(product)
        elif 'Burger' in product["Product Name"]:
            update_product( \
                product=product, \
                product_class_id = 65, \
                code = 333, \
                mapped_by = 'AU', 
                reason = 'Character match with Hamburgers against a set of frozen meat entrees')
            dst1.append(product)
        else:
            update_product( \
                product=product, \
                product_class_id = 81, \
                code = 333, \
                mapped_by = 'AU', 
                reason = 'Classifying gourmet grillers, meatballs, and sausages as Hot Dogs')
            dst1.append(product)
        continue
        
        
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_meats, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

In [43]:
def map_sour_cream(src, dst1, dst2):
    
    for product in src:
        if 'Sour Cream' in product["Product Name"]:
            update_product( \
                product=product, \
                product_class_id = 14, \
                code = 341, \
                mapped_by = 'AU', 
                reason = 'From a character match with Sour Cream')
            dst1.append(product)
        else:
            dst2.append(product)

        
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_sour_cream, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

In [44]:
def map_cheese_soup(src, dst1, dst2):
    
    for product in src:
        if (('Mac & Cheese' in product["Product Name"])
            or ('Shells & Cheese' in product["Product Name"])):
            update_product( \
                product=product, \
                product_class_id = 5, \
                code = 351, \
                mapped_by = 'AU', 
                reason = 'Mac and Cheese mapped to Pasta')
            dst1.append(product)
        else:
            dst2.append(product)

        
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(map_cheese_soup, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)

#### Now try Rohith's rule-based mapping

This is straight-line code against the style of the other cells for the sake of simplicity for the developers

In [45]:
def map_misc_products(src, dst1, dst2):
    for product in src:
        name = product['Product Name'].lower()
        manu = product['Manufacturer'].lower()

        # 1. Scrubbing Bubbles – All Purpose Cleaner
        if 'scrubbing bubbles' in manu and 'cleaner' in name:
            update_product(
                product=product,
                product_class_id=21,  # Cleaners
                code=361,
                mapped_by='RK',
                reason='Scrubbing Bubbles mapped to Cleaners'
            )
            dst1.append(product)

        # 2. Motrin Infant – Pain Reliever
        elif 'motrin' in manu and ('pain' in name or 'fever' in name):
            update_product(
                product=product,
                product_class_id=71,  # Ibuprofen
                code=362,
                mapped_by='RK',
                reason='Motrin mapped to Ibuprofen'
            )
            dst1.append(product)

        # 3. Campbell’s Spaghetti O’s
        elif 'campbell' in manu and 'spaghetti' in name:
            update_product(
                product=product,
                product_class_id=58,  # Soup (used in place of Canned Pasta)
                code=363,
                mapped_by='RK',
                reason="Spaghetti O's mapped to Canned Soup"
            )
            dst1.append(product)

        # 4. Atkins Scramble Farmhouse-Style Sausage
        elif 'atkins' in manu and 'scramble' in name:
            update_product(
                product=product,
                product_class_id=10,  # TV Dinner
                code=364,
                mapped_by='RK',
                reason='Frozen breakfast sausage mapped to Frozen Entrees'
            )
            dst1.append(product)

        else:
            dst2.append(product)

# Usage
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []
pipeline(map_misc_products, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)
print(f'{len(PRODUCTS_MAPPED)} - Products mapped')
print(f'{len(Products_To_Be_Mapped)} - Products to be mapped')
print('Should have a total of 2075')



1974 - Products mapped
101 - Products to be mapped
Should have a total of 2075


In [46]:
import json
def rule_based_mapping(src, dst1, dst2):
    try:
        with open("full_mapping_rules_all.json", "r") as f:
            mapping_rules = json.load(f)

        for product in src:
            pname = product.get('Product Name', '').lower()
            manuf = product.get('Manufacturer', '').lower()
            matched = False

            for rule in mapping_rules:
                if all(kw in pname for kw in rule['keywords']) and rule['manufacturer'] in manuf:
                    update_product(
                        product=product,
                        product_class_id=rule['class_id'],
                        code=(1000 + rule['meta_code']), 
                        mapped_by='RK',
                        reason=rule['reason']
                    )
                    dst1.append(product)
                    matched = True
                    break  # Stop checking further rules if one matches
            if not matched:
                dst2.append(product)
    except FileNotFoundError:
        print("Warning: mapping rules JSON not found. JSON-based mapping skipped.")
temp = Products_To_Be_Mapped.copy()
Products_To_Be_Mapped = []

pipeline(rule_based_mapping, temp, PRODUCTS_MAPPED, Products_To_Be_Mapped)
PRODUCTS_MAPPED.extend(Products_To_Be_Mapped)
Products_To_Be_Mapped = []

In [47]:
print(f'{len(PRODUCTS_MAPPED)} - Products mapped')
print(f'{len(Products_To_Be_Mapped)} - Products to be mapped')
print('Should have a total of 2075')

if ((len(PRODUCTS_MAPPED) == 2075) and (len(Products_To_Be_Mapped) == 0)):
    DEBUG.product_dump(PRODUCTS_MAPPED, file_name='PRODUCTS_MAPPED.csv')
else:
    for product in Products_To_Be_Mapped:
        print(f'{product["Manufacturer"]}|{product["Product Name"]}')

2075 - Products mapped
0 - Products to be mapped
Should have a total of 2075


### Miscellaneous Profiling Done By Rohith

In [48]:
from collections import Counter

# Count how many products belong to each class_id
class_id_counts = Counter(p["product_class_id"] for p in PRODUCTS_MAPPED)

# Total number of unique class IDs
print(f'Total unique product_class_ids: {len(class_id_counts)}')


Total unique product_class_ids: 95


In [49]:
from collections import Counter

# Convert class_id to int for sorting and consistency
class_id_counts = Counter(int(p["product_class_id"]) for p in PRODUCTS_MAPPED)

# Print sorted counts by class_id
for class_id in sorted(class_id_counts):
    print(f'Class ID {class_id}: {class_id_counts[class_id]} products')


Class ID 1: 25 products
Class ID 3: 10 products
Class ID 4: 39 products
Class ID 5: 47 products
Class ID 6: 11 products
Class ID 7: 120 products
Class ID 8: 21 products
Class ID 9: 11 products
Class ID 10: 23 products
Class ID 11: 50 products
Class ID 12: 139 products
Class ID 13: 123 products
Class ID 14: 12 products
Class ID 15: 10 products
Class ID 16: 16 products
Class ID 19: 6 products
Class ID 20: 6 products
Class ID 21: 22 products
Class ID 24: 5 products
Class ID 25: 40 products
Class ID 26: 44 products
Class ID 27: 52 products
Class ID 30: 32 products
Class ID 31: 12 products
Class ID 32: 2 products
Class ID 34: 3 products
Class ID 35: 93 products
Class ID 36: 26 products
Class ID 38: 10 products
Class ID 39: 5 products
Class ID 42: 1 products
Class ID 45: 99 products
Class ID 48: 54 products
Class ID 49: 20 products
Class ID 50: 21 products
Class ID 51: 10 products
Class ID 52: 54 products
Class ID 53: 20 products
Class ID 54: 15 products
Class ID 57: 18 products
Class ID 58:

In [50]:
for cls in sorted(PRODUCT_CLASSES_NEW, key=lambda x: int(x['product_class_id'])):
    class_id = int(cls['product_class_id'])
    category = cls.get('product_category', 'Unknown')
    print(f'Class ID {class_id}: {category}')


Class ID 1: Specialty
Class ID 2: Seafood
Class ID 3: Fruit
Class ID 4: Baking Goods
Class ID 5: Starchy Foods
Class ID 6: Dairy
Class ID 7: Dry Goods
Class ID 8: Meat
Class ID 9: Frozen Desserts
Class ID 10: Frozen Entrees
Class ID 11: Dairy
Class ID 12: Snack Foods
Class ID 13: Vegetables
Class ID 14: Dairy
Class ID 15: Dairy
Class ID 16: Side Dishes
Class ID 17: Snack Foods
Class ID 18: Paper Products
Class ID 19: Carbonated Beverages
Class ID 20: Cleaning Supplies
Class ID 21: Cleaning Supplies
Class ID 22: Cleaning Supplies
Class ID 23: Cleaning Supplies
Class ID 24: Seafood
Class ID 25: Bread
Class ID 26: Bread
Class ID 27: Bread
Class ID 28: Breakfast Foods
Class ID 29: Breakfast Foods
Class ID 30: Pure Juice Beverages
Class ID 31: Jams and Jellies
Class ID 32: Jams and Jellies
Class ID 33: Jams and Jellies
Class ID 34: Breakfast Foods
Class ID 35: Breakfast Foods
Class ID 36: Candy
Class ID 37: Candy
Class ID 38: Candy
Class ID 39: Hygiene
Class ID 40: Kitchen Products
Class ID

Dairy Products class

In [51]:
for p in PRODUCTS_MAPPED:
    if int(p["product_class_id"]) == 6:
        print(f'{p["Manufacturer"]}|{p["Product Name"]}')

Yoplait|GoGurt Variety Pack
Gorilla|Gorilla Blueberry Yogurt 
Gorilla|Gorilla Strawberry Yogurt 
Even Better|Even Better Blueberry Yogurt 
Even Better|Even Better Strawberry Yogurt 
Club|Club Blueberry Yogurt 
Club|Club Strawberry Yogurt 
Carlson|Carlson Blueberry Yogurt 
Carlson|Carlson Strawberry Yogurt 
Booker|Booker Blueberry Yogurt 
Booker|Booker Strawberry Yogurt 


In [52]:
for p in PRODUCTS_MAPPED:
    if int(p["product_class_id"]) == 11:
        print(f'{p["Manufacturer"]}|{p["Product Name"]}')

Gorilla|Gorilla Cheese Spread 
Gorilla|Gorilla Havarti Cheese 
Gorilla|Gorilla Head Cheese 
Gorilla|Gorilla Jack Cheese 
Gorilla|Gorilla Low Fat String Cheese 
Gorilla|Gorilla Mild Cheddar Cheese 
Gorilla|Gorilla Muenster Cheese 
Gorilla|Gorilla Sharp Cheddar Cheese 
Gorilla|Gorilla String Cheese 
Even Better|Even Better Cheese Spread 
Even Better|Even Better Havarti Cheese 
Even Better|Even Better Head Cheese 
Even Better|Even Better Jack Cheese 
Even Better|Even Better Low Fat String Cheese 
Even Better|Even Better Mild Cheddar Cheese 
Even Better|Even Better Muenster Cheese 
Even Better|Even Better Sharp Cheddar Cheese 
Even Better|Even Better String Cheese 
Club|Club Cheese Spread 
Club|Club Havarti Cheese 
Club|Club Head Cheese 
Club|Club Jack Cheese 
Club|Club Low Fat String Cheese 
Club|Club Mild Cheddar Cheese 
Club|Club Muenster Cheese 
Club|Club Sharp Cheddar Cheese 
Club|Club String Cheese 
Carlson|Carlson Cheese Spread 
Carlson|Carlson Havarti Cheese 
Carlson|Carlson Head C

In [53]:
for p in PRODUCTS_MAPPED:
    if int(p["product_class_id"]) == 14:
        print(f'{p["Manufacturer"]}|{p["Product Name"]}')

Gorilla|Gorilla Low Fat Sour Cream 
Gorilla|Gorilla Sour Cream 
Even Better|Even Better Low Fat Sour Cream 
Even Better|Even Better Sour Cream 
Club|Club Low Fat Sour Cream 
Club|Club Sour Cream 
Carlson|Carlson Low Fat Sour Cream 
Carlson|Carlson Sour Cream 
Booker|Booker Low Fat Sour Cream 
Booker|Booker Sour Cream 
Hunts|Sour Cream Original
Heluva Good|Sour Cream Dip Bacon Ranch


In [54]:
for p in PRODUCTS_MAPPED:
    if int(p["product_class_id"]) == 15:
        print(f'{p["Manufacturer"]}|{p["Product Name"]}')

Gorilla|Gorilla Large Curd Cottage Cheese 
Gorilla|Gorilla Low Fat Cottage Cheese 
Even Better|Even Better Large Curd Cottage Cheese 
Even Better|Even Better Low Fat Cottage Cheese 
Club|Club Large Curd Cottage Cheese 
Club|Club Low Fat Cottage Cheese 
Carlson|Carlson Large Curd Cottage Cheese 
Carlson|Carlson Low Fat Cottage Cheese 
Booker|Booker Large Curd Cottage Cheese 
Booker|Booker Low Fat Cottage Cheese 


In [55]:
for p in PRODUCTS_MAPPED:
    if int(p["product_class_id"]) == 76:
        print(f'{p["Manufacturer"]}|{p["Product Name"]}')

Rowan Dairy|1.00% Milk
Rowan Dairy|1.00% Milk
Rowan Dairy|2.00% Milk
Rowan Dairy|2.00% Milk
Rowan Dairy|Whole Milk Milk
Rowan Dairy|Whole Milk Milk
Pet|Evaporated Milk Original
Kraft|American Singles 2% Milk
Borden|Condensed Milk Sweetened


In [56]:
from collections import defaultdict

# Group class_ids by product_category
category_to_class_ids = defaultdict(set)

for cls in PRODUCT_CLASSES_NEW:
    category = cls.get('product_category', 'Unknown')
    class_id = int(cls['product_class_id'])
    category_to_class_ids[category].add(class_id)

# Print the result
for category, class_ids in category_to_class_ids.items():
    sorted_ids = sorted(class_ids)
    print(f"Category: {category} has class IDs: {sorted_ids}")


Category: Specialty has class IDs: [1, 89]
Category: Seafood has class IDs: [2, 24]
Category: Fruit has class IDs: [3, 99, 211]
Category: Baking Goods has class IDs: [4, 48, 49, 50]
Category: Starchy Foods has class IDs: [5, 57]
Category: Dairy has class IDs: [6, 11, 14, 15, 76]
Category: Dry Goods has class IDs: [7]
Category: Meat has class IDs: [8, 65, 77, 81, 91, 100]
Category: Frozen Desserts has class IDs: [9, 110]
Category: Frozen Entrees has class IDs: [10]
Category: Snack Foods has class IDs: [12, 17, 45, 46, 54, 82, 83, 84, 109]
Category: Vegetables has class IDs: [13, 60, 61, 62, 63]
Category: Side Dishes has class IDs: [16]
Category: Paper Products has class IDs: [18, 55, 213]
Category: Carbonated Beverages has class IDs: [19]
Category: Cleaning Supplies has class IDs: [20, 21, 22, 23]
Category: Bread has class IDs: [25, 26, 27]
Category: Breakfast Foods has class IDs: [28, 29, 34, 35]
Category: Pure Juice Beverages has class IDs: [30]
Category: Jams and Jellies has class ID