A tiny cannabis retail analytics tool for finding probable duplicate product names in messy menu exports.
Dispensary menus are full of almost-the-same products:
GLP Miss X EighthGLP Miss X 3.5gG.L.P. Miss X 1/8Green Life Productions Miss X Eighth
To a customer, those may be the same product.
To a POS, menu provider, dashboard, scraper, or inventory report, those may become four different records.
That is how the clone goblin enters the building.
menu-clone-hunter reads a CSV of product names and flags probable duplicates using fuzzy string matching.
It helps identify:
- Duplicate menu listings
- Inconsistent product naming
- Brand alias issues
- Weight format inconsistencies
- Reporting noise caused by messy product names
- Menu cleanup opportunities before analytics work
Cannabis retail data is rarely clean.
The same product can appear multiple ways across:
- POS exports
- eCommerce menus
- Weedmaps / Jane / Dutchie menus
- Internal inventory reports
- Scraped competitor menus
- Vendor menus
Bad naming creates bad analytics.
This tool is a small, practical step toward cleaner product intelligence.
- CSV input support
- Configurable product name column
- Fuzzy duplicate detection
- Cannabis-specific normalization rules
- Brand alias cleanup
- Weight alias cleanup
- Confidence labels
- Category conflict warnings
- Optional CSV export
pip install -r requirements.txtpython menu_clone_hunter.py sample_menu.csvBy default, the tool looks for a column named:
product_name
You can specify another column:
python menu_clone_hunter.py menu.csv --column "Product Name"Adjust match sensitivity:
python menu_clone_hunter.py menu.csv --threshold 85Save results:
python menu_clone_hunter.py menu.csv --out output/duplicates.csvproduct_name
GLP Miss X Eighth
GLP Miss X 3.5g
G.L.P. Miss X 1/8
Matrix Ripper 3.5g
Matrix Ripper Eighth
Matrix Ripper Pre RollProbable duplicate products found:
========================================
score confidence category_warning product_a product_b
100 High False GLP Miss X Eighth GLP Miss X 3.5g
100 High False GLP Miss X Eighth G.L.P. Miss X 1/8
100 High False Matrix Ripper 3.5g Matrix Ripper Eighth
92 Medium True Matrix Ripper 3.5g Matrix Ripper Pre Roll
Results include a confidence label based on fuzzy match score:
| Score | Confidence |
|---|---|
| 96-100 | High |
| 90-95 | Medium |
| Below 90 | Low |
The default threshold is 88, but you can tune this depending on how aggressive you want the clone hunt to be.
Some products share a strain name but are not true duplicates.
For example:
Matrix Ripper 3.5gMatrix Ripper Pre Roll
Those may be related products, but not the same menu item.
menu-clone-hunter includes a category warning field to flag possible conflicts between categories like:
- Flower
- Preroll
- Vape
- Edible
- Concentrate
This helps separate true duplicate cleanup from products that simply share a strain, flavor, or brand name.
The tool performs four basic steps:
- Normalize messy product names
- Apply cannabis-specific alias rules
- Compare product names with fuzzy string matching
- Return pairs above the selected match threshold
Normalization handles common cannabis menu chaos like:
Eighthvs3.5g1/8vs3.5gPre RollvsPrerollG.L.P.vsGLPNatures ChemistryvsNature's ChemistryNCvsNature's Chemistry
This is intentionally lightweight.
It does not yet fully understand:
- Batch IDs
- Package IDs
- Vendor SKUs
- THC percentage differences
- Live inventory state
- Product lineage
- Store-specific naming rules
- Menu provider-specific quirks
A flower eighth and a preroll with the same strain name may look similar, but they should not automatically be treated as duplicates.
That is why the tool says “probable duplicates,” not “delete this immediately, captain chaos.”
Possible upgrades:
- Group duplicate products into clusters instead of pairs
- Add SKU/package ID matching
- Add brand and category columns
- Export cleanup recommendations
- Add product canonicalization suggestions
- Add stricter cannabis category rules
- Add menu provider templates
- Add competitor menu scraper compatibility
- Add dashboard-ready summary stats
Cannabis operators often make decisions from messy menu and inventory data.
Before analytics can be trusted, product names need to be cleaned.
This tool demonstrates practical cannabis retail data hygiene using Python, CSV processing, normalization rules, fuzzy matching, confidence scoring, and export-ready duplicate review workflows.
It is a small utility, but the problem is very real: messy product names create noisy reporting, bad menu analytics, and operational confusion.
The clone goblin must be contained.