menu-clone-hunter

A tiny cannabis retail analytics tool for finding probable duplicate product names in messy menu exports.

Dispensary menus are full of almost-the-same products:

GLP Miss X Eighth
GLP Miss X 3.5g
G.L.P. Miss X 1/8
Green Life Productions Miss X Eighth

To a customer, those may be the same product.

To a POS, menu provider, dashboard, scraper, or inventory report, those may become four different records.

That is how the clone goblin enters the building.

What it does

menu-clone-hunter reads a CSV of product names and flags probable duplicates using fuzzy string matching.

It helps identify:

Duplicate menu listings
Inconsistent product naming
Brand alias issues
Weight format inconsistencies
Reporting noise caused by messy product names
Menu cleanup opportunities before analytics work

Why this matters

Cannabis retail data is rarely clean.

The same product can appear multiple ways across:

POS exports
eCommerce menus
Weedmaps / Jane / Dutchie menus
Internal inventory reports
Scraped competitor menus
Vendor menus

Bad naming creates bad analytics.

This tool is a small, practical step toward cleaner product intelligence.

Features

CSV input support
Configurable product name column
Fuzzy duplicate detection
Cannabis-specific normalization rules
Brand alias cleanup
Weight alias cleanup
Confidence labels
Category conflict warnings
Optional CSV export

Install

pip install -r requirements.txt

Usage

python menu_clone_hunter.py sample_menu.csv

By default, the tool looks for a column named:

product_name

You can specify another column:

python menu_clone_hunter.py menu.csv --column "Product Name"

Adjust match sensitivity:

python menu_clone_hunter.py menu.csv --threshold 85

Save results:

python menu_clone_hunter.py menu.csv --out output/duplicates.csv

Example input

product_name
GLP Miss X Eighth
GLP Miss X 3.5g
G.L.P. Miss X 1/8
Matrix Ripper 3.5g
Matrix Ripper Eighth
Matrix Ripper Pre Roll

Example output

Probable duplicate products found:
========================================
 score confidence category_warning              product_a              product_b
   100       High            False       GLP Miss X Eighth       GLP Miss X 3.5g
   100       High            False       GLP Miss X Eighth       G.L.P. Miss X 1/8
   100       High            False      Matrix Ripper 3.5g       Matrix Ripper Eighth
    92     Medium             True      Matrix Ripper 3.5g     Matrix Ripper Pre Roll

Confidence labels

Results include a confidence label based on fuzzy match score:

Score	Confidence
96-100	High
90-95	Medium
Below 90	Low

The default threshold is 88, but you can tune this depending on how aggressive you want the clone hunt to be.

Category warnings

Some products share a strain name but are not true duplicates.

For example:

Matrix Ripper 3.5g
Matrix Ripper Pre Roll

Those may be related products, but not the same menu item.

menu-clone-hunter includes a category warning field to flag possible conflicts between categories like:

Flower
Preroll
Vape
Edible
Concentrate

This helps separate true duplicate cleanup from products that simply share a strain, flavor, or brand name.

How it works

The tool performs four basic steps:

Normalize messy product names
Apply cannabis-specific alias rules
Compare product names with fuzzy string matching
Return pairs above the selected match threshold

Normalization handles common cannabis menu chaos like:

Eighth vs 3.5g
1/8 vs 3.5g
Pre Roll vs Preroll
G.L.P. vs GLP
Natures Chemistry vs Nature's Chemistry
NC vs Nature's Chemistry

Current limitations

This is intentionally lightweight.

It does not yet fully understand:

Batch IDs
Package IDs
Vendor SKUs
THC percentage differences
Live inventory state
Product lineage
Store-specific naming rules
Menu provider-specific quirks

A flower eighth and a preroll with the same strain name may look similar, but they should not automatically be treated as duplicates.

That is why the tool says “probable duplicates,” not “delete this immediately, captain chaos.”

Future improvements

Possible upgrades:

Group duplicate products into clusters instead of pairs
Add SKU/package ID matching
Add brand and category columns
Export cleanup recommendations
Add product canonicalization suggestions
Add stricter cannabis category rules
Add menu provider templates
Add competitor menu scraper compatibility
Add dashboard-ready summary stats

Why I built this

Cannabis operators often make decisions from messy menu and inventory data.

Before analytics can be trusted, product names need to be cleaned.

This tool demonstrates practical cannabis retail data hygiene using Python, CSV processing, normalization rules, fuzzy matching, confidence scoring, and export-ready duplicate review workflows.

It is a small utility, but the problem is very real: messy product names create noisy reporting, bad menu analytics, and operational confusion.

The clone goblin must be contained.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

menu-clone-hunter

What it does

Why this matters

Features

Install

Usage

Example input

Example output

Confidence labels

Category warnings

How it works

Current limitations

Future improvements

Why I built this

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
menu_clone_hunter.py		menu_clone_hunter.py
requirements.txt		requirements.txt
sample_menu.csv		sample_menu.csv

Folders and files

Latest commit

History

Repository files navigation

menu-clone-hunter

What it does

Why this matters

Features

Install

Usage

Example input

Example output

Confidence labels

Category warnings

How it works

Current limitations

Future improvements

Why I built this

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages