Skip to content

LJrobinson/menu-clone-hunter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

menu-clone-hunter

A tiny cannabis retail analytics tool for finding probable duplicate product names in messy menu exports.

Dispensary menus are full of almost-the-same products:

  • GLP Miss X Eighth
  • GLP Miss X 3.5g
  • G.L.P. Miss X 1/8
  • Green Life Productions Miss X Eighth

To a customer, those may be the same product.

To a POS, menu provider, dashboard, scraper, or inventory report, those may become four different records.

That is how the clone goblin enters the building.

What it does

menu-clone-hunter reads a CSV of product names and flags probable duplicates using fuzzy string matching.

It helps identify:

  • Duplicate menu listings
  • Inconsistent product naming
  • Brand alias issues
  • Weight format inconsistencies
  • Reporting noise caused by messy product names
  • Menu cleanup opportunities before analytics work

Why this matters

Cannabis retail data is rarely clean.

The same product can appear multiple ways across:

  • POS exports
  • eCommerce menus
  • Weedmaps / Jane / Dutchie menus
  • Internal inventory reports
  • Scraped competitor menus
  • Vendor menus

Bad naming creates bad analytics.

This tool is a small, practical step toward cleaner product intelligence.

Features

  • CSV input support
  • Configurable product name column
  • Fuzzy duplicate detection
  • Cannabis-specific normalization rules
  • Brand alias cleanup
  • Weight alias cleanup
  • Confidence labels
  • Category conflict warnings
  • Optional CSV export

Install

pip install -r requirements.txt

Usage

python menu_clone_hunter.py sample_menu.csv

By default, the tool looks for a column named:

product_name

You can specify another column:

python menu_clone_hunter.py menu.csv --column "Product Name"

Adjust match sensitivity:

python menu_clone_hunter.py menu.csv --threshold 85

Save results:

python menu_clone_hunter.py menu.csv --out output/duplicates.csv

Example input

product_name
GLP Miss X Eighth
GLP Miss X 3.5g
G.L.P. Miss X 1/8
Matrix Ripper 3.5g
Matrix Ripper Eighth
Matrix Ripper Pre Roll

Example output

Probable duplicate products found:
========================================
 score confidence category_warning              product_a              product_b
   100       High            False       GLP Miss X Eighth       GLP Miss X 3.5g
   100       High            False       GLP Miss X Eighth       G.L.P. Miss X 1/8
   100       High            False      Matrix Ripper 3.5g       Matrix Ripper Eighth
    92     Medium             True      Matrix Ripper 3.5g     Matrix Ripper Pre Roll

Confidence labels

Results include a confidence label based on fuzzy match score:

Score Confidence
96-100 High
90-95 Medium
Below 90 Low

The default threshold is 88, but you can tune this depending on how aggressive you want the clone hunt to be.

Category warnings

Some products share a strain name but are not true duplicates.

For example:

  • Matrix Ripper 3.5g
  • Matrix Ripper Pre Roll

Those may be related products, but not the same menu item.

menu-clone-hunter includes a category warning field to flag possible conflicts between categories like:

  • Flower
  • Preroll
  • Vape
  • Edible
  • Concentrate

This helps separate true duplicate cleanup from products that simply share a strain, flavor, or brand name.

How it works

The tool performs four basic steps:

  1. Normalize messy product names
  2. Apply cannabis-specific alias rules
  3. Compare product names with fuzzy string matching
  4. Return pairs above the selected match threshold

Normalization handles common cannabis menu chaos like:

  • Eighth vs 3.5g
  • 1/8 vs 3.5g
  • Pre Roll vs Preroll
  • G.L.P. vs GLP
  • Natures Chemistry vs Nature's Chemistry
  • NC vs Nature's Chemistry

Current limitations

This is intentionally lightweight.

It does not yet fully understand:

  • Batch IDs
  • Package IDs
  • Vendor SKUs
  • THC percentage differences
  • Live inventory state
  • Product lineage
  • Store-specific naming rules
  • Menu provider-specific quirks

A flower eighth and a preroll with the same strain name may look similar, but they should not automatically be treated as duplicates.

That is why the tool says “probable duplicates,” not “delete this immediately, captain chaos.”

Future improvements

Possible upgrades:

  • Group duplicate products into clusters instead of pairs
  • Add SKU/package ID matching
  • Add brand and category columns
  • Export cleanup recommendations
  • Add product canonicalization suggestions
  • Add stricter cannabis category rules
  • Add menu provider templates
  • Add competitor menu scraper compatibility
  • Add dashboard-ready summary stats

Why I built this

Cannabis operators often make decisions from messy menu and inventory data.

Before analytics can be trusted, product names need to be cleaned.

This tool demonstrates practical cannabis retail data hygiene using Python, CSV processing, normalization rules, fuzzy matching, confidence scoring, and export-ready duplicate review workflows.

It is a small utility, but the problem is very real: messy product names create noisy reporting, bad menu analytics, and operational confusion.

The clone goblin must be contained.

About

Cannabis retail CSV tool that flags probable duplicate menu products using fuzzy matching, normalization rules, confidence scoring, and category warnings.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages