Open datasets tracking how AI systems recommend products. We're measuring consistency, documenting patterns, and sharing everything we find.
We asked Google AI Mode and ChatGPT the same 132 product questions, 3 times each. The results surprised us.
Quick Stats:
- 792 AI responses across 2 models and 3 runs
- 3,806 product recommendations extracted and structured
- 132 query variations from 33 core product searches
- Complete source citations preserved
📁 Browse Dataset → | 📊 View Analysis →
Consistency Analysis:
- ChatGPT and Google AI Mode have a 47.3% agreement rate.
- Output drift of ChatGPT varies depending on whether it is using search retrieval.
- Business relationships play a role in ChatGPT's citation sources.
Analysis report → (https://amplifying.ai/blog/why-ai-product-recommendations-keep-changing-google-ai-mode-vs-chatgpt)
├── experiments/
│ └── consumer-products/ # Consumer product recommendations dataset
│ ├── README.md # Detailed dataset documentation
│ ├── data/
│ │ ├── analysis/
│ │ │ └── analysis.json # Consistency analysis results
│ │ ├── products/
│ │ │ └── products.jsonl # 2,074 extracted products
│ │ ├── queries/
│ │ │ └── queries.jsonl # 33 query sets, 132 variations
│ │ └── responses/
│ │ ├── chatgpt/ # 396 ChatGPT responses
│ │ │ ├── run_1.jsonl
│ │ │ ├── run_2.jsonl
│ │ │ └── run_3.jsonl
│ │ └── google_ai_mode/ # 396 Google AI responses
│ │ ├── run_1.jsonl
│ │ ├── run_2.jsonl
│ │ └── run_3.jsonl
│ └── tools/
│ └─index.html # Interactive visualization
└── README.md # This file
To run the visualization dashboard:
-
Clone the repository:
git clone https://github.com/amplifying-ai/ai-product-bench cd ai-product-bench
-
Start a web server:
# Using http-server (install with: npm install -g http-server) http-server experiments/consumer-products/tools/ # Or using Python's built-in server cd experiments/consumer-products/tools python -m http.server 8000 # Or using any other web server of your choice
-
Open in browser: Navigate to the provided local URL (typically
http://localhost:8000
)
The dashboard provides interactive visualizations of the consistency analysis results from analysis.json
.
- Research: Study AI behavior and consistency patterns
- Business Intelligence: Track your products' AI visibility
- Benchmarking: Compare AI model reliability
- Monitoring: Build tools to track changes over time
This is just the beginning. Help us grow:
- New categories: B2B software, services, travel
- More models: Claude, Perplexity, Bing
- Time series: Same queries over weeks/months
- International: Non-English queries
- Found interesting patterns? Share them!
- Built visualizations? Add them!
- Discovered anomalies? Document them!
See our contribution guide.
We're planning to add:
- B2B software recommendations dataset
- International product queries
- Historical snapshots
Want to help or have suggestions? Open an issue.
AI systems increasingly influence what products people buy. Understanding their consistency—or lack thereof—helps:
- Consumers make informed decisions
- Businesses optimize their AI presence
- Researchers study AI behavior
- Developers build better systems
@dataset{amplifying2025aiproductbench,
title={AI Product Bench: Consumer Products Dataset v1.0},
author={Amplifying},
year={2025},
url={https://github.com/amplifying-ai/ai-product-bench}
}
- Dataset questions: Open an issue
- Research collaboration: research@amplifying.ai
- Blog post: Our analysis
AI Product Bench is an open data initiative by Amplifying. We believe transparency in AI recommendations benefits everyone.