A Python data package providing access to grocery store shopping transaction data from 84.51Β°. This is a Python equivalent of the R package completejourney, using the more portable Parquet format for cross-platform compatibility.
Important: This package contains simulated data based on real grocery shopping patterns. It is intended for educational and exploratory data analysis purposes only. This data should not be used for academic research, commercial decision-making, or any purpose requiring authentic consumer behavior data.
The Complete Journey dataset represents grocery store shopping transactions over one year from a group of 801 households. The data includes detailed purchase information, household demographics, marketing campaigns, and coupon usage - providing a comprehensive view of retail shopping behavior.
Key Statistics:
- 1,469,307 transaction records
- 801 households
- 8 comprehensive datasets
- 1 year of shopping data
pip install completejourney_py# Clone the repository
git clone https://github.com/cunningjames/completejourney_py.git
cd completejourney_py
# Install in development mode
pip install -e .
# Install with development dependencies
pip install -e ".[dev]"from completejourney_py import get_data
# Load all datasets
data = get_data()
# Access individual datasets
transactions = data["transactions"]
demographics = data["demographics"]
products = data["products"]
print(f"Loaded {len(transactions):,} transaction records")
print(f"Covering {len(demographics):,} households")Comprehensive documentation including analysis notebooks is available at: completejourney-py.readthedocs.io
The documentation includes detailed analysis notebooks:
- Dataset Summary Analysis - Overview of all 8 datasets
- Top Selling Products - Product performance analysis
- Shopping Frequency Analysis - Customer behavior patterns
- Coupon Analysis - Promotional effectiveness
- Traffic Patterns - Store visit timing and trends
- Demographic Product Analysis - Purchase behavior by customer segments
- Market Basket Analysis - Product associations and cross-selling
transactions- Complete purchase records (1.47M records)products- Product metadata and categoriesdemographics- Household demographic information
campaigns- Marketing campaigns received by householdscampaign_descriptions- Campaign metadata and detailspromotions- Product placement in mailers and storescoupons- Coupon metadata (UPC codes, campaigns)coupon_redemptions- Detailed coupon usage records
from completejourney_py import get_data
# Load single dataset
transactions = get_data("transactions")["transactions"]
# Load multiple datasets
sales_data = get_data(["transactions", "products", "demographics"])import pandas as pd
from completejourney_py import get_data
# Load data
data = get_data(["transactions", "demographics", "products"])
transactions = data["transactions"]
demographics = data["demographics"]
products = data["products"]
# Basic transaction analysis
print("Transaction Summary:")
print(f"Total transactions: {len(transactions):,}")
print(f"Total households: {transactions['household_id'].nunique():,}")
print(f"Date range: {transactions['transaction_timestamp'].dt.date.min()} to {transactions['transaction_timestamp'].dt.date.max()}")
# Household spending analysis
household_spending = transactions.groupby('household_id')['sales_value'].sum()
print(f"\nAverage household spending: ${household_spending.mean():.2f}")
print(f"Median household spending: ${household_spending.median():.2f}")# Analyze campaign effectiveness
campaign_data = get_data(["campaigns", "campaign_descriptions", "transactions"])
campaigns = campaign_data["campaigns"]
descriptions = campaign_data["campaign_descriptions"]
transactions = campaign_data["transactions"]
# Join campaign data
campaign_analysis = campaigns.merge(descriptions, on='campaign')
print("Campaign Types:")
print(campaign_analysis['campaign_type'].value_counts())| Dataset | Key Variables | Description |
|---|---|---|
transactions |
household_id, product_id, sales_value, quantity |
Purchase records |
demographics |
household_id, age, income, household_size |
Household characteristics |
products |
product_id, department, product_category, brand |
Product information |
campaigns |
household_id, campaign_id |
Marketing campaigns |
coupons |
coupon_upc, product_id, campaign_id |
Coupon details |
households (demographics)
β
transactions β products
β
campaigns β campaign_descriptions
β
coupons β coupon_redemptions
Appropriate Uses:
- β Learning data analysis techniques
- β Teaching retail analytics concepts
- β Prototyping data science workflows
- β Educational coursework and tutorials
Not Appropriate For:
- β Academic research requiring real consumer data
- β Commercial business decisions
- β Market research or consumer insights
- β Publication in academic journals
The original concept and data structure are from 84.51Β°, with additional insights available at the Complete Journey project page.
Citation for Educational Use:
84.51Β°. (2015). The Complete Journey: A comprehensive view of household shopping behavior [Dataset concept]. 84.51Β°. http://www.8451.com/area51/
[Note: This implementation contains simulated data for educational purposes]
- Python 3.8-3.14
- pandas >= 1.0.0
- pyarrow >= 1.0.0
# Install test dependencies
pip install -e ".[test]"
# Run tests
pytest
# Run with coverage
pytest --cov=completejourney_py# Install development dependencies
pip install -e ".[dev]"
# Format code
black completejourney_py/ tests/
isort completejourney_py/ tests/
# Lint code
flake8 completejourney_py/ tests/
# Type checking
mypy completejourney_py/This package is released under the MIT License. The underlying data is provided by 84.51Β° for research and educational purposes.
- completejourney (R) - Original R package
- Complete Journey Analysis - Detailed data exploration
Contributions are welcome! Please feel free to submit a Pull Request.