# Product Data Migration to Strapi API

This notebook processes product data from CSV and uploads it to the Strapi database via API.

## Important Notes:
- **Parent Products Only**: Filters out product variants (those with hyphens in SKU or Frame/Print Size)
- **Foreign Key Relationships**: Links to Artists, Campaigns, Colours, and EAN Groups
- **Batch Processing**: Uploads 50 products at a time with checkpoints
- **Image URLs**: Stores URLs in product_artwork_url (images uploaded separately later)
- **Resume Capability**: Can resume from last checkpoint if interrupted

## Process:
1. Load CSV data and filter parent products
2. Build lookup dictionaries from API (Artists, Campaigns, Colours, EANs)
3. Prepare product records with foreign key lookups
4. Batch upload with checkpoints and error tracking
5. Generate detailed report

## Field Mapping:
- CSV: `Name` → API: `product_parent_name`
- CSV: `SKU` → API: `product_sku`
- CSV: `Artist` → API: `artist` (foreign key)
- CSV: `EEP Campaign Name` → API: `campaigns` (foreign key array)
- CSV: `Primary Colour` → API: `primary_colours` (foreign key array)
- CSV: `Secondary Colour` → API: `secondary_colours` (foreign key array)
- CSV: `EAN Group` → API: `ean_group` (foreign key)
- CSV: `Unbxd Primary Image URL` → API: `product_artwork_url`

In [31]:
# Import required libraries
import pandas as pd
import numpy as np
import requests
import json
import os
from dotenv import load_dotenv
import logging
from typing import Dict, List, Optional, Set
import time
import re
from datetime import datetime

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# Load environment variables
load_dotenv()

# API Configuration
API_BASE_URL = os.getenv('API_BASE_URL')
API_TOKEN = os.getenv('API_TOKEN')

if not API_BASE_URL or not API_TOKEN:
    raise ValueError("API_BASE_URL and API_TOKEN must be set in .env file")

logger.info(f"API Base URL: {API_BASE_URL}")
logger.info("API Token loaded successfully")

# Batch configuration
BATCH_SIZE = 50
DELAY_BETWEEN_BATCHES = 2  # seconds
DELAY_BETWEEN_UPLOADS = 0.5  # seconds between individual uploads
CHECKPOINT_FILE = 'product_migration_checkpoint.json'

2025-10-23 13:26:59,803 - INFO - API Base URL: http://localhost:1337/api
2025-10-23 13:26:59,804 - INFO - API Token loaded successfully


In [32]:
# Load CSV data
csv_file_path = 'data/All Product Data Back Up 070825.csv'

logger.info(f"Loading CSV file: {csv_file_path}")

# Columns to load (using exact column names from CSV)
columns_to_load = [
    'Name (productName)',
    'SKU (Unique Id)',
    'SKU (Parent ID) (parent_id)',
    'Artist',
    'Product Type',
    'Orientation',
    'Weight',
    'Height',
    'Width',
    'Depth',
    'Primary Colour',
    'Secondary Colour',
    'EAN Group',
    'EEP Campaign Name',
    'Keywords',
    'Set Size',
    'Frame Style (Child only)',
    'Print Size (Child only)',
    'Unbxd Primary Image URL (productImage)'
]

try:
    df = pd.read_csv(csv_file_path, usecols=columns_to_load)
    logger.info(f"Successfully loaded {len(df)} rows from CSV")
    logger.info(f"Columns: {df.columns.tolist()}")
    
    # Display first few rows
    print("\nFirst 5 rows of data:")
    print(df.head())
    
    print("\nData info:")
    print(df.info())
    
except Exception as e:
    logger.error(f"Error loading CSV file: {e}")
    raise


2025-10-23 13:27:00,834 - INFO - Loading CSV file: data/All Product Data Back Up 070825.csv
  df = pd.read_csv(csv_file_path, usecols=columns_to_load)
2025-10-23 13:27:11,786 - INFO - Successfully loaded 384349 rows from CSV
2025-10-23 13:27:11,788 - INFO - Columns: ['SKU (Unique Id)', 'Unbxd Primary Image URL (productImage)', 'Weight', 'Product Type', 'Orientation', 'Primary Colour', 'Secondary Colour', 'Depth', 'Name (productName)', 'Keywords', 'Height', 'Artist', 'Width', 'SKU (Parent ID) (parent_id)', 'Frame Style (Child only)', 'Print Size (Child only)', 'EAN Group', 'Set Size', 'EEP Campaign Name']



First 5 rows of data:
       SKU (Unique Id)             Unbxd Primary Image URL (productImage)  \
0            13PRIN001  https://pim-assets.unbxd.com/images/5f46a02207...   
1   13PRIN001-30x40-BB  https://pim-assets.unbxd.com/images/5f46a02207...   
2  13PRIN001-30x40-MBF  https://pim-assets.unbxd.com/images/5f46a02207...   
3  13PRIN001-30x40-MOF  https://pim-assets.unbxd.com/images/5f46a02207...   
4  13PRIN001-30x40-MWF  https://pim-assets.unbxd.com/images/5f46a02207...   

   Weight       Product Type Orientation Primary Colour Secondary Colour  \
0    0.03  Digital Art Print    Portrait          Green              NaN   
1    0.03  Digital Art Print    Portrait          Green              NaN   
2    0.93  Digital Art Print    Portrait          Green              NaN   
3    0.93  Digital Art Print    Portrait          Green              NaN   
4    0.93  Digital Art Print    Portrait          Green              NaN   

   Depth Name (productName)  \
0   0.05    Born To Be Wil

In [33]:
# Filter for parent products only (exclude variants)
def is_parent_product(row) -> bool:
    """
    Determine if a row represents a parent product (not a variant).
    
    Method 1: Check for hyphen in SKU (variants have hyphens like "SKU-A1" or "SKU-White-A2")
    Method 2: Check Frame Style and Print Size columns (variants have these filled)
    
    Args:
        row: DataFrame row
        
    Returns:
        True if parent product, False if variant
    """
    # Method 1: Check SKU for hyphen
    sku = str(row['SKU (Unique Id)']).strip() if pd.notna(row['SKU (Unique Id)']) else ''
    has_hyphen = '-' in sku
    
    # Method 2: Check Frame Style and Print Size
    frame_style = row['Frame Style (Child only)']
    print_size = row['Print Size (Child only)']
    
    has_frame = pd.notna(frame_style) and str(frame_style).strip() != ''
    has_print = pd.notna(print_size) and str(print_size).strip() != ''
    
    # Parent product must have:
    # - No hyphen in SKU AND
    # - No Frame Style or Print Size
    is_parent = not has_hyphen and not (has_frame or has_print)
    
    return is_parent

logger.info("Filtering for parent products only...")

# Apply filter
df['is_parent'] = df.apply(is_parent_product, axis=1)
parent_products_df = df[df['is_parent']].copy()

logger.info(f"Total rows in CSV: {len(df)}")
logger.info(f"Parent products: {len(parent_products_df)}")
logger.info(f"Variants (filtered out): {len(df) - len(parent_products_df)}")

# Remove duplicates by Name (product_parent_name)
parent_products_df = parent_products_df.drop_duplicates(subset=['Name (productName)'], keep='first')
logger.info(f"Unique parent products after deduplication: {len(parent_products_df)}")

# Display sample parent products
print("\nSample parent products:")
print(parent_products_df[['Name (productName)', 'SKU (Unique Id)', 'Artist', 'Product Type']].head(10))

# Display sample variants (for verification)
variant_df = df[~df['is_parent']]
if len(variant_df) > 0:
    print("\nSample variants (excluded):")
    print(variant_df[['Name (productName)', 'SKU (Unique Id)', 'Frame Style (Child only)', 'Print Size (Child only)']].head(10))


2025-10-23 13:27:11,929 - INFO - Filtering for parent products only...
2025-10-23 13:27:14,969 - INFO - Total rows in CSV: 384349
2025-10-23 13:27:14,969 - INFO - Parent products: 12964
2025-10-23 13:27:14,970 - INFO - Variants (filtered out): 371385
2025-10-23 13:27:14,973 - INFO - Unique parent products after deduplication: 8717



Sample parent products:
                                    Name (productName) SKU (Unique Id)  \
0                                      Born To Be Wild       13PRIN001   
67            Born To Be Wild Greetings Card Pack of 6      13PRIN001c   
68                                       Cheeky Monkey       13PRIN002   
135                                         Ey Up Duck       13PRIN003   
202                Ey Up Duck Greetings Card Pack of 6      13PRIN003c   
203                                 Get Off Your Phone       13PRIN004   
270                   Home Sweet Home by The 13 Prints       13PRIN005   
337  Home Sweet Home by The 13 Prints Greetings Car...      13PRIN005c   
338                            Let The Good Times Roll       13PRIN006   
405                                    Live In The Now       13PRIN007   

            Artist       Product Type  
0    The 13 Prints  Digital Art Print  
67   The 13 Prints               Card  
68   The 13 Prints  Digital Art Print  


In [34]:
# Build lookup dictionaries from API
def fetch_all_entities(endpoint: str, entity_name: str) -> List[Dict]:
    """
    Fetch all entities from a Strapi endpoint with pagination.
    
    Args:
        endpoint: API endpoint (e.g., '/artists')
        entity_name: Name for logging
        
    Returns:
        List of all entities
    """
    url = f"{API_BASE_URL}{endpoint}"
    headers = {
        'Authorization': f'Bearer {API_TOKEN}',
        'Content-Type': 'application/json'
    }
    
    all_entities = []
    page = 1
    page_size = 100
    
    while True:
        params = {
            'pagination[page]': page,
            'pagination[pageSize]': page_size
        }
        
        try:
            response = requests.get(url, headers=headers, params=params, timeout=30)
            
            if response.status_code == 200:
                data = response.json()
                entities = data.get('data', [])
                all_entities.extend(entities)
                
                # Check if there are more pages
                pagination = data.get('meta', {}).get('pagination', {})
                if page >= pagination.get('pageCount', 1):
                    break
                    
                page += 1
            else:
                logger.error(f"Failed to fetch {entity_name}: {response.status_code}")
                logger.error(f"Response: {response.text}")
                break
                
        except Exception as e:
            logger.error(f"Error fetching {entity_name}: {e}")
            break
    
    logger.info(f"Fetched {len(all_entities)} {entity_name}")
    return all_entities

# Build lookup dictionaries
logger.info("Building lookup dictionaries from API...")
logger.info("Note: Strapi v5 uses FLAT structure - fields at root level, not nested")

# Artists: {name: id}
artists = fetch_all_entities('/artists', 'artists')
artist_lookup = {}
for artist in artists:
    # Strapi v5: Fields are directly on the object, NOT under 'attributes'
    artist_id = artist.get('id')
    name = artist.get('artist_name', '').strip().lower()
    if name and artist_id:
        artist_lookup[name] = artist_id
logger.info(f"✅ Artist lookup: {len(artist_lookup)} entries")

# Campaigns: {name: id}
campaigns = fetch_all_entities('/campaigns', 'campaigns')
campaign_lookup = {}
for campaign in campaigns:
    campaign_id = campaign.get('id')
    name = campaign.get('campaign_name', '').strip().lower()
    if name and campaign_id:
        campaign_lookup[name] = campaign_id
logger.info(f"✅ Campaign lookup: {len(campaign_lookup)} entries")

# Colours: {name: id}
colours = fetch_all_entities('/colours', 'colours')
colour_lookup = {}
for colour in colours:
    colour_id = colour.get('id')
    name = colour.get('colour_name', '').strip().lower()
    if name and colour_id:
        colour_lookup[name] = colour_id
logger.info(f"✅ Colour lookup: {len(colour_lookup)} entries")

# EAN Groups: {group: id}
eans = fetch_all_entities('/eans', 'EAN groups')
ean_lookup = {}
for ean in eans:
    ean_id = ean.get('id')
    group = ean.get('ean_group', '').strip()
    if group and ean_id:
        ean_lookup[group] = ean_id
logger.info(f"✅ EAN lookup: {len(ean_lookup)} entries")

print("\n" + "="*60)
print("LOOKUP DICTIONARIES BUILT")
print("="*60)
print(f"Artists: {len(artist_lookup)}")
print(f"Campaigns: {len(campaign_lookup)}")
print(f"Colours: {len(colour_lookup)}")
print(f"EAN Groups: {len(ean_lookup)}")
print(f"\nTotal: {len(artist_lookup) + len(campaign_lookup) + len(colour_lookup) + len(ean_lookup)} entries")

# Show samples
if artist_lookup:
    print(f"\n📋 Sample artists (first 5):")
    for i, (name, id) in enumerate(list(artist_lookup.items())[:5]):
        print(f"  '{name}' → ID {id}")
if colour_lookup:
    print(f"\n📋 All colours:")
    for name, id in sorted(colour_lookup.items()):
        print(f"  '{name}' → ID {id}")

2025-10-23 13:27:16,646 - INFO - Building lookup dictionaries from API...
2025-10-23 13:27:16,648 - INFO - Note: Strapi v5 uses FLAT structure - fields at root level, not nested
2025-10-23 13:27:17,483 - INFO - Fetched 279 artists
2025-10-23 13:27:17,484 - INFO - ✅ Artist lookup: 277 entries
2025-10-23 13:27:17,712 - INFO - Fetched 54 campaigns
2025-10-23 13:27:17,713 - INFO - ✅ Campaign lookup: 53 entries
2025-10-23 13:27:17,919 - INFO - Fetched 12 colours
2025-10-23 13:27:17,920 - INFO - ✅ Colour lookup: 12 entries
2025-10-23 13:27:18,124 - INFO - Fetched 8 EAN groups
2025-10-23 13:27:18,125 - INFO - ✅ EAN lookup: 8 entries



LOOKUP DICTIONARIES BUILT
Artists: 277
Campaigns: 53
Colours: 12
EAN Groups: 8

Total: 350 entries

📋 Sample artists (first 5):
  '67 inc' → ID 574
  'aley wild' → ID 580
  'alice straker' → ID 582
  'anek' → ID 594
  'anna schmidt' → ID 600

📋 All colours:
  'black' → ID 2
  'blue' → ID 4
  'brown' → ID 6
  'green' → ID 8
  'grey' → ID 10
  'orange' → ID 12
  'pink' → ID 14
  'purple' → ID 16
  'red' → ID 18
  'teal' → ID 20
  'white' → ID 22
  'yellow' → ID 24


In [35]:
# Helper functions for data transformation

def extract_colours(colour_value) -> List[str]:
    """
    Extract individual colour names from a field that may contain multiple colours.
    Reuses logic from colour migration.
    """
    if pd.isna(colour_value):
        return []
    
    colour_str = str(colour_value).strip()
    
    excluded_terms = [
        'nan', 'none', 'n/a', 'unknown', '', 
        'multicoloured', 'multicolored', 'multi-coloured', 'multi-colored',
        'various', 'mixed', 'assorted', 'multi', 'multiple'
    ]
    
    if colour_str == '' or colour_str.lower() in excluded_terms:
        return []
    
    # Split by common separators
    colours = re.split(r'[,/&;]|\band\b', colour_str, flags=re.IGNORECASE)
    
    # Clean each colour name
    cleaned_colours = []
    for colour in colours:
        colour = colour.strip().lower()
        if colour and colour not in excluded_terms:
            cleaned_colours.append(colour)
    
    return cleaned_colours

def lookup_colour_ids(colour_names: List[str]) -> List[int]:
    """
    Convert colour names to IDs using lookup dictionary.
    """
    colour_ids = []
    for colour_name in colour_names:
        colour_id = colour_lookup.get(colour_name.lower())
        if colour_id:
            colour_ids.append(colour_id)
        else:
            logger.warning(f"Colour not found in lookup: '{colour_name}'")
    return colour_ids

def convert_to_metres(value, unit='cm') -> Optional[float]:
    """
    Convert dimension values to metres.
    Assumes CSV values are in cm unless specified.
    """
    if pd.isna(value):
        return None
    
    try:
        val = float(value)
        if unit == 'cm':
            return val / 100  # Convert cm to metres
        return val
    except (ValueError, TypeError):
        logger.warning(f"Could not convert dimension value: {value}")
        return None

def prepare_product_data(row) -> Optional[Dict]:
    """
    Transform DataFrame row into API-compatible product data format.
    
    Args:
        row: DataFrame row containing product information
        
    Returns:
        Dictionary with API field names and values, or None if invalid
    """
    # Skip if product name is missing
    product_name = row['Name (productName)']
    if pd.isna(product_name) or str(product_name).strip() == '':
        logger.warning("Skipping product - missing name")
        return None
    
    product_data = {
        "product_parent_name": str(product_name).strip(),
    }
    
    # SKU (use Unique Id, fallback to Parent ID)
    sku = row['SKU (Unique Id)'] if pd.notna(row['SKU (Unique Id)']) else row['SKU (Parent ID) (parent_id)']
    if pd.notna(sku) and str(sku).strip() != '':
        product_data['product_sku'] = str(sku).strip()
    
    # Product Type
    if pd.notna(row['Product Type']):
        product_data['product_type'] = str(row['Product Type']).strip()
    
    # Orientation
    if pd.notna(row['Orientation']):
        product_data['product_orientation'] = str(row['Orientation']).strip()
    
    # Dimensions (convert to metres)
    if pd.notna(row['Weight']):
        product_data['product_weight_kg'] = float(row['Weight'])
    
    height = convert_to_metres(row['Height'])
    if height:
        product_data['product_height_metres'] = height
    
    width = convert_to_metres(row['Width'])
    if width:
        product_data['product_width_metres'] = width
    
    depth = convert_to_metres(row['Depth'])
    if depth:
        product_data['product_depth_metres'] = depth
    
    # Set Size
    if pd.notna(row['Set Size']):
        try:
            product_data['product_set_size'] = int(float(row['Set Size']))
        except (ValueError, TypeError):
            pass
    
    # Keywords
    if pd.notna(row['Keywords']):
        product_data['product_keywords'] = str(row['Keywords']).strip()
    
    # Image URL (store for later upload)
    if pd.notna(row['Unbxd Primary Image URL (productImage)']):
        product_data['product_artwork_url'] = str(row['Unbxd Primary Image URL (productImage)']).strip()
    
    # Foreign Key: Artist
    if pd.notna(row['Artist']):
        artist_name = str(row['Artist']).strip().lower()
        artist_id = artist_lookup.get(artist_name)
        if artist_id:
            product_data['artist'] = artist_id
        else:
            logger.warning(f"Artist not found: '{row['Artist']}'")
    
    # Foreign Key: Campaign
    if pd.notna(row['EEP Campaign Name']):
        campaign_name = str(row['EEP Campaign Name']).strip().lower()
        campaign_id = campaign_lookup.get(campaign_name)
        if campaign_id:
            product_data['campaigns'] = [campaign_id]
        else:
            logger.warning(f"Campaign not found: '{row['EEP Campaign Name']}'")
    
    # Foreign Key: Primary Colours
    primary_colour_names = extract_colours(row['Primary Colour'])
    if primary_colour_names:
        primary_colour_ids = lookup_colour_ids(primary_colour_names)
        if primary_colour_ids:
            product_data['primary_colours'] = primary_colour_ids
    
    # Foreign Key: Secondary Colours
    secondary_colour_names = extract_colours(row['Secondary Colour'])
    if secondary_colour_names:
        secondary_colour_ids = lookup_colour_ids(secondary_colour_names)
        if secondary_colour_ids:
            product_data['secondary_colours'] = secondary_colour_ids
    
    # Foreign Key: EAN Group
    if pd.notna(row['EAN Group']):
        ean_group = str(row['EAN Group']).strip()
        ean_id = ean_lookup.get(ean_group)
        if ean_id:
            product_data['ean_group'] = ean_id
        else:
            logger.warning(f"EAN Group not found: '{ean_group}'")
    
    return product_data

logger.info("Helper functions defined successfully")


2025-10-23 13:27:30,168 - INFO - Helper functions defined successfully


In [36]:
# Prepare all product records
logger.info("Preparing product records...")

product_records = []
skipped_count = 0

for idx, row in parent_products_df.iterrows():
    product_data = prepare_product_data(row)
    if product_data:
        product_records.append(product_data)
    else:
        skipped_count += 1

logger.info(f"Prepared {len(product_records)} product records for upload")
if skipped_count > 0:
    logger.warning(f"Skipped {skipped_count} invalid product records")

# Display sample prepared data
print("\nSample prepared product data (first 3):")
for i, record in enumerate(product_records[:3]):
    print(f"\n{i+1}. {json.dumps(record, indent=2)}")

print(f"\nTotal products to upload: {len(product_records)}")
print(f"Total batches: {(len(product_records) + BATCH_SIZE - 1) // BATCH_SIZE}")

2025-10-23 13:27:32,933 - INFO - Preparing product records...
2025-10-23 13:27:33,519 - INFO - Prepared 8717 product records for upload



Sample prepared product data (first 3):

1. {
  "product_parent_name": "Born To Be Wild",
  "product_sku": "13PRIN001",
  "product_type": "Digital Art Print",
  "product_orientation": "Portrait",
  "product_weight_kg": 0.03,
  "product_height_metres": 0.42,
  "product_width_metres": 0.297,
  "product_depth_metres": 0.0005,
  "product_keywords": "13PRIN001,The13Prints,BornToBeWild,Floral,Typography,Green,",
  "product_artwork_url": "https://pim-assets.unbxd.com/images/5f46a02207c906b25dc972bb16266bfa/1699663218948_683773_source_1682601330.jpg",
  "artist": 1046,
  "campaigns": [
    16
  ],
  "primary_colours": [
    8
  ]
}

2. {
  "product_parent_name": "Born To Be Wild Greetings Card Pack of 6",
  "product_sku": "13PRIN001c",
  "product_type": "Card",
  "product_orientation": "Portrait",
  "product_weight_kg": 0.12,
  "product_height_metres": 0.17,
  "product_width_metres": 0.12,
  "product_depth_metres": 0.03,
  "product_keywords": "13PRIN001c,The 13 Prints,Born To Be Wild,Botanica

In [37]:
# Checkpoint management functions

def save_checkpoint(batch_num: int, successful: List[str], failed: List[Dict]):
    """
    Save current progress to a checkpoint file.
    
    This allows resuming if the process is interrupted.
    
    Args:
        batch_num: Current batch number (last completed)
        successful: List of successfully uploaded product names
        failed: List of failed uploads with error info
    """
    checkpoint_data = {
        'last_batch': batch_num,
        'timestamp': datetime.now().isoformat(),
        'successful_uploads': successful,
        'failed_uploads': failed,
        'total_processed': len(successful) + len(failed)
    }
    
    try:
        with open(CHECKPOINT_FILE, 'w') as f:
            json.dump(checkpoint_data, f, indent=2)
        logger.info(f"Checkpoint saved: batch {batch_num}, {len(successful)} successful, {len(failed)} failed")
    except Exception as e:
        logger.error(f"Failed to save checkpoint: {e}")

def load_checkpoint() -> Optional[Dict]:
    """
    Load checkpoint from file if it exists.
    
    Returns:
        Checkpoint data dict or None if no checkpoint exists
    """
    if not os.path.exists(CHECKPOINT_FILE):
        return None
    
    try:
        with open(CHECKPOINT_FILE, 'r') as f:
            checkpoint = json.load(f)
        logger.info(f"Loaded checkpoint: batch {checkpoint['last_batch']}, "
                   f"{len(checkpoint['successful_uploads'])} successful, "
                   f"{len(checkpoint['failed_uploads'])} failed")
        return checkpoint
    except Exception as e:
        logger.error(f"Failed to load checkpoint: {e}")
        return None

def clear_checkpoint():
    """
    Remove checkpoint file (call when migration completes successfully).
    """
    if os.path.exists(CHECKPOINT_FILE):
        try:
            os.remove(CHECKPOINT_FILE)
            logger.info("Checkpoint file cleared")
        except Exception as e:
            logger.error(f"Failed to clear checkpoint: {e}")

logger.info("Checkpoint functions defined")

# Check for existing checkpoint
existing_checkpoint = load_checkpoint()
if existing_checkpoint:
    print("\n⚠️  EXISTING CHECKPOINT FOUND!")
    print(f"Last batch completed: {existing_checkpoint['last_batch']}")
    print(f"Successful uploads: {len(existing_checkpoint['successful_uploads'])}")
    print(f"Failed uploads: {len(existing_checkpoint['failed_uploads'])}")
    print(f"Timestamp: {existing_checkpoint['timestamp']}")
    print("\nYou can resume from this checkpoint in the upload cell.")
else:
    print("\nNo existing checkpoint found. Will start fresh.")

2025-10-23 13:27:42,635 - INFO - Checkpoint functions defined



No existing checkpoint found. Will start fresh.


In [38]:
# Function to create product via API
def create_product(product_data: Dict) -> Optional[Dict]:
    """
    Create a product record in the Strapi database via API.
    
    Args:
        product_data: Dictionary containing product information
        
    Returns:
        Response data if successful, None if failed
    """
    endpoint = f"{API_BASE_URL}/products"
    
    headers = {
        'Authorization': f'Bearer {API_TOKEN}',
        'Content-Type': 'application/json'
    }
    
    # Prepare payload according to Strapi format
    payload = {
        "data": product_data
    }
    
    try:
        response = requests.post(endpoint, headers=headers, json=payload, timeout=30)
        
        if response.status_code == 200 or response.status_code == 201:
            logger.info(f"Successfully created product: {product_data.get('product_parent_name')}")
            return response.json()
        else:
            logger.error(f"Failed to create product {product_data.get('product_parent_name')}: {response.status_code}")
            logger.error(f"Response: {response.text}")
            return None
            
    except requests.exceptions.RequestException as e:
        logger.error(f"Network error creating product {product_data.get('product_parent_name')}: {e}")
        return None
    except Exception as e:
        logger.error(f"Unexpected error creating product {product_data.get('product_parent_name')}: {e}")
        return None

logger.info("Product creation function defined successfully")

2025-10-23 13:27:46,909 - INFO - Product creation function defined successfully


In [None]:
# Batch upload with checkpoint support
logger.info("Starting product upload process...")

# Check if we should resume from checkpoint
checkpoint = load_checkpoint()
if checkpoint:
    resume_choice = input("\nResume from checkpoint? (yes/no): ").strip().lower()
    if resume_choice == 'yes':
        start_batch = checkpoint['last_batch'] + 1
        successful_uploads = checkpoint['successful_uploads']
        failed_uploads = checkpoint['failed_uploads']
        logger.info(f"Resuming from batch {start_batch}")
    else:
        start_batch = 0
        successful_uploads = []
        failed_uploads = []
        clear_checkpoint()
        logger.info("Starting fresh upload")
else:
    start_batch = 0
    successful_uploads = []
    failed_uploads = []

# Calculate batches
total_products = len(product_records)
total_batches = (total_products + BATCH_SIZE - 1) // BATCH_SIZE

logger.info(f"Total products: {total_products}")
logger.info(f"Batch size: {BATCH_SIZE}")
logger.info(f"Total batches: {total_batches}")
logger.info(f"Starting from batch: {start_batch + 1}")

# Process batches
for batch_idx in range(start_batch, total_batches):
    batch_start = batch_idx * BATCH_SIZE
    batch_end = min(batch_start + BATCH_SIZE, total_products)
    batch = product_records[batch_start:batch_end]
    
    logger.info(f"\n{'='*60}")
    logger.info(f"Processing batch {batch_idx + 1}/{total_batches} (products {batch_start + 1}-{batch_end})")
    logger.info(f"{'='*60}")
    
    batch_successful = 0
    batch_failed = 0
    
    for i, product_data in enumerate(batch, 1):
        product_name = product_data.get('product_parent_name')
        
        logger.info(f"Uploading product {batch_start + i}/{total_products}: {product_name}")
        
        result = create_product(product_data)
        
        if result:
            successful_uploads.append(product_name)
            batch_successful += 1
        else:
            failed_uploads.append({
                'product_name': product_name,
                'batch': batch_idx + 1,
                'data': product_data
            })
            batch_failed += 1
        
        # Small delay between individual uploads
        time.sleep(DELAY_BETWEEN_UPLOADS)
    
    # Save checkpoint after each batch
    save_checkpoint(batch_idx, successful_uploads, failed_uploads)
    
    logger.info(f"Batch {batch_idx + 1} complete: {batch_successful} successful, {batch_failed} failed")
    
    # Delay between batches (except for last batch)
    if batch_idx < total_batches - 1:
        logger.info(f"Waiting {DELAY_BETWEEN_BATCHES} seconds before next batch...")
        time.sleep(DELAY_BETWEEN_BATCHES)

# Clear checkpoint on successful completion
clear_checkpoint()

# Display final summary
print("\n" + "="*60)
print("UPLOAD SUMMARY")
print("="*60)
print(f"Total products processed: {total_products}")
print(f"Successful uploads: {len(successful_uploads)}")
print(f"Failed uploads: {len(failed_uploads)}")
print(f"Success rate: {(len(successful_uploads)/total_products*100):.2f}%")

if failed_uploads:
    print(f"\nFailed products ({len(failed_uploads)}):")
    for failure in failed_uploads[:20]:  # Show first 20
        print(f"  - {failure['product_name']} (batch {failure['batch']})")
    if len(failed_uploads) > 20:
        print(f"  ... and {len(failed_uploads) - 20} more")
    
    # Save failed uploads to a separate file for review
    failed_file = 'failed_products.json'
    with open(failed_file, 'w') as f:
        json.dump(failed_uploads, f, indent=2)
    print(f"\nFailed uploads saved to: {failed_file}")

logger.info("Product upload process completed")

2025-10-23 13:27:50,062 - INFO - Starting product upload process...
2025-10-23 13:27:50,063 - INFO - Total products: 8717
2025-10-23 13:27:50,064 - INFO - Batch size: 50
2025-10-23 13:27:50,064 - INFO - Total batches: 175
2025-10-23 13:27:50,065 - INFO - Starting from batch: 1
2025-10-23 13:27:50,065 - INFO - 
2025-10-23 13:27:50,066 - INFO - Processing batch 1/175 (products 1-50)
2025-10-23 13:27:50,066 - INFO - Uploading product 1/8717: Born To Be Wild
2025-10-23 13:27:51,719 - INFO - Successfully created product: Born To Be Wild
2025-10-23 13:27:52,225 - INFO - Uploading product 2/8717: Born To Be Wild Greetings Card Pack of 6
2025-10-23 13:27:52,368 - ERROR - Failed to create product Born To Be Wild Greetings Card Pack of 6: 400
2025-10-23 13:27:52,369 - ERROR - Response: {"data":null,"error":{"status":400,"name":"ValidationError","message":"Invalid key ean_group","details":{"key":"ean_group","source":"body"}}}
2025-10-23 13:27:52,874 - INFO - Uploading product 3/8717: Cheeky Monke

## Summary and Recommendations

### What was done:
1. ✅ Loaded CSV data containing product information
2. ✅ Filtered for parent products only (excluded variants with hyphens or Frame/Print Size)
3. ✅ Built lookup dictionaries from API for foreign key relationships
4. ✅ Transformed CSV data to match API schema with foreign key lookups
5. ✅ Mapped CSV columns to API field names:
   - `Name` → `product_parent_name`
   - `SKU` → `product_sku`
   - `Artist` → `artist` (foreign key)
   - `EEP Campaign Name` → `campaigns` (foreign key array)
   - `Primary Colour` → `primary_colours` (foreign key array)
   - `Secondary Colour` → `secondary_colours` (foreign key array)
   - `EAN Group` → `ean_group` (foreign key)
   - `Unbxd Primary Image URL` → `product_artwork_url`
6. ✅ Uploaded products in batches with checkpoint support

### Key Features:
- **Parent/Variant Filtering**: Dual method (hyphen check + Frame/Print Size check)
- **Foreign Key Lookups**: Pre-loaded dictionaries for fast relationship mapping
- **Multi-colour Handling**: Splits and processes multiple colours per field
- **Batch Processing**: 50 products per batch with 2-second delays
- **Checkpoint System**: Saves progress after each batch, can resume if interrupted
- **Error Tracking**: Saves failed uploads to JSON file for review
- **Dimension Conversion**: Converts cm to metres automatically
- **Comprehensive Logging**: Tracks all operations with timestamps

### Checkpoint System Explained:
- **Auto-save**: Progress saved after every batch (50 products)
- **Resume**: Can restart from last completed batch if interrupted
- **File**: `product_migration_checkpoint.json` stores progress
- **Contents**: Batch number, successful uploads, failed uploads, timestamp
- **Cleanup**: Automatically deleted on successful completion

### Data Quality Notes:
- Parent products identified by lack of hyphens AND no Frame/Print Size
- Multi-colour fields split on: comma, slash, ampersand, "and"
- Generic colour terms excluded (multicoloured, various, etc.)
- Missing foreign keys logged as warnings but don't block upload
- Dimensions converted from cm to metres

### Next Steps:
1. **Review Failed Uploads**: Check `failed_products.json` for issues
2. **Image Migration**: Create script to download and upload images from `product_artwork_url`
3. **Variant Migration**: Create separate notebook for product variants
4. **Relationship Verification**: Query API to verify all relationships created correctly
5. **Data Validation**: Check random sample of products in Strapi admin

### Future Improvements:
1. **Duplicate Checking**: Query existing products before upload to avoid duplicates
2. **Update Capability**: Add logic to update existing products instead of only creating
3. **Retry Logic**: Implement automatic retry for failed uploads with exponential backoff
4. **Parallel Processing**: Upload multiple batches concurrently (if API supports it)
5. **Image Integration**: Download and upload images during product creation
6. **Data Validation**: Pre-upload validation for required fields and data types
7. **Export Results**: Generate detailed CSV report of upload results
8. **Relationship Verification**: Auto-verify foreign keys before upload

### Notes on Missing Data:
- Products without artists/campaigns/colours still upload (relationships optional)
- Image URLs stored for later batch image upload
- Product variants excluded from this migration (separate notebook needed)
- EAN Groups without matches logged but don't block upload