# EAN Data Migration to Strapi API

This notebook processes EAN (European Article Number) group data from CSV and uploads it to the Strapi database via API.

## Important Schema Change:
The EAN schema has been updated to use **EAN Groups** instead of individual EAN numbers. Each unique EAN Group represents a collection of related products.

## Process:
1. Load CSV data with EAN Group column
2. Extract unique EAN Groups
3. Determine EAN type (EAN-13 or EAN-8) as default
4. Clean and validate data
5. Upload to API endpoint
6. Log results and handle errors

## Field Mapping:
- CSV: `EAN Group` → API: `ean_group`
- Default: `ean_type` = "EAN-13" (can be set as needed)

In [1]:
# Import required libraries
import pandas as pd
import numpy as np
import requests
import json
import os
from dotenv import load_dotenv
import logging
from typing import Dict, List, Optional
import time
import re

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# Load environment variables
load_dotenv()

# API Configuration
API_BASE_URL = os.getenv('API_BASE_URL')
API_TOKEN = os.getenv('API_TOKEN')

if not API_BASE_URL or not API_TOKEN:
    raise ValueError("API_BASE_URL and API_TOKEN must be set in .env file")

logger.info(f"API Base URL: {API_BASE_URL}")
logger.info("API Token loaded successfully")

2025-10-23 10:43:11,976 - INFO - API Base URL: http://localhost:1337/api
2025-10-23 10:43:11,976 - INFO - API Token loaded successfully


In [2]:
# Load CSV data
csv_file_path = 'data/All Product Data Back Up 070825.csv'

logger.info(f"Loading CSV file: {csv_file_path}")

try:
    # Read CSV file - only load EAN Group column
    df = pd.read_csv(
        csv_file_path,
        usecols=['EAN Group']
    )
    logger.info(f"Successfully loaded {len(df)} rows from CSV")
    logger.info(f"Columns: {df.columns.tolist()}")
    
    # Display first few rows
    print("\nFirst 10 rows of data:")
    print(df.head(10))
    
    # Check data info
    print("\nData info:")
    print(df.info())
    
    # Check for EAN Group entries
    print("\nSample EAN Group entries:")
    for idx, row in df.head(20).iterrows():
        group = row['EAN Group']
        if pd.notna(group):
            print(f"Row {idx}: EAN Group='{group}'")
    
except Exception as e:
    logger.error(f"Error loading CSV file: {e}")
    raise

2025-10-23 10:43:15,212 - INFO - Loading CSV file: data/All Product Data Back Up 070825.csv
2025-10-23 10:43:24,049 - INFO - Successfully loaded 384349 rows from CSV
2025-10-23 10:43:24,050 - INFO - Columns: ['EAN Group']



First 10 rows of data:
    EAN Group
0         NaN
1  50566943.0
2  50566943.0
3  50566943.0
4  50566943.0
5  50566943.0
6  50566943.0
7  50566943.0
8  50566943.0
9  50566943.0

Data info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 384349 entries, 0 to 384348
Data columns (total 1 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   EAN Group  366674 non-null  float64
dtypes: float64(1)
memory usage: 2.9 MB
None

Sample EAN Group entries:
Row 1: EAN Group='50566943.0'
Row 2: EAN Group='50566943.0'
Row 3: EAN Group='50566943.0'
Row 4: EAN Group='50566943.0'
Row 5: EAN Group='50566943.0'
Row 6: EAN Group='50566943.0'
Row 7: EAN Group='50566943.0'
Row 8: EAN Group='50566943.0'
Row 9: EAN Group='50566943.0'
Row 10: EAN Group='50566943.0'
Row 11: EAN Group='50566943.0'
Row 12: EAN Group='50566943.0'
Row 13: EAN Group='50566943.0'
Row 14: EAN Group='50566943.0'
Row 15: EAN Group='50566943.0'
Row 16: EAN Group='50566943.0'
Row 17: EAN Group=

In [3]:
# Process and clean EAN Group data
logger.info("Processing EAN Group data...")

# Remove rows where EAN Group is null or empty
df_clean = df.dropna(subset=['EAN Group'])
df_clean = df_clean[df_clean['EAN Group'].astype(str).str.strip() != '']
df_clean = df_clean[df_clean['EAN Group'].astype(str).str.lower() != 'nan']

logger.info(f"After removing nulls/empties: {len(df_clean)} rows")

# Clean EAN Group values
df_clean['EAN_Group_cleaned'] = df_clean['EAN Group'].apply(lambda x: str(x).strip() if pd.notna(x) else None)

# Get unique EAN Groups
unique_ean_groups_df = df_clean[['EAN_Group_cleaned']].drop_duplicates().reset_index(drop=True)

logger.info(f"Found {len(unique_ean_groups_df)} unique EAN Groups")

# Display the processed data
print("\nUnique EAN Groups:")
print(unique_ean_groups_df)
print(f"\nTotal unique EAN Groups: {len(unique_ean_groups_df)}")

2025-10-23 10:43:24,066 - INFO - Processing EAN Group data...
2025-10-23 10:43:24,345 - INFO - After removing nulls/empties: 366674 rows
2025-10-23 10:43:24,538 - INFO - Found 8 unique EAN Groups



Unique EAN Groups:
  EAN_Group_cleaned
0        50566943.0
1         5063317.0
2         5063172.0
3        50565309.0
4         5063160.0
5       506047657.0
6       506054374.0
7         5063159.0

Total unique EAN Groups: 8


In [4]:
# Set default EAN type for all groups
# According to the schema, ean_type is an enumeration with options: "EAN-13" or "EAN-8"
# Default is "EAN-13"

DEFAULT_EAN_TYPE = "EAN-13"

unique_ean_groups_df['ean_type'] = DEFAULT_EAN_TYPE

# Show results
print("EAN Type Configuration:")
print(f"All EAN Groups will be set to: {DEFAULT_EAN_TYPE}")

# Show some examples
print("\nSample EAN Group data with type:")
sample_df = unique_ean_groups_df.head(15)
print(sample_df)

print(f"\n✅ {len(unique_ean_groups_df)} EAN Groups ready for upload")

EAN Type Configuration:
All EAN Groups will be set to: EAN-13

Sample EAN Group data with type:
  EAN_Group_cleaned ean_type
0        50566943.0   EAN-13
1         5063317.0   EAN-13
2         5063172.0   EAN-13
3        50565309.0   EAN-13
4         5063160.0   EAN-13
5       506047657.0   EAN-13
6       506054374.0   EAN-13
7         5063159.0   EAN-13

✅ 8 EAN Groups ready for upload


In [5]:
# Function to create EAN via API
def create_ean(ean_data: Dict) -> Optional[Dict]:
    """
    Create an EAN record in the Strapi database via API.
    
    Args:
        ean_data: Dictionary containing EAN information
        
    Returns:
        Response data if successful, None if failed
    """
    endpoint = f"{API_BASE_URL}/eans"
    
    headers = {
        'Authorization': f'Bearer {API_TOKEN}',
        'Content-Type': 'application/json'
    }
    
    # Prepare payload according to Strapi format
    payload = {
        "data": ean_data
    }
    
    try:
        response = requests.post(endpoint, headers=headers, json=payload, timeout=30)
        
        if response.status_code == 200 or response.status_code == 201:
            logger.info(f"Successfully created EAN Group: {ean_data.get('ean_group')}")
            return response.json()
        else:
            logger.error(f"Failed to create EAN Group {ean_data.get('ean_group')}: {response.status_code}")
            logger.error(f"Response: {response.text}")
            return None
            
    except requests.exceptions.RequestException as e:
        logger.error(f"Network error creating EAN Group {ean_data.get('ean_group')}: {e}")
        return None
    except Exception as e:
        logger.error(f"Unexpected error creating EAN Group {ean_data.get('ean_group')}: {e}")
        return None

# Test function is defined
logger.info("EAN creation function defined successfully")

2025-10-23 10:43:52,688 - INFO - EAN creation function defined successfully


In [6]:
# Prepare EAN data for API upload
def prepare_ean_data(row) -> Optional[Dict]:
    """
    Transform DataFrame row into API-compatible EAN data format.
    
    Args:
        row: DataFrame row containing EAN Group information
        
    Returns:
        Dictionary with API field names and cleaned values, or None if invalid
    """
    # Skip if EAN Group is invalid
    if pd.isna(row['EAN_Group_cleaned']):
        logger.warning(f"Skipping EAN Group - invalid data")
        return None
    
    ean_data = {
        "ean_group": str(row['EAN_Group_cleaned']).strip(),
        "ean_type": row['ean_type']
    }
    
    return ean_data

# Prepare all EAN records
ean_records = []
skipped_count = 0

for idx, row in unique_ean_groups_df.iterrows():
    ean_data = prepare_ean_data(row)
    if ean_data:
        ean_records.append(ean_data)
    else:
        skipped_count += 1

logger.info(f"Prepared {len(ean_records)} EAN Group records for upload")
if skipped_count > 0:
    logger.warning(f"Skipped {skipped_count} invalid EAN Group records")

# Display sample prepared data
print("\nSample prepared EAN Group data (first 10):")
for i, record in enumerate(ean_records[:10]):
    print(f"\n{i+1}. {json.dumps(record, indent=2)}")

print(f"\nTotal EAN Groups to upload: {len(ean_records)}")
print(f"Skipped (invalid): {skipped_count}")

# Count by type
ean_8_count = sum(1 for r in ean_records if r['ean_type'] == 'EAN-8')
ean_13_count = sum(1 for r in ean_records if r['ean_type'] == 'EAN-13')
print(f"\nEAN-8: {ean_8_count}")
print(f"EAN-13: {ean_13_count}")

2025-10-23 10:44:27,815 - INFO - Prepared 8 EAN Group records for upload



Sample prepared EAN Group data (first 10):

1. {
  "ean_group": "50566943.0",
  "ean_type": "EAN-13"
}

2. {
  "ean_group": "5063317.0",
  "ean_type": "EAN-13"
}

3. {
  "ean_group": "5063172.0",
  "ean_type": "EAN-13"
}

4. {
  "ean_group": "50565309.0",
  "ean_type": "EAN-13"
}

5. {
  "ean_group": "5063160.0",
  "ean_type": "EAN-13"
}

6. {
  "ean_group": "506047657.0",
  "ean_type": "EAN-13"
}

7. {
  "ean_group": "506054374.0",
  "ean_type": "EAN-13"
}

8. {
  "ean_group": "5063159.0",
  "ean_type": "EAN-13"
}

Total EAN Groups to upload: 8
Skipped (invalid): 0

EAN-8: 0
EAN-13: 8


In [7]:
# Upload EAN Groups to API
logger.info("Starting EAN Group upload process...")

# Track results
successful_uploads = []
failed_uploads = []
upload_results = {
    'total': len(ean_records),
    'successful': 0,
    'failed': 0,
    'errors': []
}

# Upload each EAN Group with a small delay to avoid overwhelming the API
for i, ean_data in enumerate(ean_records, 1):
    ean_group = ean_data.get('ean_group')
    
    logger.info(f"Uploading EAN Group {i}/{len(ean_records)}: {ean_group}")
    
    result = create_ean(ean_data)
    
    if result:
        successful_uploads.append(ean_group)
        upload_results['successful'] += 1
    else:
        failed_uploads.append(ean_group)
        upload_results['failed'] += 1
        upload_results['errors'].append(f"Failed to upload: {ean_group}")
    
    # Small delay to avoid rate limiting (adjust as needed)
    time.sleep(0.5)

# Display results summary
print("\n" + "="*60)
print("UPLOAD SUMMARY")
print("="*60)
print(f"Total EAN Groups processed: {upload_results['total']}")
print(f"Successful uploads: {upload_results['successful']}")
print(f"Failed uploads: {upload_results['failed']}")
print(f"Success rate: {(upload_results['successful']/upload_results['total']*100):.2f}%")

if failed_uploads:
    print(f"\nFailed EAN Groups ({len(failed_uploads)}):")
    for ean in failed_uploads[:10]:  # Show first 10
        print(f"  - {ean}")
    if len(failed_uploads) > 10:
        print(f"  ... and {len(failed_uploads) - 10} more")

logger.info("EAN Group upload process completed")

2025-10-23 10:44:32,991 - INFO - Starting EAN Group upload process...
2025-10-23 10:44:32,992 - INFO - Uploading EAN Group 1/8: 50566943.0
2025-10-23 10:44:33,674 - INFO - Successfully created EAN Group: 50566943.0
2025-10-23 10:44:34,181 - INFO - Uploading EAN Group 2/8: 5063317.0
2025-10-23 10:44:34,589 - INFO - Successfully created EAN Group: 5063317.0
2025-10-23 10:44:35,095 - INFO - Uploading EAN Group 3/8: 5063172.0
2025-10-23 10:44:35,451 - INFO - Successfully created EAN Group: 5063172.0
2025-10-23 10:44:35,957 - INFO - Uploading EAN Group 4/8: 50565309.0
2025-10-23 10:44:36,353 - INFO - Successfully created EAN Group: 50565309.0
2025-10-23 10:44:36,856 - INFO - Uploading EAN Group 5/8: 5063160.0
2025-10-23 10:44:37,243 - INFO - Successfully created EAN Group: 5063160.0
2025-10-23 10:44:37,749 - INFO - Uploading EAN Group 6/8: 506047657.0
2025-10-23 10:44:38,107 - INFO - Successfully created EAN Group: 506047657.0
2025-10-23 10:44:38,616 - INFO - Uploading EAN Group 7/8: 506054


UPLOAD SUMMARY
Total EAN Groups processed: 8
Successful uploads: 8
Failed uploads: 0
Success rate: 100.00%
