# Campaign Data Migration to Strapi API

This notebook processes campaign data from CSV and uploads it to the Strapi database via API.

## Process:
1. Load CSV data containing campaign information
2. Extract unique campaigns with their launch dates
3. Clean and validate data
4. Convert dates to proper format
5. Upload to API endpoint
6. Log results and handle errors

In [15]:
# Import required libraries
import pandas as pd
import numpy as np
import requests
import json
import os
from dotenv import load_dotenv
import logging
from typing import Dict, List, Optional
import time
from datetime import datetime

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# Load environment variables
load_dotenv()

# API Configuration
API_BASE_URL = os.getenv('API_BASE_URL')
API_TOKEN = os.getenv('API_TOKEN')

if not API_BASE_URL or not API_TOKEN:
    raise ValueError("API_BASE_URL and API_TOKEN must be set in .env file")

logger.info(f"API Base URL: {API_BASE_URL}")
logger.info("API Token loaded successfully")

2025-10-20 14:11:38,246 - INFO - API Base URL: https://renowned-flowers-a42cef227c.strapiapp.com/api
2025-10-20 14:11:38,247 - INFO - API Token loaded successfully


In [16]:
# Load CSV data
csv_file_path = 'data/All Product Data Back Up 070825.csv'

logger.info(f"Loading CSV file: {csv_file_path}")

try:
    # Read CSV file - only load relevant columns for efficiency
    df = pd.read_csv(
        csv_file_path,
        usecols=['EEP Campaign Name', 'EEP Campaign Launch Date']
    )
    logger.info(f"Successfully loaded {len(df)} rows from CSV")
    logger.info(f"Columns: {df.columns.tolist()}")
    
    # Display first few rows
    print("\nFirst 5 rows of data:")
    print(df.head())
    
    # Check data info
    print("\nData info:")
    print(df.info())
    
except Exception as e:
    logger.error(f"Error loading CSV file: {e}")
    raise

2025-10-20 14:11:40,365 - INFO - Loading CSV file: data/All Product Data Back Up 070825.csv
2025-10-20 14:11:49,334 - INFO - Successfully loaded 384349 rows from CSV
2025-10-20 14:11:49,335 - INFO - Columns: ['EEP Campaign Launch Date', 'EEP Campaign Name']



First 5 rows of data:
  EEP Campaign Launch Date        EEP Campaign Name
0                   May-23  Fun - Colour Your Walls
1                   May-23  Fun - Colour Your Walls
2                   May-23  Fun - Colour Your Walls
3                   May-23  Fun - Colour Your Walls
4                   May-23  Fun - Colour Your Walls

Data info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 384349 entries, 0 to 384348
Data columns (total 2 columns):
 #   Column                    Non-Null Count   Dtype 
---  ------                    --------------   ----- 
 0   EEP Campaign Launch Date  384238 non-null  object
 1   EEP Campaign Name         384328 non-null  object
dtypes: object(2)
memory usage: 5.9+ MB
None


In [17]:
# Process and clean data to get unique campaigns
logger.info("Processing campaign data...")

# Remove rows where both campaign name and date are null
df_clean = df.dropna(subset=['EEP Campaign Name'], how='all')

# Remove rows with empty campaign names
df_clean = df_clean[df_clean['EEP Campaign Name'].notna()]
df_clean = df_clean[df_clean['EEP Campaign Name'].astype(str).str.strip() != '']

logger.info(f"After removing nulls/empties: {len(df_clean)} rows")

# Get unique campaigns
# Group by campaign name and get the first occurrence of launch date
# (assuming launch date is consistent per campaign)
unique_campaigns_df = df_clean.groupby('EEP Campaign Name', as_index=False).first()

logger.info(f"Found {len(unique_campaigns_df)} unique campaigns")

# Display the processed data
print("\nUnique Campaigns Data:")
print(unique_campaigns_df)
print(f"\nTotal unique campaigns: {len(unique_campaigns_df)}")

# Check for any missing launch dates
missing_dates = unique_campaigns_df['EEP Campaign Launch Date'].isna().sum()
print(f"\nMissing launch dates: {missing_dates}")

# Show sample of date formats
print("\nSample launch dates:")
print(unique_campaigns_df['EEP Campaign Launch Date'].head(10))

2025-10-20 14:11:49,367 - INFO - Processing campaign data...
2025-10-20 14:11:49,460 - INFO - After removing nulls/empties: 384328 rows
2025-10-20 14:11:49,483 - INFO - Found 54 unique campaigns



Unique Campaigns Data:
           EEP Campaign Name EEP Campaign Launch Date
0                  Astrology                   Oct-20
1                Autumn Hues             23rd October
2                      Bloom                  Unknown
3         Bloomin' Beautiful                   Mar-24
4          Christmas Gifting                   Nov-20
5              Cities & Maps                   May-21
6                  Feel Good                   May-22
7    Fun - Colour Your Walls                   May-23
8            Generic Uploads                   Jan-21
9              Great Indoors                   Jan-21
10            Great Outdoors                   Jan-21
11               Happy House                  Unknown
12                      Haus                  Unknown
13              Hello Spring                   Mar-22
14        Here Comes The Sun            29th May 2025
15                Industrial                   Nov-22
16      January 2025 - Cards                   Jan-25
17  

In [18]:
# Convert date format to ISO 8601 format required by Strapi
def parse_date(date_value):
    """
    Parse various date formats and convert to ISO 8601 format (YYYY-MM-DD).
    For partial dates (e.g., only month/year), use the first day of that month.
    
    Handles formats like:
    - Jun-20, Jan-21 (abbreviated month - 2-digit year)
    - January 2024, Jan 2024 (full/abbreviated month - 4-digit year)
    - 01/2024, 2024-01 (numeric formats)
    
    Args:
        date_value: Date value in various formats
        
    Returns:
        ISO 8601 formatted date string or None
    """
    if pd.isna(date_value):
        return None
    
    # Convert to string and strip whitespace
    date_str = str(date_value).strip()
    
    if date_str == '' or date_str.lower() == 'nan':
        return None
    
    try:
        # Try parsing the date - pandas will handle various formats
        parsed_date = pd.to_datetime(date_value, errors='coerce')
        
        if pd.notna(parsed_date):
            # Return in ISO 8601 format (YYYY-MM-DD)
            return parsed_date.strftime('%Y-%m-%d')
        
        # If standard parsing failed, try to handle partial dates manually
        # Try to parse as month/year in various formats
        try:
            # Format list ordered by likelihood:
            # %b-%y = "Jun-20", "Jan-21" (abbreviated month - 2-digit year)
            # %B-%y = "June-20", "January-21" (full month - 2-digit year)
            # %b %y = "Jun 20", "Jan 21" (abbreviated month space 2-digit year)
            # %B %Y = "January 2024" (full month - 4-digit year)
            # %b %Y = "Jan 2024" (abbreviated month - 4-digit year)
            # %m/%Y = "01/2024" (numeric month/year)
            # %m/%y = "01/20" (numeric month/2-digit year)
            # %Y-%m = "2024-01" (ISO-style year-month)
            # %m-%Y = "01-2024" (month-year with dash)
            
            for fmt in ['%b-%y', '%B-%y', '%b %y', '%B %Y', '%b %Y', '%m/%Y', '%m/%y', '%Y-%m', '%m-%Y']:
                try:
                    parsed_date = datetime.strptime(date_str, fmt)
                    # Use first day of the month
                    return parsed_date.replace(day=1).strftime('%Y-%m-%d')
                except ValueError:
                    continue
            
            # Try just year (e.g., "2024")
            if date_str.isdigit() and len(date_str) == 4:
                year = int(date_str)
                if 1900 <= year <= 2100:  # Reasonable year range
                    return f"{year}-01-01"
            
            logger.warning(f"Could not parse date: {date_value}")
            return None
            
        except Exception as e:
            logger.warning(f"Error parsing partial date '{date_value}': {e}")
            return None
            
    except Exception as e:
        logger.error(f"Error parsing date '{date_value}': {e}")
        return None

# Apply date parsing to the launch date column
unique_campaigns_df['campaign_launch_date_formatted'] = unique_campaigns_df['EEP Campaign Launch Date'].apply(parse_date)

# Show conversion results
print("Date conversion results:")
print("\nSample of original vs formatted dates:")
sample_df = unique_campaigns_df[['EEP Campaign Name', 'EEP Campaign Launch Date', 'campaign_launch_date_formatted']].head(10)
print(sample_df)

# Count successful conversions
successful_conversions = unique_campaigns_df['campaign_launch_date_formatted'].notna().sum()
print(f"\nSuccessfully converted: {successful_conversions}/{len(unique_campaigns_df)} dates")

# Show any failed conversions
failed_df = unique_campaigns_df[unique_campaigns_df['campaign_launch_date_formatted'].isna() & 
                                 unique_campaigns_df['EEP Campaign Launch Date'].notna()]
if len(failed_df) > 0:
    print(f"\nFailed to convert {len(failed_df)} dates:")
    print(failed_df[['EEP Campaign Name', 'EEP Campaign Launch Date']])
    
# Show examples of partial date conversions
print("\nExamples of date conversions:")
for idx, row in unique_campaigns_df.head(15).iterrows():
    original = row['EEP Campaign Launch Date']
    formatted = row['campaign_launch_date_formatted']
    if pd.notna(original):
        print(f"  {str(original):30} → {formatted}")

  parsed_date = pd.to_datetime(date_value, errors='coerce')


Date conversion results:

Sample of original vs formatted dates:
         EEP Campaign Name EEP Campaign Launch Date  \
0                Astrology                   Oct-20   
1              Autumn Hues             23rd October   
2                    Bloom                  Unknown   
3       Bloomin' Beautiful                   Mar-24   
4        Christmas Gifting                   Nov-20   
5            Cities & Maps                   May-21   
6                Feel Good                   May-22   
7  Fun - Colour Your Walls                   May-23   
8          Generic Uploads                   Jan-21   
9            Great Indoors                   Jan-21   

  campaign_launch_date_formatted  
0                     2020-10-01  
1                           None  
2                           None  
3                     2024-03-01  
4                     2020-11-01  
5                     2021-05-01  
6                     2022-05-01  
7                     2023-05-01  
8             

In [None]:
# Function to create campaign via API
def create_campaign(campaign_data: Dict) -> Optional[Dict]:
    """
    Create a campaign record in the Strapi database via API.
    
    Args:
        campaign_data: Dictionary containing campaign information
        
    Returns:
        Response data if successful, None if failed
    """
    endpoint = f"{API_BASE_URL}/campaigns"
    
    headers = {
        'Authorization': f'Bearer {API_TOKEN}',
        'Content-Type': 'application/json'
    }
    
    # Prepare payload according to Strapi format
    payload = {
        "data": campaign_data
    }
    
    try:
        response = requests.post(endpoint, headers=headers, json=payload, timeout=30)
        
        if response.status_code == 200 or response.status_code == 201:
            logger.info(f"Successfully created campaign: {campaign_data.get('campaign_name')}")
            return response.json()
        else:
            logger.error(f"Failed to create campaign {campaign_data.get('campaign_name')}: {response.status_code}")
            logger.error(f"Response: {response.text}")
            return None
            
    except requests.exceptions.RequestException as e:
        logger.error(f"Network error creating campaign {campaign_data.get('campaign_name')}: {e}")
        return None
    except Exception as e:
        logger.error(f"Unexpected error creating campaign {campaign_data.get('campaign_name')}: {e}")
        return None

# Test function is defined
logger.info("Campaign creation function defined successfully")

In [None]:
# Prepare campaign data for API upload
def prepare_campaign_data(row) -> Dict:
    """
    Transform DataFrame row into API-compatible campaign data format.
    
    Args:
        row: DataFrame row containing campaign information
        
    Returns:
        Dictionary with API field names and cleaned values
    """
    campaign_data = {
        "campaign_name": str(row['EEP Campaign Name']).strip(),
        "campaign_launch_date": row['campaign_launch_date_formatted']
    }
    
    return campaign_data

# Prepare all campaign records
campaign_records = []
for idx, row in unique_campaigns_df.iterrows():
    campaign_data = prepare_campaign_data(row)
    campaign_records.append(campaign_data)

logger.info(f"Prepared {len(campaign_records)} campaign records for upload")

# Display sample prepared data
print("\nSample prepared campaign data (first 5):")
for i, record in enumerate(campaign_records[:5]):
    print(f"\n{i+1}. {json.dumps(record, indent=2)}")

# Count campaigns with and without dates
with_dates = sum(1 for r in campaign_records if r['campaign_launch_date'] is not None)
without_dates = len(campaign_records) - with_dates
print(f"\nCampaigns with launch dates: {with_dates}")
print(f"Campaigns without launch dates: {without_dates}")

In [None]:
# Upload campaigns to API
logger.info("Starting campaign upload process...")

# Track results
successful_uploads = []
failed_uploads = []
upload_results = {
    'total': len(campaign_records),
    'successful': 0,
    'failed': 0,
    'errors': []
}

# Upload each campaign with a small delay to avoid overwhelming the API
for i, campaign_data in enumerate(campaign_records, 1):
    campaign_name = campaign_data.get('campaign_name')
    
    logger.info(f"Uploading campaign {i}/{len(campaign_records)}: {campaign_name}")
    
    result = create_campaign(campaign_data)
    
    if result:
        successful_uploads.append(campaign_name)
        upload_results['successful'] += 1
    else:
        failed_uploads.append(campaign_name)
        upload_results['failed'] += 1
        upload_results['errors'].append(f"Failed to upload: {campaign_name}")
    
    # Small delay to avoid rate limiting (adjust as needed)
    time.sleep(0.5)

# Display results summary
print("\n" + "="*60)
print("UPLOAD SUMMARY")
print("="*60)
print(f"Total campaigns processed: {upload_results['total']}")
print(f"Successful uploads: {upload_results['successful']}")
print(f"Failed uploads: {upload_results['failed']}")
print(f"Success rate: {(upload_results['successful']/upload_results['total']*100):.2f}%")

if failed_uploads:
    print(f"\nFailed campaigns ({len(failed_uploads)}):")
    for campaign in failed_uploads[:10]:  # Show first 10
        print(f"  - {campaign}")
    if len(failed_uploads) > 10:
        print(f"  ... and {len(failed_uploads) - 10} more")

logger.info("Campaign upload process completed")

## Summary and Recommendations

### What was done:
1. ✅ Loaded CSV data containing campaign information
2. ✅ Extracted unique campaign names with their launch dates
3. ✅ Cleaned and validated data (removed nulls and empty values)
4. ✅ Converted dates to ISO 8601 format (YYYY-MM-DD) required by Strapi
5. ✅ Mapped CSV columns to API field names:
   - `EEP Campaign Name` → `campaign_name`
   - `EEP Campaign Launch Date` → `campaign_launch_date`
6. ✅ Uploaded campaigns to Strapi API with error handling and logging

### Key Features:
- Robust error handling for network issues and API errors
- Date parsing to handle various date formats
- Comprehensive logging for tracking progress
- Environment variables for sensitive credentials
- Rate limiting to avoid overwhelming the API
- Detailed success/failure reporting

### Data Quality Notes:
- Grouped by campaign name to get unique campaigns
- Handled campaigns with missing launch dates (set to null)
- Validated date formats and converted to ISO 8601 standard

### Future Improvements:
1. **Duplicate checking**: Query existing campaigns before upload to avoid duplicates
2. **Update capability**: Add logic to update existing campaigns instead of only creating new ones
3. **Batch operations**: If API supports it, batch uploads for better performance
4. **Retry logic**: Implement automatic retry for failed uploads with exponential backoff
5. **Data validation**: Add pre-upload validation for campaign names (length, special characters, etc.)
6. **Export results**: Save upload results to a CSV file for record-keeping
7. **Date validation**: Add validation for future dates or unrealistic date ranges