# Artist Data Migration to Strapi API

This notebook processes artist data from CSV and uploads it to the Strapi database via API.

## Process:
1. Load CSV data with artist information
2. Extract unique artists with their royalty rates
3. Clean and prepare data
4. Upload to API endpoint
5. Log results and handle errors

Collecting dotenv
  Downloading dotenv-0.9.9-py2.py3-none-any.whl.metadata (279 bytes)
Collecting python-dotenv (from dotenv)
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)
Downloading dotenv-0.9.9-py2.py3-none-any.whl (1.9 kB)
Downloading python_dotenv-1.1.1-py3-none-any.whl (20 kB)
Installing collected packages: python-dotenv, dotenv
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2/2[0m [dotenv]
[1A[2KSuccessfully installed dotenv-0.9.9 python-dotenv-1.1.1


In [4]:
# Import required libraries
import pandas as pd
import numpy as np
import requests
import json
import os
from dotenv import load_dotenv
import logging
from typing import Dict, List, Optional
import time

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# Load environment variables
load_dotenv()

# API Configuration
API_BASE_URL = os.getenv('API_BASE_URL')
API_TOKEN = os.getenv('API_TOKEN')

if not API_BASE_URL or not API_TOKEN:
    raise ValueError("API_BASE_URL and API_TOKEN must be set in .env file")

logger.info(f"API Base URL: {API_BASE_URL}")
logger.info("API Token loaded successfully")

2025-10-20 11:14:12,129 - INFO - API Base URL: https://renowned-flowers-a42cef227c.strapiapp.com/api
2025-10-20 11:14:12,130 - INFO - API Token loaded successfully


In [5]:
# Load CSV data
csv_file_path = 'data/All Product Data Back Up 070825.csv'

logger.info(f"Loading CSV file: {csv_file_path}")

try:
    # Read CSV file - only load relevant columns for efficiency
    df = pd.read_csv(
        csv_file_path,
        usecols=['Artist', 'Artist Royalty - Retail', 'Artist Royalty - Wholesale', 'Artist Royalty - Interiors']
    )
    logger.info(f"Successfully loaded {len(df)} rows from CSV")
    logger.info(f"Columns: {df.columns.tolist()}")
    
    # Display first few rows
    print("\nFirst 5 rows of data:")
    print(df.head())
    
except Exception as e:
    logger.error(f"Error loading CSV file: {e}")
    raise

2025-10-20 11:14:28,314 - INFO - Loading CSV file: data/All Product Data Back Up 070825.csv
2025-10-20 11:14:37,595 - INFO - Successfully loaded 384349 rows from CSV
2025-10-20 11:14:37,597 - INFO - Columns: ['Artist', 'Artist Royalty - Retail', 'Artist Royalty - Wholesale', 'Artist Royalty - Interiors']



First 5 rows of data:
          Artist  Artist Royalty - Retail  Artist Royalty - Wholesale  \
0  The 13 Prints                     3.99                         1.1   
1  The 13 Prints                     3.99                         1.1   
2  The 13 Prints                     3.99                         1.1   
3  The 13 Prints                     3.99                         1.1   
4  The 13 Prints                     3.99                         1.1   

   Artist Royalty - Interiors  
0                         1.4  
1                         1.4  
2                         1.4  
3                         1.4  
4                         1.4  


In [6]:
# Process and clean data to get unique artists with their royalty rates
logger.info("Processing artist data...")

# Remove rows where Artist is null or empty
df_clean = df.dropna(subset=['Artist'])
df_clean = df_clean[df_clean['Artist'].str.strip() != '']

logger.info(f"After removing nulls/empties: {len(df_clean)} rows")

# Get unique artists with their royalty rates
# Using groupby to get the first occurrence of each artist's royalty rates
# (assuming royalty rates are consistent per artist)
unique_artists_df = df_clean.groupby('Artist', as_index=False).first()

logger.info(f"Found {len(unique_artists_df)} unique artists")

# Display the processed data
print("\nUnique Artists Data:")
print(unique_artists_df)
print(f"\nTotal unique artists: {len(unique_artists_df)}")

# Check for any missing royalty data
missing_retail = unique_artists_df['Artist Royalty - Retail'].isna().sum()
missing_wholesale = unique_artists_df['Artist Royalty - Wholesale'].isna().sum()
missing_interiors = unique_artists_df['Artist Royalty - Interiors'].isna().sum()

print(f"\nMissing data summary:")
print(f"  - Missing Retail Royalty: {missing_retail}")
print(f"  - Missing Wholesale Royalty: {missing_wholesale}")
print(f"  - Missing Interiors Royalty: {missing_interiors}")

2025-10-20 11:15:04,528 - INFO - Processing artist data...
2025-10-20 11:15:04,633 - INFO - After removing nulls/empties: 381721 rows
2025-10-20 11:15:04,653 - INFO - Found 279 unique artists



Unique Artists Data:
              Artist  Artist Royalty - Retail  Artist Royalty - Wholesale  \
0             67 Inc                      NaN                         NaN   
1         83 Oranges                     3.99                        1.10   
2         Adam Graff                      NaN                         NaN   
3          Aley Wild                     3.99                        1.10   
4      Alice Straker                     3.99                        1.10   
..               ...                      ...                         ...   
274  Your Local Ross                     3.99                        1.10   
275    apricot+birch                     3.99                        1.10   
276       blanchouse                     3.99                        0.66   
277         cartissi                     3.99                        1.10   
278             m00d                     3.99                        1.10   

     Artist Royalty - Interiors  
0                  

In [7]:
# Function to create artist via API
def create_artist(artist_data: Dict) -> Optional[Dict]:
    """
    Create an artist record in the Strapi database via API.
    
    Args:
        artist_data: Dictionary containing artist information
        
    Returns:
        Response data if successful, None if failed
    """
    endpoint = f"{API_BASE_URL}/artists"
    
    headers = {
        'Authorization': f'Bearer {API_TOKEN}',
        'Content-Type': 'application/json'
    }
    
    # Prepare payload according to Strapi format
    payload = {
        "data": artist_data
    }
    
    try:
        response = requests.post(endpoint, headers=headers, json=payload, timeout=30)
        
        if response.status_code == 200 or response.status_code == 201:
            logger.info(f"Successfully created artist: {artist_data.get('artist_name')}")
            return response.json()
        else:
            logger.error(f"Failed to create artist {artist_data.get('artist_name')}: {response.status_code}")
            logger.error(f"Response: {response.text}")
            return None
            
    except requests.exceptions.RequestException as e:
        logger.error(f"Network error creating artist {artist_data.get('artist_name')}: {e}")
        return None
    except Exception as e:
        logger.error(f"Unexpected error creating artist {artist_data.get('artist_name')}: {e}")
        return None

# Test function is defined
logger.info("Artist creation function defined successfully")

2025-10-20 11:30:12,486 - INFO - Artist creation function defined successfully


In [8]:
# Prepare artist data for API upload
def prepare_artist_data(row) -> Dict:
    """
    Transform DataFrame row into API-compatible artist data format.
    
    Args:
        row: DataFrame row containing artist information
        
    Returns:
        Dictionary with API field names and cleaned values
    """
    artist_data = {
        "artist_name": str(row['Artist']).strip(),
        "artist_royalties_retail": None,
        "artist_royalties_wholesale": None,
        "artist_royalties_interiors": None
    }
    
    # Handle royalty values - convert to float if not null
    if pd.notna(row['Artist Royalty - Retail']):
        artist_data["artist_royalties_retail"] = float(row['Artist Royalty - Retail'])
    
    if pd.notna(row['Artist Royalty - Wholesale']):
        artist_data["artist_royalties_wholesale"] = float(row['Artist Royalty - Wholesale'])
    
    if pd.notna(row['Artist Royalty - Interiors']):
        artist_data["artist_royalties_interiors"] = float(row['Artist Royalty - Interiors'])
    
    return artist_data

# Prepare all artist records
artist_records = []
for idx, row in unique_artists_df.iterrows():
    artist_data = prepare_artist_data(row)
    artist_records.append(artist_data)

logger.info(f"Prepared {len(artist_records)} artist records for upload")

# Display sample prepared data
print("\nSample prepared artist data (first 3):")
for i, record in enumerate(artist_records[:3]):
    print(f"\n{i+1}. {json.dumps(record, indent=2)}")

2025-10-20 11:31:19,891 - INFO - Prepared 279 artist records for upload



Sample prepared artist data (first 3):

1. {
  "artist_name": "67 Inc",
  "artist_royalties_retail": null,
  "artist_royalties_wholesale": null,
  "artist_royalties_interiors": null
}

2. {
  "artist_name": "83 Oranges",
  "artist_royalties_retail": 3.99,
  "artist_royalties_wholesale": 1.1,
  "artist_royalties_interiors": 1.4
}

3. {
  "artist_name": "Adam Graff",
  "artist_royalties_retail": null,
  "artist_royalties_wholesale": null,
  "artist_royalties_interiors": null
}


In [9]:
# Upload artists to API
logger.info("Starting artist upload process...")

# Track results
successful_uploads = []
failed_uploads = []
upload_results = {
    'total': len(artist_records),
    'successful': 0,
    'failed': 0,
    'errors': []
}

# Upload each artist with a small delay to avoid overwhelming the API
for i, artist_data in enumerate(artist_records, 1):
    artist_name = artist_data.get('artist_name')
    
    logger.info(f"Uploading artist {i}/{len(artist_records)}: {artist_name}")
    
    result = create_artist(artist_data)
    
    if result:
        successful_uploads.append(artist_name)
        upload_results['successful'] += 1
    else:
        failed_uploads.append(artist_name)
        upload_results['failed'] += 1
        upload_results['errors'].append(f"Failed to upload: {artist_name}")
    
    # Small delay to avoid rate limiting (adjust as needed)
    time.sleep(0.5)

# Display results summary
print("\n" + "="*60)
print("UPLOAD SUMMARY")
print("="*60)
print(f"Total artists processed: {upload_results['total']}")
print(f"Successful uploads: {upload_results['successful']}")
print(f"Failed uploads: {upload_results['failed']}")
print(f"Success rate: {(upload_results['successful']/upload_results['total']*100):.2f}%")

if failed_uploads:
    print(f"\nFailed artists ({len(failed_uploads)}):")
    for artist in failed_uploads:
        print(f"  - {artist}")

logger.info("Artist upload process completed")

2025-10-20 11:31:27,681 - INFO - Starting artist upload process...
2025-10-20 11:31:27,682 - INFO - Uploading artist 1/279: 67 Inc
2025-10-20 11:31:28,570 - INFO - Successfully created artist: 67 Inc
2025-10-20 11:31:29,077 - INFO - Uploading artist 2/279: 83 Oranges
2025-10-20 11:31:29,436 - INFO - Successfully created artist: 83 Oranges
2025-10-20 11:31:29,947 - INFO - Uploading artist 3/279: Adam Graff
2025-10-20 11:31:30,291 - INFO - Successfully created artist: Adam Graff
2025-10-20 11:31:30,794 - INFO - Uploading artist 4/279: Aley Wild
2025-10-20 11:31:31,167 - INFO - Successfully created artist: Aley Wild
2025-10-20 11:31:31,672 - INFO - Uploading artist 5/279: Alice Straker
2025-10-20 11:31:32,045 - INFO - Successfully created artist: Alice Straker
2025-10-20 11:31:32,554 - INFO - Uploading artist 6/279: Alisa Galitsyna
2025-10-20 11:31:32,880 - INFO - Successfully created artist: Alisa Galitsyna
2025-10-20 11:31:33,387 - INFO - Uploading artist 7/279: Amanda Adam
2025-10-20 1


UPLOAD SUMMARY
Total artists processed: 279
Successful uploads: 279
Failed uploads: 0
Success rate: 100.00%
