# Artist Similarity & Color DNA Embeddings

## Introduction

Every artist has a unique visual signature—a **"Color DNA"** that defines their work. In this lesson, we'll create numerical fingerprints that capture each artist's color characteristics, then use these embeddings to:

- **Discover** unexpected similarities between artists
- **Visualize** the landscape of artistic styles
- **Build** a recommendation system ("If you like Monet, try...")
- **Trace** artistic influences through color

### What You'll Learn

1. **Feature Engineering**: Creating rich color fingerprints for artists
2. **Embedding Spaces**: Representing artists as vectors
3. **Dimensionality Reduction**: Visualizing with t-SNE and UMAP
4. **Similarity Search**: Finding related artists
5. **Influence Analysis**: Tracing artistic connections

### The Big Picture

```
Artist's Works → Color Features → Artist Embedding → Similarity Space
     ↓                ↓                  ↓                 ↓
  Images        Statistics          Vector            Visualization
                                   (Color DNA)        & Recommendations
```

Let's decode the color DNA of the masters!

## Setup

In [3]:
# Core imports
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from collections import defaultdict
import warnings
warnings.filterwarnings('ignore')

# Renoir imports
from renoir import ArtistAnalyzer
from renoir.color import ColorExtractor, ColorAnalyzer, ColorVisualizer

# ML imports
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity, euclidean_distances
from sklearn.cluster import KMeans, AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial.distance import pdist, squareform
import seaborn as sns

# UMAP for better visualization (optional)
try:
    import umap
    UMAP_AVAILABLE = True
    print("UMAP available for visualization")
except ImportError:
    UMAP_AVAILABLE = False
    print("UMAP not installed. Install with: pip install umap-learn")
    print("Will use t-SNE instead.")

# Initialize renoir components
artist_analyzer = ArtistAnalyzer()
color_extractor = ColorExtractor()
color_analyzer = ColorAnalyzer()
visualizer = ColorVisualizer()

# Load dataset
print("\nLoading WikiArt dataset...")
dataset = artist_analyzer._load_dataset()
print(f"Loaded {len(dataset)} artworks")

# Get artist names
artist_names = dataset.features['artist'].names
print(f"Found {len(artist_names)} artists")

# Visualization settings
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (14, 10)
plt.rcParams['figure.dpi'] = 100

# Reproducibility
SEED = 42
np.random.seed(SEED)

UMAP available for visualization

Loading WikiArt dataset...
Loading WikiArt dataset...


Resolving data files:   0%|          | 0/72 [00:00<?, ?it/s]

Loading dataset shards:   0%|          | 0/45 [00:00<?, ?it/s]

✓ Loaded 81444 artworks
Loaded 81444 artworks
Found 129 artists


## Part 1: Understanding Color DNA

An artist's "Color DNA" is a numerical fingerprint that captures their characteristic use of color. We'll create this by:

1. Extracting color features from multiple works
2. Aggregating statistics across their oeuvre
3. Creating a fixed-size vector representation

### What Makes a Good Color DNA?

| Feature Category | What It Captures |
|------------------|------------------|
| **Palette Statistics** | Typical colors, saturation, brightness |
| **Color Temperature** | Warm vs cool preference |
| **Harmony Patterns** | Use of complementary, triadic, analogous |
| **Diversity** | Range vs consistency of palette |
| **Contrast** | Dynamic range in brightness/saturation |

In [4]:
def extract_artwork_features(image, n_colors=8):
    """
    Extract comprehensive color features from a single artwork.
    
    Returns a dictionary of numerical features.
    """
    try:
        palette = color_extractor.extract_dominant_colors(image, n_colors=n_colors)
        if not palette or len(palette) < 3:
            return None
        
        stats = color_analyzer.analyze_palette_statistics(palette)
        temp = color_analyzer.analyze_color_temperature_distribution(palette)
        harmony = color_analyzer.analyze_color_harmony(palette)
        
        # HSV analysis
        hsv_colors = [color_analyzer.rgb_to_hsv(c) for c in palette]
        hues = [h[0] for h in hsv_colors]
        sats = [h[1] for h in hsv_colors]
        vals = [h[2] for h in hsv_colors]
        
        # RGB analysis
        reds = [c[0] for c in palette]
        greens = [c[1] for c in palette]
        blues = [c[2] for c in palette]
        
        features = {
            # Central tendencies
            'mean_hue': stats['mean_hue'],
            'mean_saturation': stats['mean_saturation'],
            'mean_brightness': stats['mean_value'],
            'mean_red': np.mean(reds),
            'mean_green': np.mean(greens),
            'mean_blue': np.mean(blues),
            
            # Variability
            'std_hue': np.std(hues),
            'std_saturation': np.std(sats),
            'std_brightness': np.std(vals),
            
            # Temperature
            'warm_ratio': temp['warm_percentage'] / 100,
            'cool_ratio': temp['cool_percentage'] / 100,
            'neutral_ratio': temp['neutral_percentage'] / 100,
            
            # Diversity and scores
            'color_diversity': color_analyzer.calculate_color_diversity(palette),
            'saturation_score': color_analyzer.calculate_saturation_score(palette),
            'brightness_score': color_analyzer.calculate_brightness_score(palette),
            
            # Harmony
            'harmony_score': harmony.get('harmony_score', 0),
            'n_complementary': len(harmony.get('complementary_pairs', [])),
            'n_triadic': len(harmony.get('triadic_sets', [])),
            'n_analogous': len(harmony.get('analogous_groups', [])),
            
            # Contrast
            'brightness_range': max(vals) - min(vals),
            'saturation_range': max(sats) - min(sats),
            
            # Color dominance
            'red_dominance': np.mean(reds) / (np.mean(reds) + np.mean(greens) + np.mean(blues) + 1e-6),
            'green_dominance': np.mean(greens) / (np.mean(reds) + np.mean(greens) + np.mean(blues) + 1e-6),
            'blue_dominance': np.mean(blues) / (np.mean(reds) + np.mean(greens) + np.mean(blues) + 1e-6),
        }
        
        return features
    except:
        return None


# Test on a single artwork
test_works = artist_analyzer.extract_artist_works('claude-monet', limit=1)
if test_works:
    features = extract_artwork_features(test_works[0]['image'])
    print(f"Extracted {len(features)} features from artwork")
    print("\nSample features:")
    for k, v in list(features.items())[:8]:
        print(f"  {k}: {v:.3f}")

✓ Found 1 works by claude-monet
Extracted 24 features from artwork

Sample features:
  mean_hue: 23.243
  mean_saturation: 55.447
  mean_brightness: 54.216
  mean_red: 133.375
  mean_green: 108.500
  mean_blue: 74.250
  std_hue: 90.036
  std_saturation: 26.651


## Part 2: Building Artist Color DNA

Now we'll aggregate features across multiple works by each artist to create their Color DNA fingerprint.

In [5]:
def create_artist_color_dna(dataset, artist_idx, artist_name, n_works=30, n_colors=8):
    """
    Create a Color DNA embedding for an artist by aggregating features
    across multiple works.
    
    Returns:
        dict: Aggregated features (means and stds across works)
    """
    all_features = []
    
    # Collect features from artist's works
    count = 0
    for item in dataset:
        if item['artist'] == artist_idx:
            features = extract_artwork_features(item['image'], n_colors)
            if features:
                all_features.append(features)
                count += 1
            if count >= n_works:
                break
    
    if len(all_features) < 5:  # Need minimum works
        return None, 0
    
    # Aggregate: compute mean and std for each feature
    df = pd.DataFrame(all_features)
    
    color_dna = {}
    for col in df.columns:
        color_dna[f'{col}_mean'] = df[col].mean()
        color_dna[f'{col}_std'] = df[col].std()
    
    return color_dna, len(all_features)


# Select artists to analyze (mix of movements and styles)
ARTISTS_TO_ANALYZE = [
    # Impressionists
    'claude-monet', 'pierre-auguste-renoir', 'edgar-degas', 'camille-pissarro',
    # Post-Impressionists
    'vincent-van-gogh', 'paul-cezanne', 'paul-gauguin', 'georges-seurat',
    # Expressionists
    'edvard-munch', 'ernst-ludwig-kirchner', 'wassily-kandinsky', 'egon-schiele',
    # Old Masters
    'rembrandt', 'johannes-vermeer', 'caravaggio', 'peter-paul-rubens',
    # Romantics
    'j.m.w.-turner', 'caspar-david-friedrich', 'eugene-delacroix',
    # Modern
    'pablo-picasso', 'henri-matisse', 'marc-chagall', 'salvador-dali',
    # Realists
    'gustave-courbet', 'jean-francois-millet', 'winslow-homer',
    # Others
    'gustav-klimt', 'alphonse-mucha', 'edward-hopper', 'frida-kahlo',
    'georgia-o-keeffe', 'jackson-pollock', 'mark-rothko', 'andy-warhol'
]

# Build artist name to index mapping
artist_to_idx = {name.lower(): idx for idx, name in enumerate(artist_names)}

print(f"Analyzing {len(ARTISTS_TO_ANALYZE)} artists...")
print("This may take a few minutes.\n")

Analyzing 34 artists...
This may take a few minutes.



In [None]:
# Create Color DNA for each artist
artist_dna = {}
artist_work_counts = {}

for artist_name in ARTISTS_TO_ANALYZE:
    # Find artist index
    artist_key = artist_name.lower().replace(' ', '-')
    artist_idx = artist_to_idx.get(artist_key)
    
    if artist_idx is None:
        # Try partial match
        matches = [k for k in artist_to_idx.keys() if artist_key in k or k in artist_key]
        if matches:
            artist_key = matches[0]
            artist_idx = artist_to_idx[artist_key]
    
    if artist_idx is None:
        print(f"  [!] Artist not found: {artist_name}")
        continue
    
    # Create Color DNA
    dna, n_works = create_artist_color_dna(dataset, artist_idx, artist_name, n_works=30)
    
    if dna:
        # Store with readable name
        display_name = artist_name.replace('-', ' ').title()
        artist_dna[display_name] = dna
        artist_work_counts[display_name] = n_works
        print(f"  [OK] {display_name}: {n_works} works analyzed")
    else:
        print(f"  [!] Insufficient works for: {artist_name}")

print(f"\nSuccessfully created Color DNA for {len(artist_dna)} artists")

In [None]:
# Convert to DataFrame and matrix
dna_df = pd.DataFrame(artist_dna).T
print(f"Color DNA matrix shape: {dna_df.shape}")
print(f"Features per artist: {dna_df.shape[1]}")

# Show sample of the data
print("\nSample Color DNA features (first 6 columns):")
print(dna_df.iloc[:5, :6].round(2))

## Part 3: Visualizing the Artist Landscape

We'll use dimensionality reduction to visualize how artists relate to each other in color space.

In [None]:
# Prepare data for visualization
X = dna_df.values
artist_list = list(dna_df.index)

# Handle any NaN values
X = np.nan_to_num(X, nan=0.0)

# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print(f"Prepared {len(artist_list)} artists with {X_scaled.shape[1]} features")

In [None]:
# t-SNE visualization
print("Computing t-SNE projection...")
tsne = TSNE(n_components=2, random_state=SEED, perplexity=min(10, len(artist_list)-1), 
            learning_rate='auto', init='pca')
X_tsne = tsne.fit_transform(X_scaled)

# UMAP visualization (if available)
if UMAP_AVAILABLE:
    print("Computing UMAP projection...")
    reducer = umap.UMAP(n_components=2, random_state=SEED, n_neighbors=min(10, len(artist_list)-1))
    X_umap = reducer.fit_transform(X_scaled)

print("Projections complete!")

In [None]:
# Define artist movements/styles for coloring
ARTIST_MOVEMENTS = {
    'Claude Monet': 'Impressionism',
    'Pierre Auguste Renoir': 'Impressionism',
    'Edgar Degas': 'Impressionism',
    'Camille Pissarro': 'Impressionism',
    'Vincent Van Gogh': 'Post-Impressionism',
    'Paul Cezanne': 'Post-Impressionism',
    'Paul Gauguin': 'Post-Impressionism',
    'Georges Seurat': 'Post-Impressionism',
    'Edvard Munch': 'Expressionism',
    'Ernst Ludwig Kirchner': 'Expressionism',
    'Wassily Kandinsky': 'Expressionism',
    'Egon Schiele': 'Expressionism',
    'Rembrandt': 'Old Masters',
    'Johannes Vermeer': 'Old Masters',
    'Caravaggio': 'Old Masters',
    'Peter Paul Rubens': 'Old Masters',
    'J.M.W. Turner': 'Romanticism',
    'Caspar David Friedrich': 'Romanticism',
    'Eugene Delacroix': 'Romanticism',
    'Pablo Picasso': 'Modern',
    'Henri Matisse': 'Modern',
    'Marc Chagall': 'Modern',
    'Salvador Dali': 'Modern',
    'Gustave Courbet': 'Realism',
    'Jean Francois Millet': 'Realism',
    'Winslow Homer': 'Realism',
    'Gustav Klimt': 'Art Nouveau',
    'Alphonse Mucha': 'Art Nouveau',
    'Edward Hopper': 'American',
    'Frida Kahlo': 'Modern',
    'Georgia O Keeffe': 'American',
    'Jackson Pollock': 'Abstract',
    'Mark Rothko': 'Abstract',
    'Andy Warhol': 'Pop Art'
}

MOVEMENT_COLORS = {
    'Impressionism': '#74b9ff',
    'Post-Impressionism': '#a29bfe',
    'Expressionism': '#e74c3c',
    'Old Masters': '#8e44ad',
    'Romanticism': '#e67e22',
    'Modern': '#27ae60',
    'Realism': '#95a5a6',
    'Art Nouveau': '#f39c12',
    'American': '#1abc9c',
    'Abstract': '#e84393',
    'Pop Art': '#fd79a8'
}

# Get movement for each artist
def get_movement(artist):
    for key, movement in ARTIST_MOVEMENTS.items():
        if key.lower() in artist.lower() or artist.lower() in key.lower():
            return movement
    return 'Other'

movements = [get_movement(a) for a in artist_list]

In [None]:
# Create visualization
fig, axes = plt.subplots(1, 2 if UMAP_AVAILABLE else 1, figsize=(18 if UMAP_AVAILABLE else 12, 10))
if not UMAP_AVAILABLE:
    axes = [axes]

# t-SNE plot
ax = axes[0]
for movement in set(movements):
    mask = [m == movement for m in movements]
    color = MOVEMENT_COLORS.get(movement, '#95a5a6')
    indices = [i for i, m in enumerate(mask) if m]
    ax.scatter(X_tsne[indices, 0], X_tsne[indices, 1], 
              c=color, label=movement, s=150, alpha=0.7, edgecolors='white', linewidth=2)

# Add artist labels
for i, artist in enumerate(artist_list):
    # Shorten long names
    short_name = artist.split()[-1] if len(artist) > 15 else artist
    ax.annotate(short_name, (X_tsne[i, 0], X_tsne[i, 1]), 
                fontsize=8, ha='center', va='bottom',
                xytext=(0, 5), textcoords='offset points')

ax.set_xlabel('t-SNE 1', fontsize=12)
ax.set_ylabel('t-SNE 2', fontsize=12)
ax.set_title('Artist Color DNA: t-SNE Visualization', fontsize=14, fontweight='bold')
ax.legend(loc='upper left', fontsize=9, ncol=2)
ax.grid(True, alpha=0.3)

# UMAP plot (if available)
if UMAP_AVAILABLE:
    ax = axes[1]
    for movement in set(movements):
        mask = [m == movement for m in movements]
        color = MOVEMENT_COLORS.get(movement, '#95a5a6')
        indices = [i for i, m in enumerate(mask) if m]
        ax.scatter(X_umap[indices, 0], X_umap[indices, 1], 
                  c=color, label=movement, s=150, alpha=0.7, edgecolors='white', linewidth=2)
    
    for i, artist in enumerate(artist_list):
        short_name = artist.split()[-1] if len(artist) > 15 else artist
        ax.annotate(short_name, (X_umap[i, 0], X_umap[i, 1]), 
                    fontsize=8, ha='center', va='bottom',
                    xytext=(0, 5), textcoords='offset points')
    
    ax.set_xlabel('UMAP 1', fontsize=12)
    ax.set_ylabel('UMAP 2', fontsize=12)
    ax.set_title('Artist Color DNA: UMAP Visualization', fontsize=14, fontweight='bold')
    ax.legend(loc='upper left', fontsize=9, ncol=2)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Part 4: Artist Similarity Analysis

Let's compute and visualize pairwise similarities between artists.

In [None]:
# Compute similarity matrices
cosine_sim = cosine_similarity(X_scaled)
euclidean_dist = euclidean_distances(X_scaled)

# Convert distance to similarity (inverted and normalized)
euclidean_sim = 1 / (1 + euclidean_dist)

# Create DataFrames
cosine_df = pd.DataFrame(cosine_sim, index=artist_list, columns=artist_list)
euclidean_df = pd.DataFrame(euclidean_sim, index=artist_list, columns=artist_list)

print("Similarity matrices computed!")

In [None]:
# Visualize similarity heatmap
fig, ax = plt.subplots(figsize=(16, 14))

# Cluster artists for better visualization
from scipy.cluster.hierarchy import leaves_list
linkage_matrix = linkage(X_scaled, method='ward')
order = leaves_list(linkage_matrix)

# Reorder similarity matrix
ordered_artists = [artist_list[i] for i in order]
ordered_sim = cosine_df.loc[ordered_artists, ordered_artists]

# Plot heatmap
mask = np.triu(np.ones_like(ordered_sim, dtype=bool), k=1)
sns.heatmap(ordered_sim, mask=mask, cmap='RdYlBu_r', center=0.5,
            square=True, linewidths=0.5, ax=ax,
            cbar_kws={'shrink': 0.6, 'label': 'Cosine Similarity'},
            annot=False)

ax.set_title('Artist Color DNA Similarity Matrix\n(Hierarchically Clustered)', 
             fontsize=14, fontweight='bold')
plt.xticks(rotation=45, ha='right', fontsize=9)
plt.yticks(fontsize=9)
plt.tight_layout()
plt.show()

In [None]:
# Hierarchical clustering dendrogram
fig, ax = plt.subplots(figsize=(16, 8))

dendrogram(
    linkage_matrix,
    labels=artist_list,
    leaf_rotation=90,
    leaf_font_size=10,
    ax=ax,
    color_threshold=0.7 * max(linkage_matrix[:, 2])
)

ax.set_title('Artist Color DNA: Hierarchical Clustering', fontsize=14, fontweight='bold')
ax.set_xlabel('Artist', fontsize=12)
ax.set_ylabel('Distance', fontsize=12)

plt.tight_layout()
plt.show()

## Part 5: Artist Recommendation System

Now let's build a recommendation system: "If you like Artist X, you might also like..."

In [None]:
def find_similar_artists(artist_name, similarity_df, top_n=5, exclude_self=True):
    """
    Find the most similar artists to a given artist.
    
    Args:
        artist_name: Name of the query artist
        similarity_df: DataFrame of pairwise similarities
        top_n: Number of recommendations
        exclude_self: Whether to exclude the artist themselves
        
    Returns:
        List of (artist_name, similarity_score) tuples
    """
    # Find matching artist name in index
    matches = [a for a in similarity_df.index if artist_name.lower() in a.lower()]
    
    if not matches:
        print(f"Artist '{artist_name}' not found in database.")
        return []
    
    artist = matches[0]
    
    # Get similarities
    similarities = similarity_df.loc[artist].copy()
    
    if exclude_self:
        similarities = similarities.drop(artist)
    
    # Sort and return top N
    top_similar = similarities.sort_values(ascending=False).head(top_n)
    
    return list(top_similar.items())


def recommend_artists(artist_name, similarity_df, top_n=5):
    """
    Generate artist recommendations with explanation.
    """
    similar = find_similar_artists(artist_name, similarity_df, top_n)
    
    if not similar:
        return
    
    print(f"\nIf you like {artist_name}, you might also enjoy:")
    print("=" * 50)
    
    for i, (rec_artist, score) in enumerate(similar, 1):
        movement = get_movement(rec_artist)
        print(f"  {i}. {rec_artist}")
        print(f"     Movement: {movement}")
        print(f"     Color similarity: {score:.1%}")
        print()


# Test recommendations
test_artists = ['Monet', 'Van Gogh', 'Rembrandt', 'Picasso']

for artist in test_artists:
    recommend_artists(artist, cosine_df, top_n=3)

In [None]:
# Interactive recommendation explorer
def visualize_recommendations(artist_name, similarity_df, X_2d, artist_list, top_n=5):
    """
    Visualize an artist and their recommendations in 2D space.
    """
    similar = find_similar_artists(artist_name, similarity_df, top_n)
    
    if not similar:
        return
    
    # Find indices
    query_matches = [i for i, a in enumerate(artist_list) if artist_name.lower() in a.lower()]
    if not query_matches:
        return
    query_idx = query_matches[0]
    
    rec_indices = []
    for rec_artist, _ in similar:
        for i, a in enumerate(artist_list):
            if rec_artist == a:
                rec_indices.append(i)
                break
    
    # Plot
    fig, ax = plt.subplots(figsize=(12, 10))
    
    # All artists (gray)
    ax.scatter(X_2d[:, 0], X_2d[:, 1], c='lightgray', s=80, alpha=0.5)
    
    # Query artist (large, red)
    ax.scatter(X_2d[query_idx, 0], X_2d[query_idx, 1], 
              c='#e74c3c', s=300, edgecolors='black', linewidth=2, 
              label=f'Query: {artist_list[query_idx]}', zorder=5)
    
    # Recommendations (blue)
    for i, (rec_idx, (rec_name, score)) in enumerate(zip(rec_indices, similar)):
        ax.scatter(X_2d[rec_idx, 0], X_2d[rec_idx, 1],
                  c='#3498db', s=200, edgecolors='black', linewidth=2, zorder=4)
        # Draw line from query to recommendation
        ax.plot([X_2d[query_idx, 0], X_2d[rec_idx, 0]], 
               [X_2d[query_idx, 1], X_2d[rec_idx, 1]],
               'b--', alpha=0.3, linewidth=2)
    
    # Labels for query and recommendations
    ax.annotate(artist_list[query_idx], (X_2d[query_idx, 0], X_2d[query_idx, 1]),
                fontsize=11, fontweight='bold', ha='center', va='bottom',
                xytext=(0, 10), textcoords='offset points',
                bbox=dict(boxstyle='round,pad=0.3', facecolor='#e74c3c', alpha=0.8),
                color='white')
    
    for rec_idx, (rec_name, score) in zip(rec_indices, similar):
        ax.annotate(f"{rec_name}\n({score:.0%})", (X_2d[rec_idx, 0], X_2d[rec_idx, 1]),
                    fontsize=9, ha='center', va='bottom',
                    xytext=(0, 8), textcoords='offset points',
                    bbox=dict(boxstyle='round,pad=0.3', facecolor='#3498db', alpha=0.8),
                    color='white')
    
    ax.set_title(f'Artist Recommendations for {artist_list[query_idx]}', 
                fontsize=14, fontweight='bold')
    ax.set_xlabel('Dimension 1', fontsize=11)
    ax.set_ylabel('Dimension 2', fontsize=11)
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()


# Visualize recommendations for a few artists
visualize_recommendations('Monet', cosine_df, X_tsne, artist_list, top_n=4)

In [None]:
# More visualization examples
visualize_recommendations('Van Gogh', cosine_df, X_tsne, artist_list, top_n=4)

In [None]:
visualize_recommendations('Rembrandt', cosine_df, X_tsne, artist_list, top_n=4)

## Part 6: Analyzing Artistic Influences

Can we trace artistic influences through color similarity? Let's explore connections between movements.

In [None]:
# Compute movement-level similarities
def compute_movement_similarity(dna_df, artist_movements):
    """
    Compute average Color DNA similarity between movements.
    """
    # Group artists by movement
    movement_dna = defaultdict(list)
    
    for artist in dna_df.index:
        movement = get_movement(artist)
        if movement != 'Other':
            movement_dna[movement].append(dna_df.loc[artist].values)
    
    # Compute mean DNA per movement
    movement_means = {}
    for movement, dnas in movement_dna.items():
        if dnas:
            movement_means[movement] = np.mean(dnas, axis=0)
    
    # Compute pairwise similarities
    movements = list(movement_means.keys())
    n = len(movements)
    sim_matrix = np.zeros((n, n))
    
    for i in range(n):
        for j in range(n):
            vec_i = movement_means[movements[i]].reshape(1, -1)
            vec_j = movement_means[movements[j]].reshape(1, -1)
            sim_matrix[i, j] = cosine_similarity(vec_i, vec_j)[0, 0]
    
    return pd.DataFrame(sim_matrix, index=movements, columns=movements)


# Compute movement similarities
movement_sim = compute_movement_similarity(dna_df, ARTIST_MOVEMENTS)

print("Movement Color DNA Similarity:")
print(movement_sim.round(2))

In [None]:
# Visualize movement relationships
fig, ax = plt.subplots(figsize=(12, 10))

mask = np.triu(np.ones_like(movement_sim, dtype=bool), k=1)
sns.heatmap(movement_sim, annot=True, fmt='.2f', cmap='RdYlBu_r',
            center=0.5, square=True, linewidths=2, ax=ax,
            cbar_kws={'shrink': 0.8, 'label': 'Color Similarity'},
            annot_kws={'size': 11})

ax.set_title('Art Movement Color DNA Similarity', fontsize=14, fontweight='bold')
plt.xticks(rotation=45, ha='right', fontsize=11)
plt.yticks(rotation=0, fontsize=11)

plt.tight_layout()
plt.show()

In [None]:
# Influence network visualization
def plot_influence_network(movement_sim, threshold=0.7):
    """
    Plot a network graph of movement influences based on color similarity.
    """
    movements = list(movement_sim.index)
    n = len(movements)
    
    # Position movements in a circle
    angles = np.linspace(0, 2*np.pi, n, endpoint=False)
    positions = {m: (np.cos(a), np.sin(a)) for m, a in zip(movements, angles)}
    
    fig, ax = plt.subplots(figsize=(12, 12))
    
    # Draw edges (connections above threshold)
    for i, m1 in enumerate(movements):
        for j, m2 in enumerate(movements):
            if i < j:  # Upper triangle only
                sim = movement_sim.loc[m1, m2]
                if sim > threshold:
                    x1, y1 = positions[m1]
                    x2, y2 = positions[m2]
                    # Line thickness based on similarity
                    linewidth = (sim - threshold) / (1 - threshold) * 8 + 1
                    alpha = 0.3 + 0.5 * (sim - threshold) / (1 - threshold)
                    ax.plot([x1, x2], [y1, y2], 'b-', 
                           linewidth=linewidth, alpha=alpha, zorder=1)
    
    # Draw nodes
    for m in movements:
        x, y = positions[m]
        color = MOVEMENT_COLORS.get(m, '#95a5a6')
        ax.scatter(x, y, s=2000, c=color, edgecolors='white', 
                  linewidth=3, zorder=2)
        ax.annotate(m, (x, y), ha='center', va='center', 
                   fontsize=9, fontweight='bold', color='white',
                   zorder=3)
    
    ax.set_xlim(-1.5, 1.5)
    ax.set_ylim(-1.5, 1.5)
    ax.set_aspect('equal')
    ax.axis('off')
    ax.set_title(f'Art Movement Color Connections\n(Similarity > {threshold:.0%})', 
                fontsize=14, fontweight='bold')
    
    plt.tight_layout()
    plt.show()


plot_influence_network(movement_sim, threshold=0.7)

## Part 7: Deep Dive - Understanding Color DNA Features

Let's examine what specific color characteristics distinguish different artists and movements.

In [None]:
# Extract key features for comparison
key_features = [
    'mean_saturation_mean', 'mean_brightness_mean', 'warm_ratio_mean',
    'color_diversity_mean', 'harmony_score_mean', 'std_brightness_mean'
]

# Create feature comparison DataFrame
feature_df = dna_df[key_features].copy()
feature_df.columns = [c.replace('_mean', '').replace('mean_', '') for c in feature_df.columns]

# Add movement column
feature_df['movement'] = [get_movement(a) for a in feature_df.index]

print("Key Color Features by Artist:")
print(feature_df.round(2))

In [None]:
# Visualize feature profiles for selected artists
def plot_artist_radar(artists, feature_df, features):
    """
    Create radar chart comparing artist color profiles.
    """
    n_features = len(features)
    angles = np.linspace(0, 2 * np.pi, n_features, endpoint=False).tolist()
    angles += angles[:1]
    
    fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(polar=True))
    
    colors = plt.cm.Set2(np.linspace(0, 1, len(artists)))
    
    for artist, color in zip(artists, colors):
        # Find matching artist
        matches = [a for a in feature_df.index if artist.lower() in a.lower()]
        if not matches:
            continue
        artist_name = matches[0]
        
        values = feature_df.loc[artist_name, features].values.tolist()
        
        # Normalize to 0-1
        min_vals = feature_df[features].min().values
        max_vals = feature_df[features].max().values
        values_norm = [(v - mi) / (ma - mi + 1e-6) for v, mi, ma in zip(values, min_vals, max_vals)]
        values_norm += values_norm[:1]
        
        ax.plot(angles, values_norm, 'o-', linewidth=2, label=artist_name, color=color)
        ax.fill(angles, values_norm, alpha=0.15, color=color)
    
    # Labels
    feature_labels = [f.replace('_', '\n').title() for f in features]
    ax.set_xticks(angles[:-1])
    ax.set_xticklabels(feature_labels, fontsize=10)
    ax.set_ylim(0, 1)
    
    ax.legend(loc='upper right', bbox_to_anchor=(1.3, 1.0), fontsize=10)
    ax.set_title('Artist Color DNA Profiles', fontsize=14, fontweight='bold', y=1.08)
    
    plt.tight_layout()
    plt.show()


# Compare Impressionists
impressionists = ['Monet', 'Renoir', 'Degas', 'Pissarro']
features = ['saturation', 'brightness', 'warm_ratio', 'color_diversity', 'harmony_score']
plot_artist_radar(impressionists, feature_df, features)

In [None]:
# Compare across movements
cross_movement = ['Monet', 'Van Gogh', 'Rembrandt', 'Picasso', 'Rothko']
plot_artist_radar(cross_movement, feature_df, features)

In [None]:
# Feature distributions by movement
fig, axes = plt.subplots(2, 3, figsize=(16, 10))

features_to_plot = ['saturation', 'brightness', 'warm_ratio', 
                    'color_diversity', 'harmony_score', 'std_brightness']

for ax, feature in zip(axes.flat, features_to_plot):
    for movement in ['Impressionism', 'Post-Impressionism', 'Expressionism', 'Old Masters', 'Modern']:
        subset = feature_df[feature_df['movement'] == movement]
        if len(subset) > 0:
            color = MOVEMENT_COLORS.get(movement, '#95a5a6')
            ax.scatter([movement] * len(subset), subset[feature], 
                      c=color, s=100, alpha=0.7, edgecolors='white', linewidth=1)
            ax.scatter([movement], [subset[feature].mean()], 
                      c=color, s=200, marker='D', edgecolors='black', linewidth=2)
    
    ax.set_ylabel(feature.replace('_', ' ').title(), fontsize=11)
    ax.tick_params(axis='x', rotation=45)
    ax.grid(axis='y', alpha=0.3)

plt.suptitle('Color DNA Features by Movement\n(Diamonds = Movement Mean)', 
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

## Part 8: Finding Unexpected Connections

One of the most interesting applications: discovering unexpected similarities between artists from different eras or movements.

In [None]:
def find_surprising_similarities(similarity_df, artist_movements, top_n=10):
    """
    Find pairs of artists from DIFFERENT movements with high color similarity.
    These represent unexpected connections.
    """
    surprising = []
    
    artists = list(similarity_df.index)
    
    for i, a1 in enumerate(artists):
        for j, a2 in enumerate(artists):
            if i >= j:  # Skip diagonal and lower triangle
                continue
            
            m1 = get_movement(a1)
            m2 = get_movement(a2)
            
            # Only consider different movements
            if m1 != m2 and m1 != 'Other' and m2 != 'Other':
                sim = similarity_df.loc[a1, a2]
                surprising.append((a1, a2, m1, m2, sim))
    
    # Sort by similarity
    surprising.sort(key=lambda x: x[4], reverse=True)
    
    return surprising[:top_n]


# Find surprising connections
surprises = find_surprising_similarities(cosine_df, ARTIST_MOVEMENTS, top_n=10)

print("SURPRISING COLOR CONNECTIONS")
print("Artists from different movements with similar Color DNA")
print("=" * 70)

for i, (a1, a2, m1, m2, sim) in enumerate(surprises, 1):
    print(f"\n{i}. {a1} ({m1}) ↔ {a2} ({m2})")
    print(f"   Color similarity: {sim:.1%}")

In [None]:
# Visualize a surprising connection
def compare_artist_palettes(artist1, artist2, n_works=3):
    """
    Side-by-side comparison of two artists' typical palettes.
    """
    fig, axes = plt.subplots(2, n_works, figsize=(4*n_works, 4))
    
    for row, artist_name in enumerate([artist1, artist2]):
        # Get artist works
        artist_key = artist_name.lower().replace(' ', '-')
        works = artist_analyzer.extract_artist_works(artist_key, limit=n_works)
        
        if not works:
            # Try partial match
            for name in artist_names:
                if artist_name.lower().split()[-1] in name.lower():
                    works = artist_analyzer.extract_artist_works(name, limit=n_works)
                    if works:
                        break
        
        for col, work in enumerate(works[:n_works]):
            ax = axes[row, col]
            palette = color_extractor.extract_dominant_colors(work['image'], n_colors=5)
            
            if palette:
                for k, color in enumerate(palette):
                    color_norm = tuple(c/255 for c in color)
                    ax.add_patch(plt.Rectangle((k, 0), 1, 1, facecolor=color_norm, 
                                               edgecolor='white', lw=1))
            
            ax.set_xlim(0, 5)
            ax.set_ylim(0, 1)
            ax.axis('off')
            
            if col == 0:
                ax.set_ylabel(artist_name, fontsize=12, fontweight='bold', 
                             rotation=0, ha='right', va='center', labelpad=10)
    
    plt.suptitle(f'Color Palette Comparison\n{artist1} vs {artist2}', 
                fontsize=14, fontweight='bold', y=1.05)
    plt.tight_layout()
    plt.show()


# Compare a surprising pair
if surprises:
    a1, a2, m1, m2, sim = surprises[0]
    print(f"\nComparing: {a1} ({m1}) and {a2} ({m2})")
    print(f"Color similarity: {sim:.1%}\n")
    compare_artist_palettes(a1, a2, n_works=4)

## Part 9: Building an Interactive Explorer

In [None]:
class ArtistColorDNAExplorer:
    """
    Interactive explorer for artist Color DNA.
    """
    
    def __init__(self, dna_df, similarity_df, embeddings_2d, artist_list):
        self.dna_df = dna_df
        self.similarity_df = similarity_df
        self.embeddings_2d = embeddings_2d
        self.artist_list = artist_list
    
    def search(self, query):
        """Search for artists by name."""
        matches = [a for a in self.artist_list if query.lower() in a.lower()]
        return matches
    
    def get_profile(self, artist_name):
        """Get Color DNA profile for an artist."""
        matches = self.search(artist_name)
        if not matches:
            return None
        
        artist = matches[0]
        profile = self.dna_df.loc[artist]
        
        # Extract key features
        key_info = {
            'Saturation': profile.get('mean_saturation_mean', 0),
            'Brightness': profile.get('mean_brightness_mean', 0),
            'Warm Colors': profile.get('warm_ratio_mean', 0) * 100,
            'Cool Colors': profile.get('cool_ratio_mean', 0) * 100,
            'Color Diversity': profile.get('color_diversity_mean', 0),
            'Harmony Score': profile.get('harmony_score_mean', 0),
        }
        
        return artist, key_info
    
    def recommend(self, artist_name, n=5):
        """Get similar artist recommendations."""
        return find_similar_artists(artist_name, self.similarity_df, n)
    
    def compare(self, artist1, artist2):
        """Compare two artists' Color DNA."""
        p1 = self.get_profile(artist1)
        p2 = self.get_profile(artist2)
        
        if not p1 or not p2:
            return None
        
        # Get similarity
        sim = self.similarity_df.loc[p1[0], p2[0]]
        
        return {
            'artist1': p1,
            'artist2': p2,
            'similarity': sim
        }
    
    def display_profile(self, artist_name):
        """Display a formatted artist profile."""
        result = self.get_profile(artist_name)
        if not result:
            print(f"Artist '{artist_name}' not found.")
            return
        
        artist, info = result
        movement = get_movement(artist)
        
        print(f"\n{'='*50}")
        print(f"{artist}")
        print(f"   Movement: {movement}")
        print(f"{'='*50}")
        print("\nColor DNA Profile:")
        for key, value in info.items():
            if 'Color' in key:
                print(f"  {key}: {value:.1f}%")
            else:
                print(f"  {key}: {value:.2f}")
        
        # Get recommendations
        print("\nSimilar Artists:")
        recs = self.recommend(artist_name, n=3)
        for rec_artist, score in recs:
            rec_movement = get_movement(rec_artist)
            print(f"  - {rec_artist} ({rec_movement}) - {score:.0%} similar")


# Create explorer
explorer = ArtistColorDNAExplorer(dna_df, cosine_df, X_tsne, artist_list)

# Demo
explorer.display_profile('Monet')

In [None]:
# Explore more artists
explorer.display_profile('Van Gogh')

In [None]:
explorer.display_profile('Rembrandt')

## Summary and Key Insights

### What We Built

1. **Color DNA Fingerprints**: 50+ dimensional vectors capturing each artist's color signature
2. **Similarity Metrics**: Cosine similarity for comparing artistic styles
3. **Visualizations**: t-SNE/UMAP projections, dendrograms, heatmaps
4. **Recommendation System**: "If you like X, try Y" based on color similarity
5. **Influence Analysis**: Movement-level color relationships

### Key Findings

- **Movements cluster by color**: Impressionists, Expressionists, Old Masters form distinct groups
- **Unexpected connections**: Some artists from different eras share color DNA
- **Distinguishing features**: Saturation, brightness, and temperature are key differentiators
- **Color harmony varies**: Some movements emphasize complementary colors, others analogous

### Applications

- **Art Education**: Understanding stylistic relationships
- **Museum Curation**: Grouping artworks by visual similarity
- **Artist Discovery**: Recommending new artists to art enthusiasts
- **Art History Research**: Quantifying artistic influences

## Exercises

In [None]:
# YOUR CODE HERE

# Exercise 1: Add more artists to the analysis
# Expand ARTISTS_TO_ANALYZE with your favorite artists

# Exercise 2: Create genre-based Color DNA
# Instead of artists, analyze genres (portraits, landscapes, etc.)

# Exercise 3: Time-based analysis
# How did an artist's Color DNA change over their career?

# Exercise 4: Weighted recommendations
# Allow users to specify which features matter most to them

# Exercise 5: Cross-cultural analysis
# Compare Eastern vs Western art color signatures


---

## Conclusion

In this lesson, you've created a complete **Artist Color DNA system**:

1. **Extracted** rich color fingerprints from thousands of artworks
2. **Visualized** the landscape of artistic styles in embedding space
3. **Built** a recommendation system for discovering similar artists
4. **Analyzed** artistic influences through color similarity
5. **Discovered** unexpected connections between artists across eras

**Key Insight**: Color alone encodes tremendous information about artistic style. The Color DNA we've created captures not just what colors artists use, but *how* they use them—their temperature preferences, contrast patterns, harmony choices, and diversity. This numerical fingerprint enables computational exploration of art history in ways that complement traditional scholarship.

This is the power of treating artistic style as data: we can quantify, compare, and discover patterns that might take human experts years to uncover!