# Problem 2 Solution: Dot Products - The Heartbeat of Machine Learning

This notebook contains complete solutions to all tasks in Problem 2. Use this to check your work or understand the intended approaches.

## Key Learning Points
- Dot products measure alignment between feature and weight vectors
- Different weights create different decision patterns
- Geometric intuition helps understand mathematical operations
- The sigmoid function transforms raw scores into probabilities

In [None]:
# Setup and imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from typing import List, Tuple

# Import our custom utilities
import sys
sys.path.append('./utils')
from data_generators import load_sports_dataset
from visualization import plot_feature_space_2d, plot_decision_boundary

# Load our data from Problem 1
features, labels, feature_names, texts = load_sports_dataset()

print("Continuing with our sports tweets:")
print(f"Features shape: {features.shape}")
print(f"Feature names: {feature_names}")
print(f"\nOur key example:")
print(f"Text: 'Go Dolphins!' → Features: {features[0]} → True label: {labels[0]}")

## Task 1 Solution: Computing Dot Products by Hand

In [None]:
# Our "Go Dolphins!" feature vector
go_dolphins_features = np.array([2, 1, 1])  # [word_count, has_team, has_exclamation]

# Let's try different weight vectors and see what predictions they give
weight_examples = {
    "Random weights": np.array([0.1, 0.2, 0.1]),
    "Team-focused": np.array([0.1, 0.8, 0.1]),
    "Excitement-focused": np.array([0.1, 0.1, 0.8]),
    "Word-count focused": np.array([0.8, 0.1, 0.1]),
    "Balanced weights": np.array([0.3, 0.3, 0.4])
}

print("DOT PRODUCT CALCULATIONS FOR 'GO DOLPHINS!'")
print("=" * 55)
print(f"Feature vector: {go_dolphins_features}")
print(f"Features: [word_count={go_dolphins_features[0]}, has_team={go_dolphins_features[1]}, has_exclamation={go_dolphins_features[2]}]")
print()

for name, weights in weight_examples.items():
    # Manual dot product calculation
    dot_product = (go_dolphins_features[0] * weights[0] + 
                  go_dolphins_features[1] * weights[1] + 
                  go_dolphins_features[2] * weights[2])
    
    # Verify with numpy
    numpy_result = np.dot(go_dolphins_features, weights)
    
    print(f"{name:<20} | Weights: {weights} | Dot Product: {dot_product:.3f}")
    
    # Show the calculation step by step
    calculation = f"({go_dolphins_features[0]}×{weights[0]:.1f}) + ({go_dolphins_features[1]}×{weights[1]:.1f}) + ({go_dolphins_features[2]}×{weights[2]:.1f})"
    print(f"{'':20} | Calculation: {calculation} = {dot_product:.3f}")
    print(f"{'':20} | NumPy verification: {numpy_result:.3f} {'✅' if abs(dot_product - numpy_result) < 1e-10 else '❌'}")
    print()

In [None]:
# Implement manual dot product function
def manual_dot_product(vector_a: np.ndarray, vector_b: np.ndarray) -> float:
    """
    Calculate dot product without using numpy's built-in function.
    """
    if len(vector_a) != len(vector_b):
        raise ValueError("Vectors must have the same length")
    
    result = 0.0
    for i in range(len(vector_a)):
        result += vector_a[i] * vector_b[i]
    
    return result

# Alternative implementation using list comprehension
def manual_dot_product_v2(vector_a: np.ndarray, vector_b: np.ndarray) -> float:
    """Alternative implementation using list comprehension"""
    return sum(a * b for a, b in zip(vector_a, vector_b))

# Test implementations
test_weights = np.array([0.3, 0.5, 0.4])
manual_result = manual_dot_product(go_dolphins_features, test_weights)
manual_result_v2 = manual_dot_product_v2(go_dolphins_features, test_weights)
numpy_result = np.dot(go_dolphins_features, test_weights)

print(f"Manual calculation (loop): {manual_result:.6f}")
print(f"Manual calculation (comprehension): {manual_result_v2:.6f}")
print(f"NumPy's calculation: {numpy_result:.6f}")
print(f"All methods match: {'✅' if abs(manual_result - numpy_result) < 1e-10 and abs(manual_result_v2 - numpy_result) < 1e-10 else '❌'}")

## Task 2 Solution: Geometric Interpretation - Vector Alignment

In [None]:
# Geometric analysis of dot products
def analyze_vector_alignment(features: np.ndarray, weights: np.ndarray, name: str = ""):
    """
    Analyze the geometric relationship between feature and weight vectors.
    """
    # Calculate dot product
    dot_product = np.dot(features, weights)
    
    # Calculate magnitudes
    features_magnitude = np.linalg.norm(features)
    weights_magnitude = np.linalg.norm(weights)
    
    # Calculate angle between vectors
    cos_angle = dot_product / (features_magnitude * weights_magnitude)
    # Clip to handle numerical errors
    cos_angle = np.clip(cos_angle, -1, 1)
    angle_radians = np.arccos(cos_angle)
    angle_degrees = np.degrees(angle_radians)
    
    print(f"Analysis for {name}:")
    print(f"  Features vector: {features}")
    print(f"  Weights vector:  {weights}")
    print(f"  Dot product:     {dot_product:.3f}")
    print(f"  Features magnitude: {features_magnitude:.3f}")
    print(f"  Weights magnitude:  {weights_magnitude:.3f}")
    print(f"  cos(θ): {cos_angle:.3f}")
    print(f"  Angle between vectors: {angle_degrees:.1f}°")
    
    if angle_degrees < 45:
        interpretation = "Well aligned (strong positive signal)"
    elif angle_degrees < 90:
        interpretation = "Moderately aligned (moderate positive signal)"
    elif angle_degrees < 135:
        interpretation = "Moderately misaligned (moderate negative signal)"
    else:
        interpretation = "Poorly aligned (strong negative signal)"
    
    print(f"  Interpretation: {interpretation}")
    print()
    
    return dot_product, angle_degrees

# Test with different weight configurations
print("GEOMETRIC ANALYSIS OF VECTOR ALIGNMENT")
print("=" * 50)

# Case 1: Well-aligned weights (similar pattern to features)
aligned_weights = np.array([0.4, 0.3, 0.3])  # Emphasizes all features proportionally
analyze_vector_alignment(go_dolphins_features, aligned_weights, "Well-aligned weights")

# Case 2: Perpendicular weights
# To create perpendicular vectors, we need weights such that dot product = 0
# For features [2, 1, 1], we need 2*w1 + 1*w2 + 1*w3 = 0
# Let's use w1 = 0, w2 = 1, w3 = -1
perpendicular_weights = np.array([0.0, 1.0, -1.0])
analyze_vector_alignment(go_dolphins_features, perpendicular_weights, "Perpendicular weights")

# Case 3: Opposite alignment
opposite_weights = np.array([-0.4, -0.3, -0.3])  # All negative
analyze_vector_alignment(go_dolphins_features, opposite_weights, "Opposite weights")

# Case 4: Team-focused weights
team_weights = np.array([0.1, 0.9, 0.1])  # Heavy emphasis on team feature
analyze_vector_alignment(go_dolphins_features, team_weights, "Team-focused weights")

In [None]:
# Explore the relationship between angle and prediction strength
def create_weight_at_angle(reference_vector: np.ndarray, target_angle_degrees: float) -> np.ndarray:
    """
    Create a weight vector that makes approximately the target angle with reference vector.
    """
    target_radians = np.radians(target_angle_degrees)
    
    if target_angle_degrees == 0:
        # Parallel - same direction
        return reference_vector / np.linalg.norm(reference_vector)
    elif target_angle_degrees == 180:
        # Anti-parallel - opposite direction
        return -reference_vector / np.linalg.norm(reference_vector)
    elif target_angle_degrees == 90:
        # Perpendicular - use the perpendicular we found
        return np.array([0.0, 1.0, -1.0]) / np.linalg.norm(np.array([0.0, 1.0, -1.0]))
    else:
        # Approximate other angles by mixing parallel and perpendicular components
        parallel_component = reference_vector / np.linalg.norm(reference_vector)
        perpendicular_component = np.array([0.0, 1.0, -1.0]) / np.linalg.norm(np.array([0.0, 1.0, -1.0]))
        
        # Mix based on desired angle
        parallel_weight = np.cos(target_radians)
        perp_weight = np.sin(target_radians)
        
        result = parallel_weight * parallel_component + perp_weight * perpendicular_component
        return result / np.linalg.norm(result)

# Test specific angles
angles_to_test = [0, 30, 60, 90, 120, 150, 180]
predictions = []
actual_angles = []

print("EXPLORING ANGLE vs PREDICTION RELATIONSHIP")
print("=" * 50)

for target_angle in angles_to_test:
    weights = create_weight_at_angle(go_dolphins_features, target_angle)
    dot_product, actual_angle = analyze_vector_alignment(go_dolphins_features, weights, f"Target {target_angle}°")
    
    predictions.append(dot_product)
    actual_angles.append(actual_angle)

# Plot the relationship
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(actual_angles, predictions, 'bo-', linewidth=2, markersize=8)
plt.xlabel('Angle between vectors (degrees)')
plt.ylabel('Dot Product (Prediction)')
plt.title('Angle vs Prediction Strength')
plt.grid(True, alpha=0.3)
plt.axhline(y=0, color='r', linestyle='--', alpha=0.5, label='Zero prediction')
plt.axvline(x=90, color='r', linestyle='--', alpha=0.5, label='Perpendicular')
plt.legend()

plt.subplot(1, 2, 2)
# Show cosine relationship
theoretical_angles = np.linspace(0, 180, 100)
theoretical_cos = np.cos(np.radians(theoretical_angles))
plt.plot(theoretical_angles, theoretical_cos, 'r--', label='cos(angle)', linewidth=2)

# Normalize our predictions to compare with cosine
features_mag = np.linalg.norm(go_dolphins_features)
normalized_predictions = [p / features_mag for p in predictions]  # Assuming unit weight vectors
plt.plot(actual_angles, normalized_predictions, 'bo-', label='Normalized predictions', markersize=8)

plt.xlabel('Angle (degrees)')
plt.ylabel('Normalized value')
plt.title('Cosine Relationship Verification')
plt.grid(True, alpha=0.3)
plt.legend()

plt.tight_layout()
plt.show()

print("\nKey Insight: Dot product = |a| × |b| × cos(angle)")
print("- When vectors are aligned (0°), prediction is strongest and positive")
print("- When vectors are perpendicular (90°), prediction is zero")
print("- When vectors are opposite (180°), prediction is strongest but negative")
print("- This cosine relationship is fundamental to understanding ML predictions")

## Task 3 Solution: Testing Different Weight Strategies

In [None]:
# Define different weight strategies
weight_strategies = {
    "Equal weights": np.array([0.33, 0.33, 0.33]),
    "Team-focused": np.array([0.1, 0.8, 0.1]),
    "Excitement-focused": np.array([0.1, 0.1, 0.8]),
    "Length-focused": np.array([0.8, 0.1, 0.1]),
    "Optimized-guess": np.array([0.2, 0.5, 0.6]),  # Weighted toward team + excitement
    "Anti-pattern": np.array([-0.3, 0.5, -0.4]),  # Some negative weights
}

def evaluate_weight_strategy(features: np.ndarray, labels: np.ndarray, 
                           weights: np.ndarray, texts: List[str], strategy_name: str):
    """
    Evaluate how well a weight strategy performs on all tweets.
    """
    # Calculate predictions for all tweets
    predictions = features @ weights  # Matrix multiplication = dot products for all rows
    
    # Convert to binary predictions (positive if > 0.5)
    binary_predictions = (predictions > 0.5).astype(int)
    
    # Calculate accuracy
    accuracy = np.mean(binary_predictions == labels)
    
    # Calculate other metrics
    true_positives = np.sum((binary_predictions == 1) & (labels == 1))
    false_positives = np.sum((binary_predictions == 1) & (labels == 0))
    true_negatives = np.sum((binary_predictions == 0) & (labels == 0))
    false_negatives = np.sum((binary_predictions == 0) & (labels == 1))
    
    precision = true_positives / (true_positives + false_positives) if (true_positives + false_positives) > 0 else 0
    recall = true_positives / (true_positives + false_negatives) if (true_positives + false_negatives) > 0 else 0
    f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
    
    # Show detailed results
    print(f"\nSTRATEGY: {strategy_name}")
    print(f"Weights: {weights}")
    print(f"Accuracy: {accuracy:.1%} | Precision: {precision:.1%} | Recall: {recall:.1%} | F1: {f1_score:.1%}")
    print("\nDetailed predictions:")
    
    for i, (text, true_label, pred_score, pred_binary) in enumerate(
        zip(texts, labels, predictions, binary_predictions)):
        
        correct = "✓" if pred_binary == true_label else "✗"
        true_sentiment = "Pos" if true_label == 1 else "Neg"
        pred_sentiment = "Pos" if pred_binary == 1 else "Neg"
        confidence = "High" if abs(pred_score - 0.5) > 0.3 else "Low"
        
        print(f"  {correct} '{text:<25}' | True: {true_sentiment} | Pred: {pred_sentiment} (score: {pred_score:.3f}, {confidence})")
    
    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1_score': f1_score,
        'predictions': predictions,
        'weights': weights
    }

# Test each strategy and compare results
print("COMPARING WEIGHT STRATEGIES")
print("=" * 50)

strategy_results = {}

for name, weights in weight_strategies.items():
    results = evaluate_weight_strategy(features, labels, weights, texts, name)
    strategy_results[name] = results

# Rank strategies by performance
print("\n" + "="*70)
print("STRATEGY RANKINGS (by accuracy):")
print("="*70)

ranked_strategies = sorted(strategy_results.items(), key=lambda x: x[1]['accuracy'], reverse=True)
for i, (name, results) in enumerate(ranked_strategies, 1):
    print(f"{i}. {name:<20} | Acc: {results['accuracy']:.1%} | F1: {results['f1_score']:.1%} | Weights: {results['weights']}")

# Analyze why certain strategies work better
print("\n" + "="*70)
print("STRATEGY ANALYSIS:")
print("="*70)

best_strategy = ranked_strategies[0]
worst_strategy = ranked_strategies[-1]

print(f"\nBest strategy: {best_strategy[0]}")
print(f"  Weights: {best_strategy[1]['weights']}")
print(f"  Why it works: High weight on 'has_team' ({best_strategy[1]['weights'][1]:.1f}) and 'has_exclamation' ({best_strategy[1]['weights'][2]:.1f})")
print(f"  This makes sense because positive sports tweets often mention teams and show excitement!")

print(f"\nWorst strategy: {worst_strategy[0]}")
print(f"  Weights: {worst_strategy[1]['weights']}")
print(f"  Why it fails: Negative weights can create confusing signals")
print(f"  The anti-pattern weights go against natural sentiment indicators")

In [None]:
# Visualize how different strategies create different decision boundaries
# We'll create a 2D visualization using the first two features

# Extract first two features for visualization
features_2d = features[:, :2]  # [word_count, has_team]

# Create grid for decision boundary visualization
x_min, x_max = features_2d[:, 0].min() - 0.5, features_2d[:, 0].max() + 0.5
y_min, y_max = features_2d[:, 1].min() - 0.1, features_2d[:, 1].max() + 0.1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100), np.linspace(y_min, y_max, 100))

# Plot decision boundaries for top 3 strategies
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

for idx, (strategy_name, results) in enumerate(ranked_strategies[:3]):
    ax = axes[idx]
    
    # Use only first two weights for 2D visualization (ignore exclamation weight)
    weights_2d = results['weights'][:2]
    
    # Calculate decision boundary: w1*x1 + w2*x2 = 0.5 (threshold)
    # Rearranged: x2 = (0.5 - w1*x1) / w2
    if abs(weights_2d[1]) > 1e-10:  # Avoid division by zero
        boundary_x = np.linspace(x_min, x_max, 100)
        boundary_y = (0.5 - weights_2d[0] * boundary_x) / weights_2d[1]
        
        # Only plot boundary within our range
        valid_mask = (boundary_y >= y_min) & (boundary_y <= y_max)
        if np.any(valid_mask):
            ax.plot(boundary_x[valid_mask], boundary_y[valid_mask], 'b-', linewidth=3, 
                   label=f'Decision Boundary\n(accuracy: {results["accuracy"]:.1%})')
    
    # Create background color map for decision regions
    grid_points = np.c_[xx.ravel(), yy.ravel()]
    # Add third feature (exclamation) as average value for visualization
    avg_exclamation = np.mean(features[:, 2])
    grid_points_3d = np.column_stack([grid_points, np.full(grid_points.shape[0], avg_exclamation)])
    
    Z = grid_points_3d @ results['weights']
    Z = Z.reshape(xx.shape)
    
    # Plot decision regions
    ax.contourf(xx, yy, Z, levels=[Z.min(), 0.5, Z.max()], colors=['lightcoral', 'lightblue'], alpha=0.3)
    ax.contour(xx, yy, Z, levels=[0.5], colors=['blue'], linewidths=3)
    
    # Plot data points
    pos_mask = labels == 1
    neg_mask = labels == 0
    
    ax.scatter(features_2d[pos_mask, 0], features_2d[pos_mask, 1], 
              c='green', label='Positive', alpha=0.8, s=120, edgecolors='darkgreen', linewidth=2)
    ax.scatter(features_2d[neg_mask, 0], features_2d[neg_mask, 1], 
              c='red', label='Negative', alpha=0.8, s=120, edgecolors='darkred', linewidth=2)
    
    ax.set_xlabel('Word Count', fontsize=12)
    ax.set_ylabel('Has Team', fontsize=12)
    ax.set_title(f'{strategy_name}\nAcc: {results["accuracy"]:.1%}, Weights: [{results["weights"][0]:.1f}, {results["weights"][1]:.1f}, {results["weights"][2]:.1f}]', 
                fontsize=11, fontweight='bold')
    ax.legend(fontsize=10)
    ax.grid(True, alpha=0.3)
    ax.set_xlim(x_min, x_max)
    ax.set_ylim(y_min, y_max)

plt.suptitle('Decision Boundaries for Different Weight Strategies', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nKey Observations:")
print("- Different weights create different decision boundaries")
print("- The boundary separates regions where the model predicts positive vs negative")
print("- Better weights create boundaries that better separate the actual classes")
print("- The blue regions predict positive sentiment, red regions predict negative")
print("- This visualization shows why feature engineering and weight learning matter!")

## Task 4 Solution: From Dot Products to Predictions

In [None]:
# Examine the full pipeline: Features → Dot Product → Activation → Prediction

def sigmoid(x):
    """Sigmoid activation function: maps any real number to (0,1)"""
    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))  # Clip to prevent overflow

def analyze_full_pipeline(features: np.ndarray, weights: np.ndarray, text: str):
    """
    Show the complete transformation from text to prediction.
    """
    # Step 1: Feature extraction (already done)
    print(f"Input text: '{text}'")
    print(f"Step 1 - Features: {features}")
    
    # Step 2: Dot product (linear combination)
    dot_product = np.dot(features, weights)
    print(f"Step 2 - Dot product: {features} · {weights} = {dot_product:.3f}")
    print(f"         Calculation: {features[0]}×{weights[0]:.1f} + {features[1]}×{weights[1]:.1f} + {features[2]}×{weights[2]:.1f} = {dot_product:.3f}")
    
    # Step 3: Activation function (sigmoid)
    probability = sigmoid(dot_product)
    print(f"Step 3 - Sigmoid: σ({dot_product:.3f}) = 1/(1+e^(-{dot_product:.3f})) = {probability:.3f}")
    
    # Step 4: Final prediction
    prediction = 1 if probability > 0.5 else 0
    confidence = probability if prediction == 1 else (1 - probability)
    sentiment = "POSITIVE" if prediction == 1 else "NEGATIVE"
    
    print(f"Step 4 - Final prediction: {sentiment} (confidence: {confidence:.1%})")
    
    # Additional analysis
    if dot_product > 2:
        print(f"         Note: High dot product ({dot_product:.3f}) → Very confident positive")
    elif dot_product > 0.5:
        print(f"         Note: Moderate dot product ({dot_product:.3f}) → Confident positive")
    elif dot_product > -0.5:
        print(f"         Note: Near-zero dot product ({dot_product:.3f}) → Uncertain prediction")
    else:
        print(f"         Note: Negative dot product ({dot_product:.3f}) → Confident negative")
    
    print()
    
    return dot_product, probability, prediction

# Test with our best-performing weights
best_strategy = ranked_strategies[0]
best_weights = best_strategy[1]['weights']

print(f"FULL PIPELINE ANALYSIS - Using '{best_strategy[0]}' strategy")
print(f"Weights: {best_weights}")
print("=" * 70)

# Analyze several key examples
key_examples = [
    (features[0], texts[0]),  # "Go Dolphins!"
    (features[1], texts[1]),  # "Terrible game"
    (features[2], texts[2]),  # "Love the fins!"
    (features[4], texts[4]),  # "Great win!!"
    (features[7], texts[7]),  # "Worst season ever"
]

pipeline_results = []
for feature_vec, text in key_examples:
    dot_prod, prob, pred = analyze_full_pipeline(feature_vec, best_weights, text)
    pipeline_results.append((dot_prod, prob, pred))
    print("-" * 50)

In [None]:
# Explore how the sigmoid function transforms dot products
# Create comprehensive visualization showing the transformation

# Generate range of dot product values
dot_product_range = np.linspace(-4, 4, 200)
sigmoid_values = sigmoid(dot_product_range)

# Create visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Plot 1: Sigmoid function
axes[0, 0].plot(dot_product_range, sigmoid_values, 'b-', linewidth=3, label='Sigmoid function')
axes[0, 0].axhline(y=0.5, color='r', linestyle='--', alpha=0.7, label='Decision threshold')
axes[0, 0].axvline(x=0, color='r', linestyle='--', alpha=0.7)
axes[0, 0].set_xlabel('Dot Product Value')
axes[0, 0].set_ylabel('Probability')
axes[0, 0].set_title('Sigmoid Activation Function')
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].legend()

# Add annotations
axes[0, 0].annotate('Positive predictions\n(probability > 0.5)', xy=(2, 0.88), 
                xytext=(2.5, 0.9), fontsize=10,
                arrowprops=dict(arrowstyle='->', color='green', alpha=0.7))
axes[0, 0].annotate('Negative predictions\n(probability < 0.5)', xy=(-2, 0.12), 
                xytext=(-3, 0.1), fontsize=10,
                arrowprops=dict(arrowstyle='->', color='red', alpha=0.7))
axes[0, 0].annotate('Uncertain\n(≈ 0.5)', xy=(0, 0.5), 
                xytext=(0.5, 0.3), fontsize=10,
                arrowprops=dict(arrowstyle='->', color='orange', alpha=0.7))

# Plot 2: Our actual data points on the sigmoid curve
all_dot_products = []
all_probabilities = []
all_labels = []

for i in range(len(features)):
    dot_prod = np.dot(features[i], best_weights)
    prob = sigmoid(dot_prod)
    all_dot_products.append(dot_prod)
    all_probabilities.append(prob)
    all_labels.append(labels[i])

# Plot sigmoid curve
axes[0, 1].plot(dot_product_range, sigmoid_values, 'b-', linewidth=2, alpha=0.5, label='Sigmoid function')
axes[0, 1].axhline(y=0.5, color='r', linestyle='--', alpha=0.7, label='Decision threshold')

# Plot our data points
pos_mask = np.array(all_labels) == 1
neg_mask = np.array(all_labels) == 0

axes[0, 1].scatter(np.array(all_dot_products)[pos_mask], np.array(all_probabilities)[pos_mask], 
               c='green', s=120, label='Positive tweets', alpha=0.8, edgecolors='darkgreen', linewidth=2)
axes[0, 1].scatter(np.array(all_dot_products)[neg_mask], np.array(all_probabilities)[neg_mask], 
               c='red', s=120, label='Negative tweets', alpha=0.8, edgecolors='darkred', linewidth=2)

# Add text labels for some points
for i, text in enumerate(texts[:5]):
    short_text = text[:10] + '..' if len(text) > 12 else text
    axes[0, 1].annotate(short_text, (all_dot_products[i], all_probabilities[i]), 
                       xytext=(5, 5), textcoords='offset points', fontsize=8, alpha=0.7)

axes[0, 1].set_xlabel('Dot Product Value')
axes[0, 1].set_ylabel('Probability')
axes[0, 1].set_title('Our Tweets on the Sigmoid Curve')
axes[0, 1].grid(True, alpha=0.3)
axes[0, 1].legend()

# Plot 3: Distribution of dot products
axes[1, 0].hist(np.array(all_dot_products)[pos_mask], bins=6, alpha=0.7, color='green', 
            label=f'Positive tweets (n={np.sum(pos_mask)})', edgecolor='darkgreen', linewidth=2)
axes[1, 0].hist(np.array(all_dot_products)[neg_mask], bins=6, alpha=0.7, color='red', 
            label=f'Negative tweets (n={np.sum(neg_mask)})', edgecolor='darkred', linewidth=2)
axes[1, 0].axvline(x=0, color='black', linestyle='--', alpha=0.7, label='Zero line')
axes[1, 0].set_xlabel('Dot Product Value')
axes[1, 0].set_ylabel('Count')
axes[1, 0].set_title('Distribution of Dot Products')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Plot 4: Sigmoid derivative (shows where learning is most effective)
sigmoid_derivative = sigmoid_values * (1 - sigmoid_values)
axes[1, 1].plot(dot_product_range, sigmoid_derivative, 'purple', linewidth=3, label='Sigmoid derivative')
axes[1, 1].set_xlabel('Dot Product Value')
axes[1, 1].set_ylabel('Derivative Value')
axes[1, 1].set_title('Sigmoid Derivative (Learning Rate)')
axes[1, 1].grid(True, alpha=0.3)
axes[1, 1].legend()

# Mark where our data points are on the derivative
for i, dot_prod in enumerate(all_dot_products[:5]):
    deriv_val = sigmoid(dot_prod) * (1 - sigmoid(dot_prod))
    color = 'green' if all_labels[i] == 1 else 'red'
    axes[1, 1].scatter(dot_prod, deriv_val, c=color, s=60, alpha=0.7, edgecolors='black')

axes[1, 1].annotate('Steepest gradient\n(fastest learning)', xy=(0, 0.25), 
                   xytext=(1, 0.2), fontsize=10,
                   arrowprops=dict(arrowstyle='->', color='purple', alpha=0.7))

plt.tight_layout()
plt.show()

print("\nKey Insights about the Sigmoid Function:")
print("1. Maps any dot product value to a probability between 0 and 1")
print("2. Values > 0 become probabilities > 0.5 (positive predictions)")
print("3. Values < 0 become probabilities < 0.5 (negative predictions)")
print("4. Extreme values get 'squashed' - very confident predictions")
print("5. Values near 0 become probabilities near 0.5 - uncertain predictions")
print("6. The derivative shows where gradient descent learns fastest (around 0)")
print("\nThis transformation is crucial for converting raw dot products into meaningful probabilities!")

## Summary and Key Insights

**What we accomplished in Problem 2:**

1. ✅ **Manual dot product calculation** - Built intuition through step-by-step arithmetic
2. ✅ **Geometric interpretation** - Understood how vector alignment affects predictions
3. ✅ **Weight strategy comparison** - Saw how different weights create different decision patterns
4. ✅ **Full prediction pipeline** - Connected dot products to final predictions via sigmoid

**Key insights discovered:**

1. **Dot products measure alignment** between feature vectors and learned weight vectors
2. **Geometric intuition matters** - angles between vectors determine prediction strength
3. **Different weights create different decision boundaries** - this is what learning optimizes
4. **The sigmoid function** transforms raw scores into probabilities for decision-making
5. **Weight strategies reveal domain knowledge** - team mentions and excitement indicators work best for sports sentiment

**Connection to the bigger picture:**
- Every neural network operation fundamentally relies on dot products
- The geometric interpretation helps debug and understand model behavior
- This mathematical foundation scales from simple classifiers to ChatGPT

**Ready for Problem 3: Loss Functions!**
Now that we can make predictions, we need to learn how to measure and optimize their quality.