# 2048 Strategy Analysis

This notebook analyzes the performance of different strategies for playing the 2048 game. We'll focus on the composition of basic heuristics as described in the paper by Kohler, Migler & Khosmood (2019).

## Key Heuristics

1. **Empty Cells (E)**: Prioritize moves that result in more empty cells
2. **Monotonicity (M)**: Prioritize moves that maintain a value gradient across the board
3. **Uniformity (U)**: Prioritize moves that group similar values together
4. **Greedy (G)**: Prioritize moves that maximize immediate score
5. **Random (R)**: Used as a terminal strategy when multiple moves have equal scores

The key finding is that the EMR strategy (Empty → Monotonicity → Random) tends to perform best.

## Setup

First, let's load the necessary libraries for our analysis.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
import glob

# Set the style for our plots
plt.style.use('ggplot')
sns.set_theme(style="whitegrid")
plt.rcParams["figure.figsize"] = (12, 6)

## Load Data

The data files are generated by running the `bin/run-analysis` script, which creates CSV files with the results for each strategy. Let's load these files.

**Note**: If the CSV files don't exist, you need to run the analysis script first:

```bash
cd /path/to/2048-heuristics
bin/run-analysis
```

In [None]:
# Define the path to the project directory
project_dir = os.path.abspath(os.path.join(os.getcwd(), '..'))

# Find all CSV files with strategy results
csv_files = glob.glob(os.path.join(project_dir, 'strategy-*-results.csv'))

if not csv_files:
    print("No CSV files found. Please run the analysis script first:")
    print("cd {} && bin/run-analysis".format(project_dir))
else:
    print("Found {} CSV files with strategy results".format(len(csv_files)))
    for file in csv_files:
        print(" - {}".format(os.path.basename(file)))

In [None]:
# Load the CSV files into a single DataFrame with a strategy column
dfs = []

for file in csv_files:
    # Extract strategy name from the filename
    strategy = os.path.basename(file).replace('strategy-', '').replace('-results.csv', '')
    
    # Load the CSV file
    df = pd.read_csv(file)
    
    # Add strategy column
    df['strategy'] = strategy
    
    # Append to list of DataFrames
    dfs.append(df)

# Concatenate all DataFrames
all_data = pd.concat(dfs, ignore_index=True)

# Display the first few rows
all_data.head()

## Basic Statistics

Let's calculate some basic statistics for each strategy.

In [None]:
# Group by strategy and calculate statistics
stats = all_data.groupby('strategy').agg({
    'score': ['mean', 'std', 'min', 'max', 'median'],
    'moves': ['mean', 'std', 'min', 'max', 'median'],
    'highest_tile': ['mean', 'median', lambda x: x.value_counts().index[0]]
})

# Rename the columns
stats.columns = [
    'avg_score', 'std_score', 'min_score', 'max_score', 'median_score',
    'avg_moves', 'std_moves', 'min_moves', 'max_moves', 'median_moves',
    'avg_highest_tile', 'median_highest_tile', 'mode_highest_tile'
]

# Format the columns to be more readable
stats['avg_score'] = stats['avg_score'].round(0).astype(int)
stats['std_score'] = stats['std_score'].round(0).astype(int)
stats['avg_moves'] = stats['avg_moves'].round(0).astype(int)
stats['std_moves'] = stats['std_moves'].round(0).astype(int)

# Display the statistics
stats

## Score Distribution

Let's visualize the distribution of scores for each strategy.

In [None]:
plt.figure(figsize=(12, 8))

# Create violin plot of scores by strategy
ax = sns.violinplot(x='strategy', y='score', data=all_data, inner='box', palette='Set3')

# Add individual data points
sns.stripplot(x='strategy', y='score', data=all_data, color='black', alpha=0.3, jitter=True)

# Add mean lines
means = all_data.groupby('strategy')['score'].mean()
ax.hlines(means, np.arange(len(means)) - 0.4, np.arange(len(means)) + 0.4, color='red', linewidth=2, label='Mean')

plt.title('Score Distribution by Strategy', fontsize=14)
plt.xlabel('Strategy', fontsize=12)
plt.ylabel('Score', fontsize=12)
plt.xticks(rotation=45)
plt.legend()
plt.tight_layout()
plt.show()

## Moves Distribution

Now let's look at the distribution of moves for each strategy.

In [None]:
plt.figure(figsize=(12, 8))

# Create violin plot of moves by strategy
ax = sns.violinplot(x='strategy', y='moves', data=all_data, inner='box', palette='Set2')

# Add individual data points
sns.stripplot(x='strategy', y='moves', data=all_data, color='black', alpha=0.3, jitter=True)

# Add mean lines
means = all_data.groupby('strategy')['moves'].mean()
ax.hlines(means, np.arange(len(means)) - 0.4, np.arange(len(means)) + 0.4, color='red', linewidth=2, label='Mean')

plt.title('Moves Distribution by Strategy', fontsize=14)
plt.xlabel('Strategy', fontsize=12)
plt.ylabel('Number of Moves', fontsize=12)
plt.xticks(rotation=45)
plt.legend()
plt.tight_layout()
plt.show()

## Highest Tile Distribution

Let's analyze the distribution of highest tiles achieved by each strategy.

In [None]:
# Create a pivot table for highest tile frequencies
tile_pivot = pd.crosstab(all_data['strategy'], all_data['highest_tile'])

# Convert to percentages
tile_pivot_pct = tile_pivot.div(tile_pivot.sum(axis=1), axis=0) * 100

# Display the pivot table
tile_pivot_pct

In [None]:
# Plot highest tile distribution
plt.figure(figsize=(14, 8))

# Melt the pivot table for seaborn
tile_pivot_pct_melt = tile_pivot_pct.reset_index().melt(id_vars='strategy',
                                                    var_name='highest_tile',
                                                    value_name='percentage')

# Create heatmap
sns.heatmap(tile_pivot_pct, annot=True, fmt='.1f', cmap='YlGnBu', linewidths=.5)

plt.title('Highest Tile Distribution by Strategy (%)', fontsize=14)
plt.xlabel('Highest Tile', fontsize=12)
plt.ylabel('Strategy', fontsize=12)
plt.tight_layout()
plt.show()

## Score vs. Moves Relationship

Let's explore the relationship between score and number of moves for each strategy.

In [None]:
plt.figure(figsize=(12, 8))

# Create scatter plot with regression line for each strategy
sns.scatterplot(x='moves', y='score', hue='strategy', data=all_data, alpha=0.7)
sns.regplot(x='moves', y='score', data=all_data, scatter=False, color='black', line_kws={'linewidth': 1})

plt.title('Score vs. Moves by Strategy', fontsize=14)
plt.xlabel('Number of Moves', fontsize=12)
plt.ylabel('Score', fontsize=12)
plt.legend(title='Strategy')
plt.tight_layout()
plt.show()

## Highest Tile vs. Score

Let's visualize the relationship between the highest tile achieved and the final score.

In [None]:
plt.figure(figsize=(12, 8))

# Create box plot of scores by highest tile
sns.boxplot(x='highest_tile', y='score', data=all_data, palette='viridis')

plt.title('Score Distribution by Highest Tile', fontsize=14)
plt.xlabel('Highest Tile', fontsize=12)
plt.ylabel('Score', fontsize=12)
plt.tight_layout()
plt.show()

## Efficiency Analysis

Let's calculate the score efficiency (score per move) for each strategy.

In [None]:
# Calculate score efficiency (score per move)
all_data['efficiency'] = all_data['score'] / all_data['moves']

# Group by strategy and calculate statistics
efficiency_stats = all_data.groupby('strategy')['efficiency'].agg(['mean', 'std', 'min', 'max', 'median'])

# Display the statistics
efficiency_stats

In [None]:
plt.figure(figsize=(12, 6))

# Create bar plot of average efficiency by strategy
sns.barplot(x=efficiency_stats.index, y=efficiency_stats['mean'], palette='viridis')

# Add error bars
plt.errorbar(x=range(len(efficiency_stats)), y=efficiency_stats['mean'], 
             yerr=efficiency_stats['std'], fmt='none', color='black', capsize=5)

plt.title('Average Score Efficiency by Strategy (Score per Move)', fontsize=14)
plt.xlabel('Strategy', fontsize=12)
plt.ylabel('Efficiency (Score/Move)', fontsize=12)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## Student Exercise

Now it's your turn to analyze the data and draw conclusions! Complete the following tasks:

1. Based on the analysis above, which strategy performs the best overall? Why?
2. How does the EMR strategy compare to the others in terms of:
   - Final score
   - Number of moves
   - Highest tile achieved
   - Score efficiency
3. Create a new visualization that highlights a different aspect of the data
4. Develop a hypothesis about why certain strategies perform better than others
5. If you were to create a new strategy, what combination of heuristics would you use and why?

Write your answers in the cells below.

### 1. Best Strategy Overall

*Your answer here...*

### 2. EMR Strategy Comparison

*Your answer here...*

### 3. New Visualization

*Create a new visualization below...*

In [None]:
# Your code for a new visualization here

### 4. Hypothesis

*Your hypothesis here...*

### 5. New Strategy Proposal

*Your new strategy proposal here...*

## Advanced Exercise: Implement Your Own Strategy

If you're feeling ambitious, try implementing your own strategy in the 2048-heuristics codebase! Here's how:

1. Look at the `evaluate-move` function in the 2048-heuristics.scm file
2. Add your own heuristic evaluator (e.g., corner preference, pattern matching, etc.)
3. Modify the distribution-analysis.scm file to include your strategy
4. Run the analysis again and compare your strategy to the others

Document your strategy implementation and results below.

### My Custom Strategy

*Document your strategy here...*

### Implementation Notes

*Your implementation notes here...*

### Results and Comparison

*Your results and comparison here...*