In [30]:
import numpy as np
import random
import pandas as pd

I’m completing a paint-by-number painting, although this one is a little different from any that I’ve seen before. It’s an infinitely long strip of canvas that is 1 cm wide. It’s broken up into adjacent 1 cm-by-1 cm squares, each of which is numbered zero or one, each with a 50 percent chance. The squares are all numbered independently of each other. Every square with a zero I color red, while every square with a one I color blue.

Once I’m done painting, there will be many “clusters” of contiguous red and blue squares. For example, consider the finite strip of canvas below. It contains 10 total squares and seven clusters, which means the average size of a cluster here is approximately 1.43 squares.

In [54]:
# My intuition is just to simulate this.
def color_square():
    return np.random.randint(0,2)

def count_consecutive_colors(df):
    # Create a group identifier that increments every time the color changes
    df['group'] = (df['colors'] != df['colors'].shift()).cumsum()
    
    # Group by the original color and the new group identifier, then calculate the mean
    result = df.groupby(['colors', 'group']).count()
    
    return result

def run_sim(n_sim=1000):
    results = list()
    for i in range(n_sim):
        results.append(color_square())
    colors = ["blue" if x == 1 else 'red' for x in results]
    results = pd.DataFrame({'numbers': results, 'colors': colors})
    
    return results


In [56]:
results = run_sim(100000)
counts = count_consecutive_colors(results)
counts.mean()

numbers    1.993303
dtype: float64

_**Answer: Approaches 2**_

Once again, I’m painting an infinitely long strip of canvas, broken up into adjacent 1 cm-by-1 cm squares. Squares are randomly and independently numbered 0 or 1 as before. But this time, the strip itself is 2 cm wide.

Squares are considered adjacent if they share a common edge. So squares can be horizontally or vertically adjacent, but not diagonally adjacent.

Once I’m done painting, there will again be many “clusters” of contiguous red and blue squares. The example below contains 20 total squares and nine clusters, which means the average size of a cluster here is approximately 2.22 squares.`

In [68]:
# simulate two strips
strip_1 = run_sim(100000)
strip_2 = run_sim(100000)

# assign like groups for 1:
strip_1['groups'] = (strip_1['colors'] != strip_1['colors'].shift()).cumsum()

# add in the colors from 2:
strip_1['colors_2'] = strip_2['colors']

# assign counts:
strip_1['count'] = strip_1.apply(lambda x: 2 if x['colors'] == x['colors_2'] else 1, 1)

# groupby colors and groups then sum counts and take mean:
strip_1.groupby(['colors', 'groups']).sum()['count'].mean()

  strip_1.groupby(['colors', 'groups']).sum()['count'].mean()


3.00250100040016

_**Answer: Approaches 3**_