##### Why use mathematics to model?
<!-- http://philsci-archive.pitt.edu/16635/1/Greslehner_2018_41st_Wittgenstein_Symposium.pdf -->

Complexity of systems like those in biology calls for mathematical methods. Indeed we now have 'systems biology' as a discipline.


# Modelling philosophy

##### Simplicity is a priority!! (at least first)

Occam's razor: the principle of parsimony. "Entities must not be multiplied beyond necessity"

Don't get tempted to do so much that we can't unpack the whys

Don't use two parameters if one will do. Don't look at three-dimensions until you have mastered one- and two-dimensions

We don't want an exact replica of the system - we already have that, which is why we need a model! (Shameless plug: [My TedX talk](https://www.youtube.com/watch?v=Pgm65hYglpg))

Schelling: segregation is obviously a very complex problem but the model used to explore how segregation could emerge was very simple. 

### Model assumptions

Simple requires assumptions, which are necessary with complex systems. Without them (good) modelling is not be feasible

Assumptions abstract away unnecessary details that may not significantly affect the behaviour of the system being modelled. This helps reveal the underlying patterns and dynamics that are essential for understanding the system's behaviour and making predictions

**Assumptions help to:**
- manage inevitable uncertainties, allowing for sensitivity analyses and exploration of different scenarios
- generalise a model such that it is more broadly applicable and can provide insights across different contexts
- ensure model behaviour is interpretable

The key to making good assumptions is to be transparent about them and to understand their impact

Schelling: one- and then two- dimensions, grid, 2 agent types, single reason to move, single focus parameter etc... Very simple, and yet, it's highly complex!

### Model components

- **Independent variable:** usually time as we tend to think about systems evolving in time (even if there are multiple independent variables as with wave propagation and heat diffusion). This is our way of thinking about causality — a system's state depends on its previous states in time. Time can be discrete, or aligned with events

- **Structure:** rules of the system. Format of the rules can be sentences, algorithms, equations ((O, S, P)DEs, difference equations)

- **Data:** serves as an input to the rules. Could be as simple as an initial condition but could be continuously interacting. Real or contrived

- **Inputs:** parameters, degrees of freedom

- **Outputs:** Various abstractions of complete behaviour. You'll have to work hard to explore the output in a dynamic way (interacting with the system to understand it)
<!--     - Most analysis is static
    - Transformation - change of perspective
    - Metric to summarise
    - Whatever we do here we aim for insight -->

##### Modelling is a dynamic and interactive process that lets us view the system from many angles

Insight comes from interacting with a model. If you take away only one thing from this unit, I hope it's this!

<!-- We do not simply build a model and step back to admire our work -->
<center>
<img src="LadderOfAbstraction.png" width="400"/>
</center>

[Bret Victor's Up and down the ladder of abstraction](http://worrydream.com/LadderOfAbstraction/):


> *"If you move to a new city, you might learn the territory by walking around. Or you might peruse a map. But far more effective than either is both together — a street-level experience with higher-level guidance. Likewise, the most powerful way to gain insight into a system is by moving between levels of abstraction."*

Building these different levels of perspective and moving between them is what (good) modelling is. Climb up to get a big picture and see high-level patterns, climb down to explain those patterns. 

# Parable models

## The OG Parable of the Polygons model
- **World:** square grid of size $20\times 20$ with fixed boundary
<!-- could be infinite or we might define special conditions for the boundary. Discretisation could be something other than squares -->

- **Individuals/'agents':** some individuals endowed with a state 

- **State:** 
    - of agent: yellow or blue and grid location. 
    - of grid site: occupied or empty
<!--     could be selected from a set of discrete options, or be one of a continuum of possible values -->

- **Initialisation:** random configuration but we aren't sure exactly how state is assigned. In two rounds of counting the number of each: $[N_e, N_y, N_b]=[61, 157, 182], [80, 153, 167]$. This is a good lesson on reproducibility! 
<!-- Could be anything provided it is within the confines of the world and state definitions above. We will frequently initialise things randomly. -->

- **Neighbourhood:** occupied sites in Moore neighbourhood, not including self. 
<!-- (typically) local interaction network for an individual. Symmetric/asymmetric, include one self/or not.  -->

- **Dynamics:** "I wanna move if less than 1/3 of my neighbors are like me." I get to move if randomly selected to do so. My move is to a random unoccupied location.
<!-- How the state changes. e.g. equation, rule or algorithm. The system rules are often described algorithmically rather than solely through mathematical formulations, which gives flexibility and an ability to implement more complex rules.

Flexibility comes at expense of analytical tractability and the options for analysis tend to be statistical.
-->

- **Analysis:** 'Segregation' evolving in time but exact definition wasn't provided (grrrr: this is bad even for sci-comms).

## Our model of the Parable of the Polygons

Let's keep it similar to the Parable of Polygons but be more explicit about our assumptions:
- **World:** square grid of size $20\times 20$ with fixed boundary
- **Individuals:** quantity depends on initialisation
- **State:** of individual is ({yellow, blue}, location) 
- **Initialisation:** each grid site is randomly assigned as
    - empty with probability <font color='red'>$p_e$</font>  
    - remaining sites are either blue or yellow, chosen by flipping a coin
- **Neighbourhood:** <font color='red'>Moore, not including self</font>. Entirely summarised by similarity ratio $\frac{n_s}{n}$, where $n_s$ are the number of similar neighbours in the neighbourhood and $n = \{1,\ldots,8\}$ is the number of total neighbours
<!-- On average $n$ depends on the density. With $n_s$ similar neighbours in the neighbourhood the possible similarity ratios are: $\frac{n_s}{n}, \ldots$. -->
- **Dynamics:** Based on local neighbourhood similarity ratio and a threshold $\theta_r$. Agent is unhappy and will want to move if <font color='red'> $\frac{n_s}{n}<\theta_r$</font>. Randomly selected an unhappy agent will move to random location
<!-- - **Analysis:** We will track the <font color='red'> mean similarity ratio</font> -->

### Qualitative analysis
We don't need the bells and whistles, and of course we start small

Direct and interactive control of the independent variable (time) allows us to ensure things are functioning as we expect

<div align="center">
    <video width="380" 
           src="Schelling_grid5_1.mov"  
           controls>
    </video>
</div>

Looks like segregation... is it?

### Quantitative analysis

We want to measure the amount of segregation so we better actually define it

*Definition:* Segregation is the degree to which two or more groups live separately from one another

But groups can live apart and be 'segregated' in a variety of ways.

**What do you expect a good measure of segregation would give for the following scenarios?**
<!-- Figure for darkmode -->
<center>
<img src="SegregationPatterns.png" width="600"/>
</center>

**What are the properties a good measure of the distribution of individuals should capture?**

The example grids have varied combinations of these five distributional characteristics (or other 'dimensions' depending on how you define them):
- evenness: how a given population group is spread, maximised when the local distribution reflects the global distribution
- exposure: the amount of contact that occurs with other population groups
- clustering: the number of groups of a given population
- concentration: captures the physical space occupied
- centralisation: around an urban core. 

[Massey and Denton (1988) for a systematic evaluation of 20 measures](https://academic.oup.com/sf/article/67/2/281/2231999)

<!-- <center>
<img src="SegregationAxes.png" width="400"/>
</center>(Fig adapted from here: https://www.zef.de/fileadmin/user_upload/0bda_Feitosaetal.pdf) -->


The 'dimensions' capture different facets of segregation, each with different social and behavioural implications. And of course there are many overlaps e.g. a centralised group is likely to have a high clustering.

The 'best' measure or index for segregation has been hotly debated and the reality is that segregation should probably be measured with several indices simultaneously. Maybe this is why the Parable of the Polygon's doesn't tell us exactly how they calculate segregation!

##### One option for quantifying segregation
The mean similarity ratio:
$$\mathcal{MSR}=\frac{1}{N}\sum_i^N \frac{n_s}{n},$$
where $n_s$, $n_e$, $n$ denote the number of similar, empty and total neighbours in a Moore neighbourhood of the individual $i$. Hence, $n=9-1-n_e$

<font color='red'> This is what we will *choose*</font> (and what Schelling used). But there are many other choices

<!-- As $N\rightarrow \infty$, the minimum $\mathcal{S}$ (perfect mixing) approaches 0.5. But smaller values can occur for finite grids. -->

We'd prefer our measure varied between 0 and 1, where 0 indicates a completely integrated and diverse population and 1 indicates complete segregation because it's more directly comparable to other measures and easier to interpret (can normalise). 

Returning to our example grids:

<center>
<img src="SegregationPatterns_MSR.png" width="600"/>
</center>

In [2]:
#Note: 0 is considered an empty cell and is not included in the computation of MSR.

grid1 = np.array([[2, 2, 2, 2, 2, 2],
                 [2, 1, 1, 1, 1, 2],
                 [2, 1, 1, 1, 1, 2],
                 [2, 1, 1, 1, 1, 2],
                 [2, 1, 1, 1, 1, 2],
                 [2, 2, 2, 2, 2, 2]])
                 
grid2 = np.array([[2, 1, 2, 1, 2, 1],
                 [1, 2, 1, 2, 1, 2],
                 [2, 1, 2, 1, 2, 1],
                 [1, 2, 1, 2, 1, 2],
                 [2, 1, 2, 1, 2, 1],
                 [1, 2, 1, 2, 1, 2]])

grid3 = np.array([[1, 1, 1, 1, 1, 1],
                 [2, 2, 2, 2, 2, 2],
                 [2, 2, 2, 2, 2, 2],
                 [2, 2, 2, 2, 2, 2],
                 [2, 2, 2, 2, 2, 2],
                 [1, 1, 1, 1, 1, 1]])
                 
grid4 = np.array([[1, 1, 1, 2, 2, 2],
                 [1, 1, 1, 2, 2, 2],
                 [1, 1, 1, 2, 2, 2],
                 [1, 1, 1, 2, 2, 2],
                 [1, 1, 1, 2, 2, 2],
                 [1, 1, 1, 2, 2, 2]]) 
                 
grid5 = np.array([[1, 1, 1, 2, 1, 1],
                 [1, 2, 2, 2, 2, 1],
                 [2, 2, 1, 1, 2, 2],
                 [1, 2, 1, 2, 1, 2],
                 [2, 2, 2, 2, 1, 1],
                 [1, 1, 1, 2, 2, 1]])  

MSR1 = get_mean_similarity_ratio(grid1)
MSR2 = get_mean_similarity_ratio(grid2)
MSR3 = get_mean_similarity_ratio(grid3)
MSR4 = get_mean_similarity_ratio(grid4)
MSR5 = get_mean_similarity_ratio(grid5)

print("Mean similarity ratios:", MSR1, MSR2, MSR3, MSR4, MSR5)

NameError: name 'np' is not defined

Multiple runs:

<div align="center">
    <video width="800"  
           src="Schelling_grid5_MSR_2.mov"  
           controls>
    </video>
</div>

A bigger grid:
<div align="center">
    <video width="800"  
           src="Schelling_grid20_MSR_3.mov"  
           controls>
    </video>
</div>


## The parable II

Now what...?

We have replicated the Parable of the Polygons
<!-- A few simulations are never going to tell us what happens but by playing with the simulations we can gain a holistic understanding of what's going to happen in different scenarios. -->

It's not even close to being enough.

### An ensemble of simulations
We've looked at a few simulations one-by-one 

Let's look at many, all at once. i.e. we can abstract over many initialisations (up the ladder)

This is called an *ensemble* of simulations.

In [8]:
show(MSR_lines)

NameError: name 'MSR_lines' is not defined

<center>
<img src="MSR_lines.png" width="600"/>
</center>

### Varying important parameters
Why is the neighbourhood threshold 33%?

It's a good start but this is clearly a a crucial parameter so we must understand how the system behaviour changes when it does.

In [9]:
# Remove the x-axis label of MSR_lines2
MSR_lines.xaxis.axis_label = ""
MSR_lines2.xaxis.axis_label = "Simulation steps"
MSR_lines.yaxis.axis_label = "MSR"
MSR_lines2.yaxis.axis_label = "MSR"
# Change the size of plot1
MSR_lines.width = 800  # New width in pixels
MSR_lines.height = 250  # New height in pixels

# Change the size of plot2
MSR_lines2.width = 800  # New width in pixels
MSR_lines2.height = 250  # New height in pixels

# Now show the modified plots
show(column(MSR_lines, MSR_lines2))

NameError: name 'MSR_lines' is not defined

A different neighbourhood threshold:
<center>
<img src="MSR_lines_x2.png" width="550"/>
</center>

But now we effectively have two slider bars and things are getting a little complicated.

What do you see that's worth diving into further?

<!-- - time to steady state
- MSR at steady state -->

Let's look at how the MSR changes as a function of the threshold percentage (up the ladder)

This is called a *parameter sweep*.

Is this what you expected? Why does the amount of segregation decrease as the individuals become very homophilic?

What's happening when the threshold is high? Back down the ladder... 
<div align="center">
    <video width="800"  
           src="Schelling_grid5_Threshold80_4.mov"  
           controls>
    </video>
</div>

Back up the ladder to get a big picture of the system's difficulty reaching a steady configuration:

Note that I've set the maximum number of iterations here to 500. We'll now need to let this be larger to see if the plot trend continues/what its shape is, or come up with a smarter way to understand the stability of the system (e.g. we could look at something like the mean individual unhappiness).

### How could we continue to improve our understanding?
A larger grid...? 
- More moves will be required to reach a stable configuration. Can we get a feel for how many moves before we do a whole lot of computation that is insufficient/excessive?
    - Plot of threshold vs mean time to steady state for various grid sizes
    - The maximum number of iterations could be a function of the threshold

More thresholds...?
- They're not all equally interesting.
    - We can do a finer graining of the threshold around the transitions 

Time for us to have a break, and our machines to go to work

<center>
<img src="MidJ_Coffee_Heimo_Zobernig.png" width="600"/>
</center> 
<div style="text-align: right"> - style of Heimo Zobernig </div>

This is why you must get started on your Projects early!

## Now we play...
- More agent types
- Noisy options for movement
- Heterogeneity in threshold
- Underlying topology of map
- ...

# Next session: 
- Your turn! You'll implement your own version of the Schelling model during the Workshop 
- Guest lecture from Professor Michael Small

# Code

## Setup

### Import stuff

In [3]:
import random
import numpy as np


# from bokeh.io import output_notebook, show
from bokeh.plotting import figure, show, output_notebook, save, output_file
from bokeh.models import ColumnDataSource, ColorBar, LinearColorMapper, Button, Div, GlyphRenderer, HoverTool
from bokeh.layouts import column, row
from bokeh.application.handlers import FunctionHandler
from bokeh.application import Application

### Function definitions

In [4]:
# Function to find dissatisfied indices
def find_dissatisfied_indices(grid):
    # Indices of dissatisfied agents
    dissatisfied_indices = []

    for idx in range(grid_size ** 2):
        i, j = divmod(idx, grid_size)

        if grid[i, j] != 0:  # Skip empty cells
            neighborhood = grid[max(0, i - 1):min(grid_size, i + 2), max(0, j - 1):min(grid_size, j + 2)]
            neighborhood_size = np.sum([len(arr) for arr in neighborhood])
            similar_neighbors = np.count_nonzero(grid[max(0, i - 1):min(grid_size, i + 2), max(0, j - 1):min(grid_size, j + 2)] == grid[i, j]) - 1
            empty_neighbors = np.count_nonzero(grid[max(0, i - 1):min(grid_size, i + 2), max(0, j - 1):min(grid_size, j + 2)] == 0)
            total_neighbors = neighborhood_size - empty_neighbors - 1
            
            # Handle the case where an agent has no neighbors
            if total_neighbors == 0:
                local_mix = 0
            else:
                local_mix = 100 * similar_neighbors / total_neighbors
            
            if local_mix < threshold_percentage:
                dissatisfied_indices.append(idx)

    return dissatisfied_indices





def is_steady_state(previous_grid, current_grid):
    return np.array_equal(previous_grid, current_grid)

In [5]:
def run_simulations(N_sims, grid_size, all_states_percentages, max_iterations):
    
    # Initialize a list to store all mean similarity ratio time series for each simulation
    all_mean_similarity_ratios = []

    for k in range(N_sims):
#         print('k:', k)
        # Reset the grid to a random initial state
        grid = np.random.choice([0, 1, 2], size=(grid_size, grid_size), p=all_states_percentages)

        # ensure grid wasn't initialised with no free space for movement
        non_zero_count = np.count_nonzero(grid)
        while non_zero_count==grid_size**2:
            grid = np.random.choice([0, 1, 2], size=(grid_size, grid_size), p=all_states_percentages)
            non_zero_count = np.count_nonzero(grid)
            
        # Initialize previous_grid for steady state detection
        previous_grid = np.copy(grid)

        # Initialize a list to track the mean similarity ratio for each update step
        mean_similarity_ratios = [get_mean_similarity_ratio(grid)]

        # Run the simulation until a steady state is reached or a maximum number of iterations is reached
        iterations = 0

        while iterations < max_iterations:
            dissatisfied_indices = find_dissatisfied_indices(grid)

            if not dissatisfied_indices:  # Check if the list is empty
                break

            selected_idx = random.choice(dissatisfied_indices)  # Randomly select a dissatisfied index

            # Update the grid by using the selected_idx
            i, j = divmod(selected_idx, grid_size)
            empty_cells = np.argwhere(grid == 0)
            if empty_cells.shape[0] > 0:
                new_grid = np.copy(grid)  # Define new_grid and initialize it with a copy of the current grid
                # Randomly select a new location among the available free locations
                random_index = random.randint(0, empty_cells.shape[0] - 1)
                new_location = empty_cells[random_index]

                new_grid[new_location[0], new_location[1]] = grid[i, j]
                new_grid[i, j] = 0  # Empty the previous location

                grid = new_grid

                # Check for steady state
                if is_steady_state(previous_grid, grid):
                    break

                previous_grid = np.copy(grid)  # Update previous_grid for steady state detection

                iterations += 1

                # Calculate mean similarity ratio for the current state and add to the list
                mean_similarity_ratios.append(get_mean_similarity_ratio(grid))


        # Add the mean similarity ratio time series to the list for this simulation
        all_mean_similarity_ratios.append(mean_similarity_ratios)
    
    return all_mean_similarity_ratios

In [6]:
#FOR DARKMODE>>

def darkmode(p):
    # Set the background fill alpha to make it transparent (0 means completely transparent, 1 means opaque)
    p.background_fill_alpha = 0.0

    # Set the figure border fill alpha to make it transparent
    p.border_fill_alpha = 0.0

    # Set the text color for the x-axis label to white
    p.xaxis.axis_label_text_color = "white"

    # Set the text color for the y-axis label to white
    p.yaxis.axis_label_text_color = "white"

    # Set the text color for the title to white
    p.title.text_color = "white"

    
    try:
        # Get the legend object from the figure
        legend = p.legend[0]
    #     print("Legend exists.")
        # If the legend exists, apply the formatting options

        # Get the legend object from the figure
        legend = p.legend[0]

        # Set the background fill alpha of the legend to make it transparent (0 means completely transparent, 1 means opaque)
        legend.background_fill_alpha = 0.0

        # Set the text color for the legend labels to white
        legend.label_text_color = "white"

        # Set the font size of the legend labels
        legend.label_text_font_size = "14pt"

        # Set the border line alpha of the legend to make the border transparent when muted (0 means completely transparent, 1 means opaque)
        legend.border_line_alpha = 0.0

    except IndexError:
        pass


    # Set the text color for the x-axis major ticks to white
    p.xaxis.major_label_text_color = "white"

    # Set the text color for the y-axis major ticks to white
    p.yaxis.major_label_text_color = "white"

    # Set the color for the x-axis major tick marks to white
    p.xaxis.major_tick_line_color = "white"

    # Set the color for the y-axis major tick marks to white
    p.yaxis.major_tick_line_color = "white"

    # Set the color for the x-axis minor tick marks to white (optional)
    p.xaxis.minor_tick_line_color = "white"

    # Set the color for the y-axis minor tick marks to white (optional)
    p.yaxis.minor_tick_line_color = "white"

    # Set the color for the x-axis line to white
    p.xaxis.axis_line_color = "white"

    # Set the color for the y-axis line to white
    p.yaxis.axis_line_color = "white"


    #MAKE PRETY FOR SLIDES...
    # Increase the size of the x-axis label
    p.xaxis.axis_label_text_font_size = "16pt"

    # Increase the size of the y-axis label
    p.yaxis.axis_label_text_font_size = "16pt"

    # Increase the size of the plot title
    p.title.text_font_size = "20pt"

    # Increase the size of the x-axis tick labels
    p.xaxis.major_label_text_font_size = "14pt"

    # Increase the size of the y-axis tick labels
    p.yaxis.major_label_text_font_size = "14pt"
    
    return p

In [7]:
def get_mean_similarity_ratio(grid):
    count = 0
    similarity_ratio = 0
    grid_size = int(np.sqrt(np.size(grid)))

    for idx in range(grid_size ** 2):
        i, j = divmod(idx, grid_size)
        race = grid[i, j]
        if race != 0:
            neighborhood = grid[max(0, i - 1):min(grid_size, i + 2), max(0, j - 1):min(grid_size, j + 2)]
            neighborhood_size = np.size(neighborhood)
            n_empty_houses = len(np.where(neighborhood == 0)[0])
            if neighborhood_size != n_empty_houses + 1:
                n_similar = len(np.where(neighborhood == race)[0]) - 1
                similarity_ratio += n_similar / (neighborhood_size - n_empty_houses - 1.)
                count += 1
    return similarity_ratio / max(1,count)

### Model parameters


In [7]:
# NB: this doesn't seem to be working with Bokeh...?

# Set the seed for Python random number generator
seed_value = 1
random.seed(seed_value)

# Set the seed for numpy random number generator
np.random.seed(seed_value)


In [8]:
grid_size = 5
threshold_percentage = 50  # The minimum percentage of similar neighbors an agent desires
empty_percentage = 0.15
all_states_percentages = [empty_percentage, (1-empty_percentage)/2, (1-empty_percentage)/2] #[empty, yellow, blue]

### Save images (NOT WORKING)

In [14]:
from bokeh.plotting import figure
from bokeh.io import export_svgs
import svglib.svglib as svglib
from reportlab.graphics import renderPDF

test_name = 'bokeh_to_pdf_test'

# Example plot p
p = figure(width=400, height=400, tools="")
p.circle(list(range(1,6)),[2, 5, 8, 2, 7], size=10)
# See comment 1
p.xaxis.axis_label_standoff = 12
p.xaxis.major_label_standoff = 12

# step 1: bokeh save as svg
p.output_backend = "svg"
export_svgs(p, filename = test_name + '.svg')

# see comment 2
svglib.register_font('helvetica', '/home/fonts/Helvetica.ttf')
# step 2: read in svg
svg = svglib.svg2rlg(test_name+".svg")

# step 3: save as pdf
renderPDF.drawToFile(svg, test_name+".pdf")

The version of chrome cannot be detected. Trying with latest driver version


RuntimeError: Neither firefox and geckodriver nor a variant of chromium browser and chromedriver are available on system PATH. You can install the former with 'conda install -c conda-forge firefox geckodriver'.

<center>
<img src="MSR_lines.pdf" width="600"/>
</center>

## Simulation details

### Qualitative analysis

In [9]:
#Schelling simulation with Update and Back functionality
#NB: Update->Back->Update doesn't work as expected. 
grid_size = 5
threshold_percentage = 80

# Output the plot to the notebook
output_notebook()

# Create the grid
grid = np.random.choice([0, 1, 2], size=(grid_size, grid_size), p=all_states_percentages)
color_mapper = LinearColorMapper(palette=['white', 'yellow', 'blue'], low=0, high=2)

# Data sources for Bokeh
x, y = np.meshgrid(np.arange(grid_size), np.arange(grid_size))
source = ColumnDataSource(data=dict(x=x.flatten(), y=y.flatten(), color=grid.flatten(), index=(y * grid_size + x).flatten()))


# Create a separate ColumnDataSource for dissatisfied individuals' crosses
dissatisfied_source = ColumnDataSource(data=dict(x=[], y=[]))
red_crosses_source = ColumnDataSource(data=dict(x=[], y=[]))

# Create a list to keep track of grid states and an index variable
grid_states = [grid]
current_state_index = 0

# Initialize a list to track the mean similarity ratio for each update step
mean_similarity_ratios = [get_mean_similarity_ratio(grid)]

# Variable to store the selected dissatisfied index
selected_idx = None

# Function to update the grid and redraw the plot
def update_grid():
    global grid, current_state_index, grid_states

    dissatisfied_indices = find_dissatisfied_indices(grid)
    red_crosses_source.data = dict(x=[], y=[])  # Clear previous dissatisfied agent data

    if not dissatisfied_indices:  # Check if the list is empty
        return

#     selected_idx = dissatisfied_indices[0]  # Use the first dissatisfied index
    selected_idx = random.choice(dissatisfied_indices)  # Randomly select a dissatisfied index

    # Update the grid by using the selected_idx
    i, j = divmod(selected_idx, grid_size)
    empty_cells = np.argwhere(grid == 0)
    if empty_cells.shape[0] > 0:
        new_grid = np.copy(grid)  # Define new_grid and initialize it with a copy of the current grid
#         new_location = empty_cells[0]
        # Randomly select a new location among the available free locations
        random_index = random.randint(0, empty_cells.shape[0] - 1)
        new_location = empty_cells[random_index]
        
        new_grid[new_location[0], new_location[1]] = grid[i, j]
        new_grid[i, j] = 0  # Empty the previous location

        grid = new_grid

        # Find the indices of the next possible dissatisfied agents
        next_dissatisfied_indices = find_dissatisfied_indices(grid)

        # Update the red crosses data
        next_dissatisfied_x = [int(idx % grid_size) for idx in next_dissatisfied_indices]
        next_dissatisfied_y = [int(idx / grid_size) for idx in next_dissatisfied_indices]
        red_crosses_source.data = dict(x=next_dissatisfied_x, y=next_dissatisfied_y)

        # Update the mean similarity ratio data
        mean_similarity_ratios.append(get_mean_similarity_ratio(grid))
        mean_similarity_source.data = dict(x=list(range(len(mean_similarity_ratios))), y=mean_similarity_ratios)

    # Save the new grid state and update the current state index
    current_state_index += 1
    if current_state_index < len(grid_states):
        grid_states[current_state_index] = grid
    else:
        grid_states.append(grid)

    # Trim the grid_states list to remove any forward history when 'back' button is used
    grid_states = grid_states[:current_state_index + 1]

    # Update the data source with the new grid state
    source.data = dict(x=x.flatten(), y=y.flatten(), color=grid.flatten(), index=(y * grid_size + x).flatten())

# Function to move one step back in the grid state history
def move_back():
    global current_state_index, grid, source, grid_states, red_crosses_source, mean_similarity_ratios
    if current_state_index > 0:
        current_state_index -= 1
        grid = np.copy(grid_states[current_state_index])
        source.data = dict(x=x.flatten(), y=y.flatten(), color=grid.flatten(), index=(y * grid_size + x).flatten())
        
        # Update the red crosses data based on the current grid state
        dissatisfied_indices = find_dissatisfied_indices(grid)
        next_dissatisfied_x = [int(idx % grid_size) for idx in dissatisfied_indices]
        next_dissatisfied_y = [int(idx / grid_size) for idx in dissatisfied_indices]
        red_crosses_source.data = dict(x=next_dissatisfied_x, y=next_dissatisfied_y)

        # Update the mean_similarity_ratios and mean_similarity_source
        mean_similarity_ratios.pop()  # Remove the mean similarity ratio of the current state
        mean_similarity_source.data = dict(x=list(range(len(mean_similarity_ratios))), y=mean_similarity_ratios)


# Create the Bokeh figure for the grid plot
plot_title = f"Threshold percentage: {threshold_percentage}%"
p = figure(width=400, height=400, toolbar_location=None, x_range=(-0.5, grid_size - 0.5), y_range=(-0.5, grid_size - 0.5))
p.rect(x='x', y='y', width=1, height=1, source=source, line_color='black', fill_color={'field': 'color', 'transform': color_mapper})

# Add a cross (plus sign) at the center of dissatisfied individuals' grids
p.square(x='x', y='y', size=15, line_color='red', fill_color='red', source=red_crosses_source)

# Add text annotations to label each grid cell with its index
# p.text(x='x', y='y', text='index', text_baseline='middle', text_align='center', source=source, text_color='black', text_font_size='10pt')

# Create the Bokeh figure for the mean similarity ratio plot
mean_similarity_source = ColumnDataSource(data=dict(x=[0], y=[mean_similarity_ratios[0]]))
mean_similarity_plot = figure(width=400, height=400, x_axis_label="Simulation steps", y_axis_label="Mean similarity ratio (MSR)")
mean_similarity_plot.line('x', 'y', source=mean_similarity_source, line_color='lightblue', line_width=3)

# Create the Bokeh button for updating the grid
button_update = Button(label="Update")
button_update.on_click(update_grid)

# Create the Bokeh button for moving one step back in the grid state history
button_back = Button(label="Back")
button_back.on_click(move_back)

p = darkmode(p)
mean_similarity_plot = darkmode(mean_similarity_plot)

# Create a layout for the plot, buttons, and title
title_div = Div(text=f"<h1>{plot_title}</h1>", width=400, height=50)
grid_plot_layout = row(p, mean_similarity_plot)
buttons_layout = row(button_back, button_update)
layout = column(title_div, grid_plot_layout, buttons_layout)

# Function to create the Bokeh app
def modify_doc(doc):
    doc.add_root(layout)

# Create the Bokeh app
app = Application(FunctionHandler(modify_doc))

show(app, notebook_url='localhost:8888')

In [40]:
#Reset functionality for multiple grids...
#Would prefer formatting of mean_similarity_plot to remain when 'Reset'

grid_size = 20 #5
threshold_percentage = 80
max_iterations=10000 #100 Increase the number of iterations for larger grid

# Create the grid
grid = np.random.choice([0, 1, 2], size=(grid_size, grid_size), p=all_states_percentages)
color_mapper = LinearColorMapper(palette=['white', 'yellow', 'blue'], low=0, high=2)

# Data sources for Bokeh
x, y = np.meshgrid(np.arange(grid_size), np.arange(grid_size))
source = ColumnDataSource(data=dict(x=x.flatten(), y=y.flatten(), color=grid.flatten(), index=(y * grid_size + x).flatten()))


# Create a separate ColumnDataSource for dissatisfied individuals' crosses
dissatisfied_source = ColumnDataSource(data=dict(x=[], y=[]))
red_crosses_source = ColumnDataSource(data=dict(x=[], y=[]))

# Create a list to keep track of grid states and an index variable
grid_states = [grid]
current_state_index = 0


# Initialize a list to track the mean similarity ratio for each update step
mean_similarity_ratios = [get_mean_similarity_ratio(grid)]

# Initialize a list to store all mean similarity ratio time series
all_mean_similarity_ratios = [mean_similarity_ratios[:]]

# Variable to store the selected dissatisfied index
selected_idx = None

# Create the Bokeh figure for the mean similarity ratio plot
mean_similarity_source = ColumnDataSource(data=dict(x=list(range(len(mean_similarity_ratios))), y=mean_similarity_ratios))
mean_similarity_plot = figure(width=400, height=400, x_axis_label="Simulation steps", y_axis_label="Mean similarity ratio (MSR)")
mean_similarity_plot.line('x', 'y', source=mean_similarity_source, line_width=2, color='blue')  # Set the line color to blue

# Create the data source for all mean similarity ratios (including history)
all_mean_similarity_source = ColumnDataSource(data=dict(x=[0], y=[mean_similarity_ratios[0]]))

# Add the mean similarity line for the current simulation (latest run) with full opacity and blue color
mean_similarity_line = mean_similarity_plot.line('x', 'y', source=mean_similarity_source, line_width=2, color='blue')

# Function to update the grid and redraw the plot
def update_grid():
    global grid, current_state_index, grid_states, selected_idx, all_mean_similarity_source

    # Run the simulation until a steady state is reached or a maximum number of iterations is reached
    iterations = 0

    while iterations < max_iterations:
        dissatisfied_indices = find_dissatisfied_indices(grid)
        red_crosses_source.data = dict(x=[], y=[])  # Clear previous dissatisfied agent data

        if not dissatisfied_indices:  # Check if the list is empty
            break

        selected_idx = random.choice(dissatisfied_indices)  # Randomly select a dissatisfied index

        # Update the grid by using the selected_idx
        i, j = divmod(selected_idx, grid_size)
        empty_cells = np.argwhere(grid == 0)
        if empty_cells.shape[0] > 0:
            new_grid = np.copy(grid)  # Define new_grid and initialize it with a copy of the current grid
            # Randomly select a new location among the available free locations
            random_index = random.randint(0, empty_cells.shape[0] - 1)
            new_location = empty_cells[random_index]

            new_grid[new_location[0], new_location[1]] = grid[i, j]
            new_grid[i, j] = 0  # Empty the previous location



            # Find the indices of the next possible dissatisfied agents
            next_dissatisfied_indices = find_dissatisfied_indices(grid)

            # Update the red crosses data
            next_dissatisfied_x = [int(idx % grid_size) for idx in next_dissatisfied_indices]
            next_dissatisfied_y = [int(idx / grid_size) for idx in next_dissatisfied_indices]
            red_crosses_source.data = dict(x=next_dissatisfied_x, y=next_dissatisfied_y)

            # Update the mean similarity ratio data
            mean_similarity_ratios.append(get_mean_similarity_ratio(grid))
            mean_similarity_source.data = dict(x=list(range(len(mean_similarity_ratios))), y=mean_similarity_ratios)

            # Check for steady state
            if is_steady_state(grid, new_grid):
                # Update the data source with the new grid state
                source.data = dict(x=x.flatten(), y=y.flatten(), color=grid.flatten(), index=(y * grid_size + x).flatten())
                break
            
            grid = new_grid
            
            iterations += 1

    # Add a message if the steady state is not reached within the threshold
#     if iterations >= max_iterations:
#         print("Steady state not reached within the maximum iterations.")

    # Save the new grid state and update the current state index
    current_state_index += 1
    if current_state_index < len(grid_states):
        grid_states[current_state_index] = grid
    else:
        grid_states.append(grid)

    # Trim the grid_states list to remove any forward history when 'back' button is used
    grid_states = grid_states[:current_state_index + 1]
    
    # Update the data source with the new grid state
    source.data = dict(x=x.flatten(), y=y.flatten(), color=grid.flatten(), index=(y * grid_size + x).flatten())

    # Append the mean similarity ratio to the list of all ratios
    all_mean_similarity_source.stream(dict(x=[current_state_index], y=[mean_similarity_ratios[-1]]))
    

# Function to reset the grid to a random initial state
def reset_grid():
    global grid, current_state_index, grid_states, mean_similarity_ratios

    # Save the initial mean similarity ratio
    initial_mean_similarity_ratio = mean_similarity_ratios[0]

    # Reset the grid to a random initial state
    grid = np.random.choice([0, 1, 2], size=(grid_size, grid_size), p=all_states_percentages)

    # Reset the current state index and grid states list
    current_state_index = 0
    grid_states = [grid]

    # Clear the mean similarity ratios list and re-add the initial ratio
    mean_similarity_ratios.clear()
    mean_similarity_ratios.append(initial_mean_similarity_ratio)

    # Trim the grid_states list to remove any forward history when 'back' button is used
    grid_states = grid_states[:current_state_index + 1]

    # Update the data source with the new grid state
    source.data = dict(x=x.flatten(), y=y.flatten(), color=grid.flatten(), index=(y * grid_size + x).flatten())

    # Clear the mean similarity plot data source to remove previous data points
    mean_similarity_source.data = dict(x=[], y=[])


# Create the Bokeh button for resetting the grid
button_reset = Button(label="Reset")
button_reset.on_click(reset_grid)


# Create the Bokeh figure for the grid plot
plot_title = f"Threshold percentage: {round(threshold_percentage,2)}%"
p = figure(width=400, height=400, toolbar_location=None, x_range=(-0.5, grid_size - 0.5), y_range=(-0.5, grid_size - 0.5))
p.rect(x='x', y='y', width=1, height=1, source=source, line_color='black', fill_color={'field': 'color', 'transform': color_mapper})

# Add a cross (plus sign) at the center of dissatisfied individuals' grids
# p.square(x='x', y='y', size=15, line_color='red', fill_color='red', source=red_crosses_source)

# Add text annotations to label each grid cell with its index
# p.text(x='x', y='y', text='index', text_baseline='middle', text_align='center', source=source, text_color='black', text_font_size='10pt')

# Create the Bokeh figure for the mean similarity ratio plot
mean_similarity_plot = figure(width=400, height=400, x_axis_label="Simulation steps", y_axis_label="Mean similarity ratio (MSR)")
mean_similarity_source = ColumnDataSource(data=dict(x=[0], y=[mean_similarity_ratios[0]]))
mean_similarity_plot.line('x', 'y', source=mean_similarity_source, line_width=2, color='blue')

# Create the Bokeh button for updating the grid
button_run = Button(label="Run")
button_run.on_click(update_grid)

p=darkmode(p)
mean_similarity_plot=darkmode(mean_similarity_plot)

# Create a layout for the plot, buttons, and title
title_div = Div(text=f"<h1>{plot_title}</h1>", width=400, height=50)
grid_plot_layout = row(p, mean_similarity_plot)
buttons_layout = row(button_run, button_reset)
layout = column(title_div, grid_plot_layout, buttons_layout)

# Function to create the Bokeh app
def modify_doc(doc):
    doc.add_root(layout)


# Create the Bokeh app
app = Application(FunctionHandler(modify_doc))

# Output the plot to the notebook
output_notebook()

show(app, notebook_url='localhost:8888')

### Quantitative analysis

In [9]:
#Ensemble runs...

threshold_percentage = 33
N_sims = 20

grid_size=20
max_iterations=300 # Increase the number of iterations

output_notebook()  # To display Bokeh plots inline in the Jupyter Notebook


# Run 20 simulations
all_mean_similarity_ratios = run_simulations(N_sims, grid_size, all_states_percentages, max_iterations)

# Determine the maximum length among all simulations
sim_lengths = [len(ratios) for ratios in all_mean_similarity_ratios]
max_length = max(sim_lengths)

# Determine the mean simulation length
mean_sim_length = np.mean(sim_lengths)

#Determine mean final similarity ratio (MSR)
final_mean_similarity_ratio = [ratios[-1] for ratios in all_mean_similarity_ratios]
mean_final_MSR = np.mean(final_mean_similarity_ratio)

# Pad the shorter arrays with NaN values to match the maximum length
for i in range(len(all_mean_similarity_ratios)):
    curr_length = len(all_mean_similarity_ratios[i])
    if curr_length < max_length:
        padding_length = max_length - curr_length
        all_mean_similarity_ratios[i].extend([np.nan] * padding_length)

# Convert the list of lists into a 2D array
mean_similarity_array = np.array(all_mean_similarity_ratios)

# Compute the mean similarity ratio across all simulations for each step, ignoring NaN values
mean_similarity_across_simulations = np.nanmean(mean_similarity_array, axis=0)

# Create a Bokeh figure for the mean similarity ratio evolution
MSR_lines = figure(width=900, height=500, title=f"Threshold percentage: {round(threshold_percentage,2)}%", x_axis_label="Simulation steps", y_axis_label="Mean similarity ratio (MSR)")

# Add all mean similarity ratio lines to the same figure
for i, ratios in enumerate(all_mean_similarity_ratios):
    source = ColumnDataSource(data={'x': list(range(len(ratios))), 'y': ratios, 'sim_num': [i] * len(ratios)})
    line_color = 'white' 
    line_alpha = 0.8
    line = MSR_lines.line('x', 'y', source=source, line_width=2, line_alpha=line_alpha, color=line_color, legend_label="Simulations", muted_alpha=0.2)  # Add muted_alpha to control the alpha of the hidden lines
    
    # Add hover interactivity to highlight individual simulation lines
    hover = HoverTool(renderers=[line], tooltips=[("Simulation", f"Simulation {i+1}"), ("Step", "$index"), ("Mean similarity ratio", "@y")])
    MSR_lines.add_tools(hover)

# Add the mean similarity ratio line for all simulations
mean_source = ColumnDataSource(data={'x': list(range(len(mean_similarity_across_simulations))), 'y': mean_similarity_across_simulations})
mean_line = MSR_lines.line('x', 'y', source=mean_source, line_width=4, line_dash='dashed', color='plum', legend_label="Mean", muted_alpha=1, muted=True, visible=False)  # Add muted_alpha to control the alpha of the hidden line

# Add a blue vertical line at the mean simulation length
sim_length_line = MSR_lines.segment(x0=[mean_sim_length], y0=0, x1=[mean_sim_length], y1=1, line_width=4, line_color='lightblue', legend_label="Mean simulation length", muted_alpha=1, muted=True, visible=False)  # Add mutedted_alpha=1, _alpha to control the alpha of the hidden line

# Add a orange horizontal line at the mean final MSR
final_MSR_line = MSR_lines.segment(x0=0, y0=[mean_final_MSR], x1=max_length, y1=[mean_final_MSR], line_width=4, line_color='lightsalmon', legend_label="Mean final MSR", muted_alpha=1, muted=True, visible=False)  # Add muted_alpha to control the alpha of the hidden line


# Customize the legend
MSR_lines.legend.click_policy = "hide"  # Clicking the legend hides the corresponding line
MSR_lines.legend.label_text_font_size = "12pt"
MSR_lines.legend.location = "bottom_right"

MSR_lines = darkmode(MSR_lines)

#ADDITIONAL FORMATTING FOR THIS FIG
# Turn off both x-axis and y-axis grids
MSR_lines.xgrid.grid_line_color = None
MSR_lines.ygrid.grid_line_color = None

# Set the desired y-axis limits
y_min = 0.35  # Minimum value for the y-axis
y_max = 1  # Maximum value for the y-axis

# Using Range1d object to set y-axis limits
from bokeh.models import Range1d
MSR_lines.y_range = Range1d(start=y_min, end=y_max)

# Show the Bokeh plot inline within the Jupyter Notebook
show(MSR_lines)

In [10]:
#Ensemble runs...

threshold_percentage = 50
N_sims = 20

grid_size=20
max_iterations=300 # Increase the number of iterations

output_notebook()  # To display Bokeh plots inline in the Jupyter Notebook


# Run 20 simulations
all_mean_similarity_ratios = run_simulations(N_sims, grid_size, all_states_percentages, max_iterations)

# Determine the maximum length among all simulations
sim_lengths = [len(ratios) for ratios in all_mean_similarity_ratios]
max_length = max(sim_lengths)

# Determine the mean simulation length
mean_sim_length = np.mean(sim_lengths)

#Determine mean final similarity ratio (MSR)
final_mean_similarity_ratio = [ratios[-1] for ratios in all_mean_similarity_ratios]
mean_final_MSR = np.mean(final_mean_similarity_ratio)

# Pad the shorter arrays with NaN values to match the maximum length
for i in range(len(all_mean_similarity_ratios)):
    curr_length = len(all_mean_similarity_ratios[i])
    if curr_length < max_length:
        padding_length = max_length - curr_length
        all_mean_similarity_ratios[i].extend([np.nan] * padding_length)

# Convert the list of lists into a 2D array
mean_similarity_array = np.array(all_mean_similarity_ratios)

# Compute the mean similarity ratio across all simulations for each step, ignoring NaN values
mean_similarity_across_simulations = np.nanmean(mean_similarity_array, axis=0)

# Create a Bokeh figure for the mean similarity ratio evolution
MSR_lines2 = figure(width=900, height=500, title=f"Threshold percentage: {round(threshold_percentage,2)}%", x_axis_label="Simulation steps", y_axis_label="Mean similarity ratio (MSR)")

# Add all mean similarity ratio lines to the same figure
for i, ratios in enumerate(all_mean_similarity_ratios):
    source = ColumnDataSource(data={'x': list(range(len(ratios))), 'y': ratios, 'sim_num': [i] * len(ratios)})
    line_color = 'white' 
    line_alpha = 0.8
    line = MSR_lines2.line('x', 'y', source=source, line_width=2, line_alpha=line_alpha, color=line_color, legend_label="Simulations", muted_alpha=0.2)  # Add muted_alpha to control the alpha of the hidden lines
    
    # Add hover interactivity to highlight individual simulation lines
    hover = HoverTool(renderers=[line], tooltips=[("Simulation", f"Simulation {i+1}"), ("Step", "$index"), ("Mean similarity ratio", "@y")])
    MSR_lines2.add_tools(hover)

# Add the mean similarity ratio line for all simulations
mean_source = ColumnDataSource(data={'x': list(range(len(mean_similarity_across_simulations))), 'y': mean_similarity_across_simulations})
mean_line = MSR_lines2.line('x', 'y', source=mean_source, line_width=4, line_dash='dashed', color='plum', legend_label="Mean", muted_alpha=1, muted=True, visible=False)  # Add muted_alpha to control the alpha of the hidden line

# Add a blue vertical line at the mean simulation length
sim_length_line = MSR_lines2.segment(x0=[mean_sim_length], y0=0, x1=[mean_sim_length], y1=1, line_width=4, line_color='lightblue', legend_label="Mean simulation length", muted_alpha=1, muted=True, visible=False)  # Add mutedted_alpha=1, _alpha to control the alpha of the hidden line

# Add a orange horizontal line at the mean final MSR
final_MSR_line = MSR_lines2.segment(x0=0, y0=[mean_final_MSR], x1=max_length, y1=[mean_final_MSR], line_width=4, line_color='lightsalmon', legend_label="Mean final MSR", muted_alpha=1, muted=True, visible=False)  # Add muted_alpha to control the alpha of the hidden line


# Customize the legend
MSR_lines2.legend.click_policy = "hide"  # Clicking the legend hides the corresponding line
MSR_lines2.legend.label_text_font_size = "12pt"
MSR_lines2.legend.location = "bottom_right"

MSR_lines2 = darkmode(MSR_lines2)

#ADDITIONAL FORMATTING FOR THIS FIG
# Turn off both x-axis and y-axis grids
MSR_lines2.xgrid.grid_line_color = None
MSR_lines2.ygrid.grid_line_color = None

# Set the desired y-axis limits
y_min = 0.35  # Minimum value for the y-axis
y_max = 1  # Maximum value for the y-axis

# Using Range1d object to set y-axis limits
from bokeh.models import Range1d
MSR_lines2.y_range = Range1d(start=y_min, end=y_max)

# Show the Bokeh plot inline within the Jupyter Notebook
show(MSR_lines2)

In [11]:
# Change the size of plot1
MSR_lines.width = 900  # New width in pixels
MSR_lines.height = 300  # New height in pixels

# Change the size of plot2
MSR_lines2.width = 900  # New width in pixels
MSR_lines2.height = 300  # New height in pixels

# Now show the modified plots
show(column(MSR_lines, MSR_lines2))

In [12]:
#Parameter sweeps... 

N_sims = 20

grid_size=20
max_iterations=500 

output_notebook()  # To display Bokeh plots inline in the Jupyter Notebook

threshold_percentages = range(0, 101, 10)  # Range of threshold_percentage values to sweep over

# Initialize a dictionary to store the final_MSR values for each threshold_percentage
threshold_final_MSR = {}

# Initialize lists to store the mean final MSR and standard deviation for each threshold_percentage
mean_final_MSR_list = []
std_final_MSR_list = []

# Define the main loop that sweeps over the threshold_percentage values
for threshold_percentage in threshold_percentages:
#     print(threshold_percentage)
    
    all_mean_similarity_ratios = run_simulations(N_sims, grid_size, all_states_percentages, max_iterations)


    # After running all simulations for a particular threshold_percentage, calculate the mean final MSR
    final_mean_similarity_ratio = [ratios[-1] for ratios in all_mean_similarity_ratios]
    mean_final_MSR = np.mean(final_mean_similarity_ratio)
    std_final_MSR = np.std(final_mean_similarity_ratio)

    # Append the mean final MSR and standard deviation to the respective lists
    mean_final_MSR_list.append(mean_final_MSR)
    std_final_MSR_list.append(std_final_MSR)

# # Print the results...
# for threshold_percentage, mean_final_MSR, std_final_MSR in zip(threshold_percentages, mean_final_MSR_list, std_final_MSR_list):
#     print(f"Threshold percentage: {threshold_percentage}, Mean final MSR: {mean_final_MSR}, std final MSR: {std_final_MSR}")

# Create a Bokeh figure for the threshold_percentage vs. mean_final_MSR plot
plot_ensemble_ThresholdVsMSR = figure(width=800, height=400, title=f"Simulations: {N_sims}", x_axis_label="Threshold percentage", y_axis_label="Mean final MSR")

# Convert the range object to a list for plotting
threshold_percentage_list = list(threshold_percentages)

# Plot the mean final MSR with shaded area for standard deviation
plot_ensemble_ThresholdVsMSR.varea(threshold_percentage_list, [mean_final_MSR_list[i] - std_final_MSR_list[i] for i in range(len(mean_final_MSR_list))],
        [mean_final_MSR_list[i] + std_final_MSR_list[i] for i in range(len(mean_final_MSR_list))],
        fill_color="white", fill_alpha=0.5, legend_label="Standard deviation")
plot_ensemble_ThresholdVsMSR.line(threshold_percentage_list, mean_final_MSR_list, line_width=4, legend_label="Mean", line_color="lightblue")

plot_ensemble_ThresholdVsMSR = darkmode(plot_ensemble_ThresholdVsMSR)

# Show the Bokeh plot inline within the Jupyter Notebook
show(plot_ensemble_ThresholdVsMSR)

In [15]:
#Parameter sweeps... 

N_sims = 20

grid_size=20
max_iterations=500 

output_notebook()  # To display Bokeh plots inline in the Jupyter Notebook

threshold_percentages = range(0, 101, 10)  # Range of threshold_percentage values to sweep over

# Initialize a dictionary to store the final_MSR values for each threshold_percentage
threshold_sim_length = {}

# Initialize lists to store the mean and standard deviation simulation length for each threshold_percentage
mean_sim_length_list = []
std_sim_length_list = []

# Define the main loop that sweeps over the threshold_percentage values
for threshold_percentage in threshold_percentages:
#     print(threshold_percentage)
    
    all_mean_similarity_ratios = run_simulations(N_sims, grid_size, all_states_percentages, max_iterations)


    # After running all simulations for a particular threshold_percentage, calculate the mean simulation time
    
    sim_lengths = [len(ratios) for ratios in all_mean_similarity_ratios]
    max_length = max(sim_lengths) # Determine the maximum length among all simulations
    mean_sim_length = np.mean(sim_lengths) # Determine the mean simulation length
    std_sim_length = np.std(sim_lengths)

    # Append the simulation time and standard deviation to the respective lists
    mean_sim_length_list.append(mean_sim_length)
    std_sim_length_list.append(std_sim_length)


# Create a Bokeh figure for the threshold_percentage vs. mean_final_MSR plot
plot_ensemble_ThresholdVsSimulationTime = figure(width=800, height=400, title=f"Simulations: {N_sims}", x_axis_label="Threshold percentage", y_axis_label="Mean simulation time")



# Convert the range object to a list for plotting
threshold_percentage_list = list(threshold_percentages)

# Plot the mean final MSR with shaded area for standard deviation
plot_ensemble_ThresholdVsSimulationTime.varea(threshold_percentage_list, [mean_sim_length_list[i] - std_sim_length_list[i] for i in range(len(mean_sim_length_list))],
        [mean_sim_length_list[i] + std_sim_length_list[i] for i in range(len(mean_sim_length_list))],
        fill_color="white", fill_alpha=0.5, legend_label="Standard deviation")
plot_ensemble_ThresholdVsSimulationTime.line(threshold_percentage_list, mean_sim_length_list, line_width=4, legend_label="Mean", line_color="lightblue")

plot_ensemble_ThresholdVsSimulationTime = darkmode(plot_ensemble_ThresholdVsSimulationTime)

# Show the Bokeh plot inline within the Jupyter Notebook
show(plot_ensemble_ThresholdVsSimulationTime)

In [29]:
#Checking how long simulation will take for a larger grid before we run lots of them...
grid_size = 100 #5
threshold_percentage = 33
max_iterations=5000 #100 Increase the number of iterations for larger grid

# Create the grid
grid = np.random.choice([0, 1, 2], size=(grid_size, grid_size), p=all_states_percentages)
color_mapper = LinearColorMapper(palette=['white', 'yellow', 'blue'], low=0, high=2)

# Data sources for Bokeh
x, y = np.meshgrid(np.arange(grid_size), np.arange(grid_size))
source = ColumnDataSource(data=dict(x=x.flatten(), y=y.flatten(), color=grid.flatten(), index=(y * grid_size + x).flatten()))


# Create a separate ColumnDataSource for dissatisfied individuals' crosses
dissatisfied_source = ColumnDataSource(data=dict(x=[], y=[]))
red_crosses_source = ColumnDataSource(data=dict(x=[], y=[]))

# Create a list to keep track of grid states and an index variable
grid_states = [grid]
current_state_index = 0


# Initialize a list to track the mean similarity ratio for each update step
mean_similarity_ratios = [get_mean_similarity_ratio(grid)]

# Initialize a list to store all mean similarity ratio time series
all_mean_similarity_ratios = [mean_similarity_ratios[:]]

# Variable to store the selected dissatisfied index
selected_idx = None

# Create the Bokeh figure for the mean similarity ratio plot
mean_similarity_source = ColumnDataSource(data=dict(x=list(range(len(mean_similarity_ratios))), y=mean_similarity_ratios))
mean_similarity_plot = figure(width=400, height=400, x_axis_label="Simulation steps", y_axis_label="Mean similarity ratio (MSR)")
mean_similarity_plot.line('x', 'y', source=mean_similarity_source, line_width=2, color='blue')  # Set the line color to blue

# Create the data source for all mean similarity ratios (including history)
all_mean_similarity_source = ColumnDataSource(data=dict(x=[0], y=[mean_similarity_ratios[0]]))

# Add the mean similarity line for the current simulation (latest run) with full opacity and blue color
mean_similarity_line = mean_similarity_plot.line('x', 'y', source=mean_similarity_source, line_width=2, color='blue')

# Function to update the grid and redraw the plot
def update_grid():
    global grid, current_state_index, grid_states, selected_idx, all_mean_similarity_source

    # Run the simulation until a steady state is reached or a maximum number of iterations is reached
    iterations = 0

    while iterations < max_iterations:
        dissatisfied_indices = find_dissatisfied_indices(grid)
        red_crosses_source.data = dict(x=[], y=[])  # Clear previous dissatisfied agent data

        if not dissatisfied_indices:  # Check if the list is empty
            break

        selected_idx = random.choice(dissatisfied_indices)  # Randomly select a dissatisfied index

        # Update the grid by using the selected_idx
        i, j = divmod(selected_idx, grid_size)
        empty_cells = np.argwhere(grid == 0)
        if empty_cells.shape[0] > 0:
            new_grid = np.copy(grid)  # Define new_grid and initialize it with a copy of the current grid
            # Randomly select a new location among the available free locations
            random_index = random.randint(0, empty_cells.shape[0] - 1)
            new_location = empty_cells[random_index]

            new_grid[new_location[0], new_location[1]] = grid[i, j]
            new_grid[i, j] = 0  # Empty the previous location



            # Find the indices of the next possible dissatisfied agents
            next_dissatisfied_indices = find_dissatisfied_indices(grid)

            # Update the red crosses data
            next_dissatisfied_x = [int(idx % grid_size) for idx in next_dissatisfied_indices]
            next_dissatisfied_y = [int(idx / grid_size) for idx in next_dissatisfied_indices]
            red_crosses_source.data = dict(x=next_dissatisfied_x, y=next_dissatisfied_y)

            # Update the mean similarity ratio data
            mean_similarity_ratios.append(get_mean_similarity_ratio(grid))
            mean_similarity_source.data = dict(x=list(range(len(mean_similarity_ratios))), y=mean_similarity_ratios)

            # Check for steady state
            if is_steady_state(grid, new_grid):
                # Update the data source with the new grid state
                source.data = dict(x=x.flatten(), y=y.flatten(), color=grid.flatten(), index=(y * grid_size + x).flatten())
                break
            
            grid = new_grid
            
            iterations += 1

    # Add a message if the steady state is not reached within the threshold
#     if iterations >= max_iterations:
#         print("Steady state not reached within the maximum iterations.")

    # Save the new grid state and update the current state index
    current_state_index += 1
    if current_state_index < len(grid_states):
        grid_states[current_state_index] = grid
    else:
        grid_states.append(grid)

    # Trim the grid_states list to remove any forward history when 'back' button is used
    grid_states = grid_states[:current_state_index + 1]
    
    # Update the data source with the new grid state
    source.data = dict(x=x.flatten(), y=y.flatten(), color=grid.flatten(), index=(y * grid_size + x).flatten())

    # Append the mean similarity ratio to the list of all ratios
    all_mean_similarity_source.stream(dict(x=[current_state_index], y=[mean_similarity_ratios[-1]]))
    

# Function to reset the grid to a random initial state
def reset_grid():
    global grid, current_state_index, grid_states, mean_similarity_ratios

    # Save the initial mean similarity ratio
    initial_mean_similarity_ratio = mean_similarity_ratios[0]

    # Reset the grid to a random initial state
    grid = np.random.choice([0, 1, 2], size=(grid_size, grid_size), p=all_states_percentages)

    # Reset the current state index and grid states list
    current_state_index = 0
    grid_states = [grid]

    # Clear the mean similarity ratios list and re-add the initial ratio
    mean_similarity_ratios.clear()
    mean_similarity_ratios.append(initial_mean_similarity_ratio)

    # Trim the grid_states list to remove any forward history when 'back' button is used
    grid_states = grid_states[:current_state_index + 1]

    # Update the data source with the new grid state
    source.data = dict(x=x.flatten(), y=y.flatten(), color=grid.flatten(), index=(y * grid_size + x).flatten())

    # Clear the mean similarity plot data source to remove previous data points
    mean_similarity_source.data = dict(x=[], y=[])


# Create the Bokeh button for resetting the grid
button_reset = Button(label="Reset")
button_reset.on_click(reset_grid)


# Create the Bokeh figure for the grid plot
plot_title = f"Threshold percentage: {round(threshold_percentage,2)}%"
p = figure(width=400, height=400, toolbar_location=None, x_range=(-0.5, grid_size - 0.5), y_range=(-0.5, grid_size - 0.5))
p.rect(x='x', y='y', width=1, height=1, source=source, line_color='black', fill_color={'field': 'color', 'transform': color_mapper})

# Add a cross (plus sign) at the center of dissatisfied individuals' grids
# p.square(x='x', y='y', size=15, line_color='red', fill_color='red', source=red_crosses_source)

# Add text annotations to label each grid cell with its index
# p.text(x='x', y='y', text='index', text_baseline='middle', text_align='center', source=source, text_color='black', text_font_size='10pt')

# Create the Bokeh figure for the mean similarity ratio plot
mean_similarity_plot = figure(width=400, height=400, x_axis_label="Simulation steps", y_axis_label="Mean similarity ratio (MSR)")
mean_similarity_source = ColumnDataSource(data=dict(x=[0], y=[mean_similarity_ratios[0]]))
mean_similarity_plot.line('x', 'y', source=mean_similarity_source, line_width=2, color='blue')

# Create the Bokeh button for updating the grid
button_run = Button(label="Run")
button_run.on_click(update_grid)

p=darkmode(p)
mean_similarity_plot=darkmode(mean_similarity_plot)

# Create a layout for the plot, buttons, and title
title_div = Div(text=f"<h1>{plot_title}</h1>", width=400, height=50)
grid_plot_layout = row(p, mean_similarity_plot)
buttons_layout = row(button_run, button_reset)
layout = column(title_div, grid_plot_layout, buttons_layout)

# Function to create the Bokeh app
def modify_doc(doc):
    doc.add_root(layout)


# Create the Bokeh app
app = Application(FunctionHandler(modify_doc))

# Output the plot to the notebook
output_notebook()

show(app, notebook_url='localhost:8888')

In [31]:
#Parameter sweeps... 

N_sims = 20

grid_size=100
max_iterations=5000 

output_notebook()  # To display Bokeh plots inline in the Jupyter Notebook

threshold_percentages = range(0, 101, 2)  # Range of threshold_percentage values to sweep over

# Initialize a dictionary to store the final_MSR values for each threshold_percentage
threshold_final_MSR = {}

# Initialize lists to store the mean final MSR and standard deviation for each threshold_percentage
mean_final_MSR_list = []
std_final_MSR_list = []

# Define the main loop that sweeps over the threshold_percentage values
for threshold_percentage in threshold_percentages:
#     print(threshold_percentage)
    
    all_mean_similarity_ratios = run_simulations(N_sims, grid_size, all_states_percentages, max_iterations)


    # After running all simulations for a particular threshold_percentage, calculate the mean final MSR
    final_mean_similarity_ratio = [ratios[-1] for ratios in all_mean_similarity_ratios]
    mean_final_MSR = np.mean(final_mean_similarity_ratio)
    std_final_MSR = np.std(final_mean_similarity_ratio)

    # Append the mean final MSR and standard deviation to the respective lists
    mean_final_MSR_list.append(mean_final_MSR)
    std_final_MSR_list.append(std_final_MSR)

# # Print the results...
# for threshold_percentage, mean_final_MSR, std_final_MSR in zip(threshold_percentages, mean_final_MSR_list, std_final_MSR_list):
#     print(f"Threshold percentage: {threshold_percentage}, Mean final MSR: {mean_final_MSR}, std final MSR: {std_final_MSR}")

# Create a Bokeh figure for the threshold_percentage vs. mean_final_MSR plot
plot_ensemble_ThresholdVsMSR2 = figure(width=800, height=400, title=f"Simulations: {N_sims}", x_axis_label="Threshold percentage", y_axis_label="Mean final MSR")

# Convert the range object to a list for plotting
threshold_percentage_list = list(threshold_percentages)

# Plot the mean final MSR with shaded area for standard deviation
plot_ensemble_ThresholdVsMSR2.varea(threshold_percentage_list, [mean_final_MSR_list[i] - std_final_MSR_list[i] for i in range(len(mean_final_MSR_list))],
        [mean_final_MSR_list[i] + std_final_MSR_list[i] for i in range(len(mean_final_MSR_list))],
        fill_color="white", fill_alpha=0.5, legend_label="Standard deviation")
plot_ensemble_ThresholdVsMSR2.line(threshold_percentage_list, mean_final_MSR_list, line_width=4, legend_label="Mean", line_color="lightblue")

plot_ensemble_ThresholdVsMSR2 = darkmode(plot_ensemble_ThresholdVsMSR2)

# Show the Bokeh plot inline within the Jupyter Notebook
show(plot_ensemble_ThresholdVsMSR2)

KeyboardInterrupt: 

# Ignored content

## Other metrics


**Dissimilarity index:** a widely used measure of evenness. It measure the proportion of minority members that would have to change their area of residence  to achieve an even distribution

$$\mathcal{D}={\frac {1}{2}}\sum _{i=1}^{N}\left|{\frac {a_{i}}{A}}-{\frac {b_{i}}{B}}\right|$$

where:
- $a_i$ = the population of group $A$ in the $i$th area (e.g. census tract)
- $A$ = the total population in group $A$ in the large geographic entity for which the index of dissimilarity is being calculated.
- $b_i$ = the population of group $B$ in the $i$th area
- $B$ = the total population in group $B$ in the large geographic entity for which the index of dissimilarity is being calculated.


**Thiel entropy index**:
it was the only index that obeyed the principle of transfers. The property of transfers is that segregation should decline when a person from group r moves from tract a to tract b, where the proportion of group r is greater in tract a than in tract b.

The primary weak- ness (a minor short-coming) of the entropy index is that it is not composition invariant, which means that the index will change if the number of minorities in each tract are multi- plied by a constant number 

$$\mathcal{T}_{ent} = - \frac{1}{N} \sum \frac{n_i}{N} \log\Big(\frac{n_i}{N}\Big)$$

where:

$N$ is the total number of grid cells (occupied or empty),
$n_i$ is the number of cells occupied by the $i$-th category (black or white).

The raw entropy value is converted to an index by scaling by a maximum entropy value to ensure it now varies between 0 and 1. where 0 indicates complete segregation, and 1 indicates a completely integrated and diverse population. This is good, it means it's more directly comparable to other measures. 

Entropy measures: allow for comparison between more than two groups at a time. Another major advantage is their ability to be decomposed into inde- pendent and dependent contributions of different constituent variables, such as race and class.





A lot of these ideas aim towards measuring the "complexity" of the arrays at varying local scales. (We'll see more of this when we cover some Information Theory).
The property of transfers is that segregation should decline when a person from group r moves from tract a to tract b, where the proportion of group r is greater in tract a than in tract b.

The primary weak- ness (a minor short-coming) of the entropy index is that it is not composition invariant, which means that the index will change if the number of minorities in each tract are multi- plied by a constant number 

## How it relates

### To complex systems generally
Schelling is a precurser to agent-based models

The 'art' of complex systems modelling requires experience. You have to play with these systems to develop intuition for them. It is the only way and is the the reason for the Workshops and provided Jupyter Notebooks.

### To other units
Note that we will not delve into much detail regarding chaos and nonlinear dynamics or networks as these are covered at length in MATH3021 and MATH3002 respectively. There is, however, a summary Notebook for each if you are interested or find that you require some revision.

Schelling related to physical model by replace the economic concept of “utility” by the physics concept of a particle's internal energy: https://www.pnas.org/doi/10.1073/pnas.0609371103

# Not working

## Save figure

In [30]:
#DOES NOT WORK
from bokeh.io import export_png

# Save the figure as a PNG image file
output_filename = "mean_similarity_ratio_plot.png"
export_png(MSR_lines, filename=output_filename)



RuntimeError: Neither firefox and geckodriver nor a variant of chromium browser and chromedriver are available on system PATH. You can install the former with 'conda install -c conda-forge firefox geckodriver'.

In [None]:
#DOES NOT WORK
# Save the figure as a PNG image file
from bokeh.io import export_svgs
from svglib.svglib import svg2rlg
from reportlab.graphics import renderPDF

# Save the figure as an SVG file
output_filename = "mean_similarity_ratio_plot.svg"
export_svgs(MSR_lines, filename=output_filename)

# Optionally, convert the SVG to a PDF (requires svglib and reportlab libraries)
output_pdf_filename = "mean_similarity_ratio_plot.pdf"
drawing = svg2rlg(output_filename)
renderPDF.drawToFile(drawing, output_pdf_filename)