# Data Analysis of Toothpick Takeaway

Analyzing the data from a given `.csv` of Toothpick Takeaway.

## The Big Question

What is the chance of Player 1 winning by taking `x` toothpicks when there are `y` left on the table?

## Broad Overview

We have a `.csv` file with a exhaustive data about a game of Toothpick Takeaway. Great! Now what? We start analyzing!

We want to look at individual moves in the game. Our aim is to isolate three specific pieces of information:
1. Who moved
2. How many toothpicks were taken
3. Who won that game

From this, we can derive an answer to our question by looking at this data across a large sample size of games.

---
## Let's begin!

### Viewing the Raw Data

We'll start by getting a look at our raw data using the `read_csv()` function from the Python `pandas` library. 

In [6]:
import pandas as pd

Now read in the data.

In [165]:
tt_df = pd.read_csv("../data_files/random_random_100.csv")

Let's see what our data looks like.

In [227]:
tt_df.head()

Unnamed: 0,10,turn_10,9,turn_9,8,turn_8,7,turn_7,6,turn_6,...,turn_5,4,turn_4,3,turn_3,2,turn_2,1,turn_1,winner
0,1,player_1,2,player_2,0,,2,player_1,0,,...,player_2,2,player_1,0,,1,player_2,1.0,player_1,player_1
1,2,player_1,0,,1,player_2,2,player_1,0,,...,player_2,1,player_1,2,player_2,0,,1.0,player_1,player_1
2,1,player_1,1,player_2,2,player_1,0,,2,player_2,...,,2,player_1,0,,1,player_2,1.0,player_1,player_1
3,2,player_1,0,,2,player_2,0,,1,player_1,...,player_2,0,,2,player_1,0,,1.0,player_2,player_2
4,2,player_1,0,,1,player_2,2,player_1,0,,...,player_2,1,player_1,1,player_2,1,player_1,1.0,player_2,player_2


Our data is very verbose and not entirely usable in its current state. We'll need to parse and edit it as we go along.

The important takeaways are the following:
1. On a game size of `n` toothpicks, there are `2n + 1` columns
2. Columns are "grouped" into pairs representing "Number of toothpicks left" and "Whose turn it is"
3. The index for the column representing `k` toothpicks left is located at `(2n - 1) - 2k`
4. The index for whose turn it is when there are `k` toothpicks left is the index found in #3, + 1
5. The winner is always in the last index
6. Some columns are empty, because the prior column was a move of 2 toothpicks drawn

### Writing a Fetching Function

Now that we know what our data looks like, we can begin writing a function that isolates what we want.

But what do we want it to do? Based on our previous takeaways from our raw data, we can set out to design a function that accomplishes the following:

Given the inputs of a **dataframe** (our game data), and the **number of toothpicks left** (the turn we want to fetch data from), the function should fetch the **number of toothpicks taken at that turn**, **whose took those toothpicks**, and **who won that game**.

As a quality-of-life improvement, it would be nice if our function returned a dataframe with no missing values.

In [402]:
def fetch_turn_data(df, toothpicks_left):
    """
    Gets the data of a specific turn in the game.
    
    Parameters:
        df : DataFrame
            The dataframe to index.
        toothpicks_left : int
            How many toothpicks were left at this turn.
    
    Returns:
        A dataframe consisting of the who made the turn, how many toothpicks were taken, and who won that game.
    """
    # Index (in the dataframe) of the move being analyzed
    move_index = (len(df.columns) - 1) - (2 * toothpicks_left)
    
    # Column that represents how many toothpicks were taken at this turn
    taken = move_index

    # Whose turn was it
    who_moved = move_index + 1

    # Winner of the game
    winner = len(df.columns) - 1

    # Fetch the columns
    turn = df[df.columns[[taken, who_moved, winner]]]

    return turn.dropna()

### Testing out fetching function

Now that our function works, we should compare it to our original data and see if it returns what we expect.

In [403]:
turn = fetch_turn_data(tt_df, 9)
turn.head()

Unnamed: 0,9,turn_9,winner
0,2,player_2,player_1
2,1,player_2,player_1
5,2,player_2,player_2
6,2,player_2,player_2
7,1,player_2,player_2


In [404]:
turn = fetch_turn_data(tt_df, 4)
turn.head()

Unnamed: 0,4,turn_4,winner
0,2,player_1,player_1
1,1,player_1,player_1
2,2,player_1,player_1
4,1,player_1,player_2
6,2,player_1,player_2


So far so good! Our function returns exactly what we want: The number of toothpicks taken, who took them, and who won that game.

### Determining Probabilities

We have a fetching function, but how exactly does this curated data help us? To truly be useful, we need to construct some probabilities out of it.

Consider the following data about when there were 3 toothpicks left:

In [405]:
turn = fetch_turn_data(tt_df, 7)
turn.head()

Unnamed: 0,7,turn_7,winner
0,2,player_1,player_1
1,2,player_1,player_1
4,2,player_1,player_2
5,2,player_1,player_2
6,1,player_1,player_2


This is a good set of data to test with, as it contains a mix of moves, whose turn it was, and winners. For each game, we want to obtain two pieces of isolated data, `move made by Player 1` and `whether Player 1 won`. Once we obtain that data, we can assemble it into a final output of a single column representing:

`<move made by Player 1> <win rate for Player 1>`

Let's begin by removing all the rows in our dataframe that consist of a move by Player 2.

In [406]:
rows_to_drop = [index for index in turn.index if turn["turn_7"][index] == "player_2"]
turn.drop(index = rows_to_drop, inplace = True)
turn.head()

Unnamed: 0,7,turn_7,winner
0,2,player_1,player_1
1,2,player_1,player_1
4,2,player_1,player_2
5,2,player_1,player_2
6,1,player_1,player_2


Perfect! Now we have a small subset of the move made by Player 1 at a given turn. Let us recom

In [407]:
# Create a list of all moves
taken = []
for move in turn["7"]:
    taken.append(move)

# Create a list of whether or not Player 1 won each game
wins = []
for win in turn["winner"]:
    wins.append(win == "player_1")

# Make a new DataFrame with the data
data = pd.DataFrame(index = range(len(turn)), columns = ["Taken", "Win"])
data["Taken"] = taken
data["Win"] = wins

In [408]:
data.head()

Unnamed: 0,Taken,Win
0,2,True
1,2,True
2,2,False
3,2,False
4,1,False


Much easier to read. Now our dataframe clearly shows how many toothpicks were taken by Player 1 and whether or not Player 1 won that game. Now we can move on to calculating win statistics.

In [416]:
# Dictionaries to represent the wins and losses of each move
wins = {"1": 0, "2": 0}
losses = {"1": 0, "2": 0}

for item in data.index:
    # Get the move
    move = str(data["Taken"][item])
    
    # If Player 1 won the game, increment the appropriate win counter
    if data["Win"][item]:
        wins[move] += 1
    else:
        # Otherwise, increment the loss counter
        losses[move] += 1

In [417]:
print(wins)
print(losses)

{'1': 15, '2': 5}
{'1': 9, '2': 19}


Now we have data representing the number of wins and losses by Player 1 at a given turn when taking 1 or 2 toothpicks. We can finally compute probabilities.

In [414]:
win_chances = {"1": wins["1"] / (wins["1"] + losses["1"]),
               "2": wins["2"] / (wins["2"] + losses["2"])}
win_chances

{'1': 0.625, '2': 0.20833333333333334}

At last, we have determined the win chance for Player 1 at a given move when taking 1 or 2 toothpicks.

### Streamlined Statistics

We have a working fetching function, and we can parse our turn-by-turn data into probabilities, but all of this code is sporadic. We should streamline it for better readability and use. We will put all of the same steps as before into a single function, but condensed and more generic.

In [415]:
def get_win_chances(df, toothpicks_left):
    """
    Calculates the win chance for Player 1's two available moves at a specified turn in the game.
    
    Parameters:
        df : DataFrame
            The dataframe to index.
        toothpicks_left : int
            How many toothpicks were left at this turn.
    
    Returns:
        A dictionary representing the win chances based on Player 1's possible move.
    """
    # Index (in the dataframe) of the move being analyzed
    move_index = (len(df.columns) - 1) - (2 * toothpicks_left)
    
    # Column that represents how many toothpicks were taken at this turn
    taken = move_index

    # Whose turn was it
    who_moved = move_index + 1

    # Winner of the game
    winner = len(df.columns) - 1

    # Fetch the columns
    turn = df[df.columns[[taken, who_moved, winner]]].dropna()
    
    # Stringified index of the number of toothpicks left, used to index into the dataframe
    turn_index = str(toothpicks_left)
    
    # More stringified data used for indexing
    who_moved = "turn_{}".format(turn_index)
    
    # Drop rows where Player 2 moved
    rows_to_drop = [index for index in turn.index if turn[who_moved][index] == "player_2"]
    turn.drop(index = rows_to_drop, inplace = True)

    # Dictionaries to represent the wins and losses of each move
    wins = {"1": 0, "2": 0}
    losses = {"1": 0, "2": 0}

    # Collect wins/losses
    for item in turn.index:
        # Get the move
        move = str(int(turn[turn_index][item]))
    
        # If Player 1 won the game, increment the appropriate win counter
        if turn["winner"][item] == "player_1":
            wins[move] += 1
        else:
            # Otherwise, increment the loss counter
            losses[move] += 1
    
    # Get the total number of games where each move occurred
    total_take_1_games = wins["1"] + losses["1"]
    total_take_2_games = wins["2"] + losses["2"]
    
    # Generate the probabilities for each move, accounting for division by 0
    take_1_win_chance = wins["1"] / total_take_1_games if total_take_1_games != 0 else 0
    take_2_win_chance = wins["2"] / total_take_2_games if total_take_2_games != 0 else 0

    return {"1": take_1_win_chance, "2": take_2_win_chance}

### Scaling it 

Now that we have a working fetching function, and we can parse our turn-by-turn data into probabilities, we should start compiling data about the moves of every turn in a given game.

In [418]:
for turn in range(10, 0, -1):
    win_chances = get_win_chances(tt_df, turn)
    take_1_percent = win_chances["1"]
    take_2_percent = win_chances["2"]
    print("Toothpicks Left: {:2d}, Take 1 win chance: {:.2f}, Take 2 win chance: {:.2f}".format(turn, take_1_percent, take_2_percent))

Toothpicks Left: 10, Take 1 win chance: 0.62, Take 2 win chance: 0.34
Toothpicks Left:  9, Take 1 win chance: 0.00, Take 2 win chance: 0.00
Toothpicks Left:  8, Take 1 win chance: 0.50, Take 2 win chance: 0.80
Toothpicks Left:  7, Take 1 win chance: 0.62, Take 2 win chance: 0.21
Toothpicks Left:  6, Take 1 win chance: 0.33, Take 2 win chance: 0.45
Toothpicks Left:  5, Take 1 win chance: 0.62, Take 2 win chance: 0.88
Toothpicks Left:  4, Take 1 win chance: 0.71, Take 2 win chance: 0.42
Toothpicks Left:  3, Take 1 win chance: 0.13, Take 2 win chance: 0.25
Toothpicks Left:  2, Take 1 win chance: 0.33, Take 2 win chance: 1.00
Toothpicks Left:  1, Take 1 win chance: 1.00, Take 2 win chance: 0.00


Excellent! Now that we have Player 1's win chances for each possible move at every turn, we can begin writing an optimal win strategy for Player 1.