<a href="https://colab.research.google.com/github/BrokenShell/Unit_2_Build/blob/master/algorithm_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Algorithm Prediction Research
### Random Distribution Detection Project
##### by Robert Sharp
<br/>

## Custom Libraries:
- [Fortuna](https://pypi.org/project/fortuna/): Random Value Toolkit
- [MonkeyScope](https://pypi.org/project/monkeyscope/): Distribution & Performance Test Suite for Non-deterministic Functions

## Target Algorithms (Fortuna):
- front_linear
- back_linear
- front_gauss
- back_gauss
- front_poisson
- back_poisson

## Distribution Ranges:
- d4 `[1..4]`
- d6 `[1..6]`
- d8 `[1..8]`
- d10 `[1..10]`
- d12 `[1..12]`
- d20 `[1..20]`

## Data Sets:
Each set contains 10,000 rows of 10 random rolls of a random distribution algorithm over a given range. A Flat Uniform Distribution is used to select the algorithm for each row.
- dice_4.csv
- dice_6.csv
- dice_8.csv
- dice_10.csv
- dice_12.csv
- dice_20.csv

## Features:
A series of 10 random rolls of a given range, the specific distribution is produced with a random algorithm.

## Baseline Guess:
For a given a range, one would have a 1 in 6 chance (16.7%) to guess the correct algorithm.

## Model & Training:
RandomForestClassifier. Six models will be trained to recognize 6 algorithms across 6 data sets.

## Research Question: 
_Are smaller dice more difficult to predict?_
- TL;DR: Mostly Yes.


In [0]:
!pip install Fortuna
!pip install MonkeyScope

Collecting Fortuna
[?25l  Downloading https://files.pythonhosted.org/packages/c8/e4/8d853ed28265df888eb94399b1b87442fa8cf4c88de4776fff80a5a05c04/Fortuna-3.17.8.tar.gz (187kB)
[K     |█▊                              | 10kB 18.8MB/s eta 0:00:01[K     |███▌                            | 20kB 1.8MB/s eta 0:00:01[K     |█████▎                          | 30kB 2.6MB/s eta 0:00:01[K     |███████                         | 40kB 1.7MB/s eta 0:00:01[K     |████████▊                       | 51kB 2.1MB/s eta 0:00:01[K     |██████████▌                     | 61kB 2.5MB/s eta 0:00:01[K     |████████████▎                   | 71kB 2.9MB/s eta 0:00:01[K     |██████████████                  | 81kB 2.2MB/s eta 0:00:01[K     |███████████████▊                | 92kB 2.5MB/s eta 0:00:01[K     |█████████████████▌              | 102kB 2.7MB/s eta 0:00:01[K     |███████████████████▎            | 112kB 2.7MB/s eta 0:00:01[K     |█████████████████████           | 122kB 2.7MB/s eta 0:00:01[K

In [0]:
import csv
import pandas as pd
import itertools as it
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from Fortuna import RandomValue
from Fortuna import front_linear, back_linear
from Fortuna import front_gauss, back_gauss
from Fortuna import front_poisson, back_poisson
from MonkeyScope import distribution_timer

## Algorithm Distributions - Example Range d10 (1-10)

In [0]:
def higher_order_dice(func_zc, dice_size):
    return func_zc(dice_size) + 1

hod = higher_order_dice

In [0]:
distribution_timer(hod, front_linear, 10)

Output Analysis: higher_order_dice(<built-in function front_linear>, 10)
Typical Timing: 191 ± 51 ns
Statistics of 1000 samples:
 Minimum: 1
 Median: 3
 Maximum: 10
 Mean: 3.854
 Std Deviation: 2.3419402212695353
Distribution of 100000 samples:
 1: 18.979%
 2: 17.289%
 3: 15.067%
 4: 12.949%
 5: 10.937%
 6: 8.848%
 7: 6.968%
 8: 4.966%
 9: 3.012%
 10: 0.985%



In [0]:
distribution_timer(hod, back_linear, 10)

Output Analysis: higher_order_dice(<built-in function back_linear>, 10)
Typical Timing: 211 ± 75 ns
Statistics of 1000 samples:
 Minimum: 1
 Median: 7
 Maximum: 10
 Mean: 7.083
 Std Deviation: 2.379939284939849
Distribution of 100000 samples:
 1: 1.005%
 2: 3.02%
 3: 4.934%
 4: 6.974%
 5: 9.072%
 6: 11.063%
 7: 12.979%
 8: 14.91%
 9: 16.865%
 10: 19.178%



In [0]:
distribution_timer(hod, front_gauss, 10)

Output Analysis: higher_order_dice(<built-in function front_gauss>, 10)
Typical Timing: 327 ± 52 ns
Statistics of 1000 samples:
 Minimum: 1
 Median: 1
 Maximum: 8
 Mean: 1.573
 Std Deviation: 0.9750235894582243
Distribution of 100000 samples:
 1: 63.001%
 2: 23.437%
 3: 8.507%
 4: 3.172%
 5: 1.185%
 6: 0.441%
 7: 0.17%
 8: 0.061%
 9: 0.021%
 10: 0.005%



In [0]:
distribution_timer(hod, back_gauss, 10)

Output Analysis: higher_order_dice(<built-in function back_gauss>, 10)
Typical Timing: 351 ± 79 ns
Statistics of 1000 samples:
 Minimum: 3
 Median: 10
 Maximum: 10
 Mean: 9.445
 Std Deviation: 0.9554972527433032
Distribution of 100000 samples:
 1: 0.004%
 2: 0.014%
 3: 0.062%
 4: 0.143%
 5: 0.455%
 6: 1.17%
 7: 3.155%
 8: 8.542%
 9: 23.44%
 10: 63.015%



In [0]:
distribution_timer(hod, front_poisson, 10)

Output Analysis: higher_order_dice(<built-in function front_poisson>, 10)
Typical Timing: 260 ± 39 ns
Statistics of 1000 samples:
 Minimum: 1
 Median: 3
 Maximum: 10
 Mean: 3.454
 Std Deviation: 1.5400922050318935
Distribution of 100000 samples:
 1: 8.35%
 2: 20.487%
 3: 25.63%
 4: 21.4%
 5: 13.385%
 6: 6.637%
 7: 2.672%
 8: 1.026%
 9: 0.317%
 10: 0.096%



In [0]:
distribution_timer(hod, back_poisson, 10)

Output Analysis: higher_order_dice(<built-in function back_poisson>, 10)
Typical Timing: 285 ± 66 ns
Statistics of 1000 samples:
 Minimum: 2
 Median: 8
 Maximum: 10
 Mean: 7.427
 Std Deviation: 1.5889213322251043
Distribution of 100000 samples:
 1: 0.098%
 2: 0.287%
 3: 1.023%
 4: 2.839%
 5: 6.712%
 6: 13.364%
 7: 21.289%
 8: 25.428%
 9: 20.67%
 10: 8.29%



## Data Wrangling

### Random Algorithm Generator

- Callable: Flat Uniform Distribution of Target Random Algorithms
- Signature: `random_method() -> (String, Callable)`

In [0]:
# Six Random Distribution Algorithms
methods = (
    ('Front Linear', front_linear),
    ('Back Linear', back_linear),
    ('Front Gauss', front_gauss),
    ('Back Gauss', back_gauss),
    ('Front Poisson', front_poisson),
    ('Back Poisson', back_poisson),
)
random_method = RandomValue(methods)

### Producing Raw Data

Models the polyhedrals d4-d20 with random distributions.

In [0]:
def make_csv(name, var, n_rows, n_cols):
    with open(name, 'w', newline='') as csv_file:
        spam = csv.writer(csv_file, delimiter=',')
        # Header
        spam.writerow(it.chain(
            ('Method', ),
            (f'Value {i+1}' for i in range(n_cols))),
        )
        # Data Rows
        for i in range(n_rows):
            name, method = random_method()
            spam.writerow(it.chain(
                (name, ),
                (method(var) + 1 for _ in range(n_cols))
            ))

In [0]:
dice = (4, 6, 8, 10, 12, 20)
n_rows = 10000
n_cols = 10
for d in dice:
    make_csv(f'method_{d}.csv', d, n_rows, n_cols)

### Collecting Raw Data

In [0]:
data = {
    n: pd.read_csv(f'method_{n}.csv') for n in dice
}

## Modeling Validation

In [0]:
model_data = {}
print("Validation Accuracy:")
for dice_size in dice:
    X_train, X_val = train_test_split(data[dice_size], random_state=42)
    y_train = X_train['Method']
    X_train = X_train.drop(columns=['Method'])
    y_val = X_val['Method']
    X_val = X_val.drop(columns=['Method'])
    model = RandomForestClassifier(
        bootstrap=False,
        criterion='gini',
        max_depth=12,
        max_features=1,
        n_estimators=128,
        n_jobs=-1,
        random_state=42,
        warm_start=True,
    )
    model.fit(X_train, y_train)
    model_data[f"d{dice_size}"] = model
    print(f"d{dice_size}: \t{100 * model.score(X_val, y_val):.2f}%")

Validation Accuracy:
d4: 	67.44%
d6: 	74.24%
d8: 	80.80%
d10: 	85.60%
d12: 	89.76%
d20: 	95.32%


# Predictions

In [0]:
def prediction(func, dice_name, dice_size):
    test_group = [[hod(func, dice_size) for _ in range(n_cols)]]
    result = model_data[dice_name].predict(test_group)
    prob = model_data[dice_name].predict_proba(test_group)
    return result[0], max(prob[0])

In [0]:
print(f"Final Data Shape: {n_rows:,} x {n_cols} x {len(dice)}")

Final Data Shape: 10,000 x 10 x 6


## One Prediction for each algorithm of each distribution range.

In [0]:
print("Algorithm Dice  \tPrediction Confidence \tCorrect")
for dice_size in dice:
    dice_name = f"d{dice_size}"
    for named_method in methods:
        name, method = named_method
        pred, prob = prediction(method, dice_name, dice_size)
        correct = True if name == pred else False
        print(f"{name} {dice_name}:  \t{pred} {100*prob:.0f}%: \t{correct}")

Algorithm Dice  	Prediction Confidence 	Correct
Front Linear d4:  	Front Linear 54%: 	True
Back Linear d4:  	Back Poisson 40%: 	False
Front Gauss d4:  	Front Gauss 86%: 	True
Back Gauss d4:  	Back Gauss 92%: 	True
Front Poisson d4:  	Front Poisson 52%: 	True
Back Poisson d4:  	Back Poisson 50%: 	True
Front Linear d6:  	Front Poisson 59%: 	False
Back Linear d6:  	Back Poisson 49%: 	False
Front Gauss d6:  	Front Gauss 99%: 	True
Back Gauss d6:  	Back Gauss 92%: 	True
Front Poisson d6:  	Front Poisson 49%: 	True
Back Poisson d6:  	Back Poisson 61%: 	True
Front Linear d8:  	Front Linear 47%: 	True
Back Linear d8:  	Back Linear 60%: 	True
Front Gauss d8:  	Front Gauss 99%: 	True
Back Gauss d8:  	Back Gauss 98%: 	True
Front Poisson d8:  	Front Poisson 79%: 	True
Back Poisson d8:  	Back Poisson 54%: 	True
Front Linear d10:  	Front Poisson 52%: 	False
Back Linear d10:  	Back Poisson 57%: 	False
Front Gauss d10:  	Front Gauss 96%: 	True
Back Gauss d10:  	Back Gauss 99%: 	True
Front Poisson d10: