In [4]:
import numpy as np
print("numpy:\t", np.__version__)

numpy:	 1.21.3


## Setup

The input to the model will have to capture:
1. Per rank: 
    1. Class
    1. HP, stress 
    1. Available skills (no equipped, on cooldown, out of uses) (actually, we'd have to represent cooldown duration and usage count)
    1. Combat item
    1. Active or not
    1. Tokens
    1. Has an action left this turn
    1. (Ignore relationships for now)
1. Per enemy rank:
    1. Enemy type
    1. HP
    1. Tokens
    1. Has an action left this turn
2. Overall:
    1. Rounds remaining

I originally imagined capturing the state of the party in a fight as a 4x9x11x8 tensor (rank x class x skill x target) with the environment providing which actions were legal.  It's probably not necessary to have an addition tensor dimension for the class and instead OHE that and preprend it to the legal skills array.  This reduces to a (rank x class&skills x target) tensor.  And actually, you don't need the targets represented in the input, just the output, so the input could be OHE concatenation of the per-rank list stacked to make a m x 4 matrix, potentially concatenated with an m x 4 matrix for the opposing party and some additional numbers catted in there somewhere for rounds remaining.

I think this leaves us with a $4 x ||{HP, Stress}|| + ||class|| + ||available skills|| + ||combat items|| + 1 (active) + 1 (has actions left) + ||tokens||$ matrix for the party. Let's say 9 classes, 10 skills each, 20 combat items, 10 distinct tokens, that gives us a 4 x 2+9+10+20+10 = 4 x 51 matrix for the party (ignoring cooldowns, usages, and relationships).  Not bad.

For the enemy party, I imagine we'd only have to track enemy type, hp, and token.  Assume 100 enemy types for now, that's a 4 x 100 + 1 + 25 (assuming enemy-specific tokens like engorged or whatever) = 4 x 126 matrix for the enemy.  Not bad, either.

A model outputting state or action values could account for legality by outputting action values for each (skill, target) combination, then multiplying that elementwise by a legality matrix containing all 0s and 1s for valid (skill, target) combinations).  Actually, that wouldn't quite work since other actions might have negative value, so zeroing out an action value wouldn't necessarily zero out the change of it being the max value.  I guess you could multiply the complement of the legality matrix by a matrix full of -np.inf and then multiply THAT elementwise by the state value matrix output of the model.

In [5]:
example_hero = {}
example_hero['hp'] = 6
example_hero['max hp'] = 16
example_hero['stress'] = 4
example_hero['class'] = "helion"
example_hero['skills'] = ["whack", "chop", "iron swan", "barbaric yawp"]
example_hero['combat item'] = "lye"
example_hero['active'] = 1
example_hero['has action left'] = 0
example_hero['tokens'] = ["improved dodge", "burn 3 6"]


In [6]:
_classes = ["man at arms", "helion", "grave robber", "highwayman", "leper", "runaway", "plague doctor"]
[1 if elem == "helion" else 0 for elem in _classes]


[0, 1, 0, 0, 0, 0, 0]

In [21]:
_skills = {}
_skills['helion'] = ["whack", "chop", "iron swan", "barbaric yawp", "if it bleeds", "breakthrough", "revel"]
_skills['plague doctor'] = ["noxious blast", "blinding gas", "incision", "battlefield medicine", "ounce of prevention", "plague grenade", "emboldening vapors"]
_skills['man at arms'] = ["crush", "rampart", "defender", "bolster", "hold the line", "bellow", "retribution", "command"]
_skills['grave robber'] = ["pick to the face", "thrown dagger", "flashing daggers", "poison dart", "absinthe", "dead of night", "glint in the dark", "lunge"]

skills_vec = np.zeros((11))
for skill in example_hero['skills']:
    _ix = np.argwhere([1 if x == skill else 0 for x in _skills[example_hero['class']]])[0][0]
    skills_vec[_ix] = 1
skills_vec

whack
['whack', 'chop', 'iron swan', 'barbaric yawp', 'if it bleeds', 'breakthrough', 'revel']
0
chop
['whack', 'chop', 'iron swan', 'barbaric yawp', 'if it bleeds', 'breakthrough', 'revel']
1
iron swan
['whack', 'chop', 'iron swan', 'barbaric yawp', 'if it bleeds', 'breakthrough', 'revel']
2
barbaric yawp
['whack', 'chop', 'iron swan', 'barbaric yawp', 'if it bleeds', 'breakthrough', 'revel']
3


array([1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.])

In [18]:
np.argwhere([1 if x == "iron swan" else 0 for x in _skills[example_hero['class']]])[0][0]

2

In [None]:
def hero2vec(hero_dct):
    _classes = ["man at arms" "helion", "grave robber", "highwayman", "leper", "runaway", "plague doctor"]
    _skills = {}
    _skills['helion'] = ["whack", "chop", "iron swan", "barbaric yawp", "if it bleeds", "breakthrough", "revel"]
    _skills['plague doctor'] = ["noxious blast", "blinding gas", "incision", "battlefield medicine", "ounce of prevention", "plague grenade", "emboldening vapors"]
    _skills['man at arms'] = ["crush", "rampart", "defender", "bolster", "hold the line", "bellow", "retribution", "command"]
    _skills['grave robber'] = ["pick to the face", "thrown dagger", "flashing daggers", "poison dart", "absinthe", "dead of night", "glint in the dark", "lunge"]
    
    _combat_items = ['lye', 'heal potion', 'reed', 'fire bomb']
    
    _tokens = ["critical", "riposte", "dodge", "improved dodge", "block", "improved block", "guard", "strength", "blind", "weak", "vulnerable", "taunt", "immobilize", "winded", "combo", "stealth", "burn", "bleed", "blight", "horror", "deaths door"]
    
    hero_vec = []
    hero_vec.append(hero_dct['hp'])
    hero_vec.append(hero_dct['max hp'])
    hero_vec.append(hero_dct['stress'])
    hero_vec += [1 if elem == hero_dct['class'] else 0 for elem in _classes]
    skills_vec = np.zeros((11))
    for skill in example_hero['skills']:
        _ix = np.argwhere([1 if x == skill else 0 for x in _skills[example_hero['class']]])[0][0]
        skills_vec[_ix] = 1
    hero_vec += skills_vec
    combat_item_vec = np.zeros((1,4))
    _ix = np.argwhere(hero_dct['combat item'] == )
        skills_vec[ix] = 1
    hero_vec += skills_vec
    
        

In [10]:
equipped_skills = np.asarray([1,1,0,1,1,0,0,0,0,0,0,1])
legal_actions = np.asarray([1,0,1,1,1,1,1,1,1,1,1,1])
skill_targets = np.zeros((12,8))  # fixed 12x8.  Will need to expand first dimension to account for different combat items.
# set skill targets
skill_targets[0,4:6] = 1 # front two ranks
skill_targets[1,5:7] = 1 # middle two ranks
skill_targets[2,7] = 1 # back rank
skill_targets[3,0:5] = 1 # all party ranks e.g, healing
legal_targets = np.full((12,8), 1)  # dependent on e.g, stealth, turn-dependent

In [11]:
equipped_skills * legal_actions

array([1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1])

In [12]:
skill_targets

array([[0., 0., 0., 0., 1., 1., 0., 0.],
       [0., 0., 0., 0., 0., 1., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1.],
       [1., 1., 1., 1., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.]])

In [13]:
# rank x class x skill x target (no combat items)
party_tsr = np.zeros((4,9,11,8))

In [14]:
4*9*12*8

3456