In [1]:
%run SMB_iterate.py

# TAS input

A tool-assisted speedrun (TAS) allows a human operator to select specific inputs on a per-frame basis. This is used to explore games, and to generate theoretical perfect play. There is an interplay between TAS and realtime speedrunning.

We can imagine using existing TAS data to inform our action-space sampling and perhaps have a weighting of button presses that might be an improvement over the pre-provided action spaces.

* *Caveat: TAS speedruns may include various glitches, as in unintended play patterns. Depending on the context, this may be undesirable for an agent to learn.*

We might also imagine using it for behavioral cloning which we expect to learn more about in a future class session.

Below, we explore a particular [speedrun from user HappyLee](https://tasvideos.org/6622M).

In [2]:
randomSeed(5004)
tasExample = open('./TAS/happylee-supermariobros-europe-warps.fm2', 'r')
for i in range(0, 20):
    print(tasExample.readline(), end = '')
print("\n... (following random sample of lines) \n")
remainder = tasExample.readlines()
shuffle(remainder)
[print(x, end = '') for x in remainder[0:10]]

tasExample.close()

version 3
emuVersion 22000
rerecordCount 11754
palFlag 1
romFilename Super Mario Bros. (E)
romChecksum base64:ujnd5jqyCbG8dR4FNecrGA==
guid ACBA9520-9FDF-B909-4E0F-7F121642CAFC
fourscore 0
microphone 0
port0 1
port1 1
port2 0
FDS 0
NewPPU 0
comment author HappyLee
|0|........|........||
|0|........|........||
|0|........|........||
|0|........|........||
|0|........|........||

... (following random sample of lines) 

|0|........|........||
|0|........|........||
|0|......B.|........||
|0|.L....B.|........||
|0|........|........||
|0|R.....B.|........||
|0|......B.|........||
|0|........|........||
|0|......B.|........||
|0|........|........||


The file format for this particular TAS 'movie' is described here: [https://fceux.com/web/help/fm2.html](https://fceux.com/web/help/fm2.html). I believe that this format is defined by the particular emulator in use (fceux). 

We will effectively be ignoring the opening lines of metadata that are not relevant for our purposes.

After the metadata, there is user input.

> SI_GAMEPAD:
> 
>     the field consists of eight characters which constitute a bit field
>     any character other than ' ' or '.' means that the button was pressed
>     by convention, the following mnemonics are used in a column to remind us of which button corresponds to which column: RLDUTSBA (Right, Left, Down, Up, sTart, Select, B, A)

The format provides helpful information on the structure:

> The input log section can be identified by it starting with a | (pipe).
> 
>     Text format (default format)
>     Every frame of the movie is represented by line of text beginning and ending with a | (pipe).
>     The fields in the line are as follows, except when fourscore is used.
>     |commands|port0|port1|port2|

We would want to have the inputs and potentially do some EDA/parsing to get the relative weighting of the different actions explored that might serve as an informed prior.

## Parsing

In [3]:
with open('./TAS/happylee-supermariobros-europe-warps.fm2', 'r') as f:
    # skip until the first input information
    while True:
        line = f.readline()
        if line[0] == '|':
            break
    lines = f.readlines()

In [4]:
# Python approach
# remove newline, split on delimiter
[x.rstrip().split('|') for index, x in enumerate(lines)][0:20]

[['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', ''],
 ['', '0', '........', '........', '', '']]

In [5]:
# pass more directly into DataFrame
df_raw = pd.DataFrame(lines)[0].str.rstrip().str.split('|', expand = True)
df_raw = df_raw.rename(
   {0 : 'drop',
    1 : 'command',
    2 : 'controller1_inputs',
    3 : 'controller2_inputs',
    4 : 'controller3_inputs',
    5 : 'drop'},
    axis = 1)


df = df_raw.loc[:, ['controller1_inputs']]
df = df.reset_index()
df = df.rename({'controller1_inputs' : 'input',
                'index' : 'frameNumber'},
               axis = 1)

df_raw.describe()

Unnamed: 0,drop,command,controller1_inputs,controller2_inputs,controller3_inputs,drop.1
count,14759.0,14759,14759,14759,14759.0,14759.0
unique,1.0,1,18,1,1.0,1.0
top,,0,........,........,,
freq,14759.0,14759,5901,14759,14759.0,14759.0


In [6]:
df_raw['controller1_inputs'].value_counts()

controller1_inputs
........    5901
......B.    4474
......BA    2349
R.....B.    1147
.L....B.     336
R.....BA     184
.......A     110
RL....B.      68
R.......      48
R......A      39
RL.....A      31
RL......      25
.L....BA      22
.L......       8
.L.....A       8
..D.....       7
....T...       1
RLDU.SBA       1
Name: count, dtype: int64

### Manual inspection

End of file, 'celebratory' pressing of all buttons except start?

In [7]:
print(df.shape)
print(df.frameNumber.min(), df.frameNumber.max())
df.query('input == "RLDU.SBA"')

(14759, 2)
0 14758


Unnamed: 0,frameNumber,input
14758,14758,RLDU.SBA


Start of the run, pressing the start button to exit the title screen.

In [8]:
df.query('input == "....T..."')

Unnamed: 0,frameNumber,input
34,34,....T...


At least some, if not all of these are inputs to enter pipes.

In [9]:
df.query('input == "..D....."')

Unnamed: 0,frameNumber,input
475,475,..D.....
3104,3104,..D.....
5903,5903,..D.....
6361,6361,..D.....
12911,12911,..D.....
13291,13291,..D.....
13608,13608,..D.....


Comments:

* No action (`NOOP`) is the most prominent action. A naive approach would suggest that this reflects a very passive set of inputs. However, I believe that this just reflects loading times between stages.
  * The gymnasium environment is set to automatically remove these components from play to allow for better focusing on play.
* We do see several inputs that include right and left inputs simultaneously. This is not something that can be done with human input on [original NES hardware](https://www.suppermariobroth.com/post/732175362993340416/in-super-mario-bros-by-pressing-left-and-right), as the directions are opposed and constrained on the D-pad. Some theories:
  * This may reflect some advanced machine-only technique.
  * It may reflect the authors choosing to input the input when it would not otherwise make a difference.
  * It may reflect errors, though this seems exceptionally unlikely.
    * Indeed, this technique allows for faster acceleration/deceleration as [per comments](https://tasvideos.org/Forum/Topics/21202?CurrentPage=1&Highlight=486141#486141) on a 'No L+R' speedrun submission. 
* The pressing of almost all inputs (RLDU.SBA) occurs on the final frame
* The 'start' button is only pressed once after loading the game.

### Convert TAS inputs to gym actions

To be able to leverage these TAS inputs, we will need to convert from the dense format to the more verbose format for gymnasium.

In [10]:
# inputs that are actually used in the TAS run
[x for x in df.input.unique()]

['........',
 '....T...',
 'RL....B.',
 'R.....B.',
 '.L....B.',
 'R.....BA',
 '......BA',
 '......B.',
 '.L......',
 'R......A',
 '.......A',
 'RL......',
 '..D.....',
 '.L.....A',
 'R.......',
 '.L....BA',
 'RL.....A',
 'RLDU.SBA']

Rewrite as a list of lists.

Generate a new column in our df that has these gym inputs as strings.

In [11]:
# cannot convert to a list with replace in a column, 
# so instead convert to a delimited string
# converted subequently to list
df['gymInput'] = df['input'].replace({
    '........' : 'NOOP',
    '....T...' : 'start',
    'RL....B.' : 'right,left,B', # invalid input for humans
    'R.....B.' : 'right,B',
    '.L....B.' : 'left,B',
    'R.....BA' : 'right,B,A',
    '......BA' : 'B,A',
    '......B.' : 'B',
    '.L......' : 'left',
    'R......A' : 'right,A',
    '.......A' : 'A',
    'RL......' : 'right,left', # invalid input for humans
    '..D.....' : 'down',
    '.L.....A' : 'left,A',
    'R.......' : 'right',
    '.L....BA' : 'left,B,A',
    'RL.....A' : 'right,left,A', # invalid input for humans
    'RLDU.SBA' : 'right,left,down,up,start,B,A'
})

In [12]:
df['gymInput'].sample(5)

6059           B,A
11922            B
10394         NOOP
7933           B,A
9053     right,B,A
Name: gymInput, dtype: object

In [13]:
## treats each character as a separate item in the list
#df['gymInput'] = df['gymInput'].apply(lambda x : list(x))

# works effectively, but is unintuitive
df['gymInput'] = df['gymInput'].apply(lambda x : x.split(','))

In [14]:
df.gymInput.sample(20)

263            [right, B]
2368               [B, A]
7108                  [B]
5146               [NOOP]
3924               [B, A]
3382               [B, A]
1556               [NOOP]
4622               [NOOP]
10750              [NOOP]
449                [left]
2314                  [B]
6114                  [B]
1271               [NOOP]
8172           [right, B]
594      [right, left, B]
5145               [NOOP]
13377              [NOOP]
14232              [NOOP]
11093                 [B]
10891                 [B]
Name: gymInput, dtype: object

#### Apply TAS inputs to gymnasium environment

Once again, the TAS includes the time for stage loading and transitions, which the gymnasium environment does not do.

So, we will be looking at a subset of the TAS inputs to use as a validation sequence for passing into the gymnasium environment.

In [15]:
# start of inputs after loading
start = 171
df.iloc[start - 5 : start + 5]

Unnamed: 0,frameNumber,input,gymInput
166,166,........,[NOOP]
167,167,........,[NOOP]
168,168,........,[NOOP]
169,169,........,[NOOP]
170,170,........,[NOOP]
171,171,RL....B.,"[right, left, B]"
172,172,R.....B.,"[right, B]"
173,173,R.....B.,"[right, B]"
174,174,R.....B.,"[right, B]"
175,175,R.....B.,"[right, B]"


There is a brief pause in action, unclear what this directly corresponds to, but we can use it as a natural enough stopping point for this proof of concept.

In [16]:
end = 415
df.iloc[end - 5 : end + 5]

Unnamed: 0,frameNumber,input,gymInput
410,410,......BA,"[B, A]"
411,411,......BA,"[B, A]"
412,412,......BA,"[B, A]"
413,413,......BA,"[B, A]"
414,414,......BA,"[B, A]"
415,415,......BA,"[B, A]"
416,416,........,[NOOP]
417,417,........,[NOOP]
418,418,........,[NOOP]
419,419,........,[NOOP]


In [17]:
startingInput = 171 # start of level
endingInput = 415   # entering pipe

sequenceTAS = df['gymInput'].iloc[startingInput : endingInput].tolist()

In [18]:
actionSpaceTAS = [
    ['NOOP'],
    ['start'],
    ['right','left','B'],
    ['right','B'],
    ['left','B'],
    ['right','B','A'],
    ['B','A'],
    ['B'],
    ['left'],
    ['right','A'],
    ['A'],
    ['right','left'],
    ['down'],
    ['left','A'],
    ['right'],
    ['left','B','A'],
    ['right','left','A'],
    ['right','left','down','up','start','B','A']
]

In [19]:
a = Agent(actionSpaceTAS, rom = 'v0')
print(a)
a.iterate(sequenceTAS, saveImage = True)

  logger.deprecation(


self.actionSpace=[['NOOP'], ['start'], ['right', 'left', 'B'], ['right', 'B'], ['left', 'B'], ['right', 'B', 'A'], ['B', 'A'], ['B'], ['left'], ['right', 'A'], ['A'], ['right', 'left'], ['down'], ['left', 'A'], ['right'], ['left', 'B', 'A'], ['right', 'left', 'A'], ['right', 'left', 'down', 'up', 'start', 'B', 'A']]
self.seed=5004
self.step=-1
self.cumulativeReward=0
Latest state:
None
self.step=0000000, self.cumulativeReward=0.0, info['coins']=0, info['time']=400
start of new life
current lives:  1


What I discovered from this, in running it on the version 0 and version 3 variants of the environment, is that the same inputs will result in different outcomes. Empirically, it appears that there are *not just* graphical differences between these different versions/ROMs.

In [20]:
a = Agent(actionSpaceTAS, rom = 'v3')
print(a)
a.iterate(sequenceTAS, saveImage = True)

self.actionSpace=[['NOOP'], ['start'], ['right', 'left', 'B'], ['right', 'B'], ['left', 'B'], ['right', 'B', 'A'], ['B', 'A'], ['B'], ['left'], ['right', 'A'], ['A'], ['right', 'left'], ['down'], ['left', 'A'], ['right'], ['left', 'B', 'A'], ['right', 'left', 'A'], ['right', 'left', 'down', 'up', 'start', 'B', 'A']]
self.seed=5004
self.step=-1
self.cumulativeReward=0
Latest state:
None
self.step=0000000, self.cumulativeReward=0.0, info['coins']=0, info['time']=400


Sequential frames are saved as images into a folder called states. These could be converted into a movie using say `ffmpeg`, but can also be converted using say kdenlive.

kdenlive instructions:

* Project > Add Image Sequence
* Select the folder containing the sequence of sequentially named images
* Set frame duration from default of 5 seconds (`00:00:05:00`) to 1 frame (`00:00:00:01`)
* Optionally extend the last frame for review as described in this [Reddit post](https://www.reddit.com/r/kdenlive/comments/pkdoc7/how_do_you_freeze_a_frame/)

Embedded video comparison of 'v0' (faithful graphics) and 'v3' (rectangular).

With gratitude to this [StackOverflow post](https://stackoverflow.com/questions/18019477/how-can-i-play-a-local-video-in-my-ipython-notebook).

In [21]:
a.state.shape

(240, 256, 3)

In [22]:
from IPython.display import Video

Video("embedded_media/v0_vs_v3_TAS_inputs.mp4", width = 240 * 3, height = 256 * 3)

# ROMs

The above realization necessitated learning about different ROMs (read-only-memory) aka versions of the game as part of understanding the environment in which we are working. Each version is nominally Super Mario Bros, but with slight variations. Given the potential for confusion, the community appears to rely on file hashes to authoritatively identify different versions.

Some versions are official, but released for different markets. For instance four ROMs listed by the [crowdcontrol](https://crowdcontrol.live/guides/SuperMarioBros/) tool.

* `811b027eaf99c2def7b933c5208636de` - Super Mario Bros (JU) Rev 0
* `94ede9347c1416105f1c08ec26b5b73a` - Super Mario Bros (JU) Rev 1
* `673913a23cd612daf5ad32d4085e0760` - Super Mario Bros (E)
* `f94bb9bb55f325d9af8a0fff80b9376d` - Super Mario Bros (World)

The portion in parenthesis refers to the country codes which can be combined [gamicus wiki](https://gamicus.fandom.com/wiki/ROM_suffixes).

* J -> Japan
* U -> USA
* E -> Europe

So, what ROMs are used by default within the `gymnasium-super-mario-bros` package?

In [23]:
# code in this cell was generated by generative AI, Microsoft CoPilot
import hashlib

def calculate_rom_checksum(file_path):
    # Read the file in binary mode and compute MD5 hash
    with open(file_path, 'rb') as f:
        file_data = f.read()
        md5_hash = hashlib.md5(file_data).hexdigest()  # Hexified MD5
    return(md5_hash)

In [24]:
import os

rom_dir = "/home/dss2q/DS5004/smb/gymnasium-super-mario-bros/gym_super_mario_bros/_roms/"

for file in os.listdir(rom_dir):
    if file[-4:] == '.nes':
        print(file, calculate_rom_checksum(rom_dir + file))
    else:
        pass

super-mario-bros-2-downsample.nes 3e359edd0097b7833214894f0941d942
super-mario-bros-pixel.nes 9e2e8cf204c40ad2030cd4cf50e28d45
super-mario-bros-rectangle.nes d88207b7a3fb143f3f94480e89c83dd7
super-mario-bros-downsample.nes 9e1a6e778a231091060ba5e9029b9a2f
super-mario-bros.nes 673913a23cd612daf5ad32d4085e0760
super-mario-bros-2.nes 007ffbdce3ca5b5ea5056438d7136051


We will ignore the mario bros 2 which is referring to a different game.

That leaves:

| ROM | MD5 hash |
|---|---|
|super-mario-bros-pixel.nes      | 9e2e8cf204c40ad2030cd4cf50e28d45 |
|super-mario-bros-rectangle.nes  | d88207b7a3fb143f3f94480e89c83dd7 |
|super-mario-bros-downsample.nes | 9e1a6e778a231091060ba5e9029b9a2f |
|super-mario-bros.nes            | 673913a23cd612daf5ad32d4085e0760 |

Looking at a table of known ROMs maintained by [TASVideos for Super Mario Bros.](https://tasvideos.org/Games/1/Versions/List), we can see that only the original 'v0' corresponds to a known entry. 

Here `REVA` should be interpreted as revision A, a revision made by Nintendo, so the original unmodified ROM for the European market.

```
Super Mario Bros. (E) (REVA) [!].nes
SourceDB: GoodNES 3.23
```

So, if we wish to use TAS input, we will want to make use of this specific ROM, which presumably is based on a patch to the original.

However, it is that very version that is used in this TAS per the submission page. 

This ultimately suggests that the problem lies with differences in the *emulator* differing nontrivially from that used in the TAS. This does not seem surprising. The gymnasium environment emulator may have been generated from scratch by the original author Kautenja.

For an interesting look at the effort necessary to try and achieve complete fidelity, see this [ArsTechnica post](https://arstechnica.com/gaming/2021/06/how-snes-emulators-got-a-few-pixels-from-complete-perfection/).

# Conclusion

The direct use of TAS inputs is problematic when using this emulator. It is presumably out of scope to try to fully implement. To do so, one would presumably need to bring the python emulator in greater alignment with the community standard emulators.

Or generate a TAS sequence manually replicating the process.

Ultimately, the best use of TAS inputs is likely to be serving as a prior, a weighting of the different inputs used.

# Note on generative AI usage

One prompt was submitted to Microsoft CoPilot, referencing language in the [.fm2 file description](https://fceux.com/web/help/fm2.html) to request code that would generate hashes for files:

>
> - romChecksum (required) - the base64 of the hexified MD5 hash of the ROM which was used to record the movie
>
> Python code to calculate an MD5 hash of a file, 'hexify' it, and then take base64 as the above describes.

Which resulted in the following output:

Here's a Python snippet that calculates the **MD5 hash** of a file, converts it to a **hex string**, and then encodes that hex string in **Base64**, as described for generating a `romChecksum`:

```python
import hashlib
import base64

def calculate_rom_checksum(file_path):
    # Read the file in binary mode and compute MD5 hash
    with open(file_path, 'rb') as f:
        file_data = f.read()
        md5_hash = hashlib.md5(file_data).hexdigest()  # Hexified MD5

    # Convert hex string to bytes, then encode in base64
    hex_bytes = bytes.fromhex(md5_hash)
    base64_encoded = base64.b64encode(hex_bytes).decode('utf-8')

    return base64_encoded

# Example usage
file_path = 'your_rom_file.rom'
checksum = calculate_rom_checksum(file_path)
print("romChecksum:", checksum)
```

Explanation:
1. **`hashlib.md5(file_data).hexdigest()`**: Computes the MD5 hash and returns it as a hex string.
2. **`bytes.fromhex(...)`**: Converts the hex string to raw bytes.
3. **`base64.b64encode(...)`**: Encodes those bytes in Base64.

Let me know if you'd like this wrapped into a command-line tool or script that accepts arguments, or if you want to process multiple files at once.

---

That was ultimately refashioned into the following:

```python
import hashlib
import base64

def calculate_rom_checksum(file_path):
    # Read the file in binary mode and compute MD5 hash
    with open(file_path, 'rb') as f:
        file_data = f.read()
        md5_hash = hashlib.md5(file_data).hexdigest()  # Hexified MD5

    # Convert hex string to bytes, then encode in base64
    hex_bytes = bytes.fromhex(md5_hash)
    base64_encoded = base64.b64encode(hex_bytes).decode('utf-8')

    result = {}
    result['md5'] = md5_hash
    result['hex_bytes'] = hex_bytes
    result['md5_base64'] = base64_encoded

    return (result)

def printChecksum(source_description, file):
    checksum = calculate_rom_checksum(file)
    print(source_description)
    for key in checksum:
        print(key, checksum[key], "length", len(checksum[key]))
    print("---")

printChecksum('Gymnasium, super-mario-bros.nes, ie., v0', 'super-mario-bros.nes')  
```