# Rewards

As per the documentation for the original environment, the reward function is composed of three parts:

As the original author puts it succinctly in the documentation: 
$$ r = v + c + d $$

Less succinctly:
$$ reward = velocity_x + clock + death $$

* The movement component only looks at Mario's x velocity, with positive values for moving right, negative values for moving left and no reward for being still.
* Each in-game clock tick (*approximately 1 second real-time?*) reduces the score by one.
  * Note that there are multiple frames between clock ticks.
* Death, each death causes a reward of -15, 0 otherwise.

# Alternative reward scheme

One question becomes how adjusting this reward might affect learning rates. What additional components might be interesting to include in the reward function? What subset of those are readily available to our system?

Interesting:
* **Powerups**
  * A positive reward for acquiring a power-up like a mushroom or fire flower.
  * This aligns with human play and given that it makes it harder for Mario to die, aligns with the incentives.
* **Coins**
  * This aligns with human play and given that 100 coins provide an additional life, there is a small value to them.
  * Moreover, it might be interesting to see the behavior of an agent that was rewarded very heavily for seeking coins. This might lead to more natural exploration of the environment.
* **Defeated enemies**
  * While ignoring enemies may lead to longer lives and more level completions, it serves as a poor model of players, who are more inclined to interact and defeat enemies.

Immediately accessible:
* **Coins**: available the `info` dictionary.
* **Powerups**: available via the `info` dictionary.
  * Status indicates Mario's state, which is a reflection of powerups. A choice would need to be made as to whether to provide a single reward for acquiring a powerup or a repeated reward for retaining a powerup.

Info from `smb_env.py` ([GitHub](https://github.com/Kautenja/gym-super-mario-bros/blob/bcb8f10c3e3676118a7364a68f5c0eb287116d7a/gym_super_mario_bros/smb_env.py))

```python
    # create a dictionary mapping value of status register to string names
    ## dss2q: defaultdict ==> {0 : 'small', 1 : 'tall', all other keys: 'fireball'}
    _STATUS_MAP = defaultdict(lambda: 'fireball', {0:'small', 1: 'tall'})

    ...

    @property
    def _player_status(self):
        """Return the player status as a string."""
        return _STATUS_MAP[self.ram[0x0756]]

    ...

    @property
    def _coins(self):
        """Return the number of coins collected (0 to 99)."""
        # coins are represented as a figure with 2 10's places
        return self._read_mem_range(0x07ed, 2)

    ...

    def _get_info(self):
        """Return the info after a step occurs"""
        return dict(
            coins=self._coins,
            flag_get=self._flag_get,
            life=self._life,
            score=self._score,
            stage=self._stage,
            status=self._player_status,
            time=self._time,
            world=self._world,
            x_pos=self._x_position,
            y_pos=self._y_position,
        )
```

# Implementing alternative reward

With that in mind, probably the simplest way to implement this would be to adjust the reward *after* we have received it from `env.step()`.

```python
obs, reward, info, terminated, truncated = obs.step(action)
reward = modify_reward(reward)
```

Alternatively, we could imagine editing the native function in the code.

The code for this is in `smb_env.py` ([GitHub](https://github.com/Kautenja/gym-super-mario-bros/blob/master/gym_super_mario_bros/smb_env.py)):

```python
    def _get_reward(self):
        """Return the reward after a step occurs."""
        return self._x_reward + self._time_penalty + self._death_penalty
```

# Scratch

## Points 

Super Mario Brothers (1985) includes an explicit scoring system. It is interesting that this scoring system does not inform the reward function at all, unlike in the Atari context, where the point scoring was the reward.

For a fuller description of the point mechanism details see [Super Mario Wiki - Point](https://www.mariowiki.com/Point#Super_Mario_series).

> In Super Mario Bros., Mario can earn points by interacting with the environment in various ways. For instance, he earns 50 points for breaking Bricks, 200 for collecting a coin, and 1,000 for collecting a power-up. Points can also be earned upon defeating an enemy, with higher points earned for sequences of defeated enemies without landing back on the ground. ... At the end of each stage, pulling down the flag on the flagpole grants 100, 400, 800, 2,000, or 5,000 points depending on how high the flagpole is touched. At the end of a level, there is a bonus that grants 50 points for each remaining second on the timer, though no such bonus is present in castle levels in the original NES version. ... In many games, points are largely aesthetic and serve only as a secondary goal ...

---

Looking at a [decompilation](https://gist.github.com/1wErt3r/4048722) would be an approach to fully explore the point system.

```
BrickShatter:
      ...
      lda #$05
      sta DigitModifier+5    ;set digit modifier to give player 50 points
      jsr AddToScore         ;do sub to update the score
      ...
```

---

Excerpts from a [discussion touching on this point from Reddit](https://www.reddit.com/r/truegaming/comments/8pp8cx/what_purpose_did_the_original_scoring_system_in/), original poster unknown.

> Title: What purpose did the original scoring system in Super Mario Bros. serve?
> 
> ... Immediately before the NES was Atari, which (mostly) tried to emulate the experience of arcade games at the time. And arcade games were almost all scoring based. ... This was also an early form of creating visual and tactile feedback to give players a greater sense of satisfaction. These big numbers are specifically important in SMB when you ride down the pole and the numbers are "rung up". More points+faster time means a longer timing counting your score. [/u/lubujackson](https://www.reddit.com/user/lubujackson/)
