Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the TicTacToe environment #1192

Merged
merged 27 commits into from May 3, 2024

Conversation

dm-ackerman
Copy link
Contributor

@dm-ackerman dm-ackerman commented Mar 20, 2024

Description

Update the TicTacToe environment to v4

Major changes:

  • Fix termination handling to work with stable baselines3. This now trains as expected with the SB3 PPO code
  • Rewrite of underlying board code
  • Add test functions for the underlying board code
  • Bump version to 4

Minor changes:

  • adds some code comments
  • removes unneeded code
  • make board class more encapsulated
  • only create rendering screen if rendering
  • minor performance improvements (~30% faster)
  • streamline win detection code
  • remove redundant calculation of winning configurations

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • This change requires a documentation update

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have run pytest -v and no errors are present.
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I solved any possible warnings that pytest -v has generated that are related to my code to the best of my knowledge.
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Rewards are only set once, only need to accumulate them once.
No need to modify the accumulated rewards if they ae not set.
For unknown reasons, this is required for stable baselines
training to work correctly. It does not appear to impact other
usage as the agents are still looped over correctly after the
game ends - just in a different order.
This makes the env code less cluttered and better encapsulates
the board behavior. It also expands the checks for a valid move.
These never change. There is no reason to recalculate them constantly.
removes the duplicate calls to check winner
This keeps the env from needing to access the internals
of the board to get the moves available.
This causes problems with test cases that have invalid
board configurations to test win lines.
@dm-ackerman
Copy link
Contributor Author

AgileRL tutorial is broken again. I think I can fix that and I'll pin the version so it won't be an issue in the future

@dm-ackerman
Copy link
Contributor Author

Tests are failing due to AgileRL version change. #1193 should resolve that.

board.play_turn(0, outside_space)

# 2) move by unknown agent should be rejected
for unknown_agent in [-1, 2]:
with pytest.raises(BadTicTacToeMoveException):
with pytest.raises(AssertionError):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pytest raises? Haven’t seen that before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, that's the recommended pytest way to confirm that a block of code raises a specific assertion. Pytest itself doesn't raise the exception. It catches and ignores that exception (because it's expected) but raises an error if the code in that block doesn't raise the expected error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok fair, we should probably also go and make the code adhere to that as well, because I’m pretty sure elsewhere that isn’t done, but I can see it being good practice. Not a problem here tbh since it’s so small I think it’s fine to have here and then we can do an issue or future prs to fix other stuff to adhere to this practice

This now always switches agents in each step. The previous change was a
bug that "fixed" the behaviour due to a bug in the SB3 tutorial.

The agent should be swapped every step as is done now.
The change to v4 was due to the agent handling at the end of an episode

The other changes to the env don't change the behaviour of the env, so
it is left at v3.
@dm-ackerman dm-ackerman changed the title Update the TicTacToe environment to v4 Update the TicTacToe environment May 3, 2024
@dm-ackerman
Copy link
Contributor Author

Apparently the failure to train with SB3 wasn't a bug with the env, it was a SB3 tutorial bug. I removed the change that "fixed" the env because it was wrong. I believe the rest of the changes are valid, but I don't think it requires a version bump, so I moved it back to v3.

@elliottower elliottower merged commit 98e8c20 into Farama-Foundation:master May 3, 2024
46 checks passed
@dm-ackerman dm-ackerman deleted the ttt_update branch May 6, 2024 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants