Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a test timeout #779

Merged
merged 29 commits into from Oct 5, 2023
Merged

Add a test timeout #779

merged 29 commits into from Oct 5, 2023

Conversation

ernestum
Copy link
Collaborator

@ernestum ernestum commented Sep 8, 2023

Description

Adds timeout of 590 seconds to the tests. This should abort tests before CircleCI interrupts the pipeline thereby allowing us to see which test took so long.

ALSO:

Fix notebook running error. #790 and #779.

  1. docs/tutorials/3_train_gail.ipynb: 800k can bring running time exceed error
  2. docs/tutorials/4_train_airl.ipynb: env.reset doesn't require seed in gymnasium setting.
  3. docs/tutorials/5a_train_preference_comparisons_with_cnn.ipynb: migrate to gymnasium
  4. docs/tutorials/8a_train_sqil_sac.ipynb: v0 is not supported.

Testing

Only test parameter changes.

@AdamGleave
Copy link
Member

Thanks for this!

Looks like one of the Ray tests is failing. That's not too surprising, I think Ray can be buggy on Windows, we may want to consider just skipping that test on Windows

@ernestum
Copy link
Collaborator Author

ernestum commented Sep 8, 2023

Good point. Added a skipif on the test with the timeout.

@AdamGleave
Copy link
Member

Looks like its timing out in test_train_preference_comparisons_main. Odd since stdout/stderr suggests that training completes so I think it must be hanging on something after that. Maybe in writing TensorBoard logs as some threads seem to be stuck there?

Skipping that test as well would be OK as a hotfix, but if can replicate that test failure via running as SSH can hopefully determine a clean resolution.

@ernestum
Copy link
Collaborator Author

ernestum commented Sep 13, 2023

Possibly related: https://discuss.pytorch.org/t/got-stuck-at-tensorboardx-event-file-writer-py/112209

I updated to the newest tensorboard/tensorboardX version in the SSH session and re-ran the tests. Now there is no timeout any more but the tests fail due to a different reason:

Some file is not found: FileNotFoundError: [Errno 2] No such file or directory: b'C:\\Users\\circleci.PACKER-64C83029\\AppData\\Local\\Temp\\pytest-of-circleci\\pytest-3\\test_train_preference_comparis0\\train_preference_comparisons\\CartPole-v1\\20230913_094320_90ea37\\log\\events.o ut.tfevents.1694598202.packer-64c83029-cd75-bf3a-2e04-0f23beb7d9d9.7076.0' (click to expand) ___________ test_train_preference_comparisons_main[plain] ___________

tmpdir = local('C:\Users\circleci.PACKER-64C83029\AppData\Local\Temp\pytest-of-circleci\pytest-3\test_train_preference_comparis0'), preference_comparison_config = {}

def test_train_preference_comparisons_main(tmpdir, preference_comparison_config): 
    config_updates = dict(logging=dict(log_root=tmpdir))
    sacred.utils.recursive_update(config_updates, preference_comparison_config)
  run = train_preference_comparisons.train_preference_comparisons_ex.run(
        # Note: we have to use the cartpole and not the seals_cartpole config because
        #  the seals_cartpole config needs rollouts with a fixed horizon,
        #  and the saved trajectory rollouts are variable horizon.
        named_configs=["cartpole"] + ALGO_FAST_CONFIGS["preference_comparison"],
        config_updates=config_updates,
    )

tests\scripts\test_scripts.py:160:


venv\lib\site-packages\sacred\experiment.py:276: in run
run()
venv\lib\site-packages\sacred\run.py:238: in call
self.result = self.main_function(*args)
venv\lib\site-packages\sacred\config\captured_function.py:42: in captured_function
result = wrapped(*args, **kwargs)
venv\lib\site-packages\imitation\scripts\train_preference_comparisons.py:167: in train_preference_comparisons
custom_logger, log_dir = logging_ingredient.setup_logging()
venv\lib\site-packages\sacred\config\captured_function.py:42: in captured_function
result = wrapped(*args, **kwargs)
venv\lib\site-packages\imitation\scripts\ingredients\logging.py:114: in setup_logging
custom_logger = imit_logger.configure(
venv\lib\site-packages\imitation\util\logger.py:413: in configure
output_formats = _build_output_formats(folder, format_strs)
venv\lib\site-packages\imitation\util\logger.py:67: in _build_output_formats
output_formats.append(make_output_format(f, str(folder)))
venv\lib\site-packages\imitation\util\logger.py:44: in make_output_format
return sb_logger.make_output_format(_format, log_dir, log_suffix)
venv\lib\site-packages\stable_baselines3\common\logger.py:462: in make_output_format
return TensorBoardOutputFormat(log_dir)
venv\lib\site-packages\stable_baselines3\common\logger.py:398: in init
self.writer = SummaryWriter(log_dir=folder)
venv\lib\site-packages\torch\utils\tensorboard\writer.py:247: in init
self._get_file_writer()
venv\lib\site-packages\torch\utils\tensorboard\writer.py:277: in _get_file_writer
self.file_writer = FileWriter(
venv\lib\site-packages\torch\utils\tensorboard\writer.py:76: in init
self.event_writer = EventFileWriter(
venv\lib\site-packages\tensorboard\summary\writer\event_file_writer.py:100: in init
self.flush()
venv\lib\site-packages\tensorboard\summary\writer\event_file_writer.py:125: in flush
self._async_writer.flush()
venv\lib\site-packages\tensorboard\summary\writer\event_file_writer.py:194: in flush
self._check_worker_status()
venv\lib\site-packages\tensorboard\summary\writer\event_file_writer.py:212: in _check_worker_status
raise exception
C:\Python38\lib\threading.py:932: in _bootstrap_inner
self.run()
venv\lib\site-packages\tensorboard\summary\writer\event_file_writer.py:244: in run
self._run()
venv\lib\site-packages\tensorboard\summary\writer\event_file_writer.py:275: in _run
self._record_writer.write(data)
venv\lib\site-packages\tensorboard\summary\writer\record_writer.py:40: in write
self._writer.write(header + header_crc + data + footer_crc)
venv\lib\site-packages\tensorboard\compat\tensorflow_stub\io\gfile.py:768: in write
self.fs.write(self.filename, file_content, self.binary_mode)
venv\lib\site-packages\tensorboard\compat\tensorflow_stub\io\gfile.py:157: in write
self._write(filename, file_content, "wb" if binary_mode else "w")


self = <tensorboard.compat.tensorflow_stub.io.gfile.LocalFileSystem object at 0x000002B784756550>
filename = b'C:\Users\circleci.PACKER-64C83029\AppData\Local\Temp\pytest-of-circleci\pytest-3\test_train_preference_compa...ole-v1\20230913_094320_90ea37\log\events.out.tfevents.1694598202.packer-64c83029-cd75-bf3a-2e04-0f23beb7d9d9.7076.0'
file_content = b'H\x00\x00\x00\x00\x00\x00\x00\xfe\x92H\x9d\te\xfd\x80\x0ea@\xd9A\x1a\rbrain.Event:2R.\n,tensorboard.summary.writer.event_file_writer\xc8p\xa0\xa0', mode = 'wb'

def _write(self, filename, file_content, mode):
    encoding = None if "b" in mode else "utf8"
  with io.open(filename, mode, encoding=encoding) as f:

E FileNotFoundError: [Errno 2] No such file or directory: b'C:\Users\circleci.PACKER-64C83029\AppData\Local\Temp\pytest-of-circleci\pytest-3\test_train_preference_comparis0\train_preference_comparisons\CartPole-v1\20230913_094320_90ea37\log\events.o
ut.tfevents.1694598202.packer-64c83029-cd75-bf3a-2e04-0f23beb7d9d9.7076.0'

venv\lib\site-packages\tensorboard\compat\tensorflow_stub\io\gfile.py:171: FileNotFoundError
---- Captured log call ----
WARNING rl:initialize.py:217 Added new config entry: "rl_kwargs.batch_size"
WARNING rl:initialize.py:217 Added new config entry: "rl_kwargs.n_epochs"
WARNING train_preference_comparisons:run.py:458 No observers have been added to this run
ERROR train_preference_comparisons:run.py:392 Failed after 0:00:00!

A manual check revealed, that the path DOES exist 🤷

@ernestum ernestum mentioned this pull request Sep 18, 2023
3 tasks
@ZiyueWang25
Copy link
Contributor

I double checked. The file indeed doesn't exists, the directory exists. So I experiment a bit and found the issue:

linux file path limit 4096
windows file path limit 256

It turns out the file length is right on the limit.

>>> len("C:\\Users\\circleci.PACKER-64C83029\\AppData\\Local\\Temp\\pytest-of-circleci\\pytest-4\\test_train_preference_comparis0\\train_preference_comparisons\\CartPole-v1\\20231001_144824_a538c8\\log\\events.out.tfevents.1696171705.packer-64c83029-cd75-bf3a-2e04-0f23beb7d9d9.6640.0")
262

So I am checking whether removing the path limit this way can work or not.

@ZiyueWang25
Copy link
Contributor

  • The ModuleNotFoundError: No module named 'gymnasium' error pops up weirdly. Not sure why it happened after this change.
  • If I do pip install gymnasium==0.28.1 and then run the test pytest tests/algorithms/test_dagger.py::test_simple_dagger_space_mismatch_error[None-False] , it shows error:
    def check_for_correct_spaces(env: GymEnv, observation_space: spaces.Space, action_space: spaces.Space) -> None:
        """
        Checks that the environment has same spaces as provided ones. Used by BaseAlgorithm to check if
        spaces match after loading the model with given env.
        Checked parameters:
        - observation_space
        - action_space

        :param env: Environment to check for valid spaces
        :param observation_space: Observation space to check against
        :param action_space: Action space to check against
        """
        if observation_space != env.observation_space:
>           raise ValueError(f"Observation spaces do not match: {observation_space} != {env.observation_space}")
E           ValueError: Observation spaces do not match: Box([-1. -1. -8.], [1. 1. 8.], (3,), float32) != Box([-1. -1. -8.], [1. 1. 8.], (3,), float32)

@ZiyueWang25
Copy link
Contributor

so the original error is gone. I noticed two new issues here:

#791 & #790

Note: #790 can represent the issues for:

FAILED tests/test_examples.py::test_run_tutorial_notebooks[C:\\Users\\circleci.PACKER-64C83029\\project\\tests\\..\\docs\\tutorials\\5a_train_preference_comparisons_with_cnn.ipynb]
FAILED tests/test_examples.py::test_run_tutorial_notebooks[C:\\Users\\circleci.PACKER-64C83029\\project\\tests\\..\\docs\\tutorials\\4_train_airl.ipynb]
FAILED tests/test_examples.py::test_run_tutorial_notebooks[C:\\Users\\circleci.PACKER-64C83029\\project\\tests\\..\\docs\\tutorials\\8a_train_sqil_sac.ipynb]
FAILED tests/test_examples.py::test_run_tutorial_notebooks[C:\\Users\\circleci.PACKER-64C83029\\project\\tests\\..\\docs\\tutorials\\3_train_gail.ipynb

And #791 is for:

FAILED tests/algorithms/test_sqil.py::test_sqil_performance_continuous[DDPG]

@ZiyueWang25 ZiyueWang25 mentioned this pull request Oct 2, 2023
@ZiyueWang25
Copy link
Contributor

So now the issue is windows platform-wise issue:

  1. need to fix the numpy randint bound issue: high is out of bounds for int32
  2. need to pip install gymnasium[mujoco]

@ZiyueWang25
Copy link
Contributor

The numpy issue can be replicated
image

Hopefully it can be resolved by specifying the seed.

@ZiyueWang25
Copy link
Contributor

So it shows a timeout issue saying that tests/test_examples.py::test_run_tutorial_notebooks[C:\\Users\\circleci.PACKER-64C83029\\project\\tests\\..\\docs\\tutorials\\4_train_airl.ipynb] cannot pass within 5 minutes. It can pass on my computer within 3 minutes. I think this is due to win32 has less resources. The current solution is to make the timeout larger.

Created feature request for this: #793

@ZiyueWang25
Copy link
Contributor

The numpy issue can be replicated image

Hopefully it can be resolved by specifying the seed.

Actually this is solved in stable_baselines3 here, I think we need to set stable-baselines3~=2.0 to stable-baselines3>=2.0 to keep the package updated.

@codecov
Copy link

codecov bot commented Oct 2, 2023

Codecov Report

Merging #779 (d5026b6) into master (8db6bfb) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head d5026b6 differs from pull request most recent head 99cd7fd. Consider uploading reports for the commit 99cd7fd to get more accurate results

@@           Coverage Diff           @@
##           master     #779   +/-   ##
=======================================
  Coverage   96.33%   96.33%           
=======================================
  Files          98       98           
  Lines        9172     9177    +5     
=======================================
+ Hits         8836     8841    +5     
  Misses        336      336           
Files Coverage Δ
tests/algorithms/test_sqil.py 100.00% <100.00%> (ø)
tests/scripts/test_scripts.py 100.00% <100.00%> (ø)
tests/test_examples.py 100.00% <ø> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these changes! Overall looks almost ready, a few clarifying questions & suggestions, please tag me for a re-review once addressed.

@@ -176,13 +176,28 @@ commands:
.\venv\Scripts\activate
pip install --upgrade --force-reinstall --no-deps .
shell: powershell.exe

- run:
name: install gymansium[mujoco]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to explicitly install Gymnasium? Shouldn't this dependency be picked up by pip install . above?

Also confused why we need the MuJoCo dependency -- do any of our tests use MuJoCo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing is about the mujoco part. It gets involved in the windows environment from import seals in the notebook and mainly here. If I don't do this, the notebook will fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is directly mentioned in Seals as well. Not quite sure why it shows up only in the Windows environment and only in notebook.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some digging here and the TL;DR is that the notebooks were only getting tested on Windows. The SQIL notebook depends on HalfCheetah which is a MuJoCo environment, so it makes sense we'd need thiis dependency. We made a decision a while back to have MuJoCo dependency be optional for the codebase since it's large and can be somewhat tricky to install, so I'm loathe to reverse that especially just special-casing it for the Windows CI run. I'd suggest we instead switch out HalfCheetah for something like MountainCarContinuous that does not require extra dependencies -- tracking in #798

Note the gym.register line here should succeed even if MuJoCo is not present -- it will only fail when attempting to instantiate those environments. Indeed,

This doesn't error on unit-test-linux since we're skipping all notebooks: https://github.com/HumanCompatibleAI/imitation/blob/master/tests/test_examples.py#L43 This was because they were meant to be run in readthedocs after #565 This run is happening but the notebooks failing is just a warning:

/home/docs/checkouts/readthedocs.org/user_builds/imitation/checkouts/latest/docs/tutorials/8a_train_sqil_sac.ipynb: WARNING: Executing notebook failed: CellExecutionError [mystnb.exec]
/home/docs/checkouts/readthedocs.org/user_builds/imitation/checkouts/latest/docs/tutorials/8a_train_sqil_sac.ipynb: WARNING: Notebook exception traceback saved in: /home/docs/checkouts/readthedocs.org/user_builds/imitation/checkouts/latest/_readthedocs/html/reports/tutorials/8a_train_sqil_sac.err.log [mystnb.exec]

I've opened #799 to track this

Mac we are not running test_examples.py either due to a separate bug, I'll address in #797

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanation! That makes a lot sense!

.circleci/config.yml Show resolved Hide resolved
docs/tutorials/4_train_airl.ipynb Outdated Show resolved Hide resolved
tests/algorithms/test_sqil.py Show resolved Hide resolved
tests/test_examples.py Outdated Show resolved Hide resolved
Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebooks have a bunch of spurious newlines introduced -- can you please clean up that diff as it will otherwise make commit history/git blame messy in future.

MuJoCo dependency I think should not be necessary although I now understand (see comment) why test was failing without it -- this has unrooted a deeper problem in our CI that I'll get the team to work on addressing in other PRs/issues.

@@ -176,13 +176,28 @@ commands:
.\venv\Scripts\activate
pip install --upgrade --force-reinstall --no-deps .
shell: powershell.exe

- run:
name: install gymansium[mujoco]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some digging here and the TL;DR is that the notebooks were only getting tested on Windows. The SQIL notebook depends on HalfCheetah which is a MuJoCo environment, so it makes sense we'd need thiis dependency. We made a decision a while back to have MuJoCo dependency be optional for the codebase since it's large and can be somewhat tricky to install, so I'm loathe to reverse that especially just special-casing it for the Windows CI run. I'd suggest we instead switch out HalfCheetah for something like MountainCarContinuous that does not require extra dependencies -- tracking in #798

Note the gym.register line here should succeed even if MuJoCo is not present -- it will only fail when attempting to instantiate those environments. Indeed,

This doesn't error on unit-test-linux since we're skipping all notebooks: https://github.com/HumanCompatibleAI/imitation/blob/master/tests/test_examples.py#L43 This was because they were meant to be run in readthedocs after #565 This run is happening but the notebooks failing is just a warning:

/home/docs/checkouts/readthedocs.org/user_builds/imitation/checkouts/latest/docs/tutorials/8a_train_sqil_sac.ipynb: WARNING: Executing notebook failed: CellExecutionError [mystnb.exec]
/home/docs/checkouts/readthedocs.org/user_builds/imitation/checkouts/latest/docs/tutorials/8a_train_sqil_sac.ipynb: WARNING: Notebook exception traceback saved in: /home/docs/checkouts/readthedocs.org/user_builds/imitation/checkouts/latest/_readthedocs/html/reports/tutorials/8a_train_sqil_sac.err.log [mystnb.exec]

I've opened #799 to track this

Mac we are not running test_examples.py either due to a separate bug, I'll address in #797

@@ -5,18 +5,17 @@
"metadata": {},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newlines seem to have changed throughout this and many other notebooks making the diff messy. Can you clean this up? Most of the notebooks shouldn't be changing in this PR, and I can't tell right now which notebooks had actual semantic changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about this. Yes, I did some semantic changes and then the lint keeps failing on docs/tutorials/4_train_airl.ipynb. I did something wrong and formatted all notebook. I should have just formatted that only one.

I have reverted the commit and now the notebook change should be easy to read.

Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a closer look I see you made some non-trivial changes to the notebooks (not just newlines) and the newlines seem to have been introduced by black reformatting (perhaps a newer version is more opinionated?) so I won't insist on that change.

I'm removing the MuJoCo dependency as that shouldn't be in our CI. This will make it red again but I'll go ahead and merge if the only test that's failing is for 8a_train_sqil_sac; we should fix that separately in #800 by changing the notebook

Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AdamGleave AdamGleave merged commit 4e429ce into master Oct 5, 2023
7 of 9 checks passed
@AdamGleave AdamGleave deleted the add_test_timeout branch October 5, 2023 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants