Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turning some logger.info into logger.debug and remove some logging overhead when not using debug #5211

Merged
merged 4 commits into from
Apr 5, 2021

Conversation

vincentpierre
Copy link
Contributor

Proposed change(s)

Clean up some of the console outputs of training. The behavior under --debug remains unchanged.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

@vincentpierre vincentpierre self-assigned this Apr 1, 2021
@vincentpierre vincentpierre requested review from ervteng and removed request for chriselion April 1, 2021 16:54
@@ -12,7 +12,8 @@
_loggers = set()
_log_level = NOTSET
DATE_FORMAT = "%Y-%m-%d %H:%M:%S"
LOG_FORMAT = "%(asctime)s %(levelname)s [%(filename)s:%(lineno)d] %(message)s"
DEBUG_LOG_FORMAT = "%(asctime)s %(levelname)s [%(filename)s:%(lineno)d] %(message)s"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I think this was added by Anupam for log parsing (so he could get the time). Just checking in with @hvpeteet to make sure we aren't using it anywhere in cloud.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't parse logs in any meaningful way at the moment so I think this should be fine. We essentially just record stdout and stderr and forward it on to the user. This may also get forwarded to stackdriver but each line would already have a timestamp provided upon upload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to provide --debug through the YAML? Just worried if users are depending on the current logs and want to restore their old functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES !
You can do this by adding debug: true in the yaml. For instance :

debug: true
behaviors:
  3DBall:
    trainer_type: ppo
    hyperparameters:
      batch_size: 64
      buffer_size: 12000
      learning_rate: 0.0003
      beta: 0.001
      epsilon: 0.2
      lambd: 0.99
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: true
      hidden_units: 128
      num_layers: 2
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    keep_checkpoints: 5
    max_steps: 500000
    time_horizon: 1000
    summary_freq: 12000
    threaded: true

The use case is not documented, I will add a line in the changelog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks


if log_level == DEBUG:
formatter = logging.Formatter(fmt=DEBUG_LOG_FORMAT, datefmt=DATE_FORMAT)
for logger in _loggers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this for loop be a good candidate for a separate method def set_all_formatters?

@@ -104,7 +104,7 @@ def save_replay_buffer(self) -> None:
Save the training buffer's update buffer to a pickle file.
"""
filename = os.path.join(self.artifact_path, "last_replay_buffer.hdf5")
logger.info(f"Saving Experience Replay Buffer to {filename}...")
logger.debug(f"Saving Experience Replay Buffer to {filename}...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make the loading and saving ones info - mainly because they can be very slow (tens of seconds) if the replay buffer is large. So the user won't think that the code got stuck.

f"{file_name} was left over from a previous run. Deleting."
)
full_fname = os.path.join(directory_name, file_name)
try:
os.remove(full_fname)
except OSError:
logger.warning(
logger.debug(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe logger.error? This one shouldn't happen unless we try to delete a file that has some write permissions blocked

@@ -234,14 +234,14 @@ def _maybe_create_summary_writer(self, category: str) -> None:
def _delete_all_events_files(self, directory_name: str) -> None:
for file_name in os.listdir(directory_name):
if file_name.startswith("events.out"):
logger.warning(
logger.debug(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep this one as a warning so that users know when their old TBs are being overwritten

Copy link
Contributor

@ervteng ervteng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 🍝

@vincentpierre vincentpierre merged commit c0889d5 into main Apr 5, 2021
@delete-merged-branch delete-merged-branch bot deleted the develop-make-some-logs-debug branch April 5, 2021 17:36
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants