Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get evaluation metrics in output logs #392

Open
MelissaKR opened this issue Jun 24, 2020 · 5 comments
Open

How to get evaluation metrics in output logs #392

MelissaKR opened this issue Jun 24, 2020 · 5 comments

Comments

@MelissaKR
Copy link

Hi,

This is my first time working with Sagemaker. I successfully trained a model, however, I'm having difficulty getting it to output evaluation metrics to the log files.

Here is a snippet of my model:

def metric_fn(label_ids, predicted_labels):
    accuracy = tf.compat.v1.metrics.accuracy(label_ids, predicted_labels)
    recall = tf.compat.v1.metrics.recall(label_ids,predicted_labels)
    precision = tf.compat.v1.metrics.precision(label_ids,predicted_labels) 
                
    return {"eval_accuracy": accuracy,
            "precision": precision,
            "recall": recall}
if mode== tf.estimator.ModeKeys.EVAL:
      eval_metrics = metric_fn(label_ids, predicted_labels)
      return tf.estimator.EstimatorSpec(mode=mode,loss=loss,eval_metric_ops=eval_metrics)

And this is how the model is fit:

estimator = TensorFlow(
    entry_point='script.py',
    source_dir = [#Source_dir],
    train_instance_type='ml.m5.2xlarge',
    train_instance_count=4,
    output_path=s3_output_location,
    hyperparameters=hyperparameters,
    role=role,
    py_version='py3',
    framework_version='1.15.2',
    sagemaker_session=sess,
    metric_definitions=[{'Name': 'eval-accuracy', 'Regex': 'eval-accuracy=(\d\.\d+)'},
                        {'Name': 'precision', 'Regex': 'precision=(\d\.\d+)'},
                        {'Name': 'recall', 'Regex': 'recall=(\d\.\d+)'}],
    enable_sagemaker_metrics=True,
    distributions= {'parameter_server': {'enabled': True}})

When the training finishes, I don't see any of these metrics in the logs, nor in the 'training jobs' section. This is how the Metrics section looks:

Metrics
Name Regex
eval-accuracy eval-accuracy=(\d.\d+)
precision precision=(\d.\d+)
recall recall=(\d.\d+)

I don't know why it should be so obscure. I've run the script multiple times with sagemaker, and no luck so far! I'd appreciate any help!

@metrizable
Copy link
Contributor

metrizable commented Jun 30, 2020

@MelissaKR thanks for filing the issue. I noticed that your metric definition regex seeks to match eval-accuracy which differs slightly from the dict key eval_accuracy your metric_fn returns for your EstimatorSpec. Is this difference intentional?

On a side note, you mentioned that you "don't see any of these metrics in the logs". Could you clarify?

@MelissaKR
Copy link
Author

@metrizable Thank you for your input. I will correct the difference in the accuracy metric. I was generally wondering where I can track model outputs for these metrics. I thought they'll be written out to the logs, or they'll show up in the "Metrics" section for the training job.

@laurenyu
Copy link
Contributor

sorry for the delayed response here. The metrics should be viewable in CloudWatch - scroll down to the "Monitor" section in the AWS console when looking at a training job.

docs: https://docs.aws.amazon.com/sagemaker/latest/dg/training-metrics.html

@Miles1996
Copy link

Hi @MelissaKR did you manage to resolve this? I am having the same issue

@lhideki
Copy link

lhideki commented Jul 14, 2022

I had a similar situation, but the cause was IAM Policy permissions; checking CloudWatch/Logs permissions may help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants