New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model Builder error on Azure Training #991
Comments
|
Seems that AzClient can't get computer info, can you go to azure portal and check if selected compute is created succesfully? |
|
This is exactly my issue. And to answer you, @LittleLittleCloud - my compute is successfully created. |
|
Thanks for the feedback @elbruno @arafattehsin We are taking a look at it |
|
Thanks for reporting @elbruno and @arafattehsin. This aggregate exception message isn't helpful (need to fix that!) I'd like to know if the run was started and ran into a problem. Can you check the ML portal (https://ml.azure.com) and see if the run has any information? If the run can't access the compute for some reason the experiment error should tell us the problem. Thanks for your patience while we figure this out! If you're not familiar with the ML portal...
|
|
Hi @beccamc - Thank you for this detailed explanation. Unfortunately, I can't even see any experiments being executed. My experiment is created but I don't see any run. |
|
@arafattehsin This is so strange... It looks like the training fail at fetching compute step so experiemnt won't even have a chance to create. T'm trying to reproduce the error, could you tell me how do you create compute, did you use the UI in model builder? or through Azure portal. And could you share your compute's property, and which configuration did you use to create compute. |
|
@LittleLittleCloud I created it using the UI in Model Builder If you can tell you a way which worked for you, please do so as I can try the exact same way to make it work. |
|
@arafattehsin I didn't do anything specifically.. Could you try creating compute in azure portal, this is the configuration I use |
|
Use dedicated machine solves the problem, better error message when launch training on low-priority machine fails is needed |








System Information (please complete the following information):
Describe the bug
Complete step by step to train an Image Recognition scenario using Azure Environment.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
After uploading all the training images, Model Builder raises this exception
at System.Threading.Tasks.Task
1.GetResultCore(Boolean waitCompletionNotification) at Azure.MachineLearning.Services.Compute.ComputeTargetPageFetcher.FetchNextPage() in /_/src/Microsoft.ML.AzureMLClient/Compute/ComputeTargetPageFetcher.cs:line 35 at Azure.MachineLearning.Services.LazyEnumerator1.d__9.MoveNext() in //src/Microsoft.ML.AzureMLClient/LazyEnumerator.cs:line 26at System.Linq.Enumerable.WhereEnumerableIterator
1.MoveNext() at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable1 source)at AzureML.AutoMLRunnerImages.d__27.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/RemoteAutoML/AutoMLRunnerImages.cs:line 233
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AzureImageClassificationExperiment.d__13.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AzureImageClassificationExperiment.cs:line 69
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ML.ModelBuilder.AutoMLEngine.d__26.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 134
Additional context
Attached the full runtime log.
b6e7c212-c030-45b8-8c31-a8c125ae1115.txt
The text was updated successfully, but these errors were encountered: