New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Image Classification] Low Accuracy on EuroSAT Dataset #386
Comments
See explanation of how to achieve good performance on this dataset using the ML.NET Image Classification API. Still need to think about how to get similar performance on AutoML / Model Builder given the characteristics of the dataset. |
Issue SummaryThe EuroSAT paper, a geo-referenced aerial/satellite image dataset of 27,000 images categorized into 10 different classes is said to achieve 98.57% classification accuracy using CNNs. More specifically, using ResNet50, it achieves 96.37% accuracy using a 90/10 train/test split. Using ML.NET Image Classification API as well as Model Builder achieves 99%+ accuracy while training. However, when evaluating the model, both with and without cross validation, accuracy drops between 61-69% using only the CPU and 59% using the GPU. See performance comparisons in table below.
DatasetBelow is the file I used to read the file paths and labels. To test, you can change the parent directory Source code / logsThe source code is at the following repo: https://github.com/luisquintanilla/EuroSATTrainSample/blob/master/EuroSATTrainSample/Program.cs Output logs: ImageClassificationTrainResultsModelBuilder.txt Potential SolutionsDisabling early stopping and DnnImageFeaturizer seem to yield the most impactful results (accuracy of 93%+) 1. You were using early stopping (and they were not), please use default 200 epochs and turn off early stopping .. doing this alone with will get your accuracy between 93-94%. Early stopping works great when you supply a validation set but you were not even doing that so it falls back to using trainset as validation set! While this is not ideal but it seems to work in practice for some datasets we have tested on but definitely not all. |
@JakeRadMSFT See summary of issue above |
@codemzs I see you closed this issue on the ML.NET side. That's fine but we need some help with next steps. This is currently blocking our Documentation Folks from being able to use this dataset in the documentation as they had planned. It doesn't seem like they should have to hand pick a data-set for use in the documentation. Customers will likely hit this with their datasets too. These are the options I can think of:
Thoughts? |
@justinormont Thoughts? |
@luisquintanilla Your "as a comparison point" statement seems a little misleading, Image Classification algorithm actually gets you ~97.1833333333333% accuracy (almost a point higher than the EuroSAT paper with resnet50), please refer to my logs - EuroSAT_90_10_split_200_epochs_shuffle.txt. Also DNN Featurizer was using Resnet18 and not Resnet50. I cannot stress enough the importance of keeping comparisons apples to apples. No matter how different DNN models(i.e resnet 18, 50, 101 etc) fare against each other when you compare you need to make sure all parameters are the same. @JakeRadMSFT Lets setup a meeting and talk offline. We are also adding retrain of DNN layers in the next release and that significantly boosts the accuracy but early stopping needs to be enabled when validation set is passed, we can certainly modify the train-test split code to also give us validation set so that early stopping works well. Without validation set early stopping uses train set as validation set that misleads it to stop early .... |
@codemzs I'd prefer to keep the conversation all in one place and I want to keep Luis in the loop. What can we do? What do you recommend? I'd like to unblock documentation as soon as we can. I'm not sure our users are too concerned with Resnet18 vs Resnet50 they just want a model that performs well. It seems odd that a dataset with 27000 images would perform so poorly. Should we turn off Early Stopping if it doesn't work consistently with datasets? |
I can recommend several remedies but will prefer offline discussion for efficiency reasons. You may invite Luis to that meeting. Thanks! |
The comment summarizing the issue and potential solutions is just a reference so the Model Builder team doesn't have to keep flipping back and forth between the original issue and this one. Although if they need more information, they can do that as well. The table comparison is from when the original issue was posted without taking into account early stopping which seems to be where the performance improvements really come from. That original comparison was intended to see whether the issue was isolated to AutoML/Model Builder. As mentioned in the potential solutions though, disabling early stopping within the API greatly improves performance when using the API. While it's good that using the Image Classification API can achieve comparable results when following the methodologies described in the academic paper, it might be good to think about how similar performance can be achieved with AutoML and dependent tooling. For documentation purposes, an "easy" solution would be to find another dataset or use case. However, that would not be beneficial in the sense that we'd be working around the limitations rather than seeing how improvements can be made overall. |
@codemzs okay we can discuss in standup or after standup. I'll send a note to standup chat. I'm find doing that but it's actually less efficient. I may or may not be the developer working on the solution. It's nice to have all the content here for the next developer to work on this. |
We should also validate the performance with early stopping using a validation dataset. If this gives good enough performance, then perhaps the solution would be Jake's third option, with early stopping remaining enabled. |
I have already done that. Will explain at scrum today. |
Hi Folks, I have added a functionality to Image classification API that auto creates the validation set by taking 10%(modifiable) of the images from test set if no validation set is provided and early stopping is enabled and also shuffles the images properly. Below are logs from which you can see with early stopping the training stopped at 33 epochs that took ~9 minutes on GPU and achieved 97.08% accuracy. Out of 27000 images 24300 images were used as train set and remaining as test set. This change should be in master branch end of this week after which model builder just needs to update the nuget version of ML .NET dependencies. CC: @harshithapv , @CESARDELATORRE , @JakeRadMSFT , @briacht , @natke , @luisquintanilla, @ashbhandare , @justinormont Thanks, Phase: Training, Dataset used: Validation, Batch Processed Count: 243, Epoch: 31, Accuracy: 0.9502052 Evaluation Metrics |
@codemzs: Quite nice. What was your final MicroAccuracy? |
@luisquintanilla Could you help test the same dataset on mlnet 0.15.0-preview, to see if accuracy improved? thanks |
@LittleLittleCloud accuracy improved after using a version of Model Builder with ML.NET 1.5.0-preview |
Thanks @luisquintanilla for finding this issue and thanks @LittleLittleCloud for integrating the latest nuget with the fix. It seems even the training time has improved, 12.43 minutes vs 18 minutes (DNNFeaturizer approach with just first sweep) and also higher accuracy. Lets ship it! |
See issue in ML.NET Repo for more details. dotnet/machinelearning#4504.
The text was updated successfully, but these errors were encountered: