[Image Classification] Low Accuracy on EuroSAT Dataset #386

luisquintanilla · 2019-11-27T01:04:09Z

See issue in ML.NET Repo for more details. dotnet/machinelearning#4504.

luisquintanilla · 2019-12-02T15:50:32Z

See explanation of how to achieve good performance on this dataset using the ML.NET Image Classification API. Still need to think about how to get similar performance on AutoML / Model Builder given the characteristics of the dataset.

dotnet/machinelearning#4504 (comment)

luisquintanilla · 2019-12-02T21:56:11Z

Issue Summary

The EuroSAT paper, a geo-referenced aerial/satellite image dataset of 27,000 images categorized into 10 different classes is said to achieve 98.57% classification accuracy using CNNs. More specifically, using ResNet50, it achieves 96.37% accuracy using a 90/10 train/test split. Using ML.NET Image Classification API as well as Model Builder achieves 99%+ accuracy while training. However, when evaluating the model, both with and without cross validation, accuracy drops between 61-69% using only the CPU and 59% using the GPU. See performance comparisons in table below.

Method	Number of Images	Cross-Validation	Training Accuracy	Evaluation Accuracy
API (CPU)	20000 (18000 Train, 2000 Test)	No	0.9946118	0.698
Model Builder (CPU)	27000	Yes	0.9954983	0.6168
Model Builder (GPU)	27000	Yes	N/A	0.5949

Dataset

Dataset download link

Below is the file I used to read the file paths and labels. To test, you can change the parent directory C:\Users\luquinta.REDMOND\Datasets\EuroSAT to wherever you've saved the labelled subdirectories. Also, the extension on the attached file is .txt. The extension in the code is .tsv so make sure to change accordingly when setting the value of TRAIN_DATA_FILEPATH at the top of Program.cs

traindata.txt

Source code / logs

The source code is at the following repo: https://github.com/luisquintanilla/EuroSATTrainSample/blob/master/EuroSATTrainSample/Program.cs

Output logs:

ImageClassificationTrainResultsModelBuilder.txt
ImageClassificationTrainResultsAPI.txt

Potential Solutions

Disabling early stopping and DnnImageFeaturizer seem to yield the most impactful results (accuracy of 93%+)

1. You were using early stopping (and they were not), please use default 200 epochs and turn off early stopping .. doing this alone with will get your accuracy between 93-94%. Early stopping works great when you supply a validation set but you were not even doing that so it falls back to using trainset as validation set! While this is not ideal but it seems to work in practice for some datasets we have tested on but definitely not all.
2. As a comparison point, the DNNImageFeaturizer style gets a bit over 94% accuracy taking 18min on CPU for its first model (then continues sweeping).
3. You were not correctly splitting your dataset into 90:10 train:test split (please see the attached
Program.txt that contains your code amended with code to split the dataset correctly.
4. You were not shuffling the dataset prior to training as the paper does, shuffle transform does not shuffle at the level of an individual data point but it shuffles in blocks.

luisquintanilla · 2019-12-02T21:57:54Z

@JakeRadMSFT See summary of issue above

JakeRadMSFT · 2019-12-02T23:16:04Z

@codemzs I see you closed this issue on the ML.NET side. That's fine but we need some help with next steps.

This is currently blocking our Documentation Folks from being able to use this dataset in the documentation as they had planned. It doesn't seem like they should have to hand pick a data-set for use in the documentation. Customers will likely hit this with their datasets too.

These are the options I can think of:

Update AutoML to not use Early Stopping
- Negatives: Adds time to training that already takes time.
Update AutoML to try with and without Early Stopping
- Negatives: Also adds more time
Have Model Builder pre-split and randomize the dataset
- Negatives: None but it sounds like this doesn't fully solve the problem. Also, isn't that compatible with streaming the dataset. (but we don't do that yet)
Add DNN Feturizer approach and try this first.
- Negatives: Not a true DNN Model?

Thoughts?

JakeRadMSFT · 2019-12-02T23:16:40Z

@justinormont Thoughts?

codemzs · 2019-12-02T23:48:10Z

@luisquintanilla Your "as a comparison point" statement seems a little misleading, Image Classification algorithm actually gets you ~97.1833333333333% accuracy (almost a point higher than the EuroSAT paper with resnet50), please refer to my logs - EuroSAT_90_10_split_200_epochs_shuffle.txt.

Also DNN Featurizer was using Resnet18 and not Resnet50. I cannot stress enough the importance of keeping comparisons apples to apples. No matter how different DNN models(i.e resnet 18, 50, 101 etc) fare against each other when you compare you need to make sure all parameters are the same.

@JakeRadMSFT Lets setup a meeting and talk offline. We are also adding retrain of DNN layers in the next release and that significantly boosts the accuracy but early stopping needs to be enabled when validation set is passed, we can certainly modify the train-test split code to also give us validation set so that early stopping works well. Without validation set early stopping uses train set as validation set that misleads it to stop early ....

JakeRadMSFT · 2019-12-03T03:07:56Z

@codemzs I'd prefer to keep the conversation all in one place and I want to keep Luis in the loop.

What can we do? What do you recommend? I'd like to unblock documentation as soon as we can.

I'm not sure our users are too concerned with Resnet18 vs Resnet50 they just want a model that performs well. It seems odd that a dataset with 27000 images would perform so poorly. Should we turn off Early Stopping if it doesn't work consistently with datasets?

codemzs · 2019-12-03T03:20:26Z

I can recommend several remedies but will prefer offline discussion for efficiency reasons. You may invite Luis to that meeting. Thanks!

luisquintanilla · 2019-12-03T03:21:13Z

The comment summarizing the issue and potential solutions is just a reference so the Model Builder team doesn't have to keep flipping back and forth between the original issue and this one. Although if they need more information, they can do that as well. The table comparison is from when the original issue was posted without taking into account early stopping which seems to be where the performance improvements really come from. That original comparison was intended to see whether the issue was isolated to AutoML/Model Builder. As mentioned in the potential solutions though, disabling early stopping within the API greatly improves performance when using the API. While it's good that using the Image Classification API can achieve comparable results when following the methodologies described in the academic paper, it might be good to think about how similar performance can be achieved with AutoML and dependent tooling. For documentation purposes, an "easy" solution would be to find another dataset or use case. However, that would not be beneficial in the sense that we'd be working around the limitations rather than seeing how improvements can be made overall.

JakeRadMSFT · 2019-12-03T03:26:33Z

@codemzs okay we can discuss in standup or after standup. I'll send a note to standup chat.

I'm find doing that but it's actually less efficient. I may or may not be the developer working on the solution. It's nice to have all the content here for the next developer to work on this.

natke · 2019-12-03T17:40:38Z

We should also validate the performance with early stopping using a validation dataset.

If this gives good enough performance, then perhaps the solution would be Jake's third option, with early stopping remaining enabled.

codemzs · 2019-12-03T17:50:02Z

I have already done that. Will explain at scrum today.

codemzs · 2019-12-13T00:48:03Z

Hi Folks,

I have added a functionality to Image classification API that auto creates the validation set by taking 10%(modifiable) of the images from test set if no validation set is provided and early stopping is enabled and also shuffles the images properly.

Below are logs from which you can see with early stopping the training stopped at 33 epochs that took ~9 minutes on GPU and achieved 97.08% accuracy. Out of 27000 images 24300 images were used as train set and remaining as test set. This change should be in master branch end of this week after which model builder just needs to update the nuget version of ML .NET dependencies.

CC: @harshithapv , @CESARDELATORRE , @JakeRadMSFT , @briacht , @natke , @luisquintanilla, @ashbhandare , @justinormont

Thanks,
Zeeshan Siddiqui

Phase: Training, Dataset used: Validation, Batch Processed Count: 243, Epoch: 31, Accuracy: 0.9502052
Phase: Training, Dataset used: Train, Batch Processed Count: 2187, Learning Rate: 0.003715743 Epoch: 32, Accuracy: 0.9697338, Cross-Entropy: 0.09318368
Phase: Training, Dataset used: Validation, Batch Processed Count: 243, Epoch: 32, Accuracy: 0.9497937
Phase: Training, Dataset used: Train, Batch Processed Count: 2187, Learning Rate: 0.003715743 Epoch: 33, Accuracy: 0.9701452, Cross-Entropy: 0.09265892
Phase: Training, Dataset used: Validation, Batch Processed Count: 243, Epoch: 33, Accuracy: 0.9497937
Saver not created because there are no variables in the graph to restore
2019-12-13 00:31:32.405873: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla M60 major: 5 minor: 2 memoryClockRate(GHz): 1.1775
pciBusID: 0001:00:00.0
2019-12-13 00:31:32.412895: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-12-13 00:31:32.420014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-12-13 00:31:32.423919: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-13 00:31:32.429937: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-12-13 00:31:32.432693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-12-13 00:31:32.437395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7466 MB memory) -> physical GPU (device: 0, name: Tesla M60, pci bus id: 0001:00:00.0, compute capability: 5.2)
Restoring parameters from C:\Users\mladmin\repo\codemzs\machinelearning\bin\AnyCPU.Debug\Microsoft.ML.Samples.GPU\workspace\custom_retrained_model_based_on_resnet_v2_50_299.meta
Froze 2 variables.
Converted 2 variables to const ops.
2019-12-13 00:31:38.397836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla M60 major: 5 minor: 2 memoryClockRate(GHz): 1.1775
pciBusID: 0001:00:00.0
2019-12-13 00:31:38.405122: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-12-13 00:31:38.413486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-12-13 00:31:38.417136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-13 00:31:38.428009: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-12-13 00:31:38.432485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-12-13 00:31:38.441809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7466 MB memory) -> physical GPU (device: 0, name: Tesla M60, pci bus id: 0001:00:00.0, compute capability: 5.2)
Finished training in 519412
Evaluating Model
Finished evaluation in 584099

Evaluation Metrics
Log Loss: 0.0901965446885085 | MacroAccuracy: 0.9708

justinormont · 2019-12-13T01:08:41Z

@codemzs: Quite nice. What was your final MicroAccuracy?

LittleLittleCloud · 2020-01-13T20:02:35Z

@luisquintanilla Could you help test the same dataset on mlnet 0.15.0-preview, to see if accuracy improved? thanks

luisquintanilla · 2020-01-14T18:40:18Z

@LittleLittleCloud accuracy improved after using a version of Model Builder with ML.NET 1.5.0-preview

codemzs · 2020-01-14T21:50:00Z

Thanks @luisquintanilla for finding this issue and thanks @LittleLittleCloud for integrating the latest nuget with the fix. It seems even the training time has improved, 12.43 minutes vs 18 minutes (DNNFeaturizer approach with just first sweep) and also higher accuracy. Lets ship it!

JakeRadMSFT self-assigned this Jan 6, 2020

JakeRadMSFT added P1 Priority:0 Work that we can't release without labels Jan 7, 2020

JakeRadMSFT added this to the January 2020 milestone Jan 12, 2020

JakeRadMSFT removed their assignment Jan 13, 2020

codemzs self-assigned this Jan 14, 2020

codemzs closed this as completed Jan 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Image Classification] Low Accuracy on EuroSAT Dataset #386

[Image Classification] Low Accuracy on EuroSAT Dataset #386

luisquintanilla commented Nov 27, 2019

luisquintanilla commented Dec 2, 2019

luisquintanilla commented Dec 2, 2019 •

edited

luisquintanilla commented Dec 2, 2019

JakeRadMSFT commented Dec 2, 2019 •

edited

JakeRadMSFT commented Dec 2, 2019

codemzs commented Dec 2, 2019

JakeRadMSFT commented Dec 3, 2019

codemzs commented Dec 3, 2019

luisquintanilla commented Dec 3, 2019

JakeRadMSFT commented Dec 3, 2019 •

edited

natke commented Dec 3, 2019

codemzs commented Dec 3, 2019

codemzs commented Dec 13, 2019 •

edited

justinormont commented Dec 13, 2019

LittleLittleCloud commented Jan 13, 2020

luisquintanilla commented Jan 14, 2020

codemzs commented Jan 14, 2020 •

edited

[Image Classification] Low Accuracy on EuroSAT Dataset #386

[Image Classification] Low Accuracy on EuroSAT Dataset #386

Comments

luisquintanilla commented Nov 27, 2019

luisquintanilla commented Dec 2, 2019

luisquintanilla commented Dec 2, 2019 • edited

Issue Summary

Dataset

Source code / logs

Potential Solutions

luisquintanilla commented Dec 2, 2019

JakeRadMSFT commented Dec 2, 2019 • edited

JakeRadMSFT commented Dec 2, 2019

codemzs commented Dec 2, 2019

JakeRadMSFT commented Dec 3, 2019

codemzs commented Dec 3, 2019

luisquintanilla commented Dec 3, 2019

JakeRadMSFT commented Dec 3, 2019 • edited

natke commented Dec 3, 2019

codemzs commented Dec 3, 2019

codemzs commented Dec 13, 2019 • edited

justinormont commented Dec 13, 2019

LittleLittleCloud commented Jan 13, 2020

luisquintanilla commented Jan 14, 2020

codemzs commented Jan 14, 2020 • edited

luisquintanilla commented Dec 2, 2019 •

edited

JakeRadMSFT commented Dec 2, 2019 •

edited

JakeRadMSFT commented Dec 3, 2019 •

edited

codemzs commented Dec 13, 2019 •

edited

codemzs commented Jan 14, 2020 •

edited