Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect prediction while testing after training network #7

Closed
ChrisZeThird opened this issue Jun 8, 2023 · 9 comments
Closed

Incorrect prediction while testing after training network #7

ChrisZeThird opened this issue Jun 8, 2023 · 9 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed

Comments

@ChrisZeThird
Copy link
Member

ChrisZeThird commented Jun 8, 2023

Currently network doesn't give a correct output when running test_network.py. Output is always 0.4202.

  • Diagrams are not all the same
  • Lines are correctly calculated
  • Angles are correctly calculates

In training (simple_network.py) the y_pred is always the same (line 89).

@ChrisZeThird
Copy link
Member Author

ChrisZeThird commented Jun 8, 2023

At line 77, y_pred is constant through each iteration, but changes between iteration
image

@ChrisZeThird ChrisZeThird pinned this issue Jun 8, 2023
@ChrisZeThird ChrisZeThird added this to the Testing network milestone Jun 8, 2023
@ChrisZeThird
Copy link
Member Author

ChrisZeThird commented Jun 8, 2023

The predicted angle tends to convert to 0.42... when training. The expected angle values and the X_batch are all correct.

@ChrisZeThird
Copy link
Member Author

Network might be learning by heart, solutions:

  • Needs resample
  • Smaller network
  • More data
  • Train on synthetic data, and test with experimental one

@ChrisZeThird
Copy link
Member Author

The data are indeed not balanced statistically with a majority of angles between 0.41 and 0.49 with the majority at 0.47 approximately.

image

@ChrisZeThird ChrisZeThird added the enhancement New feature or request label Jun 9, 2023
@ChrisZeThird
Copy link
Member Author

The problem comes maybe from the normalization. Instead of taking the angle over 2*pi what if it was between pi alone considering the angles are all between 0 and 180°. It could increase the spacing between the angles.

@ChrisZeThird
Copy link
Member Author

Changing normalization

Still the exact same issue and now MSE is very high
image

image

@ChrisZeThird ChrisZeThird self-assigned this Jun 9, 2023
@ChrisZeThird ChrisZeThird added bug Something isn't working help wanted Extra attention is needed labels Jun 9, 2023
@ChrisZeThird
Copy link
Member Author

New direction

After talking with @victor-yon, the right direction would be to consider a CNN instead of a simple feed-forward. The CNN would recognize the angle better among the noise of the stability diagram. The network indeed can't find the line, therefore tries to minimize the error by returning a value close to the expected values. Here is the workflow for the coming week (should be done in 2 days max):

  • Implement a CNN to train on synthetic data to validate the theory
  • Apply the trained network on experimental data
  • Repeat step 1 with experimental data for training.

Additional note

  • Check in Victor's code the CNN used
  • Use the derivative of the diagram to be closer to the synthetic data.
  • Create folder in .\saved\model to store CNN and FF separately.

@ChrisZeThird
Copy link
Member Author

Accuracy paradox

There is a great chance the imbalance of data causes issue. However, while testing with synthetic data, the standard deviation doesn't drop. The problem certainly comes from the network not being able to identify features in noisy/non-binary data. But exploring data imbalance a bit more could also help.

This link explains the concept of accuracy paradox, as well as this post.

@ChrisZeThird
Copy link
Member Author

A new loss function

After computing the MSE (Mean Square Error) and the MAE (Mean Absolute Error), another loss function was found: SmoothL1Loss. This new function . It uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise. It is less sensitive to outliers than torch.nn.MSELoss and in some cases prevents exploding gradients.

This loss function helped decreasing the loss drastically and obtained much more accurate results, and by that I mean the network predicts different values for different input. This means this issue can be close, as it's only a matter of tweaking the hyper-parameters to decrease the standard deviation now.

image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant