Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How do I toggle normalization? #11

Closed
cielonet opened this issue Apr 19, 2024 · 4 comments
Closed

Question: How do I toggle normalization? #11

cielonet opened this issue Apr 19, 2024 · 4 comments

Comments

@cielonet
Copy link

cielonet commented Apr 19, 2024

I see that I can toggle toggle normalization but even when I turn this on/off nothing changes from the output. Not sure if this is working correctly or I am doing something wrong?

# with reg-norm
root@ae6a62491516:/app# bin2ml generate nlp --path file_cfg.json --instruction-type esil --data-out-path .  --output-format funcstring --pairs --reg-norm
root@ae6a62491516:/app# md5sum file_cfg-efs.json 
e4159bbfe995ef55d873e6c9552acc20  file_cfg-efs.json

# without reg-norm get same file
root@ae6a62491516:/app# bin2ml generate nlp --path file_cfg.json --instruction-type esil --data-out-path .  --output-format funcstring --pairs 
root@ae6a62491516:/app# md5sum file_cfg-efs.json 
e4159bbfe995ef55d873e6c9552acc20  file_cfg-efs.json

What I am trying to do is create a non-normalized as well as a normalized output.

@br0kej
Copy link
Owner

br0kej commented Apr 20, 2024

This definitely sounds like a bug. I will investigate over the day or so and push a fix! It's likely down to how the reg norm parameter is passed into the generation code. I've fixed the parameter to default to always normalise in places too!

@br0kej
Copy link
Owner

br0kej commented Apr 20, 2024

I have looked into this. I think the --pairs command is deprecated and actually doesn't do anything.

Without this set, I have been able to generate both normalised and un-normalised examples. Try re-running the commands with "LOG_LEVEL=DEBUG" before. This will print out a lot more info as well as the normalised/unormalised sequences.

@br0kej
Copy link
Owner

br0kej commented Apr 20, 2024

I have looked into this. I think the --pairs command is deprecated and actually doesn't do anything.

Without this set, I have been able to generate both normalised and un-normalised examples. Try re-running the commands with "LOG_LEVEL=DEBUG" before. This will print out a lot more info as well as the normalised/un-!ormalised sequences.

@br0kej
Copy link
Owner

br0kej commented Apr 28, 2024

I had a look at this again this evening. It is actually useful but only when generating singles of ESIL or Disasm. It's for generating data similar to PalmTree's training data.

@br0kej br0kej closed this as completed Apr 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants