Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-generate topics and re-train fraud detection #1

Closed
1 task done
Philipp-Sc opened this issue Apr 13, 2023 · 4 comments
Closed
1 task done

Re-generate topics and re-train fraud detection #1

Philipp-Sc opened this issue Apr 13, 2023 · 4 comments
Assignees

Comments

@Philipp-Sc
Copy link
Owner

Philipp-Sc commented Apr 13, 2023

  • Re-generate topics and re-train fraud detection with bigger dataset of governance proposals.
governance_proposal_spam_ham.csv 
---------------
count spam: 172
count ham: 2551

Note: This will be great to reduce false positives, since the model has not yet seen many ham (and spam) data for governance proposals.

Note: consider reducing the ham dataset by filtering some of the rejected proposals with high votes against. To make sure not to train likely spam as ham.

@Philipp-Sc
Copy link
Owner Author

Philipp-Sc commented Apr 13, 2023

  • add DAO governance proposals first

@Philipp-Sc
Copy link
Owner Author

Philipp-Sc commented Jun 7, 2023

  • refactor dataset loading: instead of loading a boolean load the label as f64. That way the float label from governance_proposal_spam_ham.csv can be used.

@Philipp-Sc
Copy link
Owner Author

Philipp-Sc commented Jul 2, 2023

Instead of predicting all topics at once (the sum of the predictions equal to 1) predict (binary) topic pairs e.g ["hot","cold"]

  • evaluate performance vs previous technique.

New technique performs better. A potential drawback is that a higher number of topics might increase the inference time and makes it take to long on CPU only systems.

@Philipp-Sc
Copy link
Owner Author

Philipp-Sc commented Jul 2, 2023

  • consider feature selection, to improve inference time. (relevant for CPU only systems)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant