Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about USPTO-full preparation #7

Open
jiachengxiong opened this issue Nov 16, 2023 · 1 comment
Open

Questions about USPTO-full preparation #7

jiachengxiong opened this issue Nov 16, 2023 · 1 comment

Comments

@jiachengxiong
Copy link

I the article, author said: "The same procedure was used to build the edits vocabulary on USPTO-full dataset and the difference is that the edits Attach LG must appear at least 50 times in the training set of USPTO-full before it will be collected into the vocabulary. This edits vocabulary include 6 bond edits, 336 atom edits (8 Change Atom and 328 Attach LG), and a termination symbol."

Apart from processing training data, will test data for leaving the group that is not in the vocabulary be deleted?

@Jamson-Zhong
Copy link
Owner

In the USPTO-full dataset, the test data which the leaving group is not in the vocabulary were retained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants