New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestions to run it against other datasets #7
Comments
Hi, yes you would need to annotate the data in the same format. In the folder "aste/data/triplet_data", you can create a folder called "new_data", and put train.txt, dev.txt and test.txt inside. Then, you can specify the new dataset for training by modifying line 11 in aste/main.sh to be "--names new_data, " and line 12 to be "--seeds 0, ". |
Thank you! 1 - has the data format changed? 2 - looking at the data generated in the colab, Span-Aste/aste/data/triplet_data/14lap I see that train,test,dev have similar structure: Test Eval Do I need then to label manually the three sets during the first tests on my dataset? Train: Test and Dev: Thank you |
Hi, the data format that the training script needs is the same that is in Span-ASTE/aste/data/triplet_data/14lap/train.txt, which is like the sample below. The train, dev and test samples have the same format. I charge it at night and skip taking the cord with me because of the good battery life .####I=O charge=O it=O at=O night=O and=O skip=O taking=O the=O cord=O with=O me=O because=O of=O the=O good=O battery=T-POS life=T-POS .=O####I=O charge=O it=O at=O night=O and=O skip=O taking=O the=O cord=O with=O me=O because=O of=O the=O good=S battery=O life=O .=O####[([16, 17], [15], 'POS')] |
Hi, to make it more convenient to apply to new datasets, you can omit the tags component of the annotation, and include just the sentence and triplet information, such as the sample below. Each line in the train, dev and test set can have the same format. I charge it at night and skip taking the cord with me because of the good battery life .#### #### ####[([16, 17], [15], 'POS')] |
Hi! I'm pretty new to deep learning and ASTE.
Can you please suggest to me the necessary steps to run this against another dataset?
Do I need to follow this data structure (https://github.com/xuuuluuu/SemEval-Triplet-data/blob/master/README.md#data-description) on my dataset by labeling it?
How can I modify the code on Colab for new datasets? thank you
Any other advice?
Thank you
The text was updated successfully, but these errors were encountered: