Suggestions to run it against other datasets #7

Jurys22 · 2021-12-15T16:51:47Z

Hi! I'm pretty new to deep learning and ASTE.

Can you please suggest to me the necessary steps to run this against another dataset?
Do I need to follow this data structure (https://github.com/xuuuluuu/SemEval-Triplet-data/blob/master/README.md#data-description) on my dataset by labeling it?
How can I modify the code on Colab for new datasets? thank you
Any other advice?

Thank you

chiayewken · 2021-12-29T07:49:20Z

Hi, yes you would need to annotate the data in the same format. In the folder "aste/data/triplet_data", you can create a folder called "new_data", and put train.txt, dev.txt and test.txt inside. Then, you can specify the new dataset for training by modifying line 11 in aste/main.sh to be "--names new_data, " and line 12 to be "--seeds 0, ".

Jurys22 · 2022-01-04T16:20:59Z

Thank you!
I was reading a closed issue about data format, and I am wondering:

1 - has the data format changed?
From:
Exactly=O as=O posted=O plus=O a=O great=O value=T-POS .=O####Exactly=O as=O posted=O plus=O a=O great=S value=O .=O####[([6], [5], 'POS')]
To:
Exactly as posted plus a great value . [([6], [5], 'POS')]

2 - looking at the data generated in the colab, Span-Aste/aste/data/triplet_data/14lap I see that train,test,dev have similar structure:
Train
Not even safe mode boots .####Not=O even=O safe=T-NEG mode=T-NEG boots=O .=O####Not=S even=O safe=O mode=O boots=O .=O####[([2, 3], [0], 'NEG')]

Test
A lot of features and shortcuts on the MBP that I was never exposed to on a normal PC .####A=O lot=O of=O features=T-NEU and=O shortcuts=TT-NEU on=O the=O MBP=O that=O I=O was=O never=O exposed=O to=O on=O a=O normal=O PC=O .=O####A=O lot=S of=S features=O and=O shortcuts=O on=O the=O MBP=O that=O I=O was=O never=O exposed=O to=O on=O a=O normal=O PC=O .=O####[([3], [1, 2], 'NEU'), ([5], [1, 2], 'NEU')]

Eval
It was slow , locked up , and also had hardware replaced after only 2 months !####It=O was=O slow=O ,=O locked=O up=O ,=O and=O also=O had=O hardware=T-NEG replaced=O after=O only=O 2=O months=O !=O####It=O was=O slow=O ,=O locked=O up=O ,=O and=O also=O had=O hardware=O replaced=S after=O only=O 2=O months=O !=O####[([10], [11], 'NEG')]

Do I need then to label manually the three sets during the first tests on my dataset?
If yes, once I am sure that it works on my type of dataset, should the final data format be something like that -I will use the same sentence for the example but of course they will be different in the real scenario:

Train:
Exactly as posted plus a great value . [([6], [5], 'POS')]

Test and Dev:
Exactly as posted plus a great value .

Thank you

chiayewken · 2022-01-10T07:13:31Z

Hi, the data format that the training script needs is the same that is in Span-ASTE/aste/data/triplet_data/14lap/train.txt, which is like the sample below. The train, dev and test samples have the same format.

I charge it at night and skip taking the cord with me because of the good battery life .####I=O charge=O it=O at=O night=O and=O skip=O taking=O the=O cord=O with=O me=O because=O of=O the=O good=O battery=T-POS life=T-POS .=O####I=O charge=O it=O at=O night=O and=O skip=O taking=O the=O cord=O with=O me=O because=O of=O the=O good=S battery=O life=O .=O####[([16, 17], [15], 'POS')]

chiayewken · 2022-01-10T08:43:55Z

Hi, to make it more convenient to apply to new datasets, you can omit the tags component of the annotation, and include just the sentence and triplet information, such as the sample below. Each line in the train, dev and test set can have the same format.

I charge it at night and skip taking the cord with me because of the good battery life .#### #### ####[([16, 17], [15], 'POS')]

Jurys22 closed this as completed Jan 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions to run it against other datasets #7

Suggestions to run it against other datasets #7

Jurys22 commented Dec 15, 2021

chiayewken commented Dec 29, 2021

Jurys22 commented Jan 4, 2022

chiayewken commented Jan 10, 2022

chiayewken commented Jan 10, 2022

Suggestions to run it against other datasets #7

Suggestions to run it against other datasets #7

Comments

Jurys22 commented Dec 15, 2021

chiayewken commented Dec 29, 2021

Jurys22 commented Jan 4, 2022

chiayewken commented Jan 10, 2022

chiayewken commented Jan 10, 2022