Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dstc7_aug data does not work for cross encoder training #7

Closed
liuhl-source opened this issue Jan 17, 2021 · 9 comments
Closed

dstc7_aug data does not work for cross encoder training #7

liuhl-source opened this issue Jan 17, 2021 · 9 comments

Comments

@liuhl-source
Copy link

All input samples are positive, the training is meaningless.

@chijames
Copy link
Owner

Sorry I don't get it. What do you mean by all positive?

@liuhl-source
Copy link
Author

?

Thanks for your reply. I just run your code on cross mode and it does not work because the loss is always 0. Then I print the data sampled from your DataLoader, It shows that all samples are positive samples. I suggest a contrastive loss between positive and negative samples should be considered. Thanks for you work.

@chijames
Copy link
Owner

chijames commented Jan 17, 2021

Really? I ran the code again, and it seems correct. The loss is not 0, and I did see labels contain something like [1,0,0,0,0]. What did you see in the data, for example, dstc7/train.txt? It should be something like ln1:1\tTEXT ln2:0\tTEXT.... The dataset.py basically loads this file and parse it into several training instances, where each instance contains a positive sample and 15 negative samples.

@liuhl-source
Copy link
Author

Really? I ran the code again, and it seems correct. The loss is not 0, and I did see labels contain something like [1,0,0,0,0]. What did you see in the data, for example, dstc7/train.txt? It should be something like ln1:1\tTEXT ln2:0\tTEXT.... The dataset.py basically loads this file and parse it into several training instances, where each instance contains a positive sample and 15 negative samples.

Thanks. I download the dstc7_aug dataset and parse it instead of the dstc7 dataset. I think there is something different between the two dataset.

@chijames
Copy link
Owner

chijames commented Jan 18, 2021

Did you use the parsing code I put in the dstc7_aug folder? Also follow the instructions in readme.

@liuhl-source
Copy link
Author

Did you use the parsing code I put in the dstc7_aug folder? Also follow the instructions in readme.

thanks!I follow the instructions and the loss is the same 0.

I copy the train.txt:
1 participant 1: I got in, thanks you. Can anyone paste me the default ubuntu 16.04.2 /etc/rsyncd.conf please ? participant 2: which package does this beloig to?. https://packages.ubuntu.com/xenial/amd64/rsync/filelist does not list rsyncd participant 1: hm : the fact is I tweaked some stuffs in rsync and it crashed my full server, lost all ssh access, samba access, etc. Now I'm on my server running a livecd and I access the hard drive through the livecd. So I'm checking my bash history to see what might be wrong. Thanks for the link, checking other files I have edited. I also edited /etc/default/rsync it seems so, your sevrer is now running the ubuntu 16.04 x86_54 live dvd, and you ssh into it from a different computer, is this correct? 1 participant 1: Does anyone know how I can disable xgl on startup?? participant 2: appearance in preferences participant 1: im in kde desktop. xgl working here but not in gnome. Just get a white screen in gnome. hm, dunno for sure- check if there's something in ~/.kde/Autostart/ 1 participant 1: 15.04, gnome terminal not working, just hangs the console if launched participant 2: any errors when you launch gnome-terminal from another terminal participant 1: nope. it just does not display a window/neither it produces any output so it starts allright but just hangs?. can you install something like terminator? 1 participant 1: I have two problems: First, when I play movie files (i.e. youtube) in firefox, there is no sound. This is a problem that I had fixed previously, but has since resurfaced--I believe the resurrection was caused by firefox updating; however, the previous method I used to fix this problem--install latest flashplayer--is not working. The second problem is that I cannot burn cd or dvd's. I have gone through several forums and it seems that this is a participant 2: i am having the exact same problem as of the latest firefox update. are your flash videos hanging after 2 seconds, too? participant 1: <unconvertable> . any idea how to fix? no. i keep asking :< 1 participant 1: Anyone using ARM based architecture for Ubuntu desktop here?. SBCs such as this: http://www.hardkernel.com/main/products/prdt_info.php participant 2: http://www.ubuntu.com/download/server/arm participant 1: No background services (I already have one SBC for that). Full XFCE or other Unity experiment with what? state your problem 1 participant 1: So right click on the folder I want to put on my desktop and then click on what? participant 2: make a link participant 1: "Make a link" is greyed out participant 2: then you don't have write access to that directory.. Kinda sucks nautilus behaves that way... uh participant 1: The folder I'm trying to link is NTSF you can still make a launcher on the desktop with the command "nautilus /the/directory/you/want" 1 participant 1: does anyone even use raid ? participant 2: zezu, i am learning about it in a Cisco course, but i have never used it.. which raid are you using? participant 1: raid0. striping lol. what are you trying to do? 1 participant 1: can anyone help me troubleshoot a kernel panic? my machine lasts 2-12 hours before going into panic, and I've tried a variety of installed kernels... having trouble narrowing down the problem participant 2: which os/version ? participant 1: Ubuntu 9.04, Linux 2.6.28-15-generic (but I've tried some older kernels too) participant 2: Well, I've seen no other reports of kernel panics in 9.04 offhand. I'm wondering if you have a hardware problem? Is it always the same backtrace or different ones? participant 1: it doesn't always seem to appear in syslog, but when I've found it it's been similar (anywhere else I should be looking?). it just started happening a few weeks ago, been running Ubuntu find on this machine since Dapper dunno what to suggest. You can google parts of the backtrace to see if it's a known issue. It all else fails, drop back to 8.04, which is solid as a rock...? 1 participant 1: hi. is anyone noticing that jaunty+ext4 seems to be accumulating enough cache to shove applications into swap before the RAM even hits 50%? participant 2: how would i monitor this? participant 1: run "watch free -m" in a terminal. then have lots and lots of IO. then start RAM-hungry apps. then watch as your swap fills up even before your RAM hits 50% participant 2: hmmmmmm participant 1: currently my RAM is at 36%, swap is at 39%. yesterday it ended up trashing when swap hit 100% and RAM hit 60% how much ram are you talking here? 1 participant 1: propagandhi: I think it's possible to enable DMA mode using hdparm, but I'm not sure how to set it; only get it. participant 2: propagandhi hdparm -d1 /dev/hda tunrs in on participant 1: Does'nt work. participant 2: do you get an error? participant 1: I get the help menu... participant 2: are you root? participant 1: Using sudo, it should work, correct? participant 2: yes participant 1: What does the 'l' option do? participant 2: its a number one not a letter L participant 1: Oh. That explains it. =). *yaaaaaaay* participant 2: but the 1 means 'on'. did that work for you participant 1: Indeed it did. =) great! 1 participant 1: propagandhi: I think it's possible to enable DMA mode using hdparm, but I'm not sure how to set it; only get it. participant 2: propagandhi hdparm -d1 /dev/hda tunrs in on participant 1: Does'nt work. participant 2: do you get an error? participant 1: I get the help menu... participant 2: are you root? participant 1: Using sudo, it should work, correct? participant 2: yes participant 1: What does the 'l' option do? participant 2: its a number one not a letter L participant 1: Oh. That explains it. =). *yaaaaaaay* but the 1 means 'on'. did that work for you 1 participant 1: propagandhi: I think it's possible to enable DMA mode using hdparm, but I'm not sure how to set it; only get it. participant 2: propagandhi hdparm -d1 /dev/hda tunrs in on participant 1: Does'nt work. participant 2: do you get an error? participant 1: I get the help menu... participant 2: are you root? participant 1: Using sudo, it should work, correct? participant 2: yes participant 1: What does the 'l' option do? its a number one not a letter L 1 participant 1: propagandhi: I think it's possible to enable DMA mode using hdparm, but I'm not sure how to set it; only get it. participant 2: propagandhi hdparm -d1 /dev/hda tunrs in on participant 1: Does'nt work. participant 2: do you get an error? participant 1: I get the help menu... participant 2: are you root? participant 1: Using sudo, it should work, correct? yes 1 participant 1: propagandhi: I think it's possible to enable DMA mode using hdparm, but I'm not sure how to set it; only get it. participant 2: propagandhi hdparm -d1 /dev/hda tunrs in on participant 1: Does'nt work. participant 2: do you get an error? participant 1: I get the help menu... are you root? 1 participant 1: propagandhi: I think it's possible to enable DMA mode using hdparm, but I'm not sure how to set it; only get it. participant 2: propagandhi hdparm -d1 /dev/hda tunrs in on participant 1: Does'nt work. do you get an error?

@liuhl-source
Copy link
Author

error

Maybe the official annotation format has been changed? Can you try it again, thank you very much.

@chijames
Copy link
Owner

No, that is my fault. Thanks for pointing this out!

So here is the reason: The dstc7_aug data is downloaded from the Parlai library, where they used it for bi/poly encoders training. This works fine for them since they "recycle" the other training instances in one batch for negative samples. However, this does not work with cross encoder setting as it needs separate negative samples. I fail to mention this in readme.

I will add a line on readme stating that dstc7_aug cannot be used for cross encoder training. However, if you really want to train on it, you can sample some negative candidates for each context and add them to the file. Specifically, read the data format in dstc7/ to get the idea. That being said, I strongly discourage you to do so, because cross encoder training already takes super long time on the original dstc7 dataset, let alone the augmented one.

@chijames chijames changed the title Cross mode error dstc7_aug data does not work for cross encoder training Jan 18, 2021
@liuhl-source
Copy link
Author

augmented

Fine. Thanks for your quick reply. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants