Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

only 0.71 acc on Weibo Dataset #9

Open
Tangnameless opened this issue Apr 21, 2022 · 2 comments
Open

only 0.71 acc on Weibo Dataset #9

Tangnameless opened this issue Apr 21, 2022 · 2 comments

Comments

@Tangnameless
Copy link

On the Weibo dataset, I only got 71 accuracy socre.
I didn't change your model or training parameters.
Cause I don't have image_embed.pkl and xx_content_segmented.txt, I can only preprocess the data according to my own guess.

@gymbeijing
Copy link

Hi @Tangnameless , I was trying to reproduce the model on Twitter dataset. But I found some file missing. How did you handle the missing files on the Weibo dataset?

@Tangnameless
Copy link
Author

Hi @Tangnameless , I was trying to reproduce the model on Twitter dataset. But I found some file missing. How did you handle the missing files on the Weibo dataset?

没有进行Twitter数据集的实验,对于Weibo数据集

  1. 对于缺少的文本分词,我使用jieba进行中文分词,然后按照论文说的,用训练集自己训练32维的word2vec词向量。(直觉上觉得先把微博文本翻译成英文再嵌入多此一举)
  2. 对于缺少的图像嵌入,直接使用pytorch提供的预训练vgg-19,提取倒数第二层,输出一个4096维的向量。
    由于不知道确切的预处理步骤,复现效果不理想

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants