Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model accuracy is very low,i don't know why? #74

Closed
kevinsay opened this issue Jul 27, 2018 · 11 comments
Closed

model accuracy is very low,i don't know why? #74

kevinsay opened this issue Jul 27, 2018 · 11 comments

Comments

@kevinsay
Copy link

kevinsay commented Jul 27, 2018

I load the zhihu-word2vec-title-desc.bin-100 as the wordvector file,train-zhihu4-only-title-all.txt as the trainning file,set multi_label_flag=false,use_embedding=true,
a01_FastText
a03_TextRNN
a04_TextRCNN
a05_HierarchicalAttentionNetwork
a06_Seq2seqWithAttention
these models can run,but the accuracy is very low,i don't know why.
and predict,also set multi_label_flag=false,use_embedding=true,there will be more than one prediction label,i need you help.thanks.

@f20500909
Copy link

f20500909 commented Jul 27, 2018

may i ask you where did you find zhihu-word2vec-title-desc.bin-100 file.
I can find it in the project , did you generate it ?
@kevinsay

@kevinsay
Copy link
Author

i get it from author Baidu cloud sharing,we can also use google word2vec to generate it.

@f20500909
Copy link

I use create_voabulary funtion to generate vocab_label.pik to substitute zhihu-word2vec-title-desc.bin-100 file.
But I can not find the Baidu cloud sharing link in README.md,i think it is very helpful for us to study this project.
I would be very grateful if you could share it, can you have a share?
@kevinsay

@kevinsay
Copy link
Author

kevinsay commented Jul 27, 2018

link:https://pan.baidu.com/s/1orPKC0cahrIW0CUvPxts1g
pwd:bguc
@f20500909 i share the file to you,and hope you can share your trainning and predict results with me.

@f20500909
Copy link

f20500909 commented Jul 27, 2018

Thank you very much.
After I comprehend the code and run it accurately. i will share my corpus and trainning and predict results with you
It's very greatful of you to share these files
@kevinsay

Thank you for sharing. But after many days of trying, I found it is too hard to understand the code for me.I had given up learning the project so couldn't share with you my results.
But I found an equally good project that achieve similar functions and it is easy to learn, and the corpus is also very complete ,so i share with you. I hope it will help you. Thanks for your sharing again.
link:https://github.com/zhengwsh/text-classification

@kevinsay

@lreaderl
Copy link

lreaderl commented Aug 5, 2018

Hello, my F1 score is very low on single label classification as below:
Epoch 19 Validation Loss:2.709 F1 Score:0.282 Precision:0.169 Recall:0.846
Have you find any solution to that?

@brightmart
Copy link
Owner

brightmart commented Aug 7, 2018 via email

@lreaderl
Copy link

lreaderl commented Aug 8, 2018

My dataset has 19 classes, with about 100000 training samples. And the average length of training data is about 150.

@kevinsay
Copy link
Author

@f20500909 ok,thanks.

@kevinsay
Copy link
Author

@brightmart Does the length of training sample affect the accuracy of the model?one of my datasets,the average length of the sample is 10,but i pad_sequences them to 20,50 or 100 when i train model,accuracy is low.

@brightmart
Copy link
Owner

if implement correctly, pad should have mini impact to performance, as you can mask out the embedding from pad token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants