Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why are the indices all the same? #19

Closed
18398639574 opened this issue Apr 3, 2020 · 14 comments
Closed

why are the indices all the same? #19

18398639574 opened this issue Apr 3, 2020 · 14 comments

Comments

@18398639574
Copy link

18398639574 commented Apr 3, 2020

Hi~ I tried different parameter for different model, but the result really all the same, except the AUC little high, but other indices like Precision、recall、F1-score all very low. train or test all the same. can you help me? Thanks very much

@RandolphVI
Copy link
Owner

@18398639574

What's your data format looks like?

@18398639574
Copy link
Author

@18398639574

What's your data format looks like?

like this:

{"testid": 1, "features_content": ["好", "了", "你", "觉得", "对", "小区", "有", "什么", "缺点", "吗"], "labels_index": [21], "labels_num": 1}
{"testid": 2, "features_content": ["你", "把", "学区", "还", "可以", "是", "吧"], "labels_index": [21], "labels_num": 1}
{"testid": 3, "features_content": ["得", "还", "不错", "的", "感觉", "除了", "价钱", "方面", "其他", "的", "还有", "哪", "些", "问题", "吗"], "labels_index": [21], "labels_num": 1}

@18398639574
Copy link
Author

@18398639574

What's your data format looks like?

i have 23 classes, every sentence has one or more labels.

@RandolphVI
Copy link
Owner

@18398639574

So, it looks like you use your own data, and did you change the num classes in param_parser.py?

Repository owner deleted a comment from 18398639574 Apr 3, 2020
@18398639574
Copy link
Author

@18398639574

So, it looks like you use your own data, and did you change the num classes in param_parser.py?

yes, my own data,and changed the parameter, and the result always like:
2020-04-03 13:57:28,300 - INFO - All Validation set: Loss 3.6648 | AUC 0.813869 | AUPRC 0.264027
2020-04-03 13:57:28,300 - INFO - Predict by threshold: Precision 0.44, Recall 0.377143, F 0.406154
2020-04-03 13:57:28,300 - INFO - Predict by topK:
2020-04-03 13:57:28,300 - INFO - Top1: Precision 0.44, Recall 0.377143, F 0.406154
2020-04-03 13:57:28,301 - INFO - Top2: Precision 0.291667, Recall 0.5, F 0.368421

@RandolphVI
Copy link
Owner

RandolphVI commented Apr 3, 2020

@18398639574

the result always like, you mean that every evaluate time, the metrics values are totally same as the last time. Like the AUC is always 0.813869 and never change a little bit?

@18398639574
Copy link
Author

@18398639574

the result always like, you mean that every evaluate time, the metrics values are totally same as the last time. Like the AUC is always 0.813869 and never change a little bit?

yes, this problem like the link:#7 (comment)

@RandolphVI
Copy link
Owner

RandolphVI commented Apr 3, 2020

@18398639574

can u provide the train.log, test.log and predictions.json ?

@18398639574
Copy link
Author

18398639574 commented Apr 3, 2020

@ 18398639574

can u provide the train.log, test.log and predictions.json

predictions.txt
Train-Fri Apr 3 13_54_20 2020.log
Test-Fri Apr 3 14_00_18 2020.log

@RandolphVI
Copy link
Owner

@18398639574

It's kind of weird since every evaluate time the loss is different but metrics values are not change a bit. I guess that you may change the code?

Could you provide the data sample you use, like 10 records enough both Train_sample.json and Validation_sample.json.
And also provide the train_fast.py and test_fast.py you now use, I will check it.

@18398639574
Copy link
Author

@18398639574

It's kind of weird since every evaluate time the loss is different but metrics values are not change a bit. I guess that you may change the code?

Could you provide the data sample you use, like 10 records enough both Train_sample.json and Validation_sample.json.
And also provide the train_fast.py and test_fast.py you now use, I will check it.

train_fast.txt
test_fast.txt
train_sample.txt
val_sample.txt

@RandolphVI
Copy link
Owner

@18398639574

hi, I use the data and the code you provide, everything seems okay.

Screen Shot 2020-04-03 at 15 29 40

Note: since you don't provide the word2vec file (because the word2vec file is large object for uploading), I use my own Chinese word2vec model file. And I set the num classes as 23, topK as 2.

here is the train log file.

@18398639574
Copy link
Author

18398639574 commented Apr 3, 2020

@18398639574

hi, I use the data and the code you provide, everything seems okay.

Screen Shot 2020-04-03 at 15 29 40

Note: since you don't provide the word2vec file (because the word2vec file is large object for uploading), I use my own Chinese word2vec model file. And I set the num classes as 23, topK as 2.

here is the train log file.

OK thanks but why the indices are so low?

Repository owner deleted a comment from 18398639574 Apr 3, 2020
@RandolphVI
Copy link
Owner

@18398639574

The metrics values are low, it could be many reasons. Like:

  1. The data labels are imbalanced. (Could be the major reason since I can only see the tiny data sample file)
  2. The pad sequence length (and other parameters setting) you set is not very ideal.
  3. You use the Chinese word, maybe the word2vec model file you trained is not okay. Maybe the word segmentation results need to be refined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants