why are the indices all the same? #19

18398639574 · 2020-04-03T06:11:43Z

Hi~ I tried different parameter for different model, but the result really all the same, except the AUC little high, but other indices like Precision、recall、F1-score all very low. train or test all the same. can you help me? Thanks very much

RandolphVI · 2020-04-03T06:14:30Z

@18398639574

What's your data format looks like?

18398639574 · 2020-04-03T06:16:17Z

@18398639574

What's your data format looks like?

like this:

{"testid": 1, "features_content": ["好", "了", "你", "觉得", "对", "小区", "有", "什么", "缺点", "吗"], "labels_index": [21], "labels_num": 1}
{"testid": 2, "features_content": ["你", "把", "学区", "还", "可以", "是", "吧"], "labels_index": [21], "labels_num": 1}
{"testid": 3, "features_content": ["得", "还", "不错", "的", "感觉", "除了", "价钱", "方面", "其他", "的", "还有", "哪", "些", "问题", "吗"], "labels_index": [21], "labels_num": 1}

18398639574 · 2020-04-03T06:18:43Z

@18398639574

What's your data format looks like?

i have 23 classes, every sentence has one or more labels.

RandolphVI · 2020-04-03T06:18:56Z

@18398639574

So, it looks like you use your own data, and did you change the num classes in param_parser.py?

18398639574 · 2020-04-03T06:23:09Z

@18398639574

So, it looks like you use your own data, and did you change the num classes in param_parser.py?

yes, my own data,and changed the parameter, and the result always like:
2020-04-03 13:57:28,300 - INFO - All Validation set: Loss 3.6648 | AUC 0.813869 | AUPRC 0.264027
2020-04-03 13:57:28,300 - INFO - Predict by threshold: Precision 0.44, Recall 0.377143, F 0.406154
2020-04-03 13:57:28,300 - INFO - Predict by topK:
2020-04-03 13:57:28,300 - INFO - Top1: Precision 0.44, Recall 0.377143, F 0.406154
2020-04-03 13:57:28,301 - INFO - Top2: Precision 0.291667, Recall 0.5, F 0.368421

RandolphVI · 2020-04-03T06:26:06Z

@18398639574

the result always like, you mean that every evaluate time, the metrics values are totally same as the last time. Like the AUC is always 0.813869 and never change a little bit?

18398639574 · 2020-04-03T06:31:10Z

@18398639574

the result always like, you mean that every evaluate time, the metrics values are totally same as the last time. Like the AUC is always 0.813869 and never change a little bit?

yes, this problem like the link:#7 (comment)

RandolphVI · 2020-04-03T06:35:31Z

@18398639574

can u provide the train.log, test.log and predictions.json ?

18398639574 · 2020-04-03T06:53:57Z

@ 18398639574

can u provide the train.log, test.log and predictions.json ？

predictions.txt
Train-Fri Apr 3 13_54_20 2020.log
Test-Fri Apr 3 14_00_18 2020.log

RandolphVI · 2020-04-03T07:07:23Z

@18398639574

It's kind of weird since every evaluate time the loss is different but metrics values are not change a bit. I guess that you may change the code?

Could you provide the data sample you use, like 10 records enough both Train_sample.json and Validation_sample.json.
And also provide the train_fast.py and test_fast.py you now use, I will check it.

18398639574 · 2020-04-03T07:17:47Z

@18398639574

It's kind of weird since every evaluate time the loss is different but metrics values are not change a bit. I guess that you may change the code?

Could you provide the data sample you use, like 10 records enough both Train_sample.json and Validation_sample.json.
And also provide the train_fast.py and test_fast.py you now use, I will check it.

train_fast.txt
test_fast.txt
train_sample.txt
val_sample.txt

RandolphVI · 2020-04-03T07:36:06Z

@18398639574

hi, I use the data and the code you provide, everything seems okay.

Note: since you don't provide the word2vec file (because the word2vec file is large object for uploading), I use my own Chinese word2vec model file. And I set the num classes as 23, topK as 2.

here is the train log file.

18398639574 · 2020-04-03T07:46:40Z

@18398639574

hi, I use the data and the code you provide, everything seems okay.

Note: since you don't provide the word2vec file (because the word2vec file is large object for uploading), I use my own Chinese word2vec model file. And I set the num classes as 23, topK as 2.

here is the train log file.

OK thanks but why the indices are so low?

RandolphVI · 2020-04-05T05:47:06Z

@18398639574

The metrics values are low, it could be many reasons. Like:

The data labels are imbalanced. (Could be the major reason since I can only see the tiny data sample file)
The pad sequence length (and other parameters setting) you set is not very ideal.
You use the Chinese word, maybe the word2vec model file you trained is not okay. Maybe the word segmentation results need to be refined.

Repository owner deleted a comment from 18398639574 Apr 3, 2020

18398639574 closed this as completed Apr 3, 2020

18398639574 reopened this Apr 3, 2020

Repository owner deleted a comment from 18398639574 Apr 3, 2020

RandolphVI closed this as completed Jun 2, 2020

RandolphVI mentioned this issue Mar 26, 2021

请问我这边模型训练期间 P值跟F1值都不到0.1，是数据处理的原因么 #24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why are the indices all the same? #19

why are the indices all the same? #19

18398639574 commented Apr 3, 2020 •

edited by RandolphVI

Loading

RandolphVI commented Apr 3, 2020

18398639574 commented Apr 3, 2020

18398639574 commented Apr 3, 2020

RandolphVI commented Apr 3, 2020

18398639574 commented Apr 3, 2020

RandolphVI commented Apr 3, 2020 •

edited

Loading

18398639574 commented Apr 3, 2020

RandolphVI commented Apr 3, 2020 •

edited

Loading

18398639574 commented Apr 3, 2020 •

edited by RandolphVI

Loading

RandolphVI commented Apr 3, 2020

18398639574 commented Apr 3, 2020

RandolphVI commented Apr 3, 2020

18398639574 commented Apr 3, 2020 •

edited by RandolphVI

Loading

RandolphVI commented Apr 5, 2020

why are the indices all the same? #19

why are the indices all the same? #19

Comments

18398639574 commented Apr 3, 2020 • edited by RandolphVI Loading

RandolphVI commented Apr 3, 2020

18398639574 commented Apr 3, 2020

18398639574 commented Apr 3, 2020

RandolphVI commented Apr 3, 2020

18398639574 commented Apr 3, 2020

RandolphVI commented Apr 3, 2020 • edited Loading

18398639574 commented Apr 3, 2020

RandolphVI commented Apr 3, 2020 • edited Loading

18398639574 commented Apr 3, 2020 • edited by RandolphVI Loading

RandolphVI commented Apr 3, 2020

18398639574 commented Apr 3, 2020

RandolphVI commented Apr 3, 2020

18398639574 commented Apr 3, 2020 • edited by RandolphVI Loading

RandolphVI commented Apr 5, 2020

18398639574 commented Apr 3, 2020 •

edited by RandolphVI

Loading

RandolphVI commented Apr 3, 2020 •

edited

Loading

RandolphVI commented Apr 3, 2020 •

edited

Loading

18398639574 commented Apr 3, 2020 •

edited by RandolphVI

Loading

18398639574 commented Apr 3, 2020 •

edited by RandolphVI

Loading