Notice: I decide to divide the dataset into two parts. One contains udmap, another dose not contain udmap
[
{"key9":"unknown"},
{"key8":"unknown"},
{"key7":"unknown"},
{"key6": "unknown"},
{"key1":"unknown","key2":"unknown","key6":"unknown"},
{"key3":"unknown","key1":"unknown","key4":"unknown","key2":"unknown","key5":"unknown"},
{"key4":"unknown","key5":"unknown"},
{"key3": "unknown"},
{"key3":"unknown","key2":"unknown"},
{"key1":"unknown","key2":"unknown"}
]-
首先,你使用Transformer模型将9个key转换为5个嵌入表示的特征。这种嵌入表示可以捕捉特征之间的关系和重要性,有助于将高维的离散特征转换为低维连续向量,以便更好地参与后续的学习任务。
-
然后,你将这5个嵌入特征与数据集中剩余的10个特征(共15个特征)组合在一起。这样做的目的是将原始特征与嵌入特征结合起来,以丰富模型的输入信息,从而提高模型的性能。
-
接下来,你将这15个特征输入到一个全连接层中。全连接层可以学习特征之间的复杂关系,并生成更高级别的特征表示。
-
最后,使用二分类任务来输出两个值,并将其与标准答案进行对比。通过计算损失函数并进行反向传播,使模型逐步调整参数以最小化损失,从而提高模型在二分类任务上的性能。
- 合并,unknown使用特征工程补全
- eid = 26: [17w] {key2, key3}
- eid = 40 [4k] {key2, key3}
- eid = 3 [2k] {key2, key3}
- eid = 38 [240] {key2, key3}
- eid = 25 [5k] {key2, key3}, {unknown}
- eid = 12 [6k] {key2, key3}, {unknown}
- eid = 7 [7] {key2, key3}, {unknown}
- 合并,多余的key2删除
- eid = 0 [5k] {key3}
- eid = 27 [5k] {key3}
- eid = 34: [5w] {key3 all, key2 partial}
- 单独一组
- eid = 2: [5w] {key4, key5}
- 补全key4和key5
- eid = 5: [3w] {key3, key2}, {key1, key2, key3, key4, key5}
- 合并,多余的key6删除
- eid = 41: [2w] {key1, key2}
- eid = 36 [900] {key1, key2}
- eid = 31 [100] {key1, key2}
- eid = 30 [1k5] {key1, key2, key6}
- 单独处理
- eid = 4 [800] {key9}
- eid = 1 [736] {key9}
- eid = 19 [700] {key8}
- eid = 13 [700] {key7}
- eid = 15 [3k] {key6}
- eid = 20 [1k5] {unknown}
- eid = 10 [1k] {unknown}
- eid = 9 [1k5] {unknown}
- eid = 29 [2k] {unknown}
- eid = 37 [3k5] {unknown}
- eid = 32: [6k] {unknown}
- eid = 21: [3w] {unknown}
- eid = 39: [2w] {unknown}
- eid = 35: [8w] {unknown}
- eid = 11: [5w] {unknown}
- eid = 8: [5w] {unknown}
- eid = 33 [800] {unknown}
- eid = 42 [528] {unknown}
- eid = 28 [300] {unknown}
- eid = 14 [210] {unknown}
- eid = 16 [6] {unknown}
- eid = 23 [3] {unknown}
- eid = 6 [2] {unknown}
- eid = 22, 18, 17, 24 [1] {unknown}
torch.Size([188637, 16]) torch.Size([62766, 15])
torch.Size([60065, 15]) torch.Size([19884, 14])
torch.Size([84198, 19]) torch.Size([28089, 18])
torch.Size([287456, 14]) torch.Size([96046, 13])
- 0th (key2_key3) class: good
BinaryClassifier(
(fc1): Linear(in_features=14, out_features=1024, bias=True)
(fc2): Linear(in_features=1024, out_features=128, bias=True)
(fc3): Linear(in_features=128, out_features=1, bias=True)
(relu): ReLU()
)
Loaded data: E:\project\NewUserPredict\data_generate/train_2_3.pt
-
Precision: 0.897599851728292
-
Recall: 0.8413829047949966
-
F_score: 0.8685827018786711
-
Accuracy: 0.9814825344477929
-
input_size: 14
-
hidden_size1: 1024
-
hidden_size2: 128
-
lr: 0.01
-
division_rate: 0.8
-
num_epochs: 1200
-
batch_size: 512
-
threshold: 0.5
-
2th(key4_key5) clss: good
- precision: 0.9848484848484849
- recall: 1.0
- f_score: 0.9923664122137404