Skip to content

LosFurina/NewUserPredict

Repository files navigation

NewUserPredict

Notice: I decide to divide the dataset into two parts. One contains udmap, another dose not contain udmap

1. udmap analyse

There are ten different types of udmap

[
   {"key9":"unknown"},
   {"key8":"unknown"},
   {"key7":"unknown"},
   
   {"key6": "unknown"},
   {"key1":"unknown","key2":"unknown","key6":"unknown"},
   {"key3":"unknown","key1":"unknown","key4":"unknown","key2":"unknown","key5":"unknown"},
   {"key4":"unknown","key5":"unknown"},
   {"key3": "unknown"},
   {"key3":"unknown","key2":"unknown"},
   {"key1":"unknown","key2":"unknown"}
]

2. Process

  1. 首先,你使用Transformer模型将9个key转换为5个嵌入表示的特征。这种嵌入表示可以捕捉特征之间的关系和重要性,有助于将高维的离散特征转换为低维连续向量,以便更好地参与后续的学习任务。

  2. 然后,你将这5个嵌入特征与数据集中剩余的10个特征(共15个特征)组合在一起。这样做的目的是将原始特征与嵌入特征结合起来,以丰富模型的输入信息,从而提高模型的性能。

  3. 接下来,你将这15个特征输入到一个全连接层中。全连接层可以学习特征之间的复杂关系,并生成更高级别的特征表示。

  4. 最后,使用二分类任务来输出两个值,并将其与标准答案进行对比。通过计算损失函数并进行反向传播,使模型逐步调整参数以最小化损失,从而提高模型在二分类任务上的性能。

3. Statistic eid

以下单独一组

  • 合并,unknown使用特征工程补全
  1. eid = 26: [17w] {key2, key3}
  2. eid = 40 [4k] {key2, key3}
  3. eid = 3 [2k] {key2, key3}
  4. eid = 38 [240] {key2, key3}
  5. eid = 25 [5k] {key2, key3}, {unknown}
  6. eid = 12 [6k] {key2, key3}, {unknown}
  7. eid = 7 [7] {key2, key3}, {unknown}

以下一组,最终只保留key3

  • 合并,多余的key2删除
  1. eid = 0 [5k] {key3}
  2. eid = 27 [5k] {key3}
  3. eid = 34: [5w] {key3 all, key2 partial}

以下单独一组,没有的补充

  • 单独一组
  1. eid = 2: [5w] {key4, key5}
  • 补全key4和key5
  1. eid = 5: [3w] {key3, key2}, {key1, key2, key3, key4, key5}

以下分成一类,都当成unknown

  • 合并,多余的key6删除
  1. eid = 41: [2w] {key1, key2}
  2. eid = 36 [900] {key1, key2}
  3. eid = 31 [100] {key1, key2}
  4. eid = 30 [1k5] {key1, key2, key6}
  • 单独处理
  1. eid = 4 [800] {key9}
  2. eid = 1 [736] {key9}
  3. eid = 19 [700] {key8}
  4. eid = 13 [700] {key7}
  5. eid = 15 [3k] {key6}
  6. eid = 20 [1k5] {unknown}
  7. eid = 10 [1k] {unknown}
  8. eid = 9 [1k5] {unknown}
  9. eid = 29 [2k] {unknown}
  10. eid = 37 [3k5] {unknown}
  11. eid = 32: [6k] {unknown}
  12. eid = 21: [3w] {unknown}
  13. eid = 39: [2w] {unknown}
  14. eid = 35: [8w] {unknown}
  15. eid = 11: [5w] {unknown}
  16. eid = 8: [5w] {unknown}
  17. eid = 33 [800] {unknown}
  18. eid = 42 [528] {unknown}
  19. eid = 28 [300] {unknown}
  20. eid = 14 [210] {unknown}
  21. eid = 16 [6] {unknown}
  22. eid = 23 [3] {unknown}
  23. eid = 6 [2] {unknown}
  24. eid = 22, 18, 17, 24 [1] {unknown}

Input setting

torch.Size([188637, 16]) torch.Size([62766, 15])

torch.Size([60065, 15]) torch.Size([19884, 14])

torch.Size([84198, 19]) torch.Size([28089, 18])

torch.Size([287456, 14]) torch.Size([96046, 13])

Record result

  • 0th (key2_key3) class: good
  BinaryClassifier(
    (fc1): Linear(in_features=14, out_features=1024, bias=True)
    (fc2): Linear(in_features=1024, out_features=128, bias=True)
    (fc3): Linear(in_features=128, out_features=1, bias=True)
    (relu): ReLU()
  )
  Loaded data: E:\project\NewUserPredict\data_generate/train_2_3.pt
  • Precision: 0.897599851728292

  • Recall: 0.8413829047949966

  • F_score: 0.8685827018786711

  • Accuracy: 0.9814825344477929

  • input_size: 14

  • hidden_size1: 1024

  • hidden_size2: 128

  • lr: 0.01

  • division_rate: 0.8

  • num_epochs: 1200

  • batch_size: 512

  • threshold: 0.5

  • 1th (key3) class: bad

  • 2th(key4_key5) clss: good

    • precision: 0.9848484848484849
    • recall: 1.0
    • f_score: 0.9923664122137404
  • 3th(unknown) class:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors