NewUserPredict

Notice: I decide to divide the dataset into two parts. One contains udmap, another dose not contain udmap

1. udmap analyse

There are ten different types of udmap

[
   {"key9":"unknown"},
   {"key8":"unknown"},
   {"key7":"unknown"},
   
   {"key6": "unknown"},
   {"key1":"unknown","key2":"unknown","key6":"unknown"},
   {"key3":"unknown","key1":"unknown","key4":"unknown","key2":"unknown","key5":"unknown"},
   {"key4":"unknown","key5":"unknown"},
   {"key3": "unknown"},
   {"key3":"unknown","key2":"unknown"},
   {"key1":"unknown","key2":"unknown"}
]

2. Process

首先，你使用Transformer模型将9个key转换为5个嵌入表示的特征。这种嵌入表示可以捕捉特征之间的关系和重要性，有助于将高维的离散特征转换为低维连续向量，以便更好地参与后续的学习任务。
然后，你将这5个嵌入特征与数据集中剩余的10个特征（共15个特征）组合在一起。这样做的目的是将原始特征与嵌入特征结合起来，以丰富模型的输入信息，从而提高模型的性能。
接下来，你将这15个特征输入到一个全连接层中。全连接层可以学习特征之间的复杂关系，并生成更高级别的特征表示。
最后，使用二分类任务来输出两个值，并将其与标准答案进行对比。通过计算损失函数并进行反向传播，使模型逐步调整参数以最小化损失，从而提高模型在二分类任务上的性能。

3. Statistic eid

以下单独一组

合并，unknown使用特征工程补全

eid = 26: [17w] {key2, key3}
eid = 40 [4k] {key2, key3}
eid = 3 [2k] {key2, key3}
eid = 38 [240] {key2, key3}
eid = 25 [5k] {key2, key3}, {unknown}
eid = 12 [6k] {key2, key3}, {unknown}
eid = 7 [7] {key2, key3}, {unknown}

以下一组，最终只保留key3

合并，多余的key2删除

eid = 0 [5k] {key3}
eid = 27 [5k] {key3}
eid = 34: [5w] {key3 all, key2 partial}

以下单独一组，没有的补充

单独一组

eid = 2: [5w] {key4, key5}

补全key4和key5

eid = 5: [3w] {key3, key2}, {key1, key2, key3, key4, key5}

以下分成一类，都当成unknown

合并，多余的key6删除

eid = 41: [2w] {key1, key2}
eid = 36 [900] {key1, key2}
eid = 31 [100] {key1, key2}
eid = 30 [1k5] {key1, key2, key6}

单独处理

eid = 4 [800] {key9}
eid = 1 [736] {key9}
eid = 19 [700] {key8}
eid = 13 [700] {key7}
eid = 15 [3k] {key6}
eid = 20 [1k5] {unknown}
eid = 10 [1k] {unknown}
eid = 9 [1k5] {unknown}
eid = 29 [2k] {unknown}
eid = 37 [3k5] {unknown}
eid = 32: [6k] {unknown}
eid = 21: [3w] {unknown}
eid = 39: [2w] {unknown}
eid = 35: [8w] {unknown}
eid = 11: [5w] {unknown}
eid = 8: [5w] {unknown}
eid = 33 [800] {unknown}
eid = 42 [528] {unknown}
eid = 28 [300] {unknown}
eid = 14 [210] {unknown}
eid = 16 [6] {unknown}
eid = 23 [3] {unknown}
eid = 6 [2] {unknown}
eid = 22, 18, 17, 24 [1] {unknown}

Input setting

torch.Size([188637, 16]) torch.Size([62766, 15])

torch.Size([60065, 15]) torch.Size([19884, 14])

torch.Size([84198, 19]) torch.Size([28089, 18])

torch.Size([287456, 14]) torch.Size([96046, 13])

Record result

0th (key2_key3) class: good

  BinaryClassifier(
    (fc1): Linear(in_features=14, out_features=1024, bias=True)
    (fc2): Linear(in_features=1024, out_features=128, bias=True)
    (fc3): Linear(in_features=128, out_features=1, bias=True)
    (relu): ReLU()
  )
  Loaded data: E:\project\NewUserPredict\data_generate/train_2_3.pt

Precision: 0.897599851728292
Recall: 0.8413829047949966
F_score: 0.8685827018786711
Accuracy: 0.9814825344477929
input_size: 14
hidden_size1: 1024
hidden_size2: 128
lr: 0.01
division_rate: 0.8
num_epochs: 1200
batch_size: 512
threshold: 0.5
1th (key3) class: bad
2th(key4_key5) clss: good
- precision: 0.9848484848484849
- recall: 1.0
- f_score: 0.9923664122137404
3th(unknown) class:

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.cache/41		.cache/41
.idea		.idea
automl_study		automl_study
data_generate		data_generate
dataset		dataset
dataset_analyse		dataset_analyse
main		main
model		model
notebooks		notebooks
photo		photo
result		result
tools		tools
train		train
utils_webhook		utils_webhook
verify		verify
.gitignore		.gitignore
README.md		README.md
predict.py		predict.py
requirement.txt		requirement.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NewUserPredict

Notice: I decide to divide the dataset into two parts. One contains udmap, another dose not contain udmap

1. udmap analyse

There are ten different types of udmap

2. Process

3. Statistic eid

以下单独一组

以下一组，最终只保留key3

以下单独一组，没有的补充

以下分成一类，都当成unknown

Input setting

Record result

1th (key3) class: bad

3th(unknown) class:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NewUserPredict

Notice: I decide to divide the dataset into two parts. One contains udmap, another dose not contain udmap

1. udmap analyse

There are ten different types of udmap

2. Process

3. Statistic eid

以下单独一组

以下一组，最终只保留key3

以下单独一组，没有的补充

以下分成一类，都当成unknown

Input setting

Record result

1th (key3) class: bad

3th(unknown) class:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages