## 深度学习特征

提取CT、MRI、内镜、Xray等影像数据的深度学习特征。

### Onekey步骤

1. 将待提取的数据转化成jpg，可以参考使用OKT-convert2jpg或者OKT-crop_max_roi两个Onekey工具。
2. 获取到指定目录的所有图像数据。
3. 选择要提取什么样的模型的深度学习特征，目前Onekey支持主流的深度学习模型。（可以考虑使用Onekey进行迁移学习）
4. 提取特征，保存特征文件。

In [1]:
import os
from onekey_algo import get_param_in_cwd
# 目录模式
os.makedirs('features/', exist_ok=True)
mydir = get_param_in_cwd('data_pattern')
directory = os.path.expanduser(mydir)
test_samples = [os.path.join(directory, p) for p in os.listdir(directory) if p.endswith('.png') or p.endswith('.jpg')]
test_samples

['F:\\\\wlx_OK_PDAC\\\\OS_prediction-V3\\\\data\\\\crop2d\\bianjiePVP.nii.png',
 'F:\\\\wlx_OK_PDAC\\\\OS_prediction-V3\\\\data\\\\crop2d\\bijinhuanPVP.nii.png',
 'F:\\\\wlx_OK_PDAC\\\\OS_prediction-V3\\\\data\\\\crop2d\\caibinPVP.nii.png',
 'F:\\\\wlx_OK_PDAC\\\\OS_prediction-V3\\\\data\\\\crop2d\\caichengshunPVP.nii.png',
 'F:\\\\wlx_OK_PDAC\\\\OS_prediction-V3\\\\data\\\\crop2d\\caigengyuanPVP.nii.png',
 'F:\\\\wlx_OK_PDAC\\\\OS_prediction-V3\\\\data\\\\crop2d\\caihaimingPVP.nii.png',
 'F:\\\\wlx_OK_PDAC\\\\OS_prediction-V3\\\\data\\\\crop2d\\caijinlinPVP.nii.png',
 'F:\\\\wlx_OK_PDAC\\\\OS_prediction-V3\\\\data\\\\crop2d\\caiqingxiangPVP.nii.png',
 'F:\\\\wlx_OK_PDAC\\\\OS_prediction-V3\\\\data\\\\crop2d\\caiyonglinPVP.nii.png',
 'F:\\\\wlx_OK_PDAC\\\\OS_prediction-V3\\\\data\\\\crop2d\\caiyoushunPVP.nii.png',
 'F:\\\\wlx_OK_PDAC\\\\OS_prediction-V3\\\\data\\\\crop2d\\caiyunboPVP.nii.png',
 'F:\\\\wlx_OK_PDAC\\\\OS_prediction-V3\\\\data\\\\crop2d\\caizhengqingPVP.nii.png',
 'F:\\\\

## 确定提取特征

通过关键词获取要提取那一层的特征。使用init_from_onekey，精确到viz目录

In [2]:
from onekey_algo.custom.components.comp2 import extract, print_feature_hook, reg_hook_on_module, \
    init_from_model, init_from_onekey

sel_model = get_param_in_cwd('dl_sel_model')
model_root = os.path.join(get_param_in_cwd('model_root'), sel_model) 
model, transformer, device = init_from_onekey(os.path.join(model_root, 'viz'))
for n, m in model.named_modules():
    print('Feature name:', n, "|| Module:", m)

[2025-11-10 14:12:00 - <frozen onekey_algo.custom.components.comp2>: 234]	INFO	模型参数：{'pretrained': False, 'model_name': 'resnet152', 'num_classes': 1, 'in_channels': 3}


Feature name:  || Module: ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kerne

## 提取特征

`Feature name:` 之后的名称为要提取的特征名，例如`layer3.0.conv2`, 一般深度学习特征提取最后一层，例如`avgpool`

In [3]:
from functools import partial
from onekey_algo.custom.components.comp2 import feature_layer_mapping

if sel_model in feature_layer_mapping:
    feature_name = feature_layer_mapping[sel_model]
else:
    # 需要你自己设置提取那一层的特征！
    feature_name = 'avgpool'
with open('features/features.csv', 'w') as outfile:
    hook = partial(print_feature_hook, fp=outfile)
    find_num = reg_hook_on_module(feature_name, model, hook)
    results = extract(test_samples, model, transformer, device, fp=outfile)

## 读取数据

In [4]:
import pandas as pd
features = pd.read_csv('features/features.csv', header=None)
features.columns=['ID'] + [f"DL_{c}" for c in list(features.columns[1:])]
features.to_csv('features/features.csv', index=False)
features.head()

Unnamed: 0,ID,DL_1,DL_2,DL_3,DL_4,DL_5,DL_6,DL_7,DL_8,DL_9,...,DL_2039,DL_2040,DL_2041,DL_2042,DL_2043,DL_2044,DL_2045,DL_2046,DL_2047,DL_2048
0,bianjiePVP.nii.png,0.812,0.036,1.552,0.188,0.784,0.38,0.56,0.231,0.193,...,0.407,0.207,0.359,0.048,0.065,0.614,0.158,0.426,0.251,0.345
1,bijinhuanPVP.nii.png,0.521,0.25,1.801,0.461,1.756,0.397,0.428,0.689,0.496,...,0.519,0.598,0.086,0.088,0.11,0.47,0.102,0.331,0.317,0.238
2,caibinPVP.nii.png,0.741,0.389,0.145,0.179,0.242,0.457,0.354,0.087,0.268,...,0.384,0.38,0.262,0.33,0.276,0.866,0.864,0.244,0.053,0.104
3,caichengshunPVP.nii.png,1.331,0.346,0.938,0.233,0.074,0.225,0.847,0.265,1.031,...,0.919,0.967,0.327,0.473,0.176,0.494,0.289,0.23,0.629,0.039
4,caigengyuanPVP.nii.png,0.944,0.355,2.958,0.659,0.224,0.684,0.363,0.674,0.791,...,0.505,0.771,0.662,0.769,0.223,0.578,0.556,0.318,0.249,0.218


### 深度特征压缩

深度学习特征压缩，注意压缩到的维度需要小于样本数

```python
def compress_df_feature(features: pd.DataFrame, dim: int, not_compress: Union[str, List[str]] = None,
                        prefix='') -> pd.DataFrame:
    """
    压缩深度学习特征
    Args:
        features: 特征DataFrame
        dim: 需要压缩到的维度，此值需要小于样本数
        not_compress: 不进行压缩的列。
        prefix: 所有特征的前缀。

    Returns:

    """
```

In [5]:
from onekey_algo.custom.components.comp1 import compress_df_feature

cm_features = compress_df_feature(features=features, dim=256, prefix='2DL_', not_compress='ID')
cm_features.to_csv('features/compress_features_2D.csv', header=True, index=False)