## 深度学习特征

提取2D图像深度学习特征。


## 获取待提取特征的文件

提供两种批量处理的模式：
1. 目录模式，提取指定目录下的所有jpg文件的特征。
2. 文件模式，待提取的数据存储在文件中，每行一个样本。

当然也可以在最后自己指定手动提取指定若干文件。

In [1]:
import os
# 目录模式
mydir = r'E:\function\pm_data\skin4clf_out\images'
# mydir = r'C:\Users\onekey\Project\OnekeyDS\CT\full'
directory = os.path.expanduser(mydir)
test_samples = [os.path.join(directory, p) for p in os.listdir(directory) if p.endswith('.png') or p.endswith('.jpg')]

# 文件模式
# test_file = ''
# with open(test_file) as f:
#     test_samples = [l.strip() for l in f.readlines()]

# 自定义模式
# test_sampleses = ['path2jpg']
test_samples

['E:\\function\\pm_data\\skin4clf_out\\images\\ISIC_0000021_downsampled.jpg',
 'E:\\function\\pm_data\\skin4clf_out\\images\\ISIC_0000012.jpg',
 'E:\\function\\pm_data\\skin4clf_out\\images\\ISIC_0000002.jpg',
 'E:\\function\\pm_data\\skin4clf_out\\images\\ISIC_0000018_downsampled.jpg',
 'E:\\function\\pm_data\\skin4clf_out\\images\\ISIC_0000015.jpg',
 'E:\\function\\pm_data\\skin4clf_out\\images\\ISIC_0000017_downsampled.jpg',
 'E:\\function\\pm_data\\skin4clf_out\\images\\ISIC_0000022_downsampled.jpg',
 'E:\\function\\pm_data\\skin4clf_out\\images\\ISIC_0000008.jpg',
 'E:\\function\\pm_data\\skin4clf_out\\images\\ISIC_0000013.jpg',
 'E:\\function\\pm_data\\skin4clf_out\\images\\ISIC_0000020_downsampled.jpg',
 'E:\\function\\pm_data\\skin4clf_out\\images\\ISIC_0000019_downsampled.jpg',
 'E:\\function\\pm_data\\skin4clf_out\\images\\ISIC_0000016.jpg',
 'E:\\function\\pm_data\\skin4clf_out\\images\\ISIC_0000007.jpg',
 'E:\\function\\pm_data\\skin4clf_out\\images\\ISIC_0000006.jpg',
 'E:

## 确定提取特征

通过关键词获取要提取那一层的特征。

### 支持的模型名称

模型名称替换代码中的 `model_name`变量的值。

| **模型系列** | **模型名称**                                                 |
| ------------ | ------------------------------------------------------------ |
| AlexNet      | alexnet                                                      |
| VGG          | vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19_bn, vgg19 |
| ResNet       | resnet18, resnet34, resnet50, resnet101, resnet152, resnext50_32x4d, resnext101_32x8d, wide_resnet50_2, wide_resnet101_2 |
| DenseNet     | densenet121, densenet169, densenet201, densenet161           |
| Inception    | googlenet, inception_v3                                      |
| SqueezeNet   | squeezenet1_0, squeezenet1_1                                 |
| ShuffleNetV2 | shufflenet_v2_x2_0, shufflenet_v2_x0_5, shufflenet_v2_x1_0, shufflenet_v2_x1_5 |
| MobileNet    | mobilenet_v2, mobilenet_v3_large, mobilenet_v3_small         |
| MNASNet      | mnasnet0_5, mnasnet0_75, mnasnet1_0, mnasnet1_3              |

In [2]:
from pixelmed_calc.custom.components.comp2 import extract, print_feature_hook, reg_hook_on_module, \
    init_from_model, init_from_onekey

model_name = 'resnet50'
#model, transformer, device = init_from_model(model_name=model_name)
model, transformer, device = init_from_onekey(r'E:\function\note2-深度学习分类\ViT\viz')
for n, m in model.named_modules():
    print('Feature name:', n, "|| Module:", m)

Feature name:  || Module: ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kerne

In [10]:
from functools import partial
feature_name = 'avgpool'
with open('feature.csv', 'w') as outfile:
    hook = partial(print_feature_hook, fp=outfile)
    find_num = reg_hook_on_module(feature_name, model, hook)
    results = extract(test_samples, model, transformer, device, fp=outfile)

## 读取数据

In [11]:
import pandas as pd
features = pd.read_csv('feature.csv', header=None)
features.columns=['ID'] + list(features.columns[1:])
features.head()

Unnamed: 0,ID,1,2,3,4,5,6,7,8,9,...,2039,2040,2041,2042,2043,2044,2045,2046,2047,2048
0,ISIC_0000001.jpg,54747.355,166057.344,5436.148,0.0,119955.273,2101.748,0.0,140884.547,53276.652,...,103776.562,0.0,0.0,100722.789,9796.09,68264.102,0.0,0.0,28731.861,3060.981
1,ISIC_0000003.jpg,79645.805,245659.266,8167.623,0.0,179749.781,2565.139,0.0,210881.672,79810.539,...,154177.469,0.0,0.0,151622.125,14972.128,103304.211,0.0,0.0,42523.684,2895.241
2,ISIC_0000004.jpg,119449.711,336417.156,13710.838,0.0,232565.672,2502.433,0.0,320563.438,99225.461,...,215937.109,0.0,0.0,217406.891,21768.592,135883.484,0.0,0.0,74625.125,15678.768
3,ISIC_0000006.jpg,61479.816,177317.594,4999.097,0.0,126871.156,1256.23,0.0,150319.625,54623.867,...,109484.406,0.0,0.0,106950.609,10188.238,71460.656,0.0,0.0,33407.156,4361.681
4,ISIC_0000007.jpg,68658.477,199238.203,5319.55,0.0,142386.188,1465.874,0.0,166115.312,61373.887,...,120519.062,0.0,0.0,119316.969,11074.032,80214.641,0.0,0.0,35550.031,4248.044


### 深度特征压缩

深度学习特征压缩，注意压缩到的维度需要小于样本数

```python
def compress_df_feature(features: pd.DataFrame, dim: int, not_compress: Union[str, List[str]] = None,
                        prefix='') -> pd.DataFrame:
    """
    压缩深度学习特征
    Args:
        features: 特征DataFrame
        dim: 需要压缩到的维度，此值需要小于样本数
        not_compress: 不进行压缩的列。
        prefix: 所有特征的前缀。

    Returns:

    """
```

In [None]:
from pixelmed_calc.custom.components.comp1 import compress_df_feature

cm_features = compress_df_feature(features=features, dim=32, prefix='DL_', not_compress='ID')
cm_features.to_csv('compress_features.csv', header=True, index=False)