# Etude sur MedNIST : Classification d'images issues du milieu médical

Développer une Intelligence Artificielle capable de reconnaitre la catégorie d'une image médicale.

Contexte du projet

La MedNIST est une base de données de 58 954 images (N&B, 64x64 pixels) issues du milieu médical. Les images sont labellisées en six catégories : AbdomenCT, BreastMRI, CXR, ChestCT, Hand et HeadCT. Le but est de prédire en fonction de l'image sa catégorie.

47 163 images vous sont fournies ; l'objectif est d'entrainer le modèle de Machine Learning ou de Deep Learning le plus performant possible : toutes les stratégies sont autorisées, seule la performance prime !

Les 11 791 images manquantes constituent le jeu de test : elles ne vous seront pas fournies et serviront à évaluer les performances de votre modèle par le formateur.


MedNISTは、医療分野の画像（白黒、64x64ピクセル）58,954枚を収録したデータベースです。画像は6つのカテゴリーに分類されています。AbdomenCT、BreastMRI、CXR、ChestCT、Hand、HeadCT。目的は、画像のカテゴリを予測することである。

47,163枚の画像が用意されています。目的は、可能な限り最高のパフォーマンスを発揮する機械学習または深層学習モデルを学習することです。あらゆる戦略が可能で、パフォーマンスのみが重要です

11,791枚の画像はテストセットで、あなたには提供されず、トレーナーによってあなたのモデルの性能を評価するために使われます。


#cp "/content/drive/MyDrive/Colab Notebooks/2022_02_09_MedNIST/MedNIST_Training_Dataset.zip" "."

#!unzip "MedNIST_Training_Dataset.zip" -d "/content/drive/MyDrive/Colab Notebooks/2022_02_09_MedNIST/"

#from PIL import Image
#Image.open('MedNIST Training Dataset/CXR/006568.jpeg')

In [1]:
import pandas as pd
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import load_model
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions
import tensorflow
from sklearn.linear_model import LogisticRegression
import joblib
from sklearn.metrics import accuracy_score
import sklearn

In [2]:
pd.set_option("max_columns", None)
pd.set_option("max_colwidth", None)
pd.set_option("max_row", 500)

### 1 - Preprocessing

In [3]:
dataset_dir = 'Data'
dataset_ab = dataset_dir + '/AbdomenCT'
dataset_br = dataset_dir + '/BreastMRI'
dataset_ct = dataset_dir + '/ChestCT'
dataset_c = dataset_dir + '/CXR'
dataset_hd = dataset_dir + '/Hand'
dataset_h = dataset_dir + '/HeadCT'

In [4]:
import os.path
path_ab = dataset_ab
num_files_ab = len([f for f in os.listdir(path_ab)
                if os.path.isfile(os.path.join(path_ab, f))])
path_br = dataset_br
num_files_br = len([f for f in os.listdir(path_br)
                if os.path.isfile(os.path.join(path_br, f))])
path_ct = dataset_ct
num_files_ct = len([f for f in os.listdir(path_ct)
                if os.path.isfile(os.path.join(path_ct, f))])
path_c = dataset_c
num_files_c = len([f for f in os.listdir(path_c)
                if os.path.isfile(os.path.join(path_c, f))])
path_hd = dataset_hd
num_files_hd = len([f for f in os.listdir(path_hd)
                if os.path.isfile(os.path.join(path_hd, f))])
path_h = dataset_h
num_files_h = len([f for f in os.listdir(path_h)
                if os.path.isfile(os.path.join(path_h, f))])

In [5]:
batch_size = num_files_ab + num_files_br + num_files_ct + num_files_c + num_files_hd + num_files_h

In [6]:
train_datagen = ImageDataGenerator()

In [7]:
train_generator = train_datagen.flow_from_directory(dataset_dir,
                                            target_size = (64, 64),                                             
                                            batch_size = batch_size,
                                            class_mode="categorical"                                             
                                             )

Found 47169 images belonging to 6 classes.


In [8]:
X_train = train_generator[0][0]
y_train = train_generator[0][1]

In [9]:
X_train.shape, y_train.shape

((47169, 64, 64, 3), (47169, 6))

In [10]:
IMG_SHAPE = (64, 64, 3)
vgg = VGG16(input_shape = IMG_SHAPE, include_top = False,
            weights="imagenet")
vgg.summary()

Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 64, 64, 3)]       0         
                                                                 
 block1_conv1 (Conv2D)       (None, 64, 64, 64)        1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 64, 64, 64)        36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 32, 32, 64)        0         
                                                                 
 block2_conv1 (Conv2D)       (None, 32, 32, 128)       73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 32, 32, 128)       147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 16, 16, 128)       0     

In [11]:
feature_train = vgg.predict(train_generator[0][0]).reshape(batch_size, -1)

In [12]:
feature_train.shape

(47169, 2048)

### 2 - save le jeu de train in numpy

In [None]:
#np.save('feature_train_all', feature_train_all)
#np.save('y_train_all', y_train_all)

In [None]:
#np.save('X_train_all', X_train_all)

### 3 - charger le jeu de train

In [None]:
#feature_train = np.load('feature_train_all.npy')
#X_train = np.load('X_train_all.npy')
#y_train = np.load('y_train_all.npy')

In [13]:
y_train_list = y_train.argmax(axis=1)

### 5 - load model

In [14]:
best_model_lr = joblib.load('logreg_model_final.jb')

### 6 - model predict (X_train)

In [15]:
# prediction
y_train_pred = best_model_lr.predict(feature_train)

### 7 - confusion matrix

In [16]:
cm_train = confusion_matrix(y_train_list, y_train_pred)
cm_train

array([[8000,    0,    0,    0,    0,    0],
       [   0, 7163,    0,    0,    0,    0],
       [   0,    0, 8000,    0,    0,    0],
       [   0,    0,    0, 8000,    0,    0],
       [   0,    0,    0,    0, 8000,    0],
       [   0,    0,    0,    0,    0, 8006]], dtype=int64)

### 8 - accuracy score

In [17]:
sklearn.metrics.accuracy_score(y_train_list, y_train_pred)

1.0

In [18]:
print(classification_report(y_train_list, y_train_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      8000
           1       1.00      1.00      1.00      7163
           2       1.00      1.00      1.00      8000
           3       1.00      1.00      1.00      8000
           4       1.00      1.00      1.00      8000
           5       1.00      1.00      1.00      8006

    accuracy                           1.00     47169
   macro avg       1.00      1.00      1.00     47169
weighted avg       1.00      1.00      1.00     47169



https://teratail.com/questions/221722

https://qiita.com/gp333/items/f88216733c5336d5b423

https://note.com/abechanta/n/n0f198a483102

https://recruit.gmo.jp/engineer/jisedai/blog/transfer-learning/

https://www.tsl.co.jp/ai-seminar-contents-04/

https://qiita.com/ps010/items/dee9413d3de28de7d2f9

ロジスティック回帰
https://aidemy.net/magazine/652/