In [3]:
!git clone https://github.com/kuku-sichuan/ComparisonDetector

Cloning into 'ComparisonDetector'...



In [2]:
import os
import pandas as pd
from sklearn.model_selection import train_test_split

# Đường dẫn đến thư mục chứa các folder
base_dir = 'ComparisonDetector/images'  # Thay bằng đường dẫn đến thư mục của bạn

# Danh sách các class (folder names)
classes = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11']

# Khởi tạo danh sách để lưu dữ liệu
data = []

# Duyệt qua từng folder (class)
for class_folder in os.listdir(base_dir):
    class_path = os.path.join(base_dir, class_folder)
    
    # Kiểm tra xem class_folder có phải là một folder và là số nguyên không
    if os.path.isdir(class_path) and class_folder.isdigit():
        # Tìm chỉ số của class hiện tại trong danh sách classes
        class_index = int(class_folder) - 1  # Chuyển từ folder number thành index của class
        class_name = classes[class_index]
        
        # Duyệt qua từng file trong folder hiện tại
        for filename in os.listdir(class_path):
            if filename.endswith(('.jpg', '.jpeg', '.png')):  # Kiểm tra đuôi file ảnh
                # Tạo đường dẫn theo cấu trúc "WSI/class_folder/filename"
                row = [f'WSI/{class_folder}/{filename}'] + [0] * len(classes)  # Thêm class vào đường dẫn
                # Đặt giá trị 1 cho cột tương ứng với class của file
                row[class_index + 1] = 1
                data.append(row)

# Tạo DataFrame với các cột
columns = ['image_name'] + classes
df = pd.DataFrame(data, columns=columns)

# Chia dữ liệu thành tập Train (70%), Validation (15%), và Test (15%)
train_df, test_val_df = train_test_split(df, test_size=0.3, random_state=42)
val_df, test_df = train_test_split(test_val_df, test_size=0.5, random_state=42)

# Lưu các DataFrame vào file CSV
train_df.to_csv('Train.csv', index=False)
val_df.to_csv('Val.csv', index=False)
test_df.to_csv('Test.csv', index=False)

print("Đã tạo các file Train.csv, Val.csv, và Test.csv")

Đã tạo các file Train.csv, Val.csv, và Test.csv


## Name: Comparison Detector

In [4]:
import pandas as pd
train = pd.read_csv('Train.csv')
test = pd.read_csv('Test.csv')
val = pd.read_csv('Val.csv')
print(train.shape)
print(test.shape)
print(val.shape)

(1094, 12)
(235, 12)
(234, 12)


In [5]:
train.head()

Unnamed: 0,image,1,2,3,4,5,6,7,8,9,10,11
0,1357.jpg,0,0,0,0,1,0,0,0,0,0,0
1,326.jpg,0,0,0,0,0,0,1,0,0,0,0
2,1327.jpg,0,1,0,0,0,0,0,0,0,0,0
3,1308.jpg,0,1,0,0,0,0,0,0,0,0,0
4,1067.jpg,0,0,0,0,0,0,0,0,1,0,0


In [8]:
# Danh sách các class (folder names)
classes = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11']

# Hàm đếm số lượng mẫu của từng class
def count_samples_by_class(df, classes):
    counts = {}
    for class_name in classes:
        counts[class_name] = df[class_name].sum()
    return counts

# Tạo DataFrame từ số lượng mẫu của từng class trong mỗi tập
train_counts = count_samples_by_class(train, classes)
val_counts = count_samples_by_class(val, classes)
test_counts = count_samples_by_class(test, classes)

# Tạo DataFrame để hiển thị kết quả dưới dạng bảng
counts_df = pd.DataFrame({
    'Train': train_counts,
    'Validation': val_counts,
    'Test': test_counts
})

print("Số lượng mẫu của từng class trong các tập dữ liệu:")
print(counts_df)

Số lượng mẫu của từng class trong các tập dữ liệu:
    Train  Validation  Test
1      94          26    21
2      97          19    23
3      96          25    22
4     101          27    21
5     111          15    21
6     104          24    20
7      95          18    17
8     109          19    16
9      92          23    32
10     97          21    24
11     98          17    18


- 1: hsil
- 2: ascus
- 3: agc
- 4: asch
- 5: trich
- 6: lsil
- 7: cand
- 8: actin
- 9: flora
- 10: scc
- 11: herps

![Class 1](ComparisonDetector/images/README/categories.png)

# NOTICE!!!!
you can get code and dataset in new [address](https://github.com/CVIU-CSU/ComparisonDetector)
# NEW VERSION WILL UPDATE!

## Comparison-Based Convolutional Neural Networks for Cervical Cell/Clumps Detection in the Limited Data Scenario

### abstract
Automated detection of cervical cancer cells or cell clumps has the potential to significantly
 reduce error rate and increase productivity in cervical cancer screening. However, most traditional 
 methods rely on the success of accurate cell segmentation and discriminative hand-crafted features 
 extraction. Recently there are emerging deep learning-based methods which train convolutional neural 
 networks to classify image patches, but they are computationally expensive. In this paper we 
 propose to an end-to-end object detection methods for cervical cancer detection. More importantly, we develop the Comparison detector based on Faster-RCNN with Feature Pyramid Network(baseline model) to deal with 
 the limited-data problem. Actually, the key idea is that classify the region proposals by comparising with the prototype representations of each category which learn from reference images. In addition, we propose to learn the prototype representations of the background
 from the data instead of reference images manually choosing by some heuristic rules. Comparison detector shows significant improvement for small dataset, achieving a mean Average Precision (mAP) __26.3%__ and an Average Recall (AR) __35.7%__,
 both improving about __20__ points compared to baseline model. Moreover, Comparison detector achieves better performance on mAP compared with baseline model when training on the medium dataset, and improves AR by __4.6__ points. Our method is promising for the development of automation-assisted cervical cancer screening systems.

### Environment
* CUDA==9.1
* cuDNN==7.0
* tensorflow==1.8.0

### Downloading Data and Weight
If you want to check the effect, you can download the test set in [here](https://pan.baidu.com/s/1BYU3DsX8J8AiaKbE43Iqgw) and put it under the `tfdata/tct`. As same time, you must download the [weight](https://pan.baidu.com/s/1fC3fsKzwfGxq7BxvMjzC1Q) of model and unzip in the home directory.

### Evaluation and Prediction

We provide `evaluate_network.ipynb` to verify our results. We also provide `predict.ipynb` to predict results of a single picture.

### Dataset
The dataset consists of 7410 cervical microscopical images which are cropped from the whole slide images (WSIs) obtained by Pannoramic MIDI II digital slide scanner. In the dataset, there are totally 48,587 instances belonging to 11 categories. We randomly divide the dataset into training set D<sub>f</sub> which contains 6666 images and test set which contains 744 images. The small training set D<sub>s</sub> contains 762 images randomly chosen from D<sub>f</sub>.

__Original image cropped from WSI__
<p align="center">
  <img width="450" src="https://github.com/kuku-sichuan/ComparisonDetector/blob/master/images/README/orig.jpg" />
</p>

__Some instances in 11 categories__
<p align="center">
  <img width="500" src="https://github.com/kuku-sichuan/ComparisonDetector/blob/master/images/README/categories.png" />
</p>

The dataset is available on Google driver [here](https://drive.google.com/drive/folders/1YzPkv6rHLNQXA6QmEUoCl9mWV9fQFsik).
