# Data Challenge: [Help a Hematologist out!](https://helmholtz-data-challenges.de/web/challenges/challenge-page/93/overview) 
*** 
<b> Group: </b> 
> #      $ BLAMAD $  
<b> members </b> 
> Bashir K., 
> Lea G., 
> Ankita N., 
> Martin B., 
> Arnab M., 
> Dawit H. 
***

![logo](https://github.com/christinab12/Data-challenge-logo/blob/main/logo.jpg?raw=true)

## Getting started


This notebook is a short summary for getting started with the challenge ( found [here](https://helmholtz-data-challenges.de/web/challenges/challenge-page/93/overview)  ). Below you can find how to download the dataset and also the different labels along with exploring and analyzing the input and output data of the challenge, running a baseline model and creating a submission file to upload to the leaderboard.

***

<b>dataset:</b>

Three datasets, each constituting a different domain, will be used for this challenge:
> 1. The Acevedo_20 dataset with labels
> 2. The Matek_19 dataset with labels
> 3. The WBC dataset <b> without labels </b> (Used for domain adaptation and performance measurement)

The Acevedo_20 and Matek_19 datasets are labeled and should be used to train the model for the domain generalization task.
A small subpart of the WBC dataset, WBC1, will be downloadable from the beginning of the challenge. It is unlabeled and should be used for evaluation and domain adaptation techniques.

A second similar subpart of the WBC dataset, WBC2, will become available for download during phase 2 of the challenge, i.e. on the last day, 24 hours before submissions close.

***
<b>Goal: </b> 

The challenge here is in transfer learning, <b> precisely domain generalization (DG) and domain adaptation (DA) </b> techniques. The focus lies on using deep neural networks to classify single white blood cell images obtained from peripheral blood smears.
<b> Tthe goal of this challenge is to achieve a high performance, especially a high f1 macro score, on the WBC2 dataset. </b>

***
<b>Notes: </b>

This challenge wants to motivate research in domain generalization and adaptation techniques:

To make actual use of deep learning in the medical routine, it is important that the techniques can be used in realistic cases. If a peripheral blood smear is acquired from a patient and classified by a neural network, it is important that this works reliably. But the patient’s blood smear might very likely vary compared to the image domains used as training data of the network, resulting in not trustable results. To overcome this obstacle and build robust domain-invariant classifiers research in domain generalization and adaptation is needed.

***
<b>f1_score: </b>
[wikepedia](https://en.wikipedia.org/wiki/F-score)

> sklearn.metrics.f1_score(y_true, y_pred, *, labels=None, pos_label=1,<b> average='macro' </b>, sample_weight=None, zero_division='warn')

The formula can be see in [click here for the code](https://github.com/scikit-learn/scikit-learn/blob/36958fb24/sklearn/metrics/_classification.py#L1001) and is given as

> <g> F1 = 2 * (precision * recall) / (precision + recall) </g>

<img src="/beegfs/desy/user/hailudaw/blamad/figures/f1_score.PNG" width="384" height="681">

***


<b> Donwloading the data </b>

Uncomment the code below to download the dataset. Makesure you adjust the path according to where you download it

 

In [None]:
# !wget --user YraZEdrHytaCSza --password BgZL3j8DT4 https://hmgubox2.helmholtz-muenchen.de/public.php/webdav/Acevedo_20.zip -O Acevedo_20.zip #(230M) [application/zip]
# !wget --user YraZEdrHytaCSza --password BgZL3j8DT4 https://hmgubox2.helmholtz-muenchen.de/public.php/webdav/Matek_19.zip -O Matek_19.zip #(5.7G) [application/zip]
# !wget --user YraZEdrHytaCSza --password BgZL3j8DT4 https://hmgubox2.helmholtz-muenchen.de/public.php/webdav/WBC1.zip -O WBC1.zip #(357M) [application/zip]
# !wget --user YraZEdrHytaCSza --password BgZL3j8DT4 https://hmgubox2.helmholtz-muenchen.de/public.php/webdav/val_dummy.csv -O val_dummy.csv #44834 (44K) [text/csv]
# !wget --user YraZEdrHytaCSza --password BgZL3j8DT4 https://hmgubox2.helmholtz-muenchen.de/public.php/webdav/metadata2.csv -O metadata2.csv #2019059 (1.9M) [text/csv]
# print('download complete') 

# import shutil

# shutil.unpack_archive('Acevedo_20.zip', 'Datasets/Acevedo_20')
# shutil.unpack_archive('Matek_19.zip', 'Datasets/Matek_19')
# shutil.unpack_archive('WBC1.zip', 'Datasets/WBC1')
# !ls

<b> datapath </b>

In [None]:
import os
Dataset_path = os.getcwd() + '/Datasets'
data_path = { 'Ace_20': Dataset_path + '/Acevedo_20',
              'Mat_19': Dataset_path + '/Matek_19',
              'WBC1': Dataset_path + '/WBC1'}   

<b> labels </b>

In [None]:
# Common classes of the datasets and their labels: 
# Highly underrepresented classes like atypical lymphocytes and smudge cells were left out.

label_map_all = {'basophi': 0, 'eosinophi': 1, 'erythroblast': 2, 'myeloblast' : 3, 'promyelocyte': 4, 'myelocyte': 5, 'metamyelocyte': 6, 'neutrophil_banded': 7, 'neutrophil_segmented': 8, 'monocyte': 9, 'lymphocyte_typical': 10}
label_map_reverse = { 0: 'basophil', 1: 'eosinophil', 2: 'erythroblast', 3: 'myeloblast', 4: 'promyelocyte', 5: 'myelocyte', 6: 'metamyelocyte', 7: 'neutrophil_banded', 8: 'neutrophil_segmented', 9: 'monocyte', 10: 'lymphocyte_typical'}

# The unlabeled WBC dataset gets the classname 'Data-Val' for every image
label_map_pred = {'DATA-VAL': 0 }

In [None]:
import data_processing as dp 
'''includes the data processing functions like  finding_classes, metadata_generator, compute_mean, data_report, data_plot, crop (data_loader, data_augmentation, data_preprocessing, data_splitting, data_normalization, data_visualization,)'''
import dataset_generator as dg
'''includes the dataset generator functions like  dataset_generator, dataset_generator_pred'''

In [None]:
print("Labels for the datapath Ace_20: ", dp.finding_classes(data_path['Ace_20']))

metadata = dp.metadata_generator(data_path)

In [None]:
print((metadata.values[:,1]).tolist()[0:10])

In [None]:
import skimage.io as io
import matplotlib.pyplot as plt

image = io.imread(metadata.values[:,1].tolist()[0])
image2 = io.imread(metadata.values[:,1].tolist()[15000])
print(image.shape)

for ch in range(3):
    plt.subplot(1,3,ch+1, title='Channel {}'.format(ch+1, image.shape), xticks=[], yticks=[])
    plt.imshow(image[:,:,ch])
    plt.plot(image[:,:,ch][:,image.shape[1]//2], color='gray', linewidth=0.5)
    plt.plot(image[:,:,ch][:,image.shape[0]//2], color='gray', linewidth=0.5)
    plt.axis('off')
plt.show()

# np.mean(image, axis=(0,1)))

In [None]:
import tqdm
import numpy as np

import pandas as pd
for ch in range(3):
    print('Channel {}:'.format(ch+1))
    print('Mean: ', np.mean(image[:,:,ch]))
    metadata.values[:,5+ch] = np.mean(image[:,:,ch])
    print('Std: ', np.std(image[:,:,ch]))
    print('Max: ', np.max(image[:,:,ch]))
    print('Min: ', np.min(image[:,:,ch]))
    print('')


In [None]:
added = pd.DataFrame(columns=["Image", "file", "label", "dataset", "set", 'mean1', 'mean2', 'mean3', 'std1', 'std2', 'std3', 'max1', 'max2', 'max3', 'min1', 'min2', 'min3'])
metadata = pd.concat([metadata, added], axis=0)

<b> convert the dataset to a Pandas frame and compute the mean </b> 

In [274]:
import ray 
if not ray.is_initialized():
    ray.init()

@ray.remote
def make_stat(metadata = metadata, idx = 0):
    added = metadata.values[idx]
    image = io.imread(metadata.values[:,1].tolist()[idx])
    for ch in range(3):
        added[5+ch] = np.mean(image[:,:,ch])
        added[8+ch] = np.std(image[:,:,ch])
        added[11+ch] = np.max(image[:,:,ch])
        added[14+ch] = np.min(image[:,:,ch])
    return added

meta_ray = ray.put(metadata)
x =ray.get([make_stat.remote(ray.get(meta_ray), id) for id in tqdm.tqdm(range(len(metadata)))])
metadata.values[0:len(metadata)-1] = x[0:len(metadata)-1]

In [None]:
meta_ray = ray.put(metadata)
x =ray.get([make_stat.remote(ray.get(meta_ray), id) for id in tqdm.tqdm(range(len(metadata)))])
metadata.values[0:len(metadata)-1] = x[0:len(metadata)-1]
metadata.to_csv('metadata.csv', index=False)

In [273]:
metadata.head(21)

ray.shutdown()


###  Authors

> Armin Gruber

> Ali Boushehri

> Christina Bukas

> Dawit Hailu