## Classify_df

### Useful links

#### <u>Datasets</u>
Complete dataset: The full dataset of images used can be found [here](https://drive.google.com/drive/folders/1Rz0JrjUCU-4VmDkolbkwUcM8xW1jd9pl?usp=drive_link)

Cropped datasets: The complete dataset has been used by SAM to create segmentations of cells, with some noise, resulting in:
 All these [crops](https://drive.google.com/drive/folders/1Rz0JrjUCU-4VmDkolbkwUcM8xW1jd9pl?usp=drive_link) 
 and all these [csv files](https://frbautneduar-my.sharepoint.com/:u:/g/personal/ntaurozzi_frba_utn_edu_ar/EYKi5F-wXGRNkAqjPSRVhvUByTsnsEB10OrZiJHclkOPWQ?e=tVwmkS) with the information of each crop.

Input/Target dataset: From the complete dataset, some images have been tagged by the biologists an those 58 (for now) can be found [here](https://frbautneduar-my.sharepoint.com/:u:/g/personal/ntaurozzi_frba_utn_edu_ar/EQbvUOwADihJsJLAyVfBdYwBDvHJDMS5GQuyyP_PzUeCLQ?e=z8A7Tn) each image with its corresponding target. The name of the images here are ids given to them by a json file.

#### <u>Auxiliary files</u>
To create the input/target dataset from the complete dataset, this [json](https://drive.google.com/file/d/1ydQ2fIOllwPPU64Kneda4mVidUww1X9T/view?usp=drive_link) was used

#### <u>Models</u>
For making the predictions these [models](https://frbautneduar-my.sharepoint.com/:f:/g/personal/lmareque_frba_utn_edu_ar/EiDo8WYptOpEiyzJHhQIbwoBUAfsoULwRRKEm-fmgzQD-g?e=6TE9yu) will be used

### Code walkthrough

This notebook will process all the images from the input/target dataset. It will use the json file to map the IDs to their original filenames. Then, it will search for all the crops belonging to these images and use all the models to predict whether each crop is noise or a cell. After the prediction, it will store the results in the corresponding CSV file for each image.

The output will be a folder for each model used. Within each folder, a CSV file will be created for each image, containing information about the bounding boxes of each crop and its classification (cell or noise).

### Imports

In [2]:
import pandas as pd
import numpy as np
import os
import sys
import matplotlib.pyplot as plt
import joblib
import math
import keras
import cv2 as cv
import json
sys.path.insert(0, "../packages/python")

2024-12-22 18:55:32.772897: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1734904532.854361   55042 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1734904532.878342   55042 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-22 18:55:33.059501: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### Paths

In [3]:
CSV_PATH = '../output/sam_uploaded_out/'
CROPS_PATH = '../output/cropped_cells_full/'
MODELS_PATH = '../models/'
IMAGES_PATH = '../media/data/input/'
JSON_PATH = '../media/corte-28-10-2023.json'

### Functions

In [4]:
def predict_cell(model_path, image_path, images_batch):
  """
  Given an image batch it returns the predictions of the batch with the given model.

  Args:
    model_path: path to the keras model to use.
    image_path: path to the folder where the images are.
    images_batch: list of the image names to include in the batch

  Returns:
    A list of predictions.
  """
  images = []
  for image in images_batch:
      img = cv.imread(image_path+image)#, cv.IMREAD_GRAYSCALE)
      img = cv.resize(img, (128, 128))
      img = img / 255.0 
      images.append(img)

  model = keras.models.load_model(model_path)
  batch = np.stack(images)
  prediction = model.predict(batch, verbose=0)

  return prediction #True if prediction >= 0.5 else False


def find_image_name(data, id):
  """
  Given a list of objects it searches the file_name from the id.

  Args:
    data: list of objects to search from.
    id: id of the file name to return.

  Returns:
    The file name of the image id.
  """
  for img in data:
      if img['id'] == id:
          base_name, _ = os.path.splitext(img['file_name'])
          return base_name
        

def process_images_in_batches(string_list, batch_size=10):
  """
  Processes a list of strings in batches of a specified size.

  Args:
    string_list: The list of strings to process.
    batch_size: The size of each batch.

  Yields:
    A batch of strings.
  """
  for i in range(0, len(string_list), batch_size):
    batch_num = i // batch_size + 1  # Calculate batch number (1-indexed)
    yield batch_num, string_list[i:i + batch_size]

### Lists of elements to use

In [5]:
csvs = sorted(os.listdir(CSV_PATH)) #Paths to the csv of SAM detections of each image
crops = sorted(os.listdir(CROPS_PATH)) #Paths to the crops made from SAM detection of the full_images
models = sorted(os.listdir(MODELS_PATH)) #Models to use in the prediction
og_images = sorted(os.listdir(IMAGES_PATH)) #full_images from where the crops are made
with open(JSON_PATH, 'r') as f: #json with the information of the filename of the images
    data = json.load(f)

### Model prediction

In [None]:
for model_idx, model in enumerate(models):
    
    base_model, _ = os.path.splitext(model)
    output = f"../output/evaluated/{base_model}"
    if not os.path.exists(output): #Create dirs for each model used
        os.makedirs(output)

    for og_image_idx, og_image in enumerate(og_images):
        img_name, _ = os.path.splitext(og_image)
        real_name = find_image_name(data['images'], int(img_name)) #Find the real image name from where the crops where made
        images = sorted([crop for crop in crops if crop.startswith(real_name)]) #Get all the crops from that image
        df = pd.read_csv(os.path.join(CSV_PATH, f"{real_name}.csv")) #Read the csv of that image

        batch_size=30
        for idx, batch in process_images_in_batches(images, batch_size=batch_size): #Read the images in batch_size batches
            print(f"Model: {model} ({model_idx+1}/{len(models)}) - Image: {og_image} ({og_image_idx+1}/{len(og_images)}) - Batch {idx}/{math.ceil(len(images)/batch_size)}", end='\r')

            batch_prediction = predict_cell(model_path=os.path.join(MODELS_PATH, model), image_path=CROPS_PATH, images_batch=batch)
            is_cell = [True if sublist[0] >= 0.5 else False for sublist in batch_prediction]
            # print(is_cell)

            for idx, i in enumerate(batch):#For each batch store the prediction in the csv
                crop_name, _ = os.path.splitext(i)
                cell_id = crop_name.split('_')[2]

                mask = (df['cell_id'] == int(cell_id)) & (df['image'] == real_name)
                df.loc[mask, 'is_cell'] = is_cell[idx]
                df.loc[mask, 'image'] = img_name 
        df.to_csv(os.path.join(output, f"{img_name}.csv"))