# Introduction: Detecting Dividing Cells

Welcome to Class 3 of the "Projects in Data Science" course! In our previous session, we explored the world of labeling and annotating cells using Napari. Today, we will take a deep dive into cell detection and focus on building a function that can identify cells undergoing division.

In this Jupyter notebook, we will leverage the knowledge and skills acquired in the previous class to develop an algorithm that detects dividing cells within images. We will utilize the Napari library's functionalities to load and visualize cell images, annotate dividing cells, and build a custom function for automatic detection.

Throughout the notebook, we will cover the following key steps:

1. Leveraging Napari to visualize the cell images and perform manual annotations of dividing cells.


2. Exploring techniques for feature extraction and identifying distinguishing characteristics of dividing cells.


3. Implementing a detection algorithm based on the extracted features.


4. Evaluating the performance of our detection function and fine-tuning it if necessary.


By the end of this notebook, you will have gained practical experience in using Napari for cell annotation and built a functional algorithm capable of detecting dividing cells. 


Let's get started and unlock the secrets hidden within cell division!

## Setting up the Environment
To begin our exploration of cell division detection, we will first set up the environment and import the necessary libraries. In this notebook, we will be utilizing **Jupyter Lab**, a powerful interactive development environment for data science projects. Jupyter Lab provides a user-friendly interface with various features that enhance code readability and streamline the development process.

### Installing Jupyter Lab
If you haven't installed Jupyter Lab already, you can do so by running the following command in your terminal or command prompt:


`pip install jupyterlab`

### Importing Required Libraries
Let's start by importing the libraries we will be using throughout this notebook:
```python
import napari
import numpy as np
import matplotlib.pyplot as plt
from skimage import io
import utils
```

### Reloading Changes with %reload_ext autoreload
During the development process, it is common to make changes to the utils.py file and re-run the notebook. To ensure that the updated code is imported correctly, we will use the `%reload_ext` autoreload magic command. This command automatically reloads any changes made to imported modules.

### Downloading the Data
To perform image analysis in this notebook, we will need to download some data from a provided link. Please follow the instructions below:

1. Open the link: https://weizmann.box.com/s/m7gs1cidmlloei6905hdsqqgin2uuxjp in your web browser.

2. Click on the "Download" button to download the data file to your local machine.

3. Once the download is complete, locate the downloaded file on your machine. It may be in your "Downloads" folder or the default download location of your web browser.

4. Create a folder named "data" in the same directory where your Jupyter Notebook is located. You can do this using the file explorer or by running the following command in a code cell in your Jupyter Notebook:

```python
import os

# Create a directory named "data" if it doesn't exist
if not os.path.exists("data"):
    os.makedirs("data")
```

5. Move the downloaded data file into the "data" folder that you just created.

In [1]:
import napari
import numpy as np
import matplotlib.pyplot as plt
import skimage
import os
from utils import CellAnnotation

# Reloading Changes with %reload_ext autoreload
%reload_ext autoreload
%autoreload 2

### Cell Annotation and Division Detection
In the previous class, we learned how to use Napari for cell annotation and performed manual annotations on a single image. Now, let's take the next step and extend our annotation process to multiple images. We will create a loop that allows us to upload images one by one, open them in Napari, and annotate cells that are in division.

Additionally, we will generate a dataframe that captures relevant information for each annotated cell, including the image name, cell ID, and a flag indicating whether the cell is dividing or not.

### Cell Annotation Class
now, we will create a class called CellAnnotation in the utils.py file. This class will serve as a utility for cell annotation and division detection. It will have methods to upload the segmentation data, annotate cells, and capture pairs of cells that are dividing.

In the annotate method of the `CellAnnotation` class, you will need to open the image in Napari and create two new layers: one for cells that are dividing and another one for cells that are close but not dividing. This additional layer for non-dividing cells will be useful for evaluating the results later.

Make sure to update the `utils.py` file with the following code:

1. **upload_seg(segmentation_path):** This method will take a segmentation path as input and handle the logic for uploading the segmentation data. You will need to implement the logic to read and process the segmentation data in this method.

2. **annotate():** This method will open the image in Napari and create a new layer for cell annotation. Inside this method, you will use the Napari library to display the image and allow the user to annotate the cells.

3. **cells_in_division:** This attribute will store a list of pairs of cells that are dividing. You will update this attribute while annotating the cells in the annotate() method.

Your task is to complete the implementation of the upload_seg and annotate methods in the utils.py file. In the upload_seg method, you need to handle the logic for uploading the segmentation data. In the annotate method, you need to open the image in Napari and create two new layers for cell annotation. Additionally, while annotating the cells, update the cells_in_division attribute with the pairs of cells that are dividing.

Once you have completed the implementation of the CellAnnotation class, you can proceed to use it in the Jupyter Notebook to upload the segmentation, annotate the cells, and capture the pairs of dividing cells.

In [2]:
img_1_path = r'../data/fov_1_hyb_0.nd2'
seg_path = r'../data/fov_1_hyb_0.seg.npy'

img_1_annot = CellAnnotation(img_1_path)
img_1_annot.upload_seg(seg_path)
img_1_annot.annotate()
img_1_annot.cells_in_division

### Annotating Multiple Images and Building the DataFrame
Now that we have the CellAnnotation class implemented, let's use it to loop over all the files in a directory, annotate each image, and store the annotation information in a dataframe. The dataframe will have four columns: "image_name", "pair1", "pair2", and "is_dividing". The "pair1" and "pair2" columns will represent the pairs of cells, and the "is_dividing" column will indicate whether the cells are dividing or not.

**Looping over Image Files and Annotation**
```python
# Create an empty dataframe to store the annotation information.
annotation_df = pd.DataFrame(columns=['image_name', 'pair1', 'pair2', 'is_dividing'])

# Loop over all the files in the directory, annotate each image, and update the dataframe.
for filename in os.listdir(image_directory):
    if filename.endswith('.png') or filename.endswith('.jpg'):
        # Get the full path to the image file
        image_path = os.path.join(image_directory, filename)

        # Create an instance of the CellAnnotation class for the current image
        annotation = CellAnnotation(image_path)

        # Annotate the cells in the image
        viewer = annotation.annotate()

        # Retrieve the pairs of dividing cells from the CellAnnotation instance
        pairs_dividing = annotation.cells_in_division
        pairs_not_dividing = annotation.cells_not_in_division


        # Add the annotation information to the dataframe
        for pair in pairs_dividing:
            annotation_df = annotation_df.append({
                'image_name': filename,
                'pair1': pair[0],
                'pair2': pair[1],
                'is_dividing': True
            }, ignore_index=True)

        # Close the Napari viewer
        viewer.close()
```

At the end of the loop, you will have populated the `annotation_df` dataframe with the annotation information for all the images.

**Make sure that you save your annotation file aftet annotating!**

Feel free to modify the code according to your specific requirements or if you have additional steps to perform during the annotation process.

### Calculating Fold Change of Regionprops Features
In the previous class, we learned how to calculate `regionprops`. Now, let's further analyze the regionprops data by calculating the fold change of each feature for the pairs of cells in the `annotation_df`.

To calculate the fold change of each feature, follow these steps:

1. Iterate over each pair in the annotation_df dataframe.

2. Retrieve the regionprops data for the corresponding image using the image name as the identifier.

3. Extract the regionprops data for the two cells in the pair using their respective cell IDs.

4. Calculate the fold change for each feature by dividing the value of the feature for the second cell by the value of the feature for the first cell.

5. Append the fold change values to the annotation_df dataframe.

This information can provide insights into the changes and differences between the dividing and non-dividing cells based on the calculated region properties.

In [3]:
#Iterate over each pair in the annotation_df dataframe.

###############
# Your Answer #
###############

#Retrieve the regionprops data for the corresponding image using the image name as the identifier.

###############
# Your Answer #
###############

#Extract the regionprops data for the two cells in the pair using their respective cell IDs.

###############
# Your Answer #
###############

#Calculate the fold change for each feature by dividing the value of the feature for the second cell by the value of the feature for the first cell.

###############
# Your Answer #
###############

#Append the fold change values to the annotation_df dataframe.

###############
# Your Answer #
###############


### Exploring Features for Separating Dividing and Non-Dividing Cells
Now that we have calculated the fold change of regionprops features for the pairs of cells in the annotation_df, let's explore the features further to see if any of them can effectively separate dividing and non-dividing cells. We will plot the distributions of these features for both types of cells on the same plot to facilitate visual comparison.

To explore the features, follow these steps:

1. Choose a subset of features from the calculated regionprops data that you would like to investigate for their potential to separate dividing and non-dividing cells. Let's select six features for demonstration purposes.

2. Filter the annotation_df to include only the rows corresponding to dividing cells (is_dividing=True) and non-dividing cells (is_dividing=False).

3. For each selected feature, extract the values for dividing cells and non-dividing cells separately.

4. Plot the distributions of each feature for dividing and non-dividing cells on the same plot using appropriate visualization techniques such as overlapping histograms or kernel density plots.

5. Analyze the distributions and observe if any patterns emerge. Look for features that demonstrate clear separation or significant differences in the distributions between dividing and non-dividing cells.

By plotting the distributions of these selected features on the same plot, you can directly compare the distributions between dividing and non-dividing cells. Look for features that display distinct or overlapping distributions between the two cell types, as these may provide valuable insights for differentiation.

In [4]:
# Choose a 6 features from the calculated regionprops data that you would like to investigate for their potential to separate dividing and non-dividing cells.

###############
# Your Answer #
###############

# Filter the annotation_df to include only the rows corresponding to dividing cells and non-dividing cells.

###############
# Your Answer #
###############

# For each selected feature, extract the values for dividing cells and non-dividing cells separately.

###############
# Your Answer #
###############

# Plot the distributions of each feature for dividing and non-dividing cells on the same plot using appropriate visualization techniques.

###############
# Your Answer #
###############

In addition to plotting the distributions of selected features, another approach to explore the potential separation between dividing and non-dividing cells is to use scatter plots. Scatter plots allow us to visualize the relationship between two features and observe if there is a clear separation between the two cell types.

To explore the features using scatter plots, follow these steps:

1. Choose two features that you would like to investigate for their potential to separate dividing and non-dividing cells.

2. Filter the `annotation_df` to include only the rows corresponding to dividing cells (is_dividing=True) and non-dividing cells (is_dividing=False).

3. Create a scatter plot with the selected features on the x-axis and y-axis, respectively. Use different colors or markers to distinguish between dividing and non-dividing cells.

4. Analyze the scatter plot and observe if there is a clear separation or clustering of the two cell types based on the selected features.

**Look for patterns such as distinct groups, clusters, or clear boundaries that differentiate the two cell types based on the selected features.**

Keep in mind that the effectiveness of separation may vary depending on the chosen features and the characteristics of your dataset. **Experimenting with different combinations of features can provide valuable insights into the discriminatory power of the selected features.**

In [5]:
# Choose two features that you would like to investigate for their potential to separate dividing and non-dividing cells.

###############
# Your Answer #
###############

# Filter the annotation_df to include only the rows corresponding to dividing cells and non-dividing cells.

###############
# Your Answer #
###############

# Create a scatter plot with the selected features on the x-axis and y-axis, respectively.

###############
# Your Answer #
###############

# Analyze the scatter plot and observe if there is a clear separation or clustering of the two cell types based on the selected features.

###############
# Your Answer #
###############


### Exploring Additional Features and Feature Engineering
While `regionprops` provide valuable features for analyzing cell properties, there may be other features outside of regionprops that can be useful in distinguishing dividing and non-dividing cells. Feature engineering involves creating new features or combining existing ones to capture different aspects of the data and improve classification or separation.

To explore additional features and feature engineering techniques, consider the following steps:

1. Think about cell characteristics that may be relevant to cell division but are not captured by the regionprops features. These could include features related to cell shape, texture, intensity, or spatial relationships.

2. Explore different combinations or transformations of existing features that may reveal new patterns or relationships. For example, you could calculate ratios, differences, or sums of existing features to create new composite features.

3. Consider incorporating domain knowledge or biological insights into the feature engineering process. This can help identify features that are known to be relevant to cell division or cellular processes.

4. Experiment with different feature sets and evaluate their effectiveness in separating dividing and non-dividing cells using visualization techniques, statistical tests, or machine learning algorithms.

In [6]:
###############
# Your Answer #
###############

### Building a Classification Function to Determine Cell Division
Now that we have explored different features and potential feature engineering techniques, it's time to build a classification function that can determine whether two cells are dividing or not. Based on your knowledge of the features and insights gained from the analysis, you can use various approaches to classify the cells.

Here are a few suggested methods you can consider:

1. **Threshold-based approach:** Define thresholds for one or more features that can effectively separate dividing and non-dividing cells. For example, you can set a threshold on the aspect ratio and classify cells with aspect ratios above the threshold as dividing cells. Similarly, you can set thresholds on other relevant features based on your domain knowledge.

2. **Statistical methods:** Utilize statistical methods such as clustering or Gaussian mixture models to identify distinct groups or clusters of cells. Assign cells to the dividing or non-dividing group based on their cluster membership.

3. **Machine learning algorithms:** Train a classification model using machine learning algorithms such as logistic regression, decision trees, random forests, or support vector machines. Use a labeled dataset with features and corresponding labels (dividing or non-dividing) to train the model, and then use it to predict the labels for new unseen data.

To build your classification function, consider the following steps:

1. Determine the features or combination of features that you believe are most indicative of cell division based on your analysis.

2. Choose an appropriate classification method based on your problem and available resources (e.g., threshold-based, statistical, or machine learning).

3. Implement the classification function using the chosen method. If using machine learning algorithms, you can leverage libraries such as scikit-learn to facilitate the model training and prediction process.

4. Evaluate the accuracy of your classification function using appropriate evaluation metrics such as accuracy, precision, recall, or F1-score. Split your labeled dataset into training and test sets to assess the performance of the model on unseen data.



In [7]:
###############
# Your Answer #
###############