# QAFKA

## Welcome to QAFKA jupyter notebook


In order to run QAFKA please follow the instructions in the following blocks.
For more information please visit [link to paper](https://www.google.com)

### Software Installation
In the next block we will import the relevant packages for QAFKA.

In [10]:
!pip install tiffcapture
!pip install torch
!pip install ipympl

from datasets import *
from dataloaders import *
from neural_network import *
from trainers import *
from utils import *
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

print("Packages installation completed successfully")




### Parameters Initialization

Now you should specify some important parameters for your run.

**numOfBins** - Specifies the number of bins in the histogram of the number of blinking events (default: 20)

**chop** - Specifies which frames of the experiment you would like to analyze. For example: chop = \[0, 1000\] will cause QAFKA to analyze the experiment between the first frame and the 1000th frame.

**pixel_length** - Specifies the experiment's pixel size \[nm\]

**scale_size** - Specifies the resolution scaling. For example: scale_size = 3 and pixel_length = 150 \[nm\] would results in a reconstrcuted image with grid size of 50 \[nm\].

**emitters_size** - Specifies the emitters merging radius for detection. If two clusters would be located within this radius they would be considered as a single cluster.

**numOfClusters** - Specifies the number of simulated clusters in each simulated experiment

**file_names** - Specifies the names of the TIFF files (at least one experiment is required). For example: 'first_exp.tif'.

**qualityThreshold** - Specifies the minimal fitting score for the localization block. (default: 0.85)

In [11]:
numOfBins = 20
chop = [0, 2000]
pixel_length = 157 #[nm]
scale_size = 3
merging_radius = 50 #[nm]
file_names = [r'D:\Project\data\CTLA4\mEos3.2.tif', r'D:\Project\data\CTLA4\mEos3.2 (1).tif']
qualityThreshold = 0.85

### Run Configuration

**LoadData** - Determines if we want to load new experimental data (True) or we want to use an already loaded data (False).

**FilterBeads** - Determines if an additional beads filtration algorithm is needed for the experimental data.

**CreateSimulatedData** - Determines if we want to use the same training data as before (True) or we want to create new training set (False).

**TrainNet** - Determines if we want to train the neural network (True) or not (False).

**preTrainedModel** - Specifies the pre-trained model to load in case we do not want to train the net.

In [12]:
LoadData = True
FilterBeads = False
CreateSimulatedData = True
TrainNet = True
preTrainedModel = 'model_final_gauss'
# Add training time of the net

### Analysis Pipeline


In [13]:
if(chop[1]-chop[0]<2000):
    max_size = int((chop[1]-chop[0])/500)
else:
    max_size = int((chop[1]-chop[0])/1000)
    
resolution_nm = pixel_length/scale_size #[nm]

if(LoadData):
    trajectories, clusterCoordinations = [], []
    for i, file in enumerate(file_names):
        print("**** Analyzing Tiff number {} ****".format(i+1))
        # Load TIFF files and create data_set
        Data_Set = CreateDataSet(file, chop)
        
        # Segment the experiment before and after laser activation
        seg = segment(Data_Set, threshold=0.15, window_size=100)
        
        # Filter beads (if True)
        if(FilterBeads):
            Data_Set = Filter_beads(Data_Set)
        
        # Background noise cleaning
        Data_Set = clean_bg_noise(Data_Set, patch_length=5)
        
        # Clusters localization
        Max_Data_Set = CreateMaxDataSet(Data_Set, max_size, seg)
        DataThreshold, MaxThreshold = calc_threshold(Data_Set, Max_Data_Set)
        coordinates = LocalizeEmitters(Max_Data_Set, MaxThreshold, qualityThreshold, pixel_length, resolution_nm, merging_radius)
        
        # Create time traces for each cluster
        timeTraces = ExtractTimeTraces(Data_Set[seg:, :, :], coordinates, pixel_length, resolution_nm, qualityThreshold, DataThreshold, merging_radius)
        
        # Save the time traces and clusters locations of all experiments in a list
        trajectories.append(timeTraces)
        clusterCoordinations.append(coordinates)
        
        # The coordinations file would be saved as 'coordinated.npy'
    np.save('clusterCoordinations', clusterCoordinations)

    # Extract the features that would serve as the neural network's input
    X_test = feature_extraction(trajectories, DataThreshold, numOfBins)
else:
    # Load features of an already analyzed experiment
    X_test = LoadFinalDataSet()

print("Experimental Data was loaded successfully")

**** Analyzing Tiff number 1 ****
-I- Found segmentation in frame: 240
-I- Background noise was filtered
Emitter is out of bound: 439
Bad fitting grade: 11
Emitters intensity is too low: 44
-I- found 105 emitters




Blink too far from emitters: 5745
Emitter is out of bound: 121
Bad fitting grade: 40
Emitters intensity is too low: 4518
-I- updated emitters time traces
**** Analyzing Tiff number 2 ****
-I- Found segmentation in frame: 266
-I- Background noise was filtered
Emitter is out of bound: 9
Bad fitting grade: 20
Emitters intensity is too low: 36
-I- found 244 emitters
Blink too far from emitters: 6263
Emitter is out of bound: 28
Bad fitting grade: 9
Emitters intensity is too low: 5189
-I- updated emitters time traces
Experimental Data was loaded successfully


  return array(a, dtype, copy=False, order=order, subok=True)


### Plot Histograms

In [14]:
%matplotlib widget

for i in range(X_test.shape[0]):
    plt.figure()
    plt.plot(np.arange(1, numOfBins + 1), X_test[i])
    plt.xlabel('Bins')
    plt.ylabel('Counts')
    plt.xticks(np.arange(1, numOfBins + 1))
    plt.title("Exp " + str(i+1) + " histogram")
    plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

### Export to Excel

In [15]:
for i in range(X_test.shape[0]):
    np.savetxt("Exp_"+str(i+1)+"_histogram.csv", X_test[i], delimiter=',')
    np.savetxt("Exp_"+str(i+1)+"_localization.csv", clusterCoordinations[i], delimiter=',')

print("Data export completed successfully")

Data export completed successfully


### Visualize Localizations
The next block will plot a max projection image of the last experiment with the localization marked on it

In [16]:
if(LoadData):
    debug_entire_exp(Max_Data_Set, coordinates, scale_size)

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

### Simulated Data Setup

If you chose to simulate the training data, you would need to specify the following parameters:

**numOfClusters** - Specifies the number of simulated clusters in each simulation (relevant only if CreateSimulatedData is set to True).

**bleach_proba** - Specifies the bleaching probability of the used fluorophore.

**TrainSetSize** - Specifies the number of simulated experiments to be created.

In [12]:
numOfClusters = 200
bleach_proba = 0.41
TrainSetSize = 10000

### Create Simulated Training Data

In [13]:
if(CreateSimulatedData):
    [X, y] = CreateSimulatedDataSet(TrainSetSize, numOfClusters, bleach_proba, numOfBins)
else:
    [X, y] = LoadSimulatedDataSet()

X_train, X_val, y_train, y_val = train_test_split(X, y, train_size=0.75)
[X_train, X_val, X_test] = Normalization(X_train, X_val, X_test)
[X_train, X_val, X_test] = BiasTrick(X_train, X_val, X_test)
y_train = torch.FloatTensor(y_train)
y_val = torch.FloatTensor(y_val)

print("-I- Simulated Data was created successfully")

-I- Simulated Data was created successfully


### Build Model

In the next block we will build the neural network model.

**lr** - Specifies the training phase learning rate.

**betas** - Specifies the parameters for ADAM optimizer.

**batch_size** - Specifies the batch size of the training phase.

**epochs** - Specifies the maximal training epoch.

**early_stopping** - Specifies the tolerance of the neural network to lack of improvement in the validation loss. For example: early_stopping = 5, would stop the trainig phase if the validation loss did not improve for 5 epochs.

In [14]:
lr = 1e-5
betas = (0.99, 0.999)
batch_size = 4
epochs = 1000
early_stopping = np.min((int(epochs/5), 15))

### Training Phase

In [44]:
if(TrainNet):
    model = CustomNet(torch.numel(X_train[0]), [128, 128, 128, 128])
    
    criterion = torch.nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, betas=betas)

    dl_train = CreateDataLoader(X_train, y_train, batch_size=batch_size)
    dl_val = CreateDataLoader(X_val, y_val, batch_size=1)

    # ================= Train Net ================
    trainer = Trainer(model, criterion, optimizer)
    trainer.fit(dl_train, dl_val, num_epochs=epochs, early_stopping=early_stopping, print_every=1)
    torch.save(trainer.model.state_dict(), 'model_final_gauss')

--- EPOCH 1/1000 ---


  return F.mse_loss(input, target, reduction=self.reduction)


Epoch 1 : Train loss = 0.032010581344366074
Epoch 1 : Validation loss = 0.0034867837093770504
--- EPOCH 2/1000 ---
Epoch 2 : Train loss = 0.002741472329944372
Epoch 2 : Validation loss = 0.0021230459678918123
--- EPOCH 3/1000 ---
Epoch 3 : Train loss = 0.0019213355844840407
Epoch 3 : Validation loss = 0.0016796514391899109
--- EPOCH 4/1000 ---
Epoch 4 : Train loss = 0.0016077193431556225
Epoch 4 : Validation loss = 0.0015210711862891912
--- EPOCH 5/1000 ---
Epoch 5 : Train loss = 0.0014739680336788297
Epoch 5 : Validation loss = 0.0014502544654533267
--- EPOCH 6/1000 ---
Epoch 6 : Train loss = 0.0014031744794920087
Epoch 6 : Validation loss = 0.0014080804539844394
--- EPOCH 7/1000 ---
Epoch 7 : Train loss = 0.0013563685351982713
Epoch 7 : Validation loss = 0.0013794208643957973
--- EPOCH 8/1000 ---
Epoch 8 : Train loss = 0.0013223503483459353
Epoch 8 : Validation loss = 0.001358283800072968
--- EPOCH 9/1000 ---
Epoch 9 : Train loss = 0.001295644324272871
Epoch 9 : Validation loss = 0.0

### Load Pre-trained Model

In [18]:
model = CustomNet(torch.numel(X_train[0]), [128, 128, 128, 128])
model.load_state_dict(torch.load(preTrainedModel))
print("Pretrained model loaded successfully")

Pretrained model loaded successfully


### Testing Phase

In [19]:
y_val_pred = model(X_val)
y_test_pred = model(X_test).squeeze()
y_test_pred = torch.max(y_test_pred, torch.zeros(y_test_pred.shape))

val_acc = torch.mean(torch.abs(y_val_pred.squeeze() - y_val))
print("Neural Network Validation MSE:", 100 * val_acc.item())

print("Printing dimer percentage per experiment:")
if(y_test_pred.shape == torch.Size([])):
    print("1: ", 100 * y_test_pred.item())
else:
    for i in range(y_test_pred.shape[0]):
        print(str(i+1)+": ", 100 * y_test_pred[i].item())

Neural Network Validation MSE: 2.1616650745272636
Dimers Percentage Predictions Per Experiment:
1:  44.82960104942322
2:  33.570170402526855


### Detection Efficiency Correction
Please specify the detection efficiency in your experiment.

In [20]:
detection_efficiency = 0.78

### Calculate Final Predictions

In [22]:
print("Printing corrected dimer percentage per experiment:")
if(y_test_pred.shape == torch.Size([])):
    print("1: ", 100 * find_actual_dimers_percentage(y_test_pred.item(), detection_efficiency))
else:
    for i in range(y_test_pred.shape[0]):
         print(str(i+1)+": ", 100 * find_actual_dimers_percentage(y_test_pred[i].item(), detection_efficiency))

Printing corrected dimer percentage per experiment:
1:  65.79285869963569
2:  47.54001045759694
