## Combining and visualising .csv data

#### Code created by Deniz Bekat (*deniz.bekat@crick.ac.uk*)

This code is designed to combine data from different images, saved in the form of `.csv` files (as in the stack property and cell counting Notebooks). In this notebook we:

- Collect all .csv files in your folders and combine them into one `pandas` Data Frame
- Normalise your measurements where needed, to account for differences between samples
- Produce a plot of your data using `seaborn` to visualise your quantitative analysis!

### **Importing libraries**

Python contains several **modules and packages** that provide the tools to analyse our images; we can use the code ``` import ``` to use them within this code:

In [None]:
#loading required modules

import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import glob
from scipy.stats import linregress
import seaborn as sns

### Combining your data into one DataFrame

The module `pandas` can store `.csv` files in the form of **DataFrames**, which can then be used for easy analysis of your data. 

This section finds your folder of `.csv` files, and **combines** them into one DataFrame for further analysis

In [None]:


folder_path = ""
csv_files = glob.glob(os.path.join(folder_path, "*.csv")) #pulls all files with .csv in the name (should be all of your data!)

dfs = []
for file in csv_files:
    df = pd.read_csv(file)
    df["Image"] = os.path.basename(file).replace(".csv", "")
    dfs.append(df)

# Combine all
all_data = pd.concat(dfs, ignore_index=True)

In [None]:
all_data.head() #shows the first 5 entries of data - make sure that the data is how you expect it to look!

### OPTIONAL: Normalise your data!

Some of your data might have different microscope systems or settings, and have different intensity ranges, making them difficult to combine together. Here, we normalise your *mean* and *standard deviation* data as a percentage (from 0-100% of the maximum values per dataset).

This is applied PER .csv file, so then all the data will fit in much better!

In [None]:

def normalise_mean(df): #normalises mean intensity using the mean of the mean intensity dataset

    dmin = df['Mean Intensity'].min()
    dmax = df['Mean Intensity'].max()
    df['Normalised Mean'] = (df['Mean Intensity'] - dmin) / (dmax - dmin) * 100

    return df

def normalise_std(df): #normalises the standard deviation per slice using the mean intensity of THAT slice

    dmin = df['Std Dev'].min()
    dmax = df['Std Dev'].max()
    df['Normalised Std Dev'] = (df['Std Dev'] - dmin) / (dmax - dmin) * 100

    return df


# Apply normalization per image
all_data = all_data.groupby("Image", group_keys=False).apply(normalise_mean)
all_data = all_data.groupby("Image", group_keys=False).apply(normalise_std)


### Plotting your data with `seaborn`

The library `seaborn` is a library that utilises `matplotlib` to make simply and high-quality plots. In this case, we draw line plots for each image in your dataset to show how your variables change with imaging depth; just change the `y=` line of code to measure whatever you want!

In [None]:
plt.figure(figsize=(9, 6))

for t, df_t in all_data.groupby("Image"):
    sns.lineplot(
        data=df_t,
        x="Imaging depth",Â 
        y="nuclei_count", #change this to whatever you want to measure!
        units="Image",
        estimator=None,
        lw=1.5,
        marker='o',
        label=t  # only once per type, but safe to deduplicate below
    )


plt.grid(True)
plt.xlabel("Imaging depth (%)", fontsize=20)
plt.ylabel("Cell count", fontsize=20)

plt.tight_layout()
plt.show() # copy the plot and use it wherever! Enjoy the 'clear' results :)
