# Processing, Clustering and Visualising in MDV 3k PBMCs

The part of this tutorial dedicated to analysing the data is based on the scanpy tutorial "Preprocessing and clustering 3k PBMCs (legacy workflow)" found here: https://scanpy.readthedocs.io/en/stable/tutorials/index.html . If you are interested in understanding the data analysis part of this tutorial more, please check the scanpy tutorial. The data consists of 3k PBMCs from a Healthy Donor and are available from 10x Genomics. On a unix system, you can uncomment and run the following to download and unpack the data. Alternatively you can download the data from http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz, create a new directory named "data" in the same level that this jupyter notebook is saved and run from, and unzip the data within the "data" directory.

#### Importing the required packages for data preprocessing

In [13]:
import numpy as np
import pandas as pd
import scanpy as sc

#### Importing the required packages for MDV set up and visualisation

In [14]:
import os
import json
from mdvtools.mdvproject import MDVProject
from mdvtools.charts.row_chart import RowChart
from mdvtools.charts.dot_plot import DotPlot
from mdvtools.charts.histogram_plot import HistogramPlot
from mdvtools.charts.scatter_plot_3D import ScatterPlot3D
from mdvtools.charts.scatter_plot import ScatterPlot
from mdvtools.charts.table_plot import TablePlot

## Data analysis section

In [15]:
# scanpy parameters for feedback level setting
sc.settings.verbosity = 3  # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_header() # printing a header of introductory information about the environment/library used

scanpy==1.10.1 anndata==0.9.2 umap==0.5.6 numpy==1.26.4 scipy==1.13.0 pandas==2.2.2 scikit-learn==1.4.2 statsmodels==0.14.2 igraph==0.11.4 pynndescent==0.5.12


In [16]:
# reading the 3k 10x data

adata = sc.read_h5ad("../../../../../Documents/TAURUS_data/bcell_viz_ready_revised.h5ad",)

#### Data preprocessing

In [17]:
# Saving count data
adata.layers["counts"] = adata.X.copy()

# total-count normalising the anndata object to 10,000 reads per cell, so that counts become comparable among cells
sc.pp.normalize_total(adata, target_sum=1e4)

# logarithmising the data
sc.pp.log1p(adata)


normalizing counts per cell
    finished (0:00:00)


#### Converting the anndata object to pandas dataframes

In [18]:
# cells dataframe 

cells_df = pd.DataFrame(adata.obs)

# adding the umap data to the cells dataframe
umap_np = np.array(adata.obsm["X_umap"])
cells_df["UMAP 1"] = umap_np[:, 0]
cells_df["UMAP 2"] = umap_np[:, 1]

cells_df["Cell ID"] = adata.obs.index

cells_df.rename(columns={"sub_bucket": "Cell type", "final_analysis": "Cell state", "MM_scaled": "Inflammation score", 
                         "sample_id": "Sample ID"}, inplace=True)

cells_df = cells_df.iloc[:, [0,16,17,18,19,20,21,22,23,24,25,26,27]]


# genes dataframe
gene_table = adata.var
gene_table["gene_ids"]=gene_table.index

## MDV charts set up section

The different types of charts are defined in the mdvtools/charts/ folder. These are:
1. Abundance box plot
2. Box plot
3. Density scatter plot
4. Dot plot
5. Heatmap plot
6. Histogram plot
7. Multi-line plot
8. Ring chart
9. Row chart
10. Sankey plot
11. Scatter plot
12. Scatter plot 3D
13. Stacked row plot
14. Table
15. Violin plot
16. Wordcloud

In [19]:
# Cells charts

# creating a row chart based on the Row Chart implementation to show the leiden clustering
# row_chart = RowChart(
#     title="leiden",
#     param="leiden",
#     position=[380, 270],
#     size=[260, 260]
# )

# # configuring the row chart
# row_chart.set_axis_properties("x", {"textSize": 13, "label": "", "tickfont": 10})


#creating a dot plot based on the DotPlot implementation to show the gene expression of selected gene markers
# dot_plot = DotPlot(
#     title="leiden",
#     params=["Disease", "HSPE1", "HSPD1", "ICOS", "ID1", "ID2", "ID3", "ID4"],
#     size=[400, 250],
#     position=[10, 10]
# )

# # configuring the dot plot
# dot_plot.set_axis_properties("x", {"label": "", "textSize": 13, "tickfont": 10})
# dot_plot.set_axis_properties("y", {"label": "", "textSize": 13, "tickfont": 10})
# dot_plot.set_axis_properties("ry", {"label": "", "textSize": 13, "tickfont": 10})
# dot_plot.set_color_scale(log_scale=False)
# dot_plot.set_color_legend(True, [40, 10])
# dot_plot.set_fraction_legend(True, [0, 0])


# # creating a histogram plot based on the HistogramPlot implementation to show the distribution of the number of genes per counts
# histogram_plot = HistogramPlot(
#     title="n_genes_by_counts",
#     param="n_genes_by_counts",
#     bin_number=17,
#     display_min=0,
#     display_max=2500,
#     size=[360, 250],
#     position=[10, 280]
# )

# # configuring the histogram
# histogram_plot.set_x_axis(size=30, label="n_genes_by_counts", textsize=13, tickfont=10)
# histogram_plot.set_y_axis(size=45, label="frequency", textsize=13, tickfont=10, rotate_labels=False)

# creating a scatter plot based on the ScatterPlot3D implementation to show the 3 PCA clustering components
scatter_plot = ScatterPlot(
    title="B cells",
    params=["UMAP 1", "UMAP 2"],
    size=[250, 250],
    position=[420, 10],
    default_color="#377eb8",
    brush="default",
    on_filter="hide",
    radius=5,
    opacity=0.8,
)

# configuring the scatter plot
#scatter_plot.set_color_by("leiden")


In [20]:
# Genes charts

# creating a scatter plot using the ScatterPlot implementation to show the 2 PCA clustering components
# scatter2D_plot = ScatterPlot(
#     title="PCs_1 x PCs_2",
#     params=["PCs_1", "PCs_2"],
#     size=[250, 250],
#     position=[270, 10]
# )

# # configuring the scatter plot
# scatter2D_plot.set_default_color("#377eb8")
# scatter2D_plot.set_brush("poly")
# scatter2D_plot.set_opacity(0.8)
# scatter2D_plot.set_radius(2)
# scatter2D_plot.set_color_legend(display=True, position=[20, 1])
# scatter2D_plot.set_axis_properties("x", {"label": "PCs_1", "textSize": 13, "tickfont": 10})
# scatter2D_plot.set_axis_properties("y", {"label": "PCs_2", "textSize": 13, "tickfont": 10})
# scatter2D_plot.set_color_by("dispersions")

# # creating a table using the TablePlot implementation to show all the data associated with the genes dataframe
# table_plot = TablePlot(
#     title="Data table",
#     params= gene_table.columns.values.tolist() ,
#     size=[250, 500],
#     position=[10, 10]
# )

# creating a histogram using the HistogramPlot implementation to show a histogram of the number of genes per cell distribution
# histogram_plot_2 = HistogramPlot(
#     title="n_cells",
#     param="n_cells",
#     bin_number=17,
#     display_min=0,
#     display_max=2500,
#     size=[240, 240],
#     position=[270, 280]
# )

# # configuring the histogram
# histogram_plot_2.set_x_axis(size=30, label="n_cells", textsize=13, tickfont=10)
# histogram_plot_2.set_y_axis(size=45, label="frequency", textsize=13, tickfont=10, rotate_labels=False)

## Set up and serve the MDV project

In [21]:
# setting up and serving the MDV project
base = os.path.expanduser('~/mdv')
project_path = os.path.join(base, 'taurus') # defining the location where the project metadata will be stored
p = MDVProject(os.path.expanduser(project_path), delete_existing=True)

# adding the two data sources to the project
p.add_datasource("cells", cells_df)
p.add_datasource("genes", gene_table)

# creating the link between the two datasets so that selecting a subset of genes to add the expression in cells is enabled
p.add_rows_as_columns_link("cells","genes","gene_ids","Gene Expression")
p.add_rows_as_columns_subgroup("cells","genes","Gene scores",adata.layers["counts"].toarray()) #add the gene expression 

# converting the chart implementation outputs to JSON and setting up the project view
list_charts_cells = []
list_charts_genes = []

# cells panel
# list_charts_cells.append(row_chart.plot_data)
#list_charts_cells.append(dot_plot.plot_data)
# list_charts_cells.append(histogram_plot.plot_data)
list_charts_cells.append(scatter_plot.plot_data)

# genes panel
# list_charts_genes.append(table_plot.plot_data)
# list_charts_genes.append(scatter2D_plot.plot_data)
#list_charts_genes.append(histogram_plot_2.plot_data)

# setting the config combining the two panels
view_config = {'initialCharts': {"cells": list_charts_cells, "genes":list_charts_genes}}

# creating the link between the two datasets so that selecting a subset of genes to add the expression in cells is enabled
#p.add_rows_as_columns_link("cells","genes","gene_ids","Gene Expression")
#p.add_rows_as_columns_subgroup("cells","genes","Gene scores",adata.layers["counts"].toarray()) #add the gene expression 

# adding the view to the project configuration
p.set_view("default", view_config)
p.set_editable(True)

In [22]:
# serving the project
p.serve()


created Flask <Flask 'mdvtools.server'>
 * Serving Flask app 'mdvtools.server'
 * Debug mode: on


 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5050
 * Running on http://129.67.46.115:5050
Press CTRL+C to quit
127.0.0.1 - - [29/Oct/2024 10:37:13] "GET / HTTP/1.1" 200 -


recieved request to project_index


127.0.0.1 - - [29/Oct/2024 10:37:13] "GET /static/assets/mdv.css HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 10:37:13] "GET /static/js/mdv.js HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 10:37:13] "GET /static/assets/createSvgIcon-6gFOQ5xZ.js HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 10:37:13] "GET /static/assets/csvWorker-e8cMUEqV.js HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 10:37:13] "GET /datasources.json HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 10:37:13] "GET /state.json HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 10:37:13] "GET /views.json HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 10:37:13] "POST /get_view HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 10:37:13] "GET /static/assets/filteredIndexWorker-CEl1713S.js HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 10:37:13] "GET /static/assets/filteredIndexWorker-CEl1713S.js HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024 10:37:13] "POST /get_data HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 10:37:13] "GET /static/img/fa-solid-900.woff2 HTTP/1.1" 20

recieved request to project_index


127.0.0.1 - - [29/Oct/2024 10:48:09] "GET /static/assets/filteredIndexWorker-CEl1713S.js HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024 10:48:09] "POST /get_data HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 10:48:09] "POST /get_data HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 10:48:09] "POST /get_data HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 10:48:09] "POST /get_data HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 10:48:09] "GET /static/assets/binWorker-DIC2QqPk.js HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024 10:48:09] "GET /static/assets/catWorker-D5lhlpxE.js HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024 10:48:09] "GET /static/assets/binWorker-DIC2QqPk.js HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024 10:48:09] "GET /static/assets/catColWorker-BRK20MZB.js HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024 10:48:36] "GET /static/assets/binWorker-DIC2QqPk.js HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024 10:48:36] "GET /static/assets/boxPlotWorker-BvOiGWqU.js HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024 10:48:50] "GET 

recieved request to project_index


127.0.0.1 - - [29/Oct/2024 11:18:18] "GET /state.json HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 11:18:18] "GET /views.json HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 11:18:18] "POST /get_view HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 11:18:18] "GET /static/assets/filteredIndexWorker-CEl1713S.js HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024 11:18:18] "GET /static/img/fa-solid-900.woff2 HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024 11:18:18] "GET /static/assets/filteredIndexWorker-CEl1713S.js HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024 11:18:18] "POST /get_data HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 11:18:18] "POST /get_data HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 11:18:18] "POST /get_data HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 11:18:18] "POST /get_data HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 11:18:18] "GET /static/assets/binWorker-DIC2QqPk.js HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024 11:18:18] "GET /static/assets/binWorker-DIC2QqPk.js HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024

recieved request to project_index


GET /static/assets/filteredIndexWorker-CEl1713S.js HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024 12:53:45] "GET /static/img/fa-solid-900.woff2 HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 12:53:45] "POST /get_data HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 12:53:45] "POST /get_data HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 12:53:45] "POST /get_data HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 12:53:45] "POST /get_data HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 12:53:45] "GET /static/assets/binWorker-DIC2QqPk.js HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024 12:53:45] "GET /static/assets/binWorker-DIC2QqPk.js HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024 12:53:45] "POST /get_data HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 12:53:45] "POST /get_data HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 12:53:45] "GET /static/img/roboto-latin-400-normal.woff2 HTTP/1.1" 200 -
127.0.0.1 - - [29/Oct/2024 12:53:45] "GET /static/assets/binWorker-DIC2QqPk.js HTTP/1.1" 304 -
127.0.0.1 - - [29/Oct/2024 12:53:45] "GET /st