# scMultiomics Analysis-Data

## A longitudinal single-cell atlas of treatment response in pediatric AML

### 1. Data Download and Data setup

The data analysed here is from the Paper "A longitudinal single-cell atlas of treatment response in pediatric AML" (PMID: 37977148). Data can be downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE235063 using the https link.

In [28]:
# !pip install scanpy
# !pip install doubletdetection
# !pip install notebook

In [3]:
import os
import scanpy as sc
import seaborn as sns
import matplotlib.pyplot as plt
import gzip

In [4]:
# Makes a new directory in your working folder
!mkdir data

In [7]:
# Moves the Raw_data.tar file to the newly created "data" folder
!mv /Users/thorsten/code/ThorstenCodes/Bioinformatics_TK/Projects/Multiomics/GSE235063_RAW.tar ./data

In [16]:
# changes directory to the data folder, Extracts the tar file and then changes directory back to the working folder
%cd /data
!tar -xf ./data/GSE235063_RAW.tar
%cd ..

tar: Error opening archive: Failed to open './data/GSE235063_RAW.tar'


In [30]:
os.listdir('data')

['GSM7494282_AML17_DX_raw_features.tsv.gz',
 'GSM7494271_AML7_DX_processed_matrix.mtx.gz',
 'GSM7494316_AML25_DX_processed_metadata.tsv.gz',
 'GSM7494302_AML22_REM_processed_metadata.tsv.gz',
 'GSM7494266_AML15_DX_processed_metadata.tsv.gz',
 'GSM7494291_AML9_REM_raw_features.tsv.gz',
 'GSM7494308_AML24_REM_processed_metadata.tsv.gz',
 'GSM7494318_AML25_REM_raw_matrix.mtx.gz',
 'GSM7494269_AML3_DX_processed_matrix.mtx.gz',
 'GSM7494321_AML26_REM_processed_metadata.tsv.gz',
 'GSM7494310_AML23_REL_processed_barcodes.tsv.gz',
 'GSM7494267_AML15_REL_processed_features.tsv.gz',
 'GSM7494313_AML28_REM_processed_barcodes.tsv.gz',
 'GSM7494260_AML6_DX_processed_matrix.mtx.gz',
 'GSM7494267_AML15_REL_raw_barcodes.tsv.gz',
 'GSM7494264_AML2_REL_raw_barcodes.tsv.gz',
 'GSM7494286_AML27_DX_processed_matrix.mtx.gz',
 'GSM7494259_AML16_REM_processed_metadata.tsv.gz',
 'GSM7494305_AML21_REM_processed_barcodes.tsv.gz',
 'GSM7494309_AML23_DX_processed_matrix.mtx.gz',
 'GSM7494326_AML12_DX_processed_fea

In [37]:
# For MacOS it is gzcat, for Linux or Windows its zcat to read into a ziped file (.gz)
!gzcat data/GSM7494282_AML17_DX_raw_features.tsv.gz | head

ENSG00000243485	MIR1302-2HG
ENSG00000237613	FAM138A
ENSG00000186092	OR4F5
ENSG00000238009	AL627309.1
ENSG00000239945	AL627309.3
ENSG00000239906	AL627309.2
ENSG00000241599	AL627309.4
ENSG00000236601	AL732372.1
ENSG00000284733	OR4F29
ENSG00000235146	AC114498.1
gzcat: error writing to output: Broken pipe
gzcat: data/GSM7494282_AML17_DX_raw_features.tsv.gz: uncompress failed


In [38]:
# Writing the files into a list and save the list in 'files'
files = os.listdir('data/')
files

['GSM7494282_AML17_DX_raw_features.tsv.gz',
 'GSM7494271_AML7_DX_processed_matrix.mtx.gz',
 'GSM7494316_AML25_DX_processed_metadata.tsv.gz',
 'GSM7494302_AML22_REM_processed_metadata.tsv.gz',
 'GSM7494266_AML15_DX_processed_metadata.tsv.gz',
 'GSM7494291_AML9_REM_raw_features.tsv.gz',
 'GSM7494308_AML24_REM_processed_metadata.tsv.gz',
 'GSM7494318_AML25_REM_raw_matrix.mtx.gz',
 'GSM7494269_AML3_DX_processed_matrix.mtx.gz',
 'GSM7494321_AML26_REM_processed_metadata.tsv.gz',
 'GSM7494310_AML23_REL_processed_barcodes.tsv.gz',
 'GSM7494267_AML15_REL_processed_features.tsv.gz',
 'GSM7494313_AML28_REM_processed_barcodes.tsv.gz',
 'GSM7494260_AML6_DX_processed_matrix.mtx.gz',
 'GSM7494267_AML15_REL_raw_barcodes.tsv.gz',
 'GSM7494264_AML2_REL_raw_barcodes.tsv.gz',
 'GSM7494286_AML27_DX_processed_matrix.mtx.gz',
 'GSM7494259_AML16_REM_processed_metadata.tsv.gz',
 'GSM7494305_AML21_REM_processed_barcodes.tsv.gz',
 'GSM7494309_AML23_DX_processed_matrix.mtx.gz',
 'GSM7494326_AML12_DX_processed_fea

In [39]:
# List comprehension to filter for all files containing feature in their file name
feature_files = [x for x in files if 'feature' in x]

In [41]:
# Creating a temporary directory
!mkdir temp

In [44]:
# Open and Loop through each feature file (f_in) and write the content each column into a new file (f_out), 
# remove the '\n' (new line) at the end (line.strip()) and 
# add a third column (\t) with "Gene Expression" written in it and add the newline character (\n) again. 

for ff in feature_files: 
    with gzip.open('data/' + ff, 'rt') as f_in: 
        with gzip.open ('temp/' + ff, 'wt') as f_out:
            for line in f_in:
                f_out.write(line.strip() + "\tGene Expression\n")

In [46]:
!gzcat temp/GSM7494282_AML17_DX_raw_features.tsv.gz | head

ENSG00000243485	MIR1302-2HG	Gene Expression
ENSG00000237613	FAM138A	Gene Expression
ENSG00000186092	OR4F5	Gene Expression
ENSG00000238009	AL627309.1	Gene Expression
ENSG00000239945	AL627309.3	Gene Expression
ENSG00000239906	AL627309.2	Gene Expression
ENSG00000241599	AL627309.4	Gene Expression
ENSG00000236601	AL732372.1	Gene Expression
ENSG00000284733	OR4F29	Gene Expression
ENSG00000235146	AC114498.1	Gene Expression
gzcat: error writing to output: Broken pipe
gzcat: temp/GSM7494282_AML17_DX_raw_features.tsv.gz: uncompress failed


In [47]:
!mv temp/* data/

So now we are set to go 🚀

### 2. Removing ambient RNA using CellBender (Command Line)

**Note:** CellBender (https://github.com/broadinstitute/CellBender) is a command-line tool and can be prone to dependency conflicts. It requires Python 3.7, so we will use pyenv to create an isolated environment.

⚙️ Step-by-Step Setup (Using pyenv)

    Run all the steps in your terminal, not inside this notebook.

1️⃣ **Install Python 3.7.17**

`pyenv install 3.7.17`

2️⃣ **Create a New Virtual Environment**

`pyenv virtualenv 3.7.17 CellBender`

3️⃣ **(Optional) Verify Environment Creation**

`pyenv versions`

You should see something like:

  3.7.17/envs/CellBender

4️⃣ **Activate the Environment**

`pyenv activate CellBender`

Once activated, your prompt will change to show:

(CellBender) ➜

✅ You're Now Ready to Install and Run CellBender

`pip install cellbender`  