--------

## <span style="color:cornflowerblue"> 1. Loading and writing con muestras de pacientes con Hepatoblastoma (HB)</span>
El presente análisis de secuenciación de una sola célula, se basa en los estudios de Wu PV, et. al. (Sep 12, 2024), *Single cell RNA sequencing of primary human hepatoblastoma tumoroids* y de 	Münter D., et al. (Feb 07, 2025), *Multiomic analysis uncovers a continuous spectrum of differentiation and Wnt-MDK-driven immune evasion in hepatoblastoma (snRNA-seq)* obtenidos a través del portal [Gene Expression Omnibus](https://www.ncbi.nlm.nih.gov/geo/) (GEO) donde: 

1. **Primer estudio:** Wu PV, et. al. (Sep 12, 2024) [Single cell RNA sequencing of primary human hepatoblastoma tumoroids](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE233923), se descargaron los archivos de matriz 10X con un subconjunto de muestras: 10 muestras de tumor de hepatoblastoma con el tipo de archivo *tar.gz* (494.2 Mb).

2. **Segundo estudio:** Sonya A MacParland et al. (June 19, 2024) [Multiomic analysis uncovers a continuous spectrum of differentiation and Wnt-MDK-driven immune evasion in hepatoblastoma (snRNA-seq)](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE283205), se descargaron los archivos en formato *tar.gz* que contiene 5 archivos (83.6 Mb) correspondientes a muestras tumorales.

Después de la descarga de los archivos y antes de preprocesar los datos, se limpian los datos para trabajar solo con el número de genes (n_genes), identificadores de genes y el número de células (n_cells).


### <span style="color:navy">1.0 Configuración del entorno y datos</span>

Cargamos las librerias necesarias en Python. La API de Scanpy proporciona acceso a un conjunto completo de herramientas, que utiliza bibliotecas como NumPy y SciPy, mientras que Pandas facilita la importación de datos y gestiona las estructuras de datos, particularmente el módulo AnnData. Para las visualizaciones, se deben importar Matplotlib y Seaborn.

In [1]:
#Para descargas
import gdown, os, gzip, shutil

# Gestión básica de datos y representación gráfica
import pandas as pd
import numpy as np
from scipy import sparse
import matplotlib.pyplot as plt
import os

# Scanpy fundamentals
import anndata as ad
import scanpy as sc
import seaborn as sb

# sc.settings.set_figure_params(dpi=200, frameon=False)
sc.set_figure_params(figsize=(6, 6))
import scvi

  from .autonotebook import tqdm as notebook_tqdm
  doc = func(self, args[0].__doc__, *args[1:], **kwargs)


In [2]:
%cd /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB

/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [3]:
import gzip
import shutil
import os

# Ruta base donde están los archivos
base_path = "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439126_HB1"

# Archivos de entrada y salida
files = [
    "GSM7439126_HB1_barcodes.tsv.gz",
    "GSM7439126_HB1_features.tsv.gz",
    "GSM7439126_HB1_matrix.mtx.gz",
]

# Descomprimir cada archivo
for file in files:
    input_path = os.path.join(base_path, file)
    output_path = os.path.join(base_path, file.replace('.gz', ''))
    with gzip.open(input_path, 'rb') as f_in:
        with open(output_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
            print(f"Descomprimido: {input_path}")

Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439126_HB1/GSM7439126_HB1_barcodes.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439126_HB1/GSM7439126_HB1_features.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439126_HB1/GSM7439126_HB1_matrix.mtx.gz


In [6]:
adata_HB1 = sc.read_10x_mtx(
    "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439126_HB1",
    var_names="gene_symbols",
    cache=False,
    prefix="GSM7439126_HB1_"
)

In [7]:
adata_HB1.var_names_make_unique()

In [8]:
sc.pp.filter_genes(adata_HB1, min_cells =10)
sc.pp.filter_cells(adata_HB1, min_genes =200)

In [9]:
adata_HB1.var = adata_HB1.var.drop(columns=['feature_types'])
adata_HB1.var_names_make_unique()
adata_HB1

AnnData object with n_obs × n_vars = 4340 × 15995
    obs: 'n_genes'
    var: 'gene_ids', 'n_cells'

------------

In [10]:
import gzip
import shutil
import os

# Ruta base donde están los archivos
base_path = "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439127_HB4"

# Archivos de entrada y salida
files = [
    "GSM7439127_HB4_barcodes.tsv.gz",
    "GSM7439127_HB4_features.tsv.gz",
    "GSM7439127_HB4_matrix.mtx.gz",
]

# Descomprimir cada archivo
for file in files:
    input_path = os.path.join(base_path, file)
    output_path = os.path.join(base_path, file.replace('.gz', ''))
    with gzip.open(input_path, 'rb') as f_in:
        with open(output_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
            print(f"Descomprimido: {input_path}")

Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439127_HB4/GSM7439127_HB4_barcodes.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439127_HB4/GSM7439127_HB4_features.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439127_HB4/GSM7439127_HB4_matrix.mtx.gz


In [11]:
adata_HB4 = sc.read_10x_mtx(
    "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439127_HB4",
    var_names="gene_symbols",
    cache=False,
    prefix="GSM7439127_HB4_"
)

In [12]:
adata_HB4.var_names_make_unique()

In [13]:
sc.pp.filter_genes(adata_HB4, min_cells =10)
sc.pp.filter_cells(adata_HB4, min_genes =200)

In [14]:
adata_HB4.var = adata_HB4.var.drop(columns=['feature_types'])
adata_HB4.var_names_make_unique()
adata_HB4

AnnData object with n_obs × n_vars = 7856 × 17356
    obs: 'n_genes'
    var: 'gene_ids', 'n_cells'

------------

In [15]:
import gzip
import shutil
import os

# Ruta base donde están los archivos
base_path = "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439128_HB6"

# Archivos de entrada y salida
files = [
    "GSM7439128_HB6_barcodes.tsv.gz",
    "GSM7439128_HB6_features.tsv.gz",
    "GSM7439128_HB6_matrix.mtx.gz",
]

# Descomprimir cada archivo
for file in files:
    input_path = os.path.join(base_path, file)
    output_path = os.path.join(base_path, file.replace('.gz', ''))
    with gzip.open(input_path, 'rb') as f_in:
        with open(output_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
            print(f"Descomprimido: {input_path}")

Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439128_HB6/GSM7439128_HB6_barcodes.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439128_HB6/GSM7439128_HB6_features.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439128_HB6/GSM7439128_HB6_matrix.mtx.gz


In [16]:
adata_HB6 = sc.read_10x_mtx(
    "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439128_HB6",
    var_names="gene_symbols",
    cache=False,
    prefix="GSM7439128_HB6_"
)

In [17]:
adata_HB6.var_names_make_unique()

In [18]:
sc.pp.filter_genes(adata_HB6, min_cells =10)
sc.pp.filter_cells(adata_HB6, min_genes =200)

In [19]:
adata_HB6.var = adata_HB6.var.drop(columns=['feature_types'])
adata_HB6.var_names_make_unique()
adata_HB6

AnnData object with n_obs × n_vars = 4490 × 18295
    obs: 'n_genes'
    var: 'gene_ids', 'n_cells'

-------------

In [20]:
import gzip
import shutil
import os

# Ruta base donde están los archivos
base_path = "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439129_HB7"

# Archivos de entrada y salida
files = [
    "GSM7439129_HB7_barcodes.tsv.gz",
    "GSM7439129_HB7_features.tsv.gz",
    "GSM7439129_HB7_matrix.mtx.gz",
]

# Descomprimir cada archivo
for file in files:
    input_path = os.path.join(base_path, file)
    output_path = os.path.join(base_path, file.replace('.gz', ''))
    with gzip.open(input_path, 'rb') as f_in:
        with open(output_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
            print(f"Descomprimido: {input_path}")

Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439129_HB7/GSM7439129_HB7_barcodes.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439129_HB7/GSM7439129_HB7_features.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439129_HB7/GSM7439129_HB7_matrix.mtx.gz


In [22]:
adata_HB7 = sc.read_10x_mtx(
    "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439129_HB7",
    var_names="gene_symbols",
    cache=False,
    prefix="GSM7439129_HB7_"
)

In [23]:
adata_HB7.var_names_make_unique()

In [24]:
sc.pp.filter_genes(adata_HB7, min_cells =10)
sc.pp.filter_cells(adata_HB7, min_genes =200)

In [25]:
adata_HB7.var = adata_HB7.var.drop(columns=['feature_types'])
adata_HB7.var_names_make_unique()
adata_HB7

AnnData object with n_obs × n_vars = 8378 × 18151
    obs: 'n_genes'
    var: 'gene_ids', 'n_cells'

------------

In [26]:
import gzip
import shutil
import os

# Ruta base donde están los archivos
base_path = "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439130_HB12"

# Archivos de entrada y salida
files = [
    "GSM7439130_HB12_barcodes.tsv.gz",
    "GSM7439130_HB12_features.tsv.gz",
    "GSM7439130_HB12_matrix.mtx.gz",
]

# Descomprimir cada archivo
for file in files:
    input_path = os.path.join(base_path, file)
    output_path = os.path.join(base_path, file.replace('.gz', ''))
    with gzip.open(input_path, 'rb') as f_in:
        with open(output_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
            print(f"Descomprimido: {input_path}")

Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439130_HB12/GSM7439130_HB12_barcodes.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439130_HB12/GSM7439130_HB12_features.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439130_HB12/GSM7439130_HB12_matrix.mtx.gz


In [27]:
adata_HB12 = sc.read_10x_mtx(
    "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439130_HB12",
    var_names="gene_symbols",
    cache=False,
    prefix="GSM7439130_HB12_"
)

In [28]:
adata_HB12.var_names_make_unique()

In [29]:
sc.pp.filter_genes(adata_HB12, min_cells =10)
sc.pp.filter_cells(adata_HB12, min_genes =200)

In [30]:
adata_HB12.var = adata_HB12.var.drop(columns=['feature_types'])
adata_HB12.var_names_make_unique()
adata_HB12

AnnData object with n_obs × n_vars = 2318 × 15273
    obs: 'n_genes'
    var: 'gene_ids', 'n_cells'

--------------

In [31]:
import gzip
import shutil
import os

# Ruta base donde están los archivos
base_path = "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439131_HB13"

# Archivos de entrada y salida
files = [
    "GSM7439131_HB13_barcodes.tsv.gz",
    "GSM7439131_HB13_features.tsv.gz",
    "GSM7439131_HB13_matrix.mtx.gz",
]

# Descomprimir cada archivo
for file in files:
    input_path = os.path.join(base_path, file)
    output_path = os.path.join(base_path, file.replace('.gz', ''))
    with gzip.open(input_path, 'rb') as f_in:
        with open(output_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
            print(f"Descomprimido: {input_path}")

Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439131_HB13/GSM7439131_HB13_barcodes.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439131_HB13/GSM7439131_HB13_features.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439131_HB13/GSM7439131_HB13_matrix.mtx.gz


In [32]:
adata_HB13 = sc.read_10x_mtx(
    "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439131_HB13",
    var_names="gene_symbols",
    cache=False,
    prefix="GSM7439131_HB13_"
)

In [33]:
adata_HB13.var_names_make_unique()

In [34]:
sc.pp.filter_genes(adata_HB13, min_cells =10)
sc.pp.filter_cells(adata_HB13, min_genes =200)

In [35]:
adata_HB13.var = adata_HB13.var.drop(columns=['feature_types'])
adata_HB13.var_names_make_unique()
adata_HB13

AnnData object with n_obs × n_vars = 3394 × 16009
    obs: 'n_genes'
    var: 'gene_ids', 'n_cells'

----------------

In [36]:
import gzip
import shutil
import os

# Ruta base donde están los archivos
base_path = "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439132_HB14"

# Archivos de entrada y salida
files = [
    "GSM7439132_HB14_barcodes.tsv.gz",
    "GSM7439132_HB14_features.tsv.gz",
    "GSM7439132_HB14_matrix.mtx.gz",
]

# Descomprimir cada archivo
for file in files:
    input_path = os.path.join(base_path, file)
    output_path = os.path.join(base_path, file.replace('.gz', ''))
    with gzip.open(input_path, 'rb') as f_in:
        with open(output_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
            print(f"Descomprimido: {input_path}")

Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439132_HB14/GSM7439132_HB14_barcodes.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439132_HB14/GSM7439132_HB14_features.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439132_HB14/GSM7439132_HB14_matrix.mtx.gz


In [37]:
adata_HB14 = sc.read_10x_mtx(
    "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439132_HB14",
    var_names="gene_symbols",
    cache=False,
    prefix="GSM7439132_HB14_"
)

In [38]:
adata_HB14.var_names_make_unique()

In [39]:
sc.pp.filter_genes(adata_HB14, min_cells =10)
sc.pp.filter_cells(adata_HB14, min_genes =200)

In [40]:
adata_HB14.var = adata_HB14.var.drop(columns=['feature_types'])
adata_HB14.var_names_make_unique()
adata_HB14

AnnData object with n_obs × n_vars = 4870 × 13891
    obs: 'n_genes'
    var: 'gene_ids', 'n_cells'

------------

In [41]:
import gzip
import shutil
import os

# Ruta base donde están los archivos
base_path = "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439133_HB15"

# Archivos de entrada y salida
files = [
    "GSM7439133_HB15_barcodes.tsv.gz",
    "GSM7439133_HB15_features.tsv.gz",
    "GSM7439133_HB15_matrix.mtx.gz",
]

# Descomprimir cada archivo
for file in files:
    input_path = os.path.join(base_path, file)
    output_path = os.path.join(base_path, file.replace('.gz', ''))
    with gzip.open(input_path, 'rb') as f_in:
        with open(output_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
            print(f"Descomprimido: {input_path}")

Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439133_HB15/GSM7439133_HB15_barcodes.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439133_HB15/GSM7439133_HB15_features.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439133_HB15/GSM7439133_HB15_matrix.mtx.gz


In [42]:
adata_HB15 = sc.read_10x_mtx(
    "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439133_HB15",
    var_names="gene_symbols",
    cache=False,
    prefix="GSM7439133_HB15_"
)

In [43]:
adata_HB15.var_names_make_unique()

In [44]:
sc.pp.filter_genes(adata_HB15, min_cells =10)
sc.pp.filter_cells(adata_HB15, min_genes =200)

In [45]:
adata_HB15.var = adata_HB15.var.drop(columns=['feature_types'])
adata_HB15.var_names_make_unique()
adata_HB15

AnnData object with n_obs × n_vars = 2704 × 16179
    obs: 'n_genes'
    var: 'gene_ids', 'n_cells'

------------------

In [46]:
import gzip
import shutil
import os

# Ruta base donde están los archivos
base_path = "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439134_HB16"

# Archivos de entrada y salida
files = [
    "GSM7439134_HB16_barcodes.tsv.gz",
    "GSM7439134_HB16_features.tsv.gz",
    "GSM7439134_HB16_matrix.mtx.gz",
]

# Descomprimir cada archivo
for file in files:
    input_path = os.path.join(base_path, file)
    output_path = os.path.join(base_path, file.replace('.gz', ''))
    with gzip.open(input_path, 'rb') as f_in:
        with open(output_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
            print(f"Descomprimido: {input_path}")

Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439134_HB16/GSM7439134_HB16_barcodes.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439134_HB16/GSM7439134_HB16_features.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439134_HB16/GSM7439134_HB16_matrix.mtx.gz


In [47]:
adata_HB16 = sc.read_10x_mtx(
    "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439134_HB16",
    var_names="gene_symbols",
    cache=False,
    prefix="GSM7439134_HB16_"
)

In [48]:
adata_HB16.var_names_make_unique()

In [49]:
sc.pp.filter_genes(adata_HB16, min_cells =10)
sc.pp.filter_cells(adata_HB16, min_genes =200)

In [50]:
adata_HB16.var = adata_HB16.var.drop(columns=['feature_types'])
adata_HB16.var_names_make_unique()
adata_HB16

AnnData object with n_obs × n_vars = 2869 × 16014
    obs: 'n_genes'
    var: 'gene_ids', 'n_cells'

-----------------

In [51]:
import gzip
import shutil
import os

# Ruta base donde están los archivos
base_path = "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439135_HB17"

# Archivos de entrada y salida
files = [
    "GSM7439135_HB17_barcodes.tsv.gz",
    "GSM7439135_HB17_features.tsv.gz",
    "GSM7439135_HB17_matrix.mtx.gz",
]

# Descomprimir cada archivo
for file in files:
    input_path = os.path.join(base_path, file)
    output_path = os.path.join(base_path, file.replace('.gz', ''))
    with gzip.open(input_path, 'rb') as f_in:
        with open(output_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
            print(f"Descomprimido: {input_path}")

Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439135_HB17/GSM7439135_HB17_barcodes.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439135_HB17/GSM7439135_HB17_features.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439135_HB17/GSM7439135_HB17_matrix.mtx.gz


In [52]:
adata_HB17 = sc.read_10x_mtx(
    "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM7439135_HB17",
    var_names="gene_symbols",
    cache=False,
    prefix="GSM7439135_HB17_"
)

In [53]:
adata_HB17.var_names_make_unique()

In [54]:
sc.pp.filter_genes(adata_HB17, min_cells =10)
sc.pp.filter_cells(adata_HB17, min_genes =200)

In [55]:
adata_HB17.var = adata_HB17.var.drop(columns=['feature_types'])
adata_HB17.var_names_make_unique()
adata_HB17

AnnData object with n_obs × n_vars = 1420 × 16081
    obs: 'n_genes'
    var: 'gene_ids', 'n_cells'

---------------

In [56]:
import gzip
import shutil
import os

# Ruta base donde están los archivos
base_path = "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657808_H678"

# Archivos de entrada y salida
files = [
    "GSM8657808_H678_barcodes.tsv.gz",
    "GSM8657808_H678_features.tsv.gz",
    "GSM8657808_H678_matrix.mtx.gz",
]

# Descomprimir cada archivo
for file in files:
    input_path = os.path.join(base_path, file)
    output_path = os.path.join(base_path, file.replace('.gz', ''))
    with gzip.open(input_path, 'rb') as f_in:
        with open(output_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
            print(f"Descomprimido: {input_path}")

Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657808_H678/GSM8657808_H678_barcodes.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657808_H678/GSM8657808_H678_features.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657808_H678/GSM8657808_H678_matrix.mtx.gz


In [57]:
adata_HB78 = sc.read_10x_mtx(
    "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657808_H678",
    var_names="gene_symbols",
    cache=False,
    prefix="GSM8657808_H678_"
)

In [58]:
adata_HB78.var_names_make_unique()

In [59]:
sc.pp.filter_genes(adata_HB78, min_cells =10)
sc.pp.filter_cells(adata_HB78, min_genes =200)

In [60]:
adata_HB78.var = adata_HB78.var.drop(columns=['feature_types'])
adata_HB78.var_names_make_unique()
adata_HB78

AnnData object with n_obs × n_vars = 1601 × 12251
    obs: 'n_genes'
    var: 'gene_ids', 'n_cells'

----------------

In [61]:
import gzip
import shutil
import os

# Ruta base donde están los archivos
base_path = "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657809_H679"

# Archivos de entrada y salida
files = [
    "GSM8657809_H679_barcodes.tsv.gz",
    "GSM8657809_H679_features.tsv.gz",
    "GSM8657809_H679_matrix.mtx.gz",
]

# Descomprimir cada archivo
for file in files:
    input_path = os.path.join(base_path, file)
    output_path = os.path.join(base_path, file.replace('.gz', ''))
    with gzip.open(input_path, 'rb') as f_in:
        with open(output_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
            print(f"Descomprimido: {input_path}")

Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657809_H679/GSM8657809_H679_barcodes.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657809_H679/GSM8657809_H679_features.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657809_H679/GSM8657809_H679_matrix.mtx.gz


In [62]:
adata_HB79 = sc.read_10x_mtx(
    "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657809_H679",
    var_names="gene_symbols",
    cache=False,
    prefix="GSM8657809_H679_"
)

In [63]:
adata_HB79.var_names_make_unique()

In [64]:
sc.pp.filter_genes(adata_HB79, min_cells =10)
sc.pp.filter_cells(adata_HB79, min_genes =200)

In [65]:
adata_HB79.var = adata_HB79.var.drop(columns=['feature_types'])
adata_HB79.var_names_make_unique()
adata_HB79

AnnData object with n_obs × n_vars = 3568 × 13658
    obs: 'n_genes'
    var: 'gene_ids', 'n_cells'

--------------

In [66]:
import gzip
import shutil
import os

# Ruta base donde están los archivos
base_path = "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657810_H691"

# Archivos de entrada y salida
files = [
    "GSM8657810_H691_barcodes.tsv.gz",
    "GSM8657810_H691_features.tsv.gz",
    "GSM8657810_H691_matrix.mtx.gz",
]

# Descomprimir cada archivo
for file in files:
    input_path = os.path.join(base_path, file)
    output_path = os.path.join(base_path, file.replace('.gz', ''))
    with gzip.open(input_path, 'rb') as f_in:
        with open(output_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
            print(f"Descomprimido: {input_path}")

Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657810_H691/GSM8657810_H691_barcodes.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657810_H691/GSM8657810_H691_features.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657810_H691/GSM8657810_H691_matrix.mtx.gz


In [67]:
adata_HB91 = sc.read_10x_mtx(
    "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657810_H691",
    var_names="gene_symbols",
    cache=False,
    prefix="GSM8657810_H691_"
)

In [68]:
adata_HB91.var_names_make_unique()

In [69]:
sc.pp.filter_genes(adata_HB91, min_cells =10)
sc.pp.filter_cells(adata_HB91, min_genes =200)

In [70]:
adata_HB91.var = adata_HB91.var.drop(columns=['feature_types'])
adata_HB91.var_names_make_unique()
adata_HB91

AnnData object with n_obs × n_vars = 7291 × 15158
    obs: 'n_genes'
    var: 'gene_ids', 'n_cells'

-----------

In [71]:
import gzip
import shutil
import os

# Ruta base donde están los archivos
base_path = "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657811_H692"

# Archivos de entrada y salida
files = [
    "GSM8657811_H692_barcodes.tsv.gz",
    "GSM8657811_H692_features.tsv.gz",
    "GSM8657811_H692_matrix.mtx.gz",
]

# Descomprimir cada archivo
for file in files:
    input_path = os.path.join(base_path, file)
    output_path = os.path.join(base_path, file.replace('.gz', ''))
    with gzip.open(input_path, 'rb') as f_in:
        with open(output_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
            print(f"Descomprimido: {input_path}")

Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657811_H692/GSM8657811_H692_barcodes.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657811_H692/GSM8657811_H692_features.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657811_H692/GSM8657811_H692_matrix.mtx.gz


In [72]:
adata_HB92 = sc.read_10x_mtx(
    "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657811_H692",
    var_names="gene_symbols",
    cache=False,
    prefix="GSM8657811_H692_"
)

In [73]:
adata_HB92.var_names_make_unique()

In [74]:
sc.pp.filter_genes(adata_HB92, min_cells =10)
sc.pp.filter_cells(adata_HB92, min_genes =200)

In [75]:
adata_HB92.var = adata_HB92.var.drop(columns=['feature_types'])
adata_HB92.var_names_make_unique()
adata_HB92

AnnData object with n_obs × n_vars = 621 × 9982
    obs: 'n_genes'
    var: 'gene_ids', 'n_cells'

---------------

In [76]:
import gzip
import shutil
import os

# Ruta base donde están los archivos
base_path = "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657812_H693"

# Archivos de entrada y salida
files = [
    "GSM8657812_H693_barcodes.tsv.gz",
    "GSM8657812_H693_features.tsv.gz",
    "GSM8657812_H693_matrix.mtx.gz",
]

# Descomprimir cada archivo
for file in files:
    input_path = os.path.join(base_path, file)
    output_path = os.path.join(base_path, file.replace('.gz', ''))
    with gzip.open(input_path, 'rb') as f_in:
        with open(output_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
            print(f"Descomprimido: {input_path}")

Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657812_H693/GSM8657812_H693_barcodes.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657812_H693/GSM8657812_H693_features.tsv.gz
Descomprimido: /home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657812_H693/GSM8657812_H693_matrix.mtx.gz


In [77]:
adata_HB93 = sc.read_10x_mtx(
    "/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB/GSM8657812_H693",
    var_names="gene_symbols",
    cache=False,
    prefix="GSM8657812_H693_"
)

In [78]:
adata_HB93.var_names_make_unique()

In [79]:
sc.pp.filter_genes(adata_HB93, min_cells =10)
sc.pp.filter_cells(adata_HB93, min_genes =200)

In [80]:
adata_HB93.var = adata_HB93.var.drop(columns=['feature_types'])
adata_HB93.var_names_make_unique()
adata_HB93

AnnData object with n_obs × n_vars = 3401 × 14030
    obs: 'n_genes'
    var: 'gene_ids', 'n_cells'

-------------
## <span style="color:navy">1.3 Creación de archivos h5ad</span>

In [81]:
pwd

'/home/mcgonzalez/Servicio_Social/Data/CellTypist_HB'

In [82]:
%cd /home/mcgonzalez/Servicio_Social/Data/HB_h5ad

/home/mcgonzalez/Servicio_Social/Data/HB_h5ad


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [83]:
adata_HB1.write_h5ad('tumor1')

In [85]:
adata_HB4.write_h5ad('tumor2')

In [86]:
adata_HB6.write_h5ad('tumor3')

In [87]:
adata_HB7.write_h5ad('tumor4')

In [88]:
adata_HB12.write_h5ad('tumor5')

In [89]:
adata_HB13.write_h5ad('tumor6')

In [90]:
adata_HB14.write_h5ad('tumor7')

In [91]:
adata_HB15.write_h5ad('tumor8')

In [92]:
adata_HB16.write_h5ad('tumor9')

In [93]:
adata_HB17.write_h5ad('tumor10')

In [94]:
adata_HB78.write_h5ad('tumor11')

In [95]:
adata_HB79.write_h5ad('tumor12')

In [96]:
adata_HB91.write_h5ad('tumor13')

In [97]:
adata_HB92.write_h5ad('tumor14')

In [98]:
adata_HB93.write_h5ad('tumor15')