## Loading the data

The data came from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE171524 --> GSE171524_RAW.tar file

In [None]:
import pandas as pd

file_path = r"GSM5226574_C51ctr_raw_counts.csv.gz"

# Read the compressed CSV file
df = pd.read_csv(file_path, compression='gzip')

# Display the first few rows of the DataFrame
print(df.head())  # This will take 55 sec to load

In [None]:
!pip install scanpy

In [None]:
import scanpy as sc

file_path = r"GSM5226574_C51ctr_raw_counts.csv.gz"

# Read the compressed CSV file into an AnnData object
adata = sc.read_csv(file_path)

# Transpose the data to have cells as rows and genes as columns
adata = adata.T

# Display the contents of the AnnData object
print(adata)  # This will take about 60 sec to load

In [None]:
# Cell Barcodes
adata.obs

In [None]:
# genes
adata.var

In [None]:
adata.X.shape # Number of cells by the number of genes

## Doublet Removal

When you are making single cell libaries, somteimes two or more cells can end up in the same drop.

In [None]:
!pip install scvi-tools

In [None]:
import scvi

In [None]:
adata # shows the dimmensions of the cells by genes
# need to filter down the genes and test if a cell is a doublet or not, so lets narrow the 34546

In [None]:
sc.pp.filter_genes(adata, min_cells= 10) # keep genes that are found in at least 10 cells
adata # now we are at 19896 genes

In [None]:
# Keep the top 2000 variable genes
sc.pp.highly_variable_genes(adata, n_top_genes=2000, subset=True, flavor='seurat_v3')

In [None]:
adata #observe how there are now only 2000 genes

In [None]:
# model setup
scvi.model.SCVI.setup_anndata(adata)
# train vae model
vae = scvi.model.SCVI(adata)
vae.train()

In [None]:
# train the solo model, which predicts doublets
solo = scvi.external.SOLO.from_scvi_model(vae)

In [None]:
solo.train()

In [None]:
solo.predict()
# cell barcode on the left
# score for doublet and singlet (higher scores represent prediction)

In [None]:
# pass soft=False to make a new column that is the predicted label (doublet or singlet)
df = solo.predict()
df['prediction'] = solo.predict(soft=False)

df

In [None]:
# count how many doublets and singlets we have
df.groupby('prediction').count()

In [None]:
# make a new column that is the difference in probability of doublet and singlet columns
df['dif'] = df.doublet - df.singlet
df

In [None]:
import seaborn as sns

In [None]:
sns.displot(df[df.prediction == 'doublet'], x='dif')

The histogram shown presents x-axis as the difference in prediction (doublet probability - singlet probability).
The y-axis is the count of cell barcodes in that bin (difference).
The histogram overall seems to be leveled, however, the distribution begins to drop dramatically than taper off at x = 0.25
This suggests that the cell barcodes at x < 0.25 can still likely be singlets, which we do not want to remove.
So we will make the cutoff at x = 0.25 keeping 1 < x > 0.25; these are the doublets, with a new dataframe reload the raw cell barcode file and label the matching cell barcodes (from the processed dataframe) as doublets and eliminate them... thus keeping only singlets in the new dataframe.

In [None]:
doublets = df[(df.prediction == 'doublet') & (df.dif > 0.24)]
doublets

In [None]:
# Read the compressed CSV file into an AnnData object
adata = sc.read_csv(file_path)

# Transpose the data to have cells as rows and genes as columns
adata = adata.T

In [None]:
adata.obs['doublet'] = adata.obs.index.isin(doublets.index)

In [None]:
adata.obs

In [None]:
# ~ means you keep the "False"
adata = adata[~adata.obs.doublet]

In [None]:
adata # 5424 cell barcodes that are now only singlets
# this makes sense as the doublet list was 675 cell barcodes, the raw data was 6099 cell barcodes. (6099-675=5424) 

Congrats, the doublets are now removed.

## Preprocessing

In [None]:
adata

In [None]:
# gene names
adata.var

In [None]:
# mitochondrial genes are annotated as "MT-" in the gene names, lets filter that.
adata.var['mt'] = adata.var.index.str.startswith('MT-')

In [None]:
adata.var

In [None]:
# we need to now label ribosomal genes
# we need to import a list of known ribosomal genes (import from the Broad Institute) can import into pandas
import pandas as pd

In [None]:
ribo_url = "http://software.broadinstitute.org/gsea/msigdb/download_geneset.jsp?geneSetName=KEGG_RIBOSOME&fileType=txt"

In [None]:
ribo_url = "http://software.broadinstitute.org/gsea/msigdb/download_geneset.jsp?geneSetName=KEGG_RIBOSOME&fileType=txt"
ribo_genes = pd.read_table(ribo_url, skiprows=2, header=None)

In [None]:
ribo_genes

In [None]:
ribo_genes[0].values

In [None]:
# same as we did with mitochondrial genes, we label it to our main dataframe
adata.var['ribo'] = adata.var_names.isin(ribo_genes[0].values)

In [None]:
adata.var

In [None]:
adata.obs

In [None]:
# calculate qc metrics
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt', 'ribo'], percent_top=None, log1p=False, inplace=True)

In [None]:
adata.var

pct_dropout_by_counts: percent dropout by counts
n_cells_by_counts = gene was found in ___ many cells

In [None]:
adata.obs

total_counts_mt = mitochondrial counts
pct_counts_mt = percent of the mitochondrial reads for that given cell
pct_counts_ribo = percent counts of the ribosomal reads for that given cell
n_genes_by_counts = number of genes positive in that cell
total_counts = total number of umis 

In [None]:
adata.var.sort_values('n_cells_by_counts')

In [n_cells_by_counts], you can see some genes like MALAT1 is found in nearly every cell, and other genes like AL445072.1 are found in no cells.
We are gonna filter out genes that were not found in at least 3 cells.

In [None]:
sc.pp.filter_genes(adata, min_cells=3)

In [None]:
adata.var.sort_values('n_cells_by_counts')
# now every gene listed is in at least 3 cells

In [None]:
# lets now pre-process adata.obs
adata.obs # look at total_counts

In [None]:
adata.obs.sort_values('total_counts')

Lowest total_counts is 401, it seems as if the authers who published these cell barcodes already processed the data to filter out cells that 400 count or fewer. Thus the data is already processed and will not need to filter it ourselves.

However, if the data didnt processed or filter out the cell barcodes we would do the following: sc.pp.filter_cells(adata, min_genes=401)

In [None]:
sc.pl.violin(adata, ['n_genes_by_counts', 'total_counts', 'pct_counts_mt', 'pct_counts_ribo'],
             jitter=0.4, multi_panel=True)

We want to use these QC metrics to eliminate outliers. In 'n_genes_by_counts', if a cell has significantly higher counts of genes than the average, there's a chance that its some artifact, likewise with 'total_counts'. If there is a high mitochondrial percentage, then it could be a sequencing artifact or the cell could have been dying.

'pct_counts_mt': there is no percent counts above 10 detected.
'pct_counts_ribo': there are most percent counts near 0, some at most percent count is 6.

We will use numpy to filter genes by the 98 percentile of 'value'.

In [None]:
import numpy as np
upper_lim = np.quantile(adata.obs.n_genes_by_counts.values, 0.98)

In [None]:
upper_lim # 'value' or the y-axis cutoff for filtering
# you can also do "upper_lim = 3000"

In [None]:
adata = adata[adata.obs.n_genes_by_counts < upper_lim]

In [None]:
adata.obs # n_genes_by_counts should all be less than 2216.24

In [None]:
adata = adata[adata.obs.pct_counts_mt < 20] # 20 percent cutoff, will not get rid of any mitochondrial cells but good practice

In [None]:
adata = adata[adata.obs.pct_counts_ribo < 2]

In [None]:
adata

## Normalization

In single-cell RNA sequencing, there is alot of variation between cells, even between the same cell type due to sequencing biases, etc. Normaliztion will help compare cells and compare genes appropriately without the wide variations.

In [None]:
adata.X.sum(axis=1) #total counts of cells, we need to normalize the counts for each cells so that their total counts adds up to the same value.

In [None]:
sc.pp.normalize_total(adata, target_sum=1e4) # normalize every cell to 10,000 UMI

In [None]:
adata.X.sum(axis=1) # each value (cell) got modified based on the starting number of counts for that cell.

In [None]:
sc.pp.log1p(adata) #change to log counts

In [None]:
adata.X.sum(axis=1)

We will now need to freeze the data as it is now before we begin filtering based on variable genes and regressing and scaling out data.

In [None]:
adata.raw = adata

## Clustering

In [None]:
sc.pp.highly_variable_genes(adata, n_top_genes=2000)

In [None]:
adata.var

In [None]:
sc.pl.highly_variable_genes(adata)

In [None]:
# Filter out the non-highly variable genes
adata = adata[:, adata.var.highly_variable]

In [None]:
adata #we now have only the top 2000 genes that are highly variable

In [None]:
# we're going to regress out the differences that arise due to the total number of counts mitochondrial counts and the ribosomal counts
#  this will filter out some data that are due to processing and just sample quality, sequencing artifact, etc.
sc.pp.regress_out(adata, ['total_counts', 'pct_counts_mt', 'pct_counts_ribo'])

In [None]:
#normalize each gene to the unit variance of that gene
sc.pp.scale(adata, max_value=10)

In [None]:
#Run component analysis to further reduce the dimensions of the data
sc.tl.pca(adata, svd_solver='arpack') # this calculates 50 pcs

In [None]:
sc.pl.pca_variance_ratio(adata, log=True, n_pcs=50)

Plot of how much these pcs contribute. 
We want to look for where the data points tapers off to where values dont make a y-value difference anymore as you increase pcs value; this is at x = 30

In [None]:
sc.pp.neighbors(adata, n_pcs= 30)

In [None]:
sc.tl.umap(adata)

In [None]:
sc.pl.umap(adata)

This map are single cells but they haven't been assigned clusters yet.

In [None]:
# leiden algorithm
!pip install leidenalg

In [None]:
sc.tl.leiden(adata, resolution = 0.5)

In [None]:
adata.obs

In [None]:
sc.pl.umap(adata, color = ['leiden'])

We have now plotted our single cells and labeled them by their clusters using the leiden algorithm.

## Integration

Only necessary if you are runing an analysis with multiple samples. The experiment sampled 19 COVID-19 patients and 7 control patients. There are a total of 26 samples in this analysis, thus, integration is necessary.

We are essentially gonna now do the same thing as we did for (1) loading the data, (2) doublet removal, (3) preprocessing, and (4) Clustering.
However, we will do this for all samples, to make this more efficient, we made this into a function and have the function iterate through each sample file.

In [None]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
import scanpy as sc
import scvi
import pandas as pd
import numpy as np

def pp(csv_path):
    
    # Continue with your analysis using Scanpy
    adata = sc.read_csv(csv_path).T
    sc.pp.filter_genes(adata, min_cells = 10)
    sc.pp.highly_variable_genes(adata, n_top_genes = 2000, subset = True, flavor = 'seurat_v3')
    scvi.model.SCVI.setup_anndata(adata)
    vae = scvi.model.SCVI(adata)
    vae.train()
    solo = scvi.external.SOLO.from_scvi_model(vae)
    solo.train()
    df = solo.predict()
    df['prediction'] = solo.predict(soft = False)
    df.index = df.index.map(lambda x: x[:-2])
    df['dif'] = df.doublet - df.singlet
    doublets = df[(df.prediction == 'doublet') & (df.dif > 0.25)]
    
    adata = sc.read_csv(csv_path).T
    adata.obs['Sample'] = csv_path.split('_')[2] #'raw_counts/<sample_identifier>_raw_counts.csv'
    
    adata.obs['doublet'] = adata.obs.index.isin(doublets.index)
    adata = adata[~adata.obs.doublet]
    
    
    sc.pp.filter_cells(adata, min_genes=200) #get rid of cells with fewer than 200 genes
    #sc.pp.filter_genes(adata, min_cells=3) #get rid of genes that are found in fewer than 3 cells
    adata.var['mt'] = adata.var_names.str.startswith('mt-')  # annotate the group of mitochondrial genes as 'mt'
    
    ribo_url = "http://software.broadinstitute.org/gsea/msigdb/download_geneset.jsp?geneSetName=KEGG_RIBOSOME&fileType=txt"
    ribo_genes = pd.read_table(ribo_url, skiprows=2, header=None)
    
    adata.var['ribo'] = adata.var_names.isin(ribo_genes[0].values)
    sc.pp.calculate_qc_metrics(adata, qc_vars=['mt', 'ribo'], percent_top=None, log1p=False, inplace=True)
    upper_lim = np.quantile(adata.obs.n_genes_by_counts.values, .98)
    adata = adata[adata.obs.n_genes_by_counts < upper_lim]
    adata = adata[adata.obs.pct_counts_mt < 20]
    adata = adata[adata.obs.pct_counts_ribo < 2]
    
    return adata

In [None]:
import os

out = []
for file in os.listdir('GSE171524_RAW/'):
    file_path = os.path.join('GSE171524_RAW', file)  # Construct the full file path
    out.append(pp(file_path))

In [None]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
import scanpy as sc
import scvi
import pandas as pd
import numpy as np
import os

def pp(csv_path):
    adata = sc.read_csv(csv_path).T
    sc.pp.filter_genes(adata, min_cells=10)
    sc.pp.highly_variable_genes(adata, n_top_genes=2000, subset=True, flavor='seurat_v3')
    scvi.model.SCVI.setup_anndata(adata)
    vae = scvi.model.SCVI(adata)
    vae.train()
    solo = scvi.external.SOLO.from_scvi_model(vae)
    solo.train()
    df = solo.predict()
    df['prediction'] = solo.predict(soft=False)
    df.index = df.index.map(lambda x: x[:-2])
    df['dif'] = df.doublet - df.singlet
    doublets = df[(df.prediction == 'doublet') & (df.dif > 0.25)]
    
    adata = sc.read_csv(csv_path).T
    adata.obs['Sample'] = csv_path.split('_')[2]  # Adjust based on your file naming convention
    adata.obs['doublet'] = adata.obs.index.isin(doublets.index)
    adata = adata[~adata.obs.doublet]
    
    sc.pp.filter_cells(adata, min_genes=200)
    adata.var['mt'] = adata.var_names.str.startswith('mt-')
    
    ribo_url = "http://software.broadinstitute.org/gsea/msigdb/download_geneset.jsp?geneSetName=KEGG_RIBOSOME&fileType=txt"
    ribo_genes = pd.read_table(ribo_url, skiprows=2, header=None)
    adata.var['ribo'] = adata.var_names.isin(ribo_genes[0].values)
    
    sc.pp.calculate_qc_metrics(adata, qc_vars=['mt', 'ribo'], percent_top=None, log1p=False, inplace=True)
    
    upper_lim = np.quantile(adata.obs.n_genes_by_counts.values, .98)
    adata = adata[adata.obs.n_genes_by_counts < upper_lim]
    adata = adata[adata.obs.pct_counts_mt < 20]
    adata = adata[adata.obs.pct_counts_ribo < 2]
    
    return adata

def batch_process(input_dir, output_dir, batch_size=6):
    files = os.listdir(input_dir)
    batches = [files[i:i + batch_size] for i in range(0, len(files), batch_size)]
    
    for i, batch in enumerate(batches):
        out = []
        for file in batch:
            file_path = os.path.join(input_dir, file)
            out.append(pp(file_path))
        
        batch_adata = sc.concat(out)
        batch_adata.write(os.path.join(output_dir, f'batch_{i}.h5ad'))
        print(f'Processed batch {i} and saved to disk.')

# Set your directories and batch size
input_dir = 'GSE171524_RAW/'
output_dir = 'processed_batches/'
os.makedirs(output_dir, exist_ok=True)

# Run batch processing
batch_process(input_dir, output_dir, batch_size=6)

# Later, you can concatenate the batches
batches = [sc.read(os.path.join(output_dir, f)) for f in os.listdir(output_dir) if f.endswith('.h5ad')]
adata = sc.concat(batches)
sc.pp.filter_genes(adata, min_cells=10)

In [None]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
import scanpy as sc
import scvi
import pandas as pd
import numpy as np
import os

def pp(csv_path):
    adata = sc.read_csv(csv_path).T
    sc.pp.filter_genes(adata, min_cells=10)
    sc.pp.highly_variable_genes(adata, n_top_genes=2000, subset=True, flavor='seurat_v3')
    scvi.model.SCVI.setup_anndata(adata)
    vae = scvi.model.SCVI(adata)
    vae.train()
    solo = scvi.external.SOLO.from_scvi_model(vae)
    solo.train()
    df = solo.predict()
    df['prediction'] = solo.predict(soft=False)
    df.index = df.index.map(lambda x: x[:-2])
    df['dif'] = df.doublet - df.singlet
    doublets = df[(df.prediction == 'doublet') & (df.dif > 0.25)]
    
    adata = sc.read_csv(csv_path).T
    adata.obs['Sample'] = csv_path.split('_')[2]  # Adjust based on your file naming convention
    adata.obs['doublet'] = adata.obs.index.isin(doublets.index)
    adata = adata[~adata.obs.doublet]
    
    sc.pp.filter_cells(adata, min_genes=200)
    adata.var['mt'] = adata.var_names.str.startswith('mt-')
    
    ribo_url = "http://software.broadinstitute.org/gsea/msigdb/download_geneset.jsp?geneSetName=KEGG_RIBOSOME&fileType=txt"
    ribo_genes = pd.read_table(ribo_url, skiprows=2, header=None)
    adata.var['ribo'] = adata.var_names.isin(ribo_genes[0].values)
    
    sc.pp.calculate_qc_metrics(adata, qc_vars=['mt', 'ribo'], percent_top=None, log1p=False, inplace=True)
    
    upper_lim = np.quantile(adata.obs.n_genes_by_counts.values, .98)
    adata = adata[adata.obs.n_genes_by_counts < upper_lim]
    adata = adata[adata.obs.pct_counts_mt < 20]
    adata = adata[adata.obs.pct_counts_ribo < 2]
    
    return adata

def batch_process(input_dir, output_dir, batch_size=4):
    files = os.listdir(input_dir)
    batches = [files[i:i + batch_size] for i in range(0, len(files), batch_size)]
    
    for i, batch in enumerate(batches):
        out = []
        for file in batch:
            file_path = os.path.join(input_dir, file)
            out.append(pp(file_path))
        
        batch_adata = sc.concat(out)
        batch_adata.write(os.path.join(output_dir, f'batch_{i}.h5ad'))
        print(f'Processed batch {i} and saved to disk.')

In [None]:
# Set your directories and batch size
input_dir = 'GSE171524_RAW/'
output_dir = 'D:/processed_batches/'
os.makedirs(output_dir, exist_ok=True)

# Run batch processing
batch_process(input_dir, output_dir, batch_size=4)

# Later, you can concatenate the batches
batches = [sc.read(os.path.join(output_dir, f)) for f in os.listdir(output_dir) if f.endswith('.h5ad')]
adata = sc.concat(batches)
sc.pp.filter_genes(adata, min_cells=10)

In [1]:
import gc  # Import garbage collector
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
import scanpy as sc
import scvi
import pandas as pd
import numpy as np
import os

def pp(csv_path):
    adata = sc.read_csv(csv_path).T
    sc.pp.filter_genes(adata, min_cells=10)
    sc.pp.highly_variable_genes(adata, n_top_genes=2000, subset=True, flavor='seurat_v3')
    scvi.model.SCVI.setup_anndata(adata)
    vae = scvi.model.SCVI(adata)
    vae.train()
    solo = scvi.external.SOLO.from_scvi_model(vae)
    solo.train()
    df = solo.predict()
    df['prediction'] = solo.predict(soft=False)
    df.index = df.index.map(lambda x: x[:-2])
    df['dif'] = df.doublet - df.singlet
    doublets = df[(df.prediction == 'doublet') & (df.dif > 0.25)]
    
    adata = sc.read_csv(csv_path).T
    adata.obs['Sample'] = csv_path.split('_')[2]
    adata.obs['doublet'] = adata.obs.index.isin(doublets.index)
    adata = adata[~adata.obs.doublet]
    
    sc.pp.filter_cells(adata, min_genes=200)
    adata.var['mt'] = adata.var_names.str.startswith('mt-')
    
    ribo_url = "http://software.broadinstitute.org/gsea/msigdb/download_geneset.jsp?geneSetName=KEGG_RIBOSOME&fileType=txt"
    ribo_genes = pd.read_table(ribo_url, skiprows=2, header=None)
    adata.var['ribo'] = adata.var_names.isin(ribo_genes[0].values)
    
    sc.pp.calculate_qc_metrics(adata, qc_vars=['mt', 'ribo'], percent_top=None, log1p=False, inplace=True)
    
    upper_lim = np.quantile(adata.obs.n_genes_by_counts.values, .98)
    adata = adata[adata.obs.n_genes_by_counts < upper_lim]
    adata = adata[adata.obs.pct_counts_mt < 20]
    adata = adata[adata.obs.pct_counts_ribo < 2]
    
    return adata

def batch_process(input_dir, output_dir, batch_size=2):  # Reduce batch size to 2
    files = os.listdir(input_dir)
    batches = [files[i:i + batch_size] for i in range(0, len(files), batch_size)]
    
    for i, batch in enumerate(batches):
        out = []
        for file in batch:
            file_path = os.path.join(input_dir, file)
            out.append(pp(file_path))
        
        batch_adata = sc.concat(out)
        batch_adata.write(os.path.join(output_dir, f'batch_{i}.h5ad'))
        print(f'Processed batch {i} and saved to disk.')
        
        del out, batch_adata  # Clear variables to free memory
        gc.collect()  # Force garbage collection

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Set your directories and batch size
input_dir = 'GSE171524_RAW/'
output_dir = 'D:/processed_batches/'
os.makedirs(output_dir, exist_ok=True)

# Run batch processing with reduced batch size
batch_process(input_dir, output_dir, batch_size=2)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [15:05<00:00,  2.23s/it, v_num=1, train_loss_step=331, train_loss_epoch=323]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [15:06<00:00,  2.27s/it, v_num=1, train_loss_step=331, train_loss_epoch=323]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 167/400:  42%|████▏     | 167/400 [01:02<01:26,  2.68it/s, v_num=1, train_loss_step=0.316, train_loss_epoch=0.3]  
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.285. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [11:13<00:00,  1.71s/it, v_num=1, train_loss_step=405, train_loss_epoch=396]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [11:13<00:00,  1.68s/it, v_num=1, train_loss_step=405, train_loss_epoch=396]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 143/400:  36%|███▌      | 143/400 [00:40<01:13,  3.51it/s, v_num=1, train_loss_step=0.357, train_loss_epoch=0.306] 
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.281. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number


Processed batch 0 and saved to disk.


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [17:35<00:00,  2.61s/it, v_num=1, train_loss_step=389, train_loss_epoch=331]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [17:35<00:00,  2.64s/it, v_num=1, train_loss_step=389, train_loss_epoch=331]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 303/400:  76%|███████▌  | 303/400 [02:08<00:41,  2.36it/s, v_num=1, train_loss_step=0.562, train_loss_epoch=0.31] 
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.292. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [10:50<00:00,  1.61s/it, v_num=1, train_loss_step=295, train_loss_epoch=308]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [10:50<00:00,  1.63s/it, v_num=1, train_loss_step=295, train_loss_epoch=308]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 163/400:  41%|████      | 163/400 [00:43<01:03,  3.73it/s, v_num=1, train_loss_step=0.195, train_loss_epoch=0.247]
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.237. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number


Processed batch 1 and saved to disk.


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [14:06<00:00,  2.12s/it, v_num=1, train_loss_step=311, train_loss_epoch=305]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [14:06<00:00,  2.12s/it, v_num=1, train_loss_step=311, train_loss_epoch=305]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 155/400:  39%|███▉      | 155/400 [00:53<01:24,  2.89it/s, v_num=1, train_loss_step=0.224, train_loss_epoch=0.228]
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.247. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [10:08<00:00,  1.52s/it, v_num=1, train_loss_step=330, train_loss_epoch=326]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [10:08<00:00,  1.52s/it, v_num=1, train_loss_step=330, train_loss_epoch=326]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 231/400:  58%|█████▊    | 231/400 [00:58<00:42,  3.98it/s, v_num=1, train_loss_step=0.219, train_loss_epoch=0.255] 
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.247. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number


Processed batch 2 and saved to disk.


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [11:45<00:00,  1.76s/it, v_num=1, train_loss_step=291, train_loss_epoch=288]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [11:45<00:00,  1.76s/it, v_num=1, train_loss_step=291, train_loss_epoch=288]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 243/400:  61%|██████    | 243/400 [01:10<00:45,  3.43it/s, v_num=1, train_loss_step=0.0697, train_loss_epoch=0.251]
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.250. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [07:34<00:00,  1.15s/it, v_num=1, train_loss_step=397, train_loss_epoch=332]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [07:34<00:00,  1.14s/it, v_num=1, train_loss_step=397, train_loss_epoch=332]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 208/400:  52%|█████▏    | 208/400 [00:39<00:36,  5.24it/s, v_num=1, train_loss_step=0.204, train_loss_epoch=0.272]
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.297. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number


Processed batch 3 and saved to disk.


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [12:28<00:00,  1.88s/it, v_num=1, train_loss_step=537, train_loss_epoch=473]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [12:28<00:00,  1.87s/it, v_num=1, train_loss_step=537, train_loss_epoch=473]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 215/400:  54%|█████▍    | 215/400 [01:05<00:56,  3.28it/s, v_num=1, train_loss_step=0.302, train_loss_epoch=0.352]
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.358. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [09:01<00:00,  1.34s/it, v_num=1, train_loss_step=260, train_loss_epoch=259]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [09:01<00:00,  1.35s/it, v_num=1, train_loss_step=260, train_loss_epoch=259]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 202/400:  50%|█████     | 202/400 [00:44<00:43,  4.51it/s, v_num=1, train_loss_step=0.315, train_loss_epoch=0.275]
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.292. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number


Processed batch 4 and saved to disk.


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [07:30<00:00,  1.13s/it, v_num=1, train_loss_step=341, train_loss_epoch=344]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [07:30<00:00,  1.13s/it, v_num=1, train_loss_step=341, train_loss_epoch=344]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 273/400:  68%|██████▊   | 273/400 [00:53<00:24,  5.14it/s, v_num=1, train_loss_step=0.467, train_loss_epoch=0.319]
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.325. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [18:53<00:00,  2.79s/it, v_num=1, train_loss_step=350, train_loss_epoch=361]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [18:53<00:00,  2.83s/it, v_num=1, train_loss_step=350, train_loss_epoch=361]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 180/400:  45%|████▌     | 180/400 [01:21<01:39,  2.22it/s, v_num=1, train_loss_step=0.376, train_loss_epoch=0.299]
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.284. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number


Processed batch 5 and saved to disk.


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [12:35<00:00,  1.87s/it, v_num=1, train_loss_step=391, train_loss_epoch=339]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [12:35<00:00,  1.89s/it, v_num=1, train_loss_step=391, train_loss_epoch=339]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 299/400:  75%|███████▍  | 299/400 [01:32<00:31,  3.25it/s, v_num=1, train_loss_step=0.474, train_loss_epoch=0.355] 
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.318. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [10:18<00:00,  1.54s/it, v_num=1, train_loss_step=371, train_loss_epoch=341]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [10:18<00:00,  1.55s/it, v_num=1, train_loss_step=371, train_loss_epoch=341]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 230/400:  57%|█████▊    | 230/400 [00:58<00:42,  3.96it/s, v_num=1, train_loss_step=0.293, train_loss_epoch=0.285]
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.281. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number


Processed batch 6 and saved to disk.


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [08:52<00:00,  1.34s/it, v_num=1, train_loss_step=342, train_loss_epoch=339]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [08:52<00:00,  1.33s/it, v_num=1, train_loss_step=342, train_loss_epoch=339]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 180/400:  45%|████▌     | 180/400 [00:40<00:49,  4.41it/s, v_num=1, train_loss_step=0.391, train_loss_epoch=0.253] 
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.247. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [03:51<00:00,  1.76it/s, v_num=1, train_loss_step=316, train_loss_epoch=310]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [03:51<00:00,  1.73it/s, v_num=1, train_loss_step=316, train_loss_epoch=310]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 217/400:  54%|█████▍    | 217/400 [00:21<00:18,  9.87it/s, v_num=1, train_loss_step=0.286, train_loss_epoch=0.286]
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.296. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number


Processed batch 7 and saved to disk.


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [08:06<00:00,  1.24s/it, v_num=1, train_loss_step=266, train_loss_epoch=356]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [08:06<00:00,  1.22s/it, v_num=1, train_loss_step=266, train_loss_epoch=356]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 256/400:  64%|██████▍   | 256/400 [00:52<00:29,  4.92it/s, v_num=1, train_loss_step=0.327, train_loss_epoch=0.343]
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.336. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [09:32<00:00,  1.53s/it, v_num=1, train_loss_step=320, train_loss_epoch=318]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [09:32<00:00,  1.43s/it, v_num=1, train_loss_step=320, train_loss_epoch=318]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 226/400:  56%|█████▋    | 226/400 [00:53<00:41,  4.23it/s, v_num=1, train_loss_step=0.4, train_loss_epoch=0.317]  
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.327. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number


Processed batch 8 and saved to disk.


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [11:56<00:00,  1.80s/it, v_num=1, train_loss_step=437, train_loss_epoch=370]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [11:56<00:00,  1.79s/it, v_num=1, train_loss_step=437, train_loss_epoch=370]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 295/400:  74%|███████▍  | 295/400 [01:31<00:32,  3.23it/s, v_num=1, train_loss_step=0.299, train_loss_epoch=0.298]
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.290. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [10:02<00:00,  1.49s/it, v_num=1, train_loss_step=308, train_loss_epoch=353]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [10:02<00:00,  1.51s/it, v_num=1, train_loss_step=308, train_loss_epoch=353]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 122/400:  30%|███       | 122/400 [00:30<01:08,  4.04it/s, v_num=1, train_loss_step=0.284, train_loss_epoch=0.295]
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.300. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number


Processed batch 9 and saved to disk.


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [04:28<00:00,  1.50it/s, v_num=1, train_loss_step=402, train_loss_epoch=381]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [04:28<00:00,  1.49it/s, v_num=1, train_loss_step=402, train_loss_epoch=381]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 313/400:  78%|███████▊  | 313/400 [00:36<00:10,  8.51it/s, v_num=1, train_loss_step=0.36, train_loss_epoch=0.29]  
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.302. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [11:24<00:00,  1.71s/it, v_num=1, train_loss_step=321, train_loss_epoch=339]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [11:24<00:00,  1.71s/it, v_num=1, train_loss_step=321, train_loss_epoch=339]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 219/400:  55%|█████▍    | 219/400 [01:02<00:51,  3.52it/s, v_num=1, train_loss_step=0.235, train_loss_epoch=0.266] 
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.245. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number


Processed batch 10 and saved to disk.


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [06:55<00:00,  1.10s/it, v_num=1, train_loss_step=303, train_loss_epoch=304]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [06:55<00:00,  1.04s/it, v_num=1, train_loss_step=303, train_loss_epoch=304]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 330/400:  82%|████████▎ | 330/400 [00:58<00:12,  5.60it/s, v_num=1, train_loss_step=0.162, train_loss_epoch=0.186] 
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.212. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [06:10<00:00,  1.09it/s, v_num=1, train_loss_step=383, train_loss_epoch=417]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [06:10<00:00,  1.08it/s, v_num=1, train_loss_step=383, train_loss_epoch=417]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 213/400:  53%|█████▎    | 213/400 [00:33<00:29,  6.39it/s, v_num=1, train_loss_step=0.356, train_loss_epoch=0.351]
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.345. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number


Processed batch 11 and saved to disk.


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [08:28<00:00,  1.27s/it, v_num=1, train_loss_step=319, train_loss_epoch=322]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [08:28<00:00,  1.27s/it, v_num=1, train_loss_step=319, train_loss_epoch=322]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 227/400:  57%|█████▋    | 227/400 [00:47<00:36,  4.73it/s, v_num=1, train_loss_step=0.439, train_loss_epoch=0.292] 
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.283. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 400/400: 100%|██████████| 400/400 [17:18<00:00,  2.60s/it, v_num=1, train_loss_step=319, train_loss_epoch=341]

`Trainer.fit` stopped: `max_epochs=400` reached.


Epoch 400/400: 100%|██████████| 400/400 [17:18<00:00,  2.60s/it, v_num=1, train_loss_step=319, train_loss_epoch=341]
[34mINFO    [0m Creating doublets, preparing SOLO model.                                                                  


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
c:\Users\agarc\anaconda3\envs\bioenv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.


Epoch 172/400:  43%|████▎     | 172/400 [01:11<01:34,  2.41it/s, v_num=1, train_loss_step=0.223, train_loss_epoch=0.314]
Monitored metric validation_loss did not improve in the last 30 records. Best score: 0.293. Signaling Trainer to stop.


  return func(*args, **kwargs)
  return func(*args, **kwargs)
  adata.obs["n_genes"] = number


Processed batch 12 and saved to disk.


In [3]:
# Later, you can concatenate the batches
batches = [sc.read(os.path.join(output_dir, f)) for f in os.listdir(output_dir) if f.endswith('.h5ad')]
adata = sc.concat(batches)
sc.pp.filter_genes(adata, min_cells=10)

MemoryError: Unable to allocate 1.24 GiB for an array with shape (9647, 34546) and data type float32

1. Sequential Processing with Incremental Merging: Instead of loading all .h5ad files at once, process them one by one or in smaller groups, then combine the results incrementally.

In [5]:
output_dir = 'D:/processed_batches/'
merged_adata = None

for file in os.listdir(output_dir):
    if file.endswith('.h5ad'):
        print(f"Processing {file}...")
        adata = sc.read(os.path.join(output_dir, file))
        
        if merged_adata is None:
            merged_adata = adata
        else:
            merged_adata = sc.concat([merged_adata, adata])
        
        del adata
        gc.collect()

# Save the final merged AnnData object
merged_adata.write(os.path.join(output_dir, 'merged_data.h5ad'))


Processing batch_0.h5ad...
Processing batch_1.h5ad...
Processing batch_2.h5ad...
Processing batch_3.h5ad...
Processing batch_4.h5ad...
Processing batch_5.h5ad...


MemoryError: Unable to allocate 7.08 GiB for an array with shape (55051, 34546) and data type float32

In [None]:
sc.pp.filter_genes(merged_adata, min_cells=10)