In this notebook, I discuss the various ways you can load and save data to/from SAM.

In [1]:
from SAM import SAM
import pandas as pd

We can use `sam.load_data` to load data from files (namely `.csv`, `.txt`, and `.h5ad` files). For `.csv` and `.txt` files, the `sep` argument is the delimeter used in those files (usually `,` for `.csv` and `\t` for `.txt`). `sep=','` by default.

In [2]:
sam=SAM()
sam.__dict__

{'run_args': {}, 'preprocess_args': {}}

`sam.__dict__` shows all the current attributes of the SAM object. We can see that the SAM object is currently empty.

In [3]:
sam.load_data('../../example_data/schisto2.5_tpm.csv.gz' , sep = ',')
sam.__dict__

{'run_args': {},
 'preprocess_args': {},
 'adata_raw': AnnData object with n_obs × n_vars = 338 × 10782 ,
 'adata': AnnData object with n_obs × n_vars = 338 × 10782 
     layers: 'X_disp'}

Now, we see that the SAM object is populated with the `AnnData` objects. `adata_raw` is the loaded data and will remain untouched. `adata` is initially set equal to `adata_raw` and will be subject to preprocessing and store the results of the SAM analysis.

When loading a dense expression table like from a `.csv` or a `.txt` file, you can use the `save_sparse_file` argument to save a sparse representation of the data (in the `AnnData` file format `.h5ad`) for much faster loading in the future. See:

In [8]:
sam=SAM()
sam.load_data('../../example_data/schisto2.5_tpm.csv.gz' , sep = ',', save_sparse_file = '../../example_data/sparse_data.h5ad')

In [9]:
sam=SAM()
sam.load_data('../../example_data/sparse_data.h5ad')
sam.__dict__

{'run_args': {},
 'preprocess_args': {},
 'adata_raw': AnnData object with n_obs × n_vars = 338 × 10782 ,
 'adata': AnnData object with n_obs × n_vars = 338 × 10782 
     layers: 'X_disp'}

We can also pass data directly into the SAM constructor via the `counts` argument. We can pass in an `AnnData` object:

In [10]:
example_anndata = sam.adata_raw
sam=SAM(counts = example_anndata)
sam.__dict__

{'adata_raw': AnnData object with n_obs × n_vars = 338 × 10782 ,
 'adata': AnnData object with n_obs × n_vars = 338 × 10782 
     layers: 'X_disp',
 'run_args': {},
 'preprocess_args': {}}

We can also pass in a Pandas DataFrame:

In [13]:
DATAFRAME = pd.read_csv('../../example_data/schisto2.5_tpm.csv.gz',sep=',',index_col=0).T
sam=SAM(counts = DATAFRAME)
sam.__dict__

{'adata_raw': AnnData object with n_obs × n_vars = 338 × 10782 ,
 'adata': AnnData object with n_obs × n_vars = 338 × 10782 
     layers: 'X_disp',
 'run_args': {},
 'preprocess_args': {}}

We can also pass in a tuple (scipy.sparse matrix, list of gene IDs, list of cell IDs):

In [14]:
sparse_data = sam.adata_raw.X
genes = sam.adata_raw.var_names
cells = sam.adata_raw.obs_names

sam=SAM(counts = (sparse_data,genes,cells))
sam.__dict__

{'adata_raw': AnnData object with n_obs × n_vars = 338 × 10782 ,
 'adata': AnnData object with n_obs × n_vars = 338 × 10782 
     layers: 'X_disp',
 'run_args': {},
 'preprocess_args': {}}

We can also pass in the tuple (numpy array, list of gene IDs, list of cell IDs):

In [15]:
sparse_data = sparse_data.toarray()

sam=SAM(counts = (sparse_data,genes,cells))
sam.__dict__

{'adata_raw': AnnData object with n_obs × n_vars = 338 × 10782 ,
 'adata': AnnData object with n_obs × n_vars = 338 × 10782 
     layers: 'X_disp',
 'run_args': {},
 'preprocess_args': {}}

To save the SAM object, we can write the AnnData `sam.adata` to a '.h5ad' file. The '.h5ad' file will also contain the raw data in `sam.adata_raw`:

In [16]:
sam.preprocess_data()
sam.run()
sam.save_anndata('../../example_data/sam_results.h5ad')
sam.__dict__

RUNNING SAM
Iteration: 0, Convergence: 0.43119513581604074
Iteration: 1, Convergence: 0.11154262887407886
Iteration: 2, Convergence: 0.06843432389262005
Iteration: 3, Convergence: 0.020232217785839057
Iteration: 4, Convergence: 0.007733197047618673
Computing the UMAP embedding...
Elapsed time: 14.95327353477478 seconds


{'adata_raw': AnnData object with n_obs × n_vars = 338 × 10782 ,
 'adata': AnnData object with n_obs × n_vars = 338 × 10782 
     var: 'mask_genes', 'spatial_dispersions', 'weights'
     uns: 'preprocess_args', 'ranked_genes', 'pca_obj', 'X_processed', 'neighbors', 'run_args'
     obsm: 'X_pca', 'X_umap'
     layers: 'X_disp', 'X_knn_avg',
 'run_args': {'max_iter': 10,
  'verbose': True,
  'projection': 'umap',
  'stopping_condition': 0.005,
  'num_norm_avg': 50,
  'k': 20,
  'distance': 'correlation',
  'preprocessing': 'Normalizer',
  'npcs': None,
  'n_genes': None,
  'weight_PCs': True,
  'proj_kwargs': {}},
 'preprocess_args': {'div': 1,
  'downsample': 0,
  'sum_norm': None,
  'include_genes': None,
  'exclude_genes': None,
  'norm': 'log',
  'min_expression': 1,
  'thresh': 0.01,
  'filter_genes': True}}

Now, we can initialize an empty SAM object,

In [17]:
sam=SAM()
sam.__dict__

{'run_args': {}, 'preprocess_args': {}}

and use the `load_data` function to load the data back into the SAM object:

In [18]:
sam.load_data('../../example_data/sam_results.h5ad')
sam.__dict__

{'run_args': {'max_iter': 10,
  'verbose': True,
  'projection': 'umap',
  'stopping_condition': 0.005,
  'num_norm_avg': 50,
  'k': 20,
  'distance': 'correlation',
  'preprocessing': 'Normalizer',
  'npcs': None,
  'n_genes': None,
  'weight_PCs': True,
  'proj_kwargs': {}},
 'preprocess_args': {'div': 1,
  'downsample': 0,
  'sum_norm': None,
  'include_genes': None,
  'exclude_genes': None,
  'norm': 'log',
  'min_expression': 1,
  'thresh': 0.01,
  'filter_genes': True},
 'adata_raw': AnnData object with n_obs × n_vars = 338 × 10782 ,
 'adata': AnnData object with n_obs × n_vars = 338 × 10782 
     var: 'mask_genes', 'spatial_dispersions', 'weights'
     uns: 'preprocess_args', 'ranked_genes', 'pca_obj', 'X_processed', 'neighbors', 'run_args'
     obsm: 'X_pca', 'X_umap'
     layers: 'X_disp', 'X_knn_avg'}