# Advanced Usage
Using AnnSQL, we will demonstrate extended functionality and how to perform the following operations:

- Determine total counts per cell library
- Calculate total gene counts
- Normalize the counts to 10,000 reads per cell.
- Log transform the counts.
- Calculate the top highly variable genes.

###  Install the AnnSQL package

```bash
pip install annsql
```

### Import libraries, build, then open the database

In [1]:
from AnnSQL import AnnSQL
from AnnSQL.MakeDb import MakeDb
import scanpy as sc
import os
adata = sc.datasets.pbmc3k()
adata.var_names_make_unique()
if os.path.exists("db/pbmc3k.asql"):
	os.remove("db/pbmc3k.asql")
MakeDb(adata=adata, db_name="pbmc3k", db_path="db/")
asql = AnnSQL(db="db/pbmc3k.asql")

Time to make var_names unique:  18.618839740753174
Time to create X table structure:  0.2079753875732422
Time to insert X data:  7.7958338260650635


### Calculate total counts per cell

In [2]:
asql.calculate_total_counts(chunk_size=950) #if system memory is low, lower chunks to <=200 

Total Counts Calculation Started
Total Counts Calculation Complete


### View the total counts

In [3]:
asql.query("SELECT total_counts FROM X ORDER BY total_counts DESC LIMIT 5 ")

Unnamed: 0,total_counts
0,15844.0
1,15301.0
2,10783.0
3,10359.0
4,10282.0


### Calculate counts per gene

In [4]:
asql.calculate_gene_counts(chunk_size=950)

Updating Var Table
Gene Counts Calculation Started
Gene Counts Calculation Complete


### View the gene counts

In [5]:
asql.query("SELECT * FROM var ORDER BY gene_counts DESC LIMIT 5 ")

Unnamed: 0,gene_ids,gene_names_orig,gene_names,gene_counts
0,ENSG00000251562,MALAT1,MALAT1,161685.0
1,ENSG00000205542,TMSB4X,TMSB4X,124210.0
2,ENSG00000166710,B2M,B2M,121363.0
3,ENSG00000147403,RPL10,RPL10,88517.0
4,ENSG00000167526,RPL13,RPL13,77111.0


### Normalize cell expression counts to 10,000	

In [6]:
#lower chunk_size if system memory is low. Max supported chunk_size is 950 (DuckDB limitation)
asql.expression_normalize(chunk_size=950) 

Expression Normalization Started
Expression Normalization Complete


### View a normalized gene

In [7]:
asql.query("SELECT RER1 FROM X ORDER BY RER1 DESC LIMIT 5 ")

Unnamed: 0,RER1
0,219.324249
1,204.081635
2,35.587189
3,18.939394
4,17.401392


### Log transform the expression values

In [8]:
asql.expression_log(log_type="LOG2", chunk_size=950) #LN, LOG2 or LOG10

Log Transform Started
Log Transform Complete


### Examine a log transformed value

In [9]:
asql.query("SELECT RER1 FROM X ORDER BY RER1 DESC LIMIT 5 ")

Unnamed: 0,RER1
0,7.783484
1,7.680055
2,5.193267
3,4.31755
4,4.201743


### Calculate Highly Variable Genes

In [10]:
asql.calculate_variable_genes(chunk_size=950) 

Updating Var Table
Variance Calculation Complete


### View the top 50 highly variable genes

In [11]:
asql.query("SELECT * FROM var ORDER BY variance DESC LIMIT 50 ")

Unnamed: 0,gene_ids,gene_names_orig,gene_names,gene_counts,variance
0,ENSG00000090382,LYZ,LYZ,27666.0,7.49079
1,ENSG00000163220,S100A9,S100A9,16326.0,6.883412
2,ENSG00000204287,HLA-DRA,HLA_DRA,16467.0,6.731869
3,ENSG00000101439,CST3,CST3,11857.0,6.039524
4,ENSG00000011600,TYROBP,TYROBP,9381.0,5.810699
5,ENSG00000143546,S100A8,S100A8,8515.0,5.380419
6,ENSG00000105374,NKG7,NKG7,7165.0,5.09594
7,ENSG00000100097,LGALS1,LGALS1,8055.0,5.019437
8,ENSG00000019582,CD74,CD74,23018.0,4.922016
9,ENSG00000161570,CCL5,CCL5,5491.0,4.770338
