# SeqFu2
Summary of [Seqfu](https://github.com/telatin/seqfu2) results from project: `[{{ project().name }}]` 

## Description
[Seqfu2](https://github.com/telatin/seqfu2) provides a sequence statistic overview of the genomes in the dataset.

## Genome Statistics Overview

In [None]:
import pandas as pd
from pathlib import Path
import altair as alt
import warnings
warnings.filterwarnings('ignore')

In [None]:
report_dir = Path("../")
seqfu_table = report_dir / "tables/df_seqfu_stats.csv"
gtdb_table = report_dir / "tables/df_gtdb_meta.csv"
df_seqfu = pd.read_csv(seqfu_table)
df_seqfu = df_seqfu.rename(columns={'File' : 'genome_id'}).set_index('genome_id')
df_gtdb = pd.read_csv(gtdb_table).set_index('genome_id')
df = pd.concat([df_seqfu, df_gtdb], axis=1).reset_index()

In [None]:
source = df

chart = alt.Chart(source).mark_circle(size=150).encode(
    alt.X("gc", type='quantitative', title='GC (%)', scale=alt.Scale(zero=False)).axis(format='.2%'),
    alt.Y("N50", type='quantitative', scale=alt.Scale(zero=False)),
    color=alt.Color('Total', scale=alt.Scale(domain=[df['Total'].min(), df['Total'].max()], range=['blue', 'red'])),
    tooltip=['genome_id', 'Count', 'Total', 'gc', 'N50', 'N75', 'N90', 'AuN', 'Min', 'Max', 'Organism']
).properties(
    width=500,
    height=500,
).properties(
    title = "Genome QC Statistics",
).interactive()

chart = chart.configure_title(fontSize=20, offset=10, orient='top', anchor='middle')

chart

| Axis | Description |
|-|-|
| `#Seq` |  The number of sequences in the input file. |
| `Total bp` |  The total number of base pairs in all sequences. |
| `Avg` |  The average length of the sequences in base pairs. |
| `N50`, `N75`, `N90` |  The length of the shortest sequence such that the sum of the lengths of all sequences equal to or greater than it is at least 50%, 75%, or 90% of the total length of all sequences. |
| `auN` |  area under the Nx curve. |
| `Min` |  The length of the shortest sequence in base pairs. |
| `Max` |  The length of the longest sequence in base pairs. |

These statistics can be useful for assessing the quality and characteristics of a set of sequences, such as a genome assembly or a set of reads.

[Download Table]({{ project().file_server() }}/tables/df_seqfu_stats.csv){:target="_blank" .md-button}

## References
<font size="2">
{% for i in project().rule_used['seqfu']['references'] %}
- *{{ i }}*
{% endfor %}
</font>