# Download DisGeNet SQLite file

Download the DisGeNet data source from here: https://www.disgenet.org/downloads as a gunziped SQLite databse file, after registration and login into the data directory of this project.

then unarchive tha gziped sqlite db file:

```bash
gzip -d disgenet_2020.db.gz
```

The following Python+SQLite SQL code shows the inclusion ratio of different DAG score thresholds

# Connect to SQLite db

In [79]:
import sqlite3

In [80]:
con = sqlite3.connect("../data/disgenet_2020.db")

In [81]:
cur = con.cursor()

# Inclusion ratios of different DAT score thresholds

In [82]:
query_gene_disease_network_threshold_steps="""
SELECT
    SUM(
        CASE 
            WHEN score >= ? THEN 1.0 
            ELSE 0.0 
        END
    ) / COUNT(*) AS inlcusion_ratio
FROM 
    geneDiseaseNetwork
LIMIT 10
"""
for threshold in [0.06, 0.2, 0.4, 0.6]:
    for r in cur.execute(query_gene_disease_network_threshold_steps, [threshold]):
        print(f"threshold: {threshold}\tinclusion ratio: {r[0]:.3f}")

threshold: 0.06	inclusion ratio: 0.659
threshold: 0.2	inclusion ratio: 0.254
threshold: 0.4	inclusion ratio: 0.184
threshold: 0.6	inclusion ratio: 0.100


Note:

If you already loaded DisGeNet into PostgreSQL you may use the `FILTER` clause, which will yield the same results but with a more concise code.

```sql
SELECT
    (count(*) FILTER(WHERE score >= 0.06))::float / count(*)::float -- --> 0.66
FROM
    disgenet.gene_disease as gd 
;
```