# Download data needed for CNV analyses
This notebook downloads all of the underlying data for analyses of CNVs in ClinVar and how a real-world data set of CNVs from microarrays matches to these variants. It also downloads the intermediate output files generated in the course of running these CNV analyses so that the user may avoid re-running long computations in the matching analysis.

In [1]:
import os
from pathlib import Path

import boto3
from botocore.config import Config
from dotenv import load_dotenv

In [2]:
load_dotenv()

True

In [4]:
s3 = boto3.resource("s3", config=Config(region_name="us-east-2"))

In [4]:
bucket = sorted(
    list(
        s3.Bucket("nch-igm-wagner-lab-public").objects.filter(Prefix="variation-normalizer-manuscript/cnv_data/").all()
    ),
    key=lambda f: f.key,
    reverse=True
)

In [5]:

Path("cnv_data").mkdir(exist_ok = True)

for file in bucket:
    fn = file.key.split("/")[-1]
    with open(os.path.join('cnv_data',fn), "wb") as f:
        file.Object().download_fileobj(f)

### ClinVar normalized variant data
If you have not already run the notebooks in the ClinVar analysis directory, you will need to pull down the normalized ClinVar variants. This can be done by running ```clinvar_variation_analysis.ipynb``` in the ```clinvar``` directory, or by running the following cell:

In [6]:
Path("../clinvar").mkdir(exist_ok = True)

s3 = boto3.client('s3')
with open('../clinvar/output-variation_identity-vrs-1.3.ndjson.gz', 'wb') as f:
    s3.download_fileobj('nch-igm-wagner-lab-public', 'variation-normalizer-manuscript/output-variation_identity-vrs-1.3.ndjson.gz', f)