**Table of contents**<a id='toc0_'></a>    
- [Download Data](#toc1_)    
  - [Download data needed for CNV analyses](#toc1_1_)    
  - [ClinVar normalized variant data](#toc1_2_)    
  - [ClinVar processed CNV data](#toc1_3_)

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Download Data](#toc0_)

Download data for analysis notebooks

In [1]:
import requests
from pathlib import Path

In [2]:
MANUSCRIPT_S3_URL = "https://nch-igm-wagner-lab-public.s3.us-east-2.amazonaws.com/variation-normalizer-manuscript/2025"

In [3]:
def download_s3(url: str, outfile_path: Path) -> None:
    """Download objects from public s3 bucket

    :param url: URL for file in s3 bucket
    :param outfile_path: Path where file should be saved
    """
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(outfile_path, "wb") as h:
            for chunk in r.iter_content(chunk_size=8192):
                if chunk:
                    h.write(chunk)

## <a id='toc1_1_'></a>[Download data needed for CNV analyses](#toc0_)

This notebook downloads all of the underlying data for analyses of CNVs in ClinVar and how a real-world data set of CNVs from microarrays matches to these variants. It also downloads the intermediate output files generated in the course of running these CNV analyses so that the user may avoid re-running long computations in the matching analysis.

In [4]:
path = "cnvs/cnv_data"
Path(path).mkdir(exist_ok=True)

for fn in [
    "NCH-microarray-CNVs-cleaned.csv",
    "NCH-microarray-CNVs.csv",
    "NCH-normalizer-results.json",
]:
    url = f"{MANUSCRIPT_S3_URL}/cnv_data/{fn}"
    outfile_path = Path(f"{path}/{fn}")
    download_s3(url, outfile_path)

## <a id='toc1_2_'></a>[ClinVar normalized variant data](#toc0_)

If you have not already run the notebooks in the ClinVar analysis directory, you will need to pull down the normalized ClinVar variants. This can be done by running ```clinvar_variation_analysis.ipynb``` in the ```clinvar``` directory, or by running the following cell:

In [5]:
path = "clinvar"
Path(path).mkdir(exist_ok=True)

url = f"{MANUSCRIPT_S3_URL}/clinvar/vi-normalized-with-liftover.jsonl.gz"
outfile_path = Path(f"{path}/vi-normalized-with-liftover.jsonl.gz")
download_s3(url, outfile_path)

## <a id='toc1_3_'></a>[ClinVar processed CNV data](#toc0_)
If you have not already run `prep_clinvar_cnvs.ipynb`, the following cell can be run to generated a list of processed ClinVar cnvs. This file is provided as input to the `query_match_nch_clinvar_cnvs.ipynb` notebook.

In [6]:
path = "cnvs/cnv_data"
Path(path).mkdir(exist_ok=True)

url = f"{MANUSCRIPT_S3_URL}/clinvar/ClinVar-CNVs-normalized.csv.gzip"
outfile_path = Path(f"{path}/ClinVar-CNVs-normalized.csv.gzip")
download_s3(url, outfile_path)