1. Download dependencies

In [1]:
pip install pandas pybigwig tqdm pyfaidx

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


2. Download PhyloP scores for 100 vertebrate species (in bigWig format)

In [2]:
!wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/phyloP100way/hg38.phyloP100way.bw

--2025-08-02 14:51:11--  http://hgdownload.soe.ucsc.edu/goldenPath/hg38/phyloP100way/hg38.phyloP100way.bw
Resolving hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)... 128.114.119.163
Connecting to hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)|128.114.119.163|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9870053206 (9.2G)
Saving to: ‘hg38.phyloP100way.bw’


2025-08-02 14:57:17 (25.8 MB/s) - ‘hg38.phyloP100way.bw’ saved [9870053206/9870053206]



3. Extract the PhyloP scores

In [4]:
import pandas as pd
import pyBigWig
from tqdm import tqdm

# Configure file paths
input_csv = "all.csv"                    # Path to your variant CSV file
bw_file = "hg38.phyloP100way.bw"         # PhyloP bigWig file downloaded from UCSC
output_csv = "phylop.csv"                # Output CSV path

# Load CSV
df = pd.read_csv(input_csv)
df = df.head(100)

# Open bigWig file
bw = pyBigWig.open(bw_file)

# Query PhyloP scores with progress bar
phylop_scores = []
for i, row in tqdm(df.iterrows(), total=len(df), desc="Querying PhyloP"):
    chrom = str(row["#CHROM"]).replace("chr", "")
    if not chrom.startswith("chr"):
        chrom = "chr" + chrom
    pos = int(row["POS"])
    try:
        score = bw.values(chrom, pos - 1, pos)[0]
    except:
        score = None
    phylop_scores.append(score)

# Add scores to DataFrame
df["PhyloP"] = phylop_scores

# Save result
df.to_csv(output_csv, index=False)
print(f"Done! Results saved to: {output_csv}")

Querying PhyloP: 100%|██████████| 100/100 [00:00<00:00, 2008.74it/s]

Done! Results saved to: phylop.csv



