### Download and Extract the Latest Gene IDs from WormBase

This code automates the process of downloading and extracting gene IDs from WormBase. 

1. Get the Latest Wormbase Version
2. Download Wormbase based on Wormbase Naming convention
3. Extract the Live genes to a CVS formatted file

In [3]:
from pub_worm.wormbase.wormbase_util import current_wormbase_version, download_gene_ids, extract_live_gene_ids

output_dir = "./wormbase_data"

wormbase_version = current_wormbase_version()
print(f"The latest Wormbase Version is {wormbase_version}")

# Download Gene IDs for the given Wormbase Version
download_gene_ids(wormbase_version, output_dir)

# Extract only the Live Genes for the given Wormbase Genes txt file
extract_live_gene_ids(wormbase_version, output_dir)


The latest Wormbase Version is WS293
Downloaded: ./wormbase_data/c_elegans.PRJNA13758.WS293.geneIDs.txt.gz
Removed: ./wormbase_data/c_elegans.PRJNA13758.WS293.geneIDs.txt.gz
Processed file saved to: ./wormbase_data/c_elegans.PRJNA13758.WS293.geneIDs.csv


## Provide some summary info on Gene IDs

In [5]:
import pandas as pd

gene_ids_df = pd.read_csv('./wormbase_data/c_elegans.PRJNA13758.WS293.geneIDs.csv') 
unique_gene_types = gene_ids_df["Gene_Type"].value_counts()
print(unique_gene_types)

Gene_Type
protein_coding_gene      19983
piRNA_gene               15363
ncRNA_gene                8487
pseudogene                2131
gene                      1523
tRNA_gene                  634
snoRNA_gene                346
miRNA_gene                 261
lincRNA_gene               193
snRNA_gene                 129
antisense_lncRNA_gene      100
rRNA_gene                   22
scRNA_gene                   1
Name: count, dtype: int64


### Download WormCat CSV File

This code downloads a WormCat CSV file from a wormcat.com URL and saves it to a designated output directory. 
It ensures that the directory exists before saving the file.


#### Example Execution:
- The function downloads the file `whole_genome_v2_nov-11-2021.csv` from the WormCat website and saves it to the `./wormbase_data` directory (or any other specified directory).

In [11]:
import os
import requests
import shutil

def _download_url(file_url, output_file_path):
    response = requests.get(file_url, stream=True)
    if response.status_code == 200:
        with open(output_file_path, 'wb') as f:
            shutil.copyfileobj(response.raw, f)
        print(f"Downloaded: {output_file_path}")
    else:
        print(f"Failed to download: {file_url} (status code: {response.status_code})")
    return

def download_wormcat_csv(output_dir="./"):
    url = "http://www.wormcat.com/static/download/whole_genome_v2_nov-11-2021.csv"
    output_filename = url.split("/")[-1]  # Get the filename from the URL

    os.makedirs(output_dir, exist_ok=True)
    output_file_path = os.path.join(output_dir, output_filename)

    if os.path.exists(output_file_path):
        print(f"File already exists: {output_file_path}. Skipping download.")
        return
    
    _download_url(url, output_file_path)
    print(f"File downloaded to: {output_file_path}")



In [12]:
download_wormcat_csv("./wormbase_data")

Downloaded: ./wormbase_data/whole_genome_v2_nov-11-2021.csv
File downloaded to: ./wormbase_data/whole_genome_v2_nov-11-2021.csv


# Appendix