## 📦 Install Required Library

### tqdm is used to display a progress bar during download

In [2]:
!pip install tqdm

Collecting tqdm
  Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
Installing collected packages: tqdm
Successfully installed tqdm-4.67.1



[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: C:\Users\S MEHDI SHAH\AppData\Local\Programs\Python\Python312\python.exe -m pip install --upgrade pip


## 📚 Import Required Modules

In [3]:
import requests          # To send HTTP requests
import os                # For file operations (not heavily used here)
from tqdm import tqdm    # To show a progress bar while downloading

## 🧬 Step 1: Provide List of Compound IDs

### These are compound IDs from the Natural Products Atlas (NPAtlas)

In [4]:
compound_ids = [
    "NPA001610", "NPA034958", "NPA008722", "NPA002681", "NPA002681", "NPA007612",
    "NPA007612", "NPA005699", "NPA006564", "NPA006564", "NPA010979", "NPA010979",
    "NPA020779", "NPA023382", "NPA010713", "NPA010713", "NPA011983", "NPA011983",
    "NPA020777", "NPA012230", "NPA016762", "NPA002361", "NPA033178", "NPA035539",
    "NPA033178", "NPA035539", "NPA019878", "NPA008672", "NPA011609", "NPA019587",
    "NPA034210", "NPA033083", "NPA015833", "NPA015833", "NPA020869", "NPA017664",
    "NPA017664", "NPA035985", "NPA035986", "NPA020869", "NPA001814", "NPA001814",
    "NPA017591", "NPA017591", "NPA035554", "NPA035555", "NPA024138", "NPA032735",
    "NPA023171", "NPA032735", "NPA022599", "NPA022594", "NPA020780", "NPA020780",
    "NPA031144", "NPA031144", "NPA025659", "NPA027081", "NPA033082", "NPA033082",
    "NPA026594", "NPA033083", "NPA034956", "NPA021555", "NPA012669", "NPA015553",
    "NPA020976", "NPA031748", "NPA031748", "NPA009208", "NPA009208", "NPA000421",
    "NPA000421", "NPA034211", "NPA011627", "NPA013943", "NPA030144", "NPA030144",
    "NPA034909", "NPA034909", "NPA002235", "NPA017166", "NPA034141", "NPA016668",
    "NPA030460", "NPA030461", "NPA027105", "NPA032736", "NPA027245", "NPA027246",
    "NPA027245", "NPA027246", "NPA036583", "NPA036583", "NPA032682", "NPA007995",
    "NPA007995", "NPA036452", "NPA032736", "NPA012967", "NPA004548", "NPA027909",
    "NPA011811", "NPA011811", "NPA016918", "NPA001117", "NPA031946", "NPA031946",
    "NPA027249", "NPA027251", "NPA014446", "NPA001985", "NPA001985", "NPA027249",
    "NPA027251", "NPA027248", "NPA027250", "NPA027248", "NPA027250", "NPA012122",
    "NPA014249", "NPA012122", "NPA034911", "NPA034911", "NPA014966", "NPA014966",
    "NPA002051", "NPA007240", "NPA002051", "NPA031202", "NPA031203", "NPA031202",
    "NPA031203", "NPA031202", "NPA031203", "NPA027878", "NPA025660", "NPA011649",
    "NPA000147", "NPA007240", "NPA011613", "NPA006328", "NPA034577", "NPA006328",
    "NPA021618", "NPA031511", "NPA036592", "NPA027256", "NPA027257", "NPA004503",
    "NPA027256", "NPA027257", "NPA022951", "NPA022952", "NPA031038", "NPA031038",
    "NPA035890", "NPA035890", "NPA035890", "NPA021547", "NPA021550", "NPA022951",
    "NPA022952", "NPA015133", "NPA033022", "NPA021549", "NPA006854", "NPA006854",
    "NPA006854", "NPA001734", "NPA012535", "NPA024462", "NPA004055", "NPA004718",
    "NPA004718", "NPA003304", "NPA003516", "NPA003516", "NPA006597", "NPA013904",
    "NPA029463", "NPA029463", "NPA029463", "NPA011835", "NPA001859", "NPA021965",
    "NPA021965", "NPA021965", "NPA021965", "NPA021965", "NPA009861", "NPA009861",
    "NPA023450", "NPA032792", "NPA032792", "NPA034969", "NPA016029", "NPA016029",
    "NPA023325", "NPA022306", "NPA022919", "NPA016029", "NPA030462", "NPA002258",
    "NPA002258", "NPA002655", "NPA006209", "NPA035361", "NPA031200", "NPA031201",
    "NPA031200", "NPA031201", "NPA031200", "NPA031201", "NPA009556", "NPA009556",
    "NPA016578", "NPA016578", "NPA016578", "NPA021548", "NPA033404", "NPA001743",
    "NPA001743", "NPA013780", "NPA007154", "NPA012825", "NPA012825", "NPA030372",
    "NPA030372", "NPA018844", "NPA018844", "NPA012179", "NPA012179", "NPA029930",
    "NPA002256", "NPA013197", "NPA032382", "NPA017763", "NPA003034", "NPA003034",
    "NPA019109", "NPA019109", "NPA019109", "NPA009848", "NPA032683", "NPA025662",
    "NPA025662", "NPA002308", "NPA015724", "NPA017922", "NPA017922", "NPA017922",
    "NPA017922", "NPA017922", "NPA033388", "NPA017640", "NPA017640", "NPA005178",
    "NPA010756", "NPA030273", "NPA030273", "NPA006820", "NPA006820", "NPA010756",
    "NPA010756", "NPA009407", "NPA005057", "NPA009058", "NPA010074", "NPA018865",
    "NPA011149", "NPA011149", "NPA004260", "NPA005178", "NPA005178", "NPA009430",
    "NPA005057", "NPA005057", "NPA034185", "NPA010200", "NPA018980", "NPA018980",
    "NPA034275", "NPA007565", "NPA023330", "NPA023331", "NPA023330", "NPA023331",
    "NPA014954", "NPA014954", "NPA012620", "NPA034823"
]

## 🌐 Step 2: Define API Base URL

### This is the NPAtlas API endpoint used to fetch compound data in SDF format

In [5]:
base_url = 'https://www.npatlas.org/api/v1/compound/{}/mol?encode=sdf'

## 💾 Step 3: Set Output File Path

### This is the file where all downloaded compounds will be merged and saved

In [6]:
merged_sdf_path = 'npatlas_merged.sdf'

## 📥 Step 4: Download and Merge Compounds

### This block downloads SDF files for each compound and writes them into a single SDF file

In [7]:
with open(merged_sdf_path, 'wb') as merged_file:
    # Using tqdm to display progress
    for compound_id in tqdm(compound_ids, desc="Downloading compounds"):
        try:
            # Construct the full URL for each compound
            full_url = base_url.format(compound_id)
            response = requests.get(full_url)

            # If request is successful, write the SDF content to the output file
            if response.status_code == 200:
                merged_file.write(response.content)
                merged_file.write(b'\n$$$$\n')  # Standard separator for SDF files
            else:
                print(f"❌ Failed: {compound_id} (Status {response.status_code})")

        except Exception as e:
            print(f"⚠️ Error downloading {compound_id}: {e}")

Downloading compounds: 100%|█████████████████████████████████████████████████████████| 292/292 [04:30<00:00,  1.08it/s]


## ✅ Step 5: Confirmation Message

In [8]:
print("\n✅ All compounds downloaded and merged into:", merged_sdf_path)


✅ All compounds downloaded and merged into: npatlas_merged.sdf
