# PyEuropePMC FTP Downloader Demo

This notebook demonstrates the core functionality of the PyEuropePMC FTP downloader for downloading full-text articles from Europe PMC.

## Quick Setup

In [9]:
from pyeuropepmc.ftp_downloader import FTPDownloader
from pathlib import Path
import logging

# Set up logging to see what's happening
logging.basicConfig(level=logging.INFO)

# Create downloader and output directory
downloader = FTPDownloader()
downloads_dir = Path("downloads")
downloads_dir.mkdir(exist_ok=True)

print("✅ Setup complete!")

✅ Setup complete!


## Basic Usage: Download a Single Article

In [10]:
# Download a single PMC article
pmcid = "1911200"  # Example PMC ID
print(f"📥 Downloading PMC{pmcid}...")

try:
    results = downloader.bulk_download_and_extract([pmcid], downloads_dir)
    
    if results[pmcid]["status"] == "success":
        pdf_path = results[pmcid]["pdf_paths"][0]
        print(f"✅ Success! Downloaded to: {pdf_path}")
        print(f"📄 File size: {pdf_path.stat().st_size / 1024:.1f} KB")
    else:
        print(f"❌ Failed: {results[pmcid]['error']}")
        
except Exception as e:
    print(f"❌ Error: {e}")

INFO:pyeuropepmc.ftp_downloader:Querying 1 PMC IDs in FTP server
INFO:pyeuropepmc.ftp_downloader:Searching in targeted directories: ['PMCxxxx1199', 'PMCxxxx1200', 'PMCxxxx199', 'PMCxxxx200', 'PMCxxxx201']
INFO:pyeuropepmc.ftp_downloader:Searching in targeted directories: ['PMCxxxx1199', 'PMCxxxx1200', 'PMCxxxx199', 'PMCxxxx200', 'PMCxxxx201']


INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx1200/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:Found 52 ZIP files in PMCxxxx1200
INFO:pyeuropepmc.ftp_downloader:Found PMC1911200 in PMCxxxx1200
INFO:pyeuropepmc.ftp_downloader:Found 52 ZIP files in PMCxxxx1200
INFO:pyeuropepmc.ftp_downloader:Found PMC1911200 in PMCxxxx1200


📥 Downloading PMC1911200...


INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx1199/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:Found 56 ZIP files in PMCxxxx1199
INFO:pyeuropepmc.ftp_downloader:Found 56 ZIP files in PMCxxxx1199
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx201/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx201/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:Found 591 ZIP files in PMCxxxx201
INFO:pyeuropepmc.ftp_downloader:Found 591 ZIP files in PMCxxxx201
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx200/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx200/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:Found 612 ZIP files in PMCxxxx200
INFO:pyeuropepmc.ftp_downloader:Found 612 ZIP files in PMCxxxx200
INFO:pye

✅ Success! Downloaded to: downloads/extracted/PMC1911200.pdf
📄 File size: 54.7 KB


## Bulk Download: Multiple Articles

In [11]:
# Download multiple articles at once
pmcids = ["1911200", "1976993", "1034000"]
print(f"📦 Downloading {len(pmcids)} articles...")

try:
    results = downloader.bulk_download_and_extract(pmcids, downloads_dir)
    
    print("\n📊 Results:")
    for pmcid, result in results.items():
        status = result["status"]
        if status == "success":
            pdf_count = len(result["pdf_paths"])
            print(f"  PMC{pmcid}: ✅ {pdf_count} PDF(s) downloaded")
        else:
            print(f"  PMC{pmcid}: ❌ {result['error']}")
            
except Exception as e:
    print(f"❌ Error: {e}")

INFO:pyeuropepmc.ftp_downloader:Querying 3 PMC IDs in FTP server
INFO:pyeuropepmc.ftp_downloader:Searching in targeted directories: ['PMCxxxx000', 'PMCxxxx001', 'PMCxxxx1199', 'PMCxxxx1200', 'PMCxxxx199', 'PMCxxxx200', 'PMCxxxx201', 'PMCxxxx992', 'PMCxxxx993', 'PMCxxxx994']
INFO:pyeuropepmc.ftp_downloader:Searching in targeted directories: ['PMCxxxx000', 'PMCxxxx001', 'PMCxxxx1199', 'PMCxxxx1200', 'PMCxxxx199', 'PMCxxxx200', 'PMCxxxx201', 'PMCxxxx992', 'PMCxxxx993', 'PMCxxxx994']
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx1200/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:Found 52 ZIP files in PMCxxxx1200
INFO:pyeuropepmc.ftp_downloader:Found PMC1911200 in PMCxxxx1200
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx1200/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:Found 52 ZIP files in PMCxxxx1200
INFO:pyeuropepmc.ftp_downloader:Found PMC1911200 in PMCxxxx1200
INFO:pyeuropepm

📦 Downloading 3 articles...


INFO:pyeuropepmc.ftp_downloader:Found 591 ZIP files in PMCxxxx201
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx992/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx992/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:Found 532 ZIP files in PMCxxxx992
INFO:pyeuropepmc.ftp_downloader:Found 532 ZIP files in PMCxxxx992
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx200/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx200/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:Found 612 ZIP files in PMCxxxx200
INFO:pyeuropepmc.ftp_downloader:Found 612 ZIP files in PMCxxxx200
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx001/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ft


📊 Results:
  PMC1911200: ✅ 1 PDF(s) downloaded
  PMC1976993: ✅ 1 PDF(s) downloaded
  PMC1034000: ✅ 1 PDF(s) downloaded


## Check Available Files

In [None]:
# Check which PMC IDs are available before downloading
test_pmcids = ["1911200", "1976993", "9999999"
               ]  # Mix of available and unavailable
print(f"🔍 Checking availability of {len(test_pmcids)} PMC IDs...")

available_files = downloader.query_pmcids_in_ftp(test_pmcids)

print("\n📋 Availability Report:")
for pmcid in test_pmcids:
    if available_files[pmcid]:
        file_info = available_files[pmcid]
        size_kb = file_info["size"] / 1024
        print(f"  PMC{pmcid}: ✅ Available ({size_kb:.1f} KB)")
    else:
        print(f"  PMC{pmcid}: ❌ Not found")

INFO:pyeuropepmc.ftp_downloader:Searching in targeted directories: ['PMCxxxx1199', 'PMCxxxx1200', 'PMCxxxx199', 'PMCxxxx200', 'PMCxxxx201', 'PMCxxxx992', 'PMCxxxx993', 'PMCxxxx994', 'PMCxxxx998', 'PMCxxxx999']
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx1200/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx1200/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:Found 52 ZIP files in PMCxxxx1200
INFO:pyeuropepmc.ftp_downloader:Found PMC1911200 in PMCxxxx1200
INFO:pyeuropepmc.ftp_downloader:Found 52 ZIP files in PMCxxxx1200
INFO:pyeuropepmc.ftp_downloader:Found PMC1911200 in PMCxxxx1200
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx1199/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx1199/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:Found 56 ZIP files

🔍 Checking availability of 3 PMC IDs...


INFO:pyeuropepmc.ftp_downloader:Found 591 ZIP files in PMCxxxx201
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx992/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx992/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:Found 532 ZIP files in PMCxxxx992
INFO:pyeuropepmc.ftp_downloader:Found 532 ZIP files in PMCxxxx992
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx999/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx999/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:Found 545 ZIP files in PMCxxxx999
INFO:pyeuropepmc.ftp_downloader:Found 545 ZIP files in PMCxxxx999
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ftp/pdf/PMCxxxx200/ succeeded with status 200
INFO:pyeuropepmc.ftp_downloader:FTP GET request to https://europepmc.org/ft


📋 Availability Report:
  PMC1911200: ✅ Available (43.0 KB)
  PMC1976993: ✅ Available (111.0 KB)
  PMC9999999: ❌ Not found


## What Gets Downloaded

In [17]:
# Show what files were created
print("📁 Downloaded files:")
for file_path in sorted(downloads_dir.rglob("*")):
    if file_path.is_file():
        size_kb = file_path.stat().st_size / 1024
        print(f"  {file_path.name}: {size_kb:.1f} KB")

📁 Downloaded files:
  PMC1911200.pdf: 54.7 KB
  PMC3312970.pdf: 2376.5 KB
  PMCPMC3257301.xml: 120.7 KB
  PMC3312970.pdf: 2376.5 KB
  PMC4123456.pdf: 118.7 KB
  PMC5678901.pdf: 680.8 KB
  PMC1911200.pdf: 54.7 KB
  PMC1034000.pdf: 257.0 KB
  PMC1911200.pdf: 54.7 KB
  PMC1976993.pdf: 116.3 KB
  PMC1716993.pdf: 129.8 KB
  PMC1716993.pdf: 129.8 KB
  PMC1911200.pdf: 54.7 KB
  PMC1716993.pdf: 129.8 KB
  PMC5251083.pdf: 631.8 KB


## Key Features

- **Automatic Discovery**: Finds PMC articles in the Europe PMC FTP server
- **Bulk Download**: Download multiple articles efficiently  
- **PDF Extraction**: Automatically extracts PDFs from ZIP files
- **Error Handling**: Clear error messages for missing or unavailable articles
- **Smart Directory Navigation**: Intelligently searches FTP directory structure

## Common Use Cases

1. **Research Data Collection**: Download papers for systematic reviews
2. **Text Mining**: Bulk download for NLP and text analysis projects
3. **Backup**: Archive important papers locally
4. **Offline Reading**: Download papers for offline access

That's it! The FTP downloader makes it easy to get full-text articles from Europe PMC.