# Code pour télécharger des projections CORDEX (et autres)

Vous pouvez trouver des données sur: 
https://esgf-metagrid.cloud.dkrz.de/

### Choisir les données 
![](img/ESGFmetagrid.png)

### Télécharger une liste de fichiers cible 

Pour le faire, ajourter votre choix au pannier.  
Télécharger la liste `wget`dans le pannier.
![](img/ESGFdownload.png)

### Inclure le fichier .sh téléchargé au bon endroit
Par exemple, dans le dossier `data` de ce projet.


In [None]:
### Tourner le code
Par exemple, dans le dossier `data` de ce projet.


In [7]:
import re
import os
import requests
from pathlib import Path

# looking for .sh files in the folder
data_folder = Path(r'data')
dest_folder = "download_folder"  # Replace with your desired folder.
files = data_folder.glob('*.sh')

# Create helper functions
def extract_urls_from_sh(sh_file_path):
    """
    Reads the shell file and extracts download URLs from the download_files block.
    """
    with open(sh_file_path, 'r') as f:
        content = f.read()

    # Use a regex to extract the block between the starting and ending markers.
    pattern = r'download_files="\$\(cat <<EOF--dataset\.file\.url\.chksum_type\.chksum(.*?)EOF--dataset\.file\.url\.chksum_type\.chksum'
    match = re.search(pattern, content, re.DOTALL)
    if not match:
        print("Could not find the download_files block.")
        return []
    
    block = match.group(1)
    urls = []
    # Process each line in the block.
    for line in block.splitlines():
        line = line.strip()
        if not line:
            continue
        # Each line should have 4 quoted strings: filename, URL, checksum type, and checksum.
        parts = re.findall(r"'([^']+)'", line)
        if len(parts) >= 2:
            urls.append(parts[1])
    return urls

def download_urls(urls, dest_folder):
    """
    Downloads each URL and saves it to dest_folder.
    """
    if not os.path.exists(dest_folder):
        os.makedirs(dest_folder)
    
    for url in urls:
        local_filename = os.path.join(dest_folder, os.path.basename(url))
        
        if not Path(local_filename).exists():
            print(f"Downloading {url} to {local_filename}...")
            try:
                response = requests.get(url)
                response.raise_for_status()
                with open(local_filename, 'wb') as f:
                    f.write(response.content)
                print("Downloaded:", local_filename)
            except Exception as e:
                print(f"Error downloading {url} : {e}")
        else:
            print("Already exists:", local_filename)
            
# Download the files
for f0 in files:
    urls = extract_urls_from_sh(f0)
    if urls:
        print("Found URLs:")
        for u in urls:
            print("  ", u)
        download_urls(urls, dest_folder)
    else:
        print("No URLs found.")


Found URLs:
   http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/AFR-44/CLMcom/CNRM-CERFACS-CNRM-CM5/historical/r1i1p1/CLMcom-CCLM4-8-17/v1/mon/pr/v20140401/pr_AFR-44_CNRM-CERFACS-CNRM-CM5_historical_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_195001-195012.nc
   http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/AFR-44/CLMcom/CNRM-CERFACS-CNRM-CM5/historical/r1i1p1/CLMcom-CCLM4-8-17/v1/mon/pr/v20140401/pr_AFR-44_CNRM-CERFACS-CNRM-CM5_historical_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_195101-196012.nc
   http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/AFR-44/CLMcom/CNRM-CERFACS-CNRM-CM5/historical/r1i1p1/CLMcom-CCLM4-8-17/v1/mon/pr/v20140401/pr_AFR-44_CNRM-CERFACS-CNRM-CM5_historical_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_196101-197012.nc
   http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/AFR-44/CLMcom/CNRM-CERFACS-CNRM-CM5/historical/r1i1p1/CLMcom-CCLM4-8-17/v1/mon/pr/v20140401/pr_AFR-44_CNRM-CERFACS-CNRM-CM5_historical_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_197101-198012

Error downloading http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/AFR-44/CLMcom/CNRM-CERFACS-CNRM-CM5/historical/r1i1p1/CLMcom-CCLM4-8-17/v1/mon/pr/v20140401/pr_AFR-44_CNRM-CERFACS-CNRM-CM5_historical_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_195001-195012.nc : 404 Client Error: Not Found for url: https://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/AFR-44/CLMcom/CNRM-CERFACS-CNRM-CM5/historical/r1i1p1/CLMcom-CCLM4-8-17/v1/mon/pr/v20140401/pr_AFR-44_CNRM-CERFACS-CNRM-CM5_historical_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_195001-195012.nc
Downloading http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/AFR-44/CLMcom/CNRM-CERFACS-CNRM-CM5/historical/r1i1p1/CLMcom-CCLM4-8-17/v1/mon/pr/v20140401/pr_AFR-44_CNRM-CERFACS-CNRM-CM5_historical_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_195101-196012.nc to download_folder\pr_AFR-44_CNRM-CERFACS-CNRM-CM5_historical_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_195101-196012.nc...
Error downloading http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/AF

Error downloading http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/AFR-44/CLMcom/CNRM-CERFACS-CNRM-CM5/rcp45/r1i1p1/CLMcom-CCLM4-8-17/v1/mon/pr/v20140401/pr_AFR-44_CNRM-CERFACS-CNRM-CM5_rcp45_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_203101-204012.nc : 404 Client Error: Not Found for url: https://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/AFR-44/CLMcom/CNRM-CERFACS-CNRM-CM5/rcp45/r1i1p1/CLMcom-CCLM4-8-17/v1/mon/pr/v20140401/pr_AFR-44_CNRM-CERFACS-CNRM-CM5_rcp45_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_203101-204012.nc
Downloading http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/AFR-44/CLMcom/CNRM-CERFACS-CNRM-CM5/rcp45/r1i1p1/CLMcom-CCLM4-8-17/v1/mon/pr/v20140401/pr_AFR-44_CNRM-CERFACS-CNRM-CM5_rcp45_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_204101-205012.nc to download_folder\pr_AFR-44_CNRM-CERFACS-CNRM-CM5_rcp45_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_204101-205012.nc...
Error downloading http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/AFR-44/CLMcom/CNRM-CERFACS-CNRM-CM5/r

Error downloading http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/AFR-44/CLMcom/CNRM-CERFACS-CNRM-CM5/rcp85/r1i1p1/CLMcom-CCLM4-8-17/v1/mon/pr/v20140401/pr_AFR-44_CNRM-CERFACS-CNRM-CM5_rcp85_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_203101-204012.nc : 404 Client Error: Not Found for url: https://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/AFR-44/CLMcom/CNRM-CERFACS-CNRM-CM5/rcp85/r1i1p1/CLMcom-CCLM4-8-17/v1/mon/pr/v20140401/pr_AFR-44_CNRM-CERFACS-CNRM-CM5_rcp85_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_203101-204012.nc
Downloading http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/AFR-44/CLMcom/CNRM-CERFACS-CNRM-CM5/rcp85/r1i1p1/CLMcom-CCLM4-8-17/v1/mon/pr/v20140401/pr_AFR-44_CNRM-CERFACS-CNRM-CM5_rcp85_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_204101-205012.nc to download_folder\pr_AFR-44_CNRM-CERFACS-CNRM-CM5_rcp85_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_204101-205012.nc...
Error downloading http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/AFR-44/CLMcom/CNRM-CERFACS-CNRM-CM5/r

KeyboardInterrupt: 