# CHELSA Monthly Climate Data Downloader

This script automates the download of monthly climate data from the CHELSA dataset. It reads a list of URLs from the "envidatS3paths.txt" file and organizes the downloaded data into folders based on specific keywords. This helps to efficiently manage and store the downloaded files.

## Prerequisites

1. **Python Dependencies**: Ensure you have the required Python dependencies installed. You can install them using the following command:

   ```bash
   pip install requests
   ```

## Usage

1. **URL List**: Prepare a text file named "envidatS3paths.txt" containing the URLs of the data files you want to download. Each URL should be on a separate line.

2. **Keywords**: Update the `keywords` list with the keywords that correspond to the data types you want to organize and download (e.g., "pr", "rsds", "tasmax", "tasmin", "vpd").

3. **Open the Jupyter Notebook**: Launch your Jupyter Notebook environment.

4. **Run the Notebook Cells**: Execute the notebook cells sequentially by clicking on each cell and pressing Shift + Enter. Make sure to run the cells in the correct order.

5. **Output**: The script will create subfolders for each keyword (e.g., "pr", "rsds") and download the respective files into those folders. If any downloaded files are incomplete or corrupted, the script will detect and handle them by replacing or deleting the problematic files. The script will also create an "incomplete_files.txt" file to keep track of incomplete or corrupted downloads.

## Notes

- The script uses the `requests` library to handle HTTP requests and download files. Make sure you have a stable internet connection while running the script.
- This script is tailored for the specific requirements of the CHELSA dataset. Adapt it accordingly if you're using it for a different dataset or scenario.
- Ensure data integrity by verifying downloaded files after the script execution.

## Author

Script written by Luca Ferrari

Contact: luca.ferrari@usys.ethz.ch

For inquiries or assistance, please contact the author.

**Note:** Always ensure compliance with terms of use and copyright restrictions when downloading and using external datasets.

This README content was generated with the assistance of an AI language model from OpenAI. The provided content is based on user input and has been tailored to the specific requirements of the project.

In [None]:
import os
import requests
from multiprocessing import Pool

keywords = ["pr", "rsds", "tasmax", "tasmin", "vpd"]
with open("envidatS3paths.txt") as f:
    urls = [url.rstrip() for url in f]

def download_file(url):
    try:
        # Extract the filename from the URL
        filename = os.path.basename(url)

        # Determine the keyword that identifies the data
        keyword = next((kw for kw in keywords if kw in filename), None)

        # If no keyword is found, skip the file
        if not keyword:
            print(f"Skipping file {filename}")
            return

        # Determine the destination folder based on the keyword
        folder = os.path.join(os.getcwd(), keyword)

        # Create the folder if it doesn't exist
        if not os.path.exists(folder):
            print(f"Creating folder {folder}")
            os.makedirs(folder)

        # Download the file to the destination folder
        output_file = os.path.join(folder, filename)
        if os.path.exists(output_file):
            # Check if the existing file is complete and not corrupted
            existing_size = os.path.getsize(output_file)
            headers = requests.head(url).headers
            downloaded_size = int(headers.get("Content-Length", 0))
            if downloaded_size != existing_size:
                # Delete the existing file and download the new file
                print(f"Replacing incomplete or corrupted file {output_file}")
                os.remove(output_file)
                response = requests.get(url)
                with open(output_file, "wb") as f:
                    f.write(response.content)
                with open("incomplete_files.txt", "a") as f:
                    f.write(url + "\n")
            else:
                print(f"Skipping existing complete file {output_file}")
        else:
            # Download the file to the destination folder
            print(f"Downloading file {filename}")
            response = requests.get(url)
            with open(output_file, "wb") as f:
                f.write(response.content)
            # Check if the downloaded file is complete and not corrupted
            downloaded_size = len(response.content)
            headers = requests.head(url).headers
            content_length = int(headers.get("Content-Length", 0))
            if downloaded_size != content_length:
                # Delete the incomplete or corrupted file and add its URL to the incomplete_files.txt file
                print(f"Deleting incomplete or corrupted file {output_file}")
                os.remove(output_file)
                with open("incomplete_files.txt", "a") as f:
                    f.write(url + "\n")
    except Exception as e:
        print(f"Error downloading file {url}: {e}")

if __name__ == "__main__":
    with Pool(8) as p:
        p.map(download_file, urls)
