<a href="https://colab.research.google.com/github/CarmenTheodoraCraciun/ML-Sleep-Quality-Based-EEG-Signals/blob/main/1_Sleep_EDF_Ext_download.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Learning Systems for Sleep Quality Assessment Based on EEG Signals

---

* **Author**: Carmen-Theodora Craciun
* **Status**: Done
* **Previos step**: None
* **The purpose of this Notebook**: This notebook handles the *downloading and organization* of EDF files from the two studies offered by PhysioNet.
* **Dataset**: [Sleep-EDFX Database (PhysioNet)](https://www.physionet.org/content/sleep-edfx/1.0.0/)
  * **Studies used:**
    * Sleep Cassette (SC) - the study on healthy people
    * Sleep Telemetry (ST) - study on people diagnosed with mild insomnia.
* **Input**: A text file with the credentials for logging into the [PyshioNet](https://physionet.org/login/) platform.
* **Output:** A zip file containing 4 folders:
  * SC_healthy_hypno (153 EDF files) - hypnogram files of the SC study
  * SC_healthy_psg (153 EDF files) - PSG files of the SC study
  * ST_insomnia_hypno (44 EDF files)
  * ST_insomnia_psg (44 EDF files)

#Importing

In [None]:
import os
import urllib3
import requests
import warnings
from bs4 import BeautifulSoup
import time

#Connect

In [None]:
def upload_credentials():
  '''
  Upload the credentials for PyshioNet.
  Return:
   - the username and password (str).
  '''
  print("Upload the file with username and password (first line: username, second line: password)")
  uploaded = files.upload()

  for filename in uploaded.keys():
      with open(filename, 'r') as f:
          lines = f.read().splitlines()
          if len(lines) >= 2:
              username = lines[0].strip()
              password = lines[1].strip()
              return username, password
          else:
              raise ValueError("File must contain at least two lines: username and password")

In [None]:
from google.colab import files
username, password = upload_credentials()

print(f"Username: {username}")
print("Password: [PROTECTED]")

Upload the file with username and password (first line: username, second line: password)


Saving physhionet.txt to physhionet (4).txt
Username: carmen-theodora
Password: [PROTECTED]


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


#Download

In [None]:
class SleepEdFxDownloader:
    def __init__(self, username, password, base_path='/content/drive/MyDrive/sleep_data/raw_dataset'):
        self.auth = (username, password)
        self.base_url = "https://physionet.org/files/sleep-edfx/1.0.0/"

        self.base_path = base_path

        self.headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'}
        self.study_urls = [
            "https://physionet.org/content/sleep-edfx/1.0.0/sleep-cassette/",
            "https://physionet.org/content/sleep-edfx/1.0.0/sleep-telemetry/"
        ]

        self.directories = [
            os.path.join(base_path, 'SC_healthy_psg'),
            os.path.join(base_path, 'SC_healthy_hypno'),
            os.path.join(base_path, 'ST_insomnia_psg'),
            os.path.join(base_path, 'ST_insomnia_hypno')
        ]

        urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

    def __get_all_files_from_studies(self):
      '''Get all files from both studies.

      Return:
        - A list of all files.
      '''
      all_files = []
      for url in self.study_urls:
          response = requests.get(url)
          response.raise_for_status()
          soup = BeautifulSoup(response.text, 'html.parser')

          for link in soup.find_all('a', href=True):
              href = link['href']
              if href.endswith(('.edf', '.hypnogram')):
                  all_files.append(href)

      print(f"Found {len(all_files)} total files to dowload")
      return all_files

    def __get_file_info(self,filename):
      '''Get the type and study for the file.

      Return:
        - study where the file belongs
        - output path
        - file type
      '''
      is_psg = 'PSG' in filename
      is_hypno = 'Hypnogram' in filename

      if filename.startswith('SC'):
          subfolder = "sleep-cassette"
          if is_psg:
              return "sleep-cassette", f"{self.base_path}/SC_healthy_psg", "psg"
          elif is_hypno:
              return "sleep-cassette", f"{self.base_path}/SC_healthy_hypno", "hypno"
      elif filename.startswith('ST'):
          subfolder = "sleep-telemetry"
          if is_psg:
              return "sleep-telemetry", f"{self.base_path}/ST_insomnia_psg", "psg"
          elif is_hypno:
              return "sleep-telemetry", f"{self.base_path}/ST_insomnia_hypno", "hypno"

      return None, None, None

    def __check_file_exist(self, filename, output_path, file_type, downloaded_list):
        ''' Check if the file already exists.'''
        if os.path.exists(output_path):
            file_size = os.path.getsize(output_path)
            min_size = 1000000 if file_type == "psg" else 1000

            if file_size > min_size:
                downloaded_list.append(filename)
                return True

        return False

    def __check_file_size(self, downloaded_list, filename, output_path, file_type, failed_list):
        ''' Check if the file has the correct size.'''
        if not os.path.exists(output_path):
            failed_list.append(filename)
            return False

        file_size = os.path.getsize(output_path)

        min_size = 1000000 if file_type == "psg" else 1000

        if file_size > min_size:
            downloaded_list.append(filename)
            return True
        else:
            os.remove(output_path)
            failed_list.append(filename)
            return False

    def download_files(self):
        '''Main method to execute the logic.'''
        # Get the list of the files
        file_list = self.__get_all_files_from_studies()

        # Create directory if not exist
        os.makedirs(self.base_path, exist_ok=True)
        for directory in self.directories:
            os.makedirs(directory, exist_ok=True)

        downloaded = []
        failed = []

        print(f"Starting check/download process for {len(file_list)} files...")

        for idx, filename in enumerate(file_list):
            # Get the output_dir for the file
            subfolder, output_dir, file_type = self.__get_file_info(filename)

            if subfolder is None:
                continue

            output_path = os.path.join(output_dir, filename)

            # Check if the file exist
            if self.__check_file_exist(filename, output_path, file_type, downloaded):
                continue

            # If not download the file
            url = f"{self.base_url}{subfolder}/{filename}"

            try:
                print(f"Downloading {filename}...")
                # Request to PhysioNet
                response = requests.get(url, auth=self.auth, headers=self.headers, timeout=60, verify=False, stream=True)

                if response.status_code != 200:
                    print(f"HTTP {response.status_code} for {filename}")
                    failed.append(filename)
                    continue

                # Write the file
                with open(output_path, 'wb') as f:
                    for chunk in response.iter_content(chunk_size=8192):
                        if chunk:
                            f.write(chunk)

                if not self.__check_file_size(downloaded, filename, output_path, file_type, failed):
                    print(f"Validation Failed: {filename}")

            except Exception as e:
                print(f"Error downloading {filename}: {str(e)}")
                failed.append(filename)

        print(f"\n=== SUMMARY ===")
        print(f"Total files (existing + downloaded): {len(downloaded)}")
        print(f"Failed downloads: {len(failed)}")

        return downloaded, failed

In [None]:
try:
    downloader = SleepEdFxDownloader(username, password)
    downloaded, failed = downloader.download_files()
except Exception as e:
    print(f"Download process failed: {e}")
    downloaded, failed = [], []

Found 394 total files to dowload
Starting check/download process for 394 files...

=== SUMMARY ===
Total files (existing + downloaded): 394
Failed downloads: 0


---
# Conclusions and Next Step

In this notebook we have accomplished:
1. Download the EDF files.
2. Order the files in 4 different folders.

The next step in the pipeline is: [**Data Visualization**](https://github.com/CarmenTheodoraCraciun/ML-Sleep-Quality-Based-EEG-Signals/blob/main/2_Sleep_EDF_Ext_visualization.ipynb)