# Gong Data Archive

1. The Gong2 data is hosted here: https://gong2.nso.edu/archive/patch.pl?menutype=s. 
1. There is an FTP site organized with different subfolders: https://nispdata.nso.edu/ftp/
1. Target data fetch initially from https://nispdata.nso.edu/ftp/oQR/zqa/202402/
1. The subfolder `oQR` [1] refers to Quick-Reduce Outputs refers to outputs produced each minute, weather permitting, the GONG network observes the Sun at two spectral wavelengths: 676.78nm (a Ni I absorption line) and 656.28nm (the H-alpha absorption line).
1. The subfolder `zqa` is a filetype, because GONG was not originally designed for precise calibration and removal of non-solar magnetic field bias, a separate zeropoint corrected (ZPC) “zqa” [2] file is [normally] created from each “bqa”. The two are identical except for the subtraction of a planar zeropoint correction.
1. The GONG network has telescopes strategically placed at six locations around the world. Each site represents one of the six longitudal bands that allows the network to make 24-hour a day observations of the Sun. Current coverage with the network is around 87%. Subfolders [3] correspond to the 6x locations:
1. bb = Big Bear Solar Observatory, California
1. ct = Cerro Tololo Interamerican Observatory, Chile
1. le = Learmonth Solar Observatory, Australia
1. td = El Teide Observatory, Canary Islands
1. ud = Udaipur Solar Observatory, India
1. ?? = Mauna Loa Observatory, Hawaii, USA

# References
- [1] https://catalog.data.gov/dataset/global-oscillation-network-group-gong-quick-reduce-outputs-oqr
- [2] https://ccmc.gsfc.nasa.gov/static/files/CCMC_SWPC_annex_final_report.pdf
- [3] https://nso.edu/telescopes/nisp/gong/

In [3]:
pip install --upgrade pip && pip install requests beautifulsoup4 -q

Note: you may need to restart the kernel to use updated packages.


# Fetch data

To download all the files in the folders that contain "dim-860.jpg" from https://nispdata.nso.edu/ftp/oQR/zqa/202402/, I used BeautifulSoup libraries to scrape the webpage for links. Then, I use the os and shutil libraries to create directories and download the files.

In [6]:
# scratch directory is apart of the .gitignore to ensure it is not committed to git
%env SCRATCH=../scratch
! [ -e "${SCRATCH}" ] || mkdir -p "${SCRATCH}"

scratch_path = os.environ.get('SCRATCH', './scratch')

env: SCRATCH=../scratch


In [14]:
# %%writefile ../scripts/fetch_data.py
import requests
from bs4 import BeautifulSoup
import os

# The base URL of the directory is specified.
base_url = 'https://nispdata.nso.edu/ftp/oQR/zqa/202402/'

# Ensure the scratch_path is defined
scratch_path = '../scratch'  # Define your scratch path here

# Downloads a file from a given URL and saves it to a specified folder.
def download_file(url, dest_folder):
    if not os.path.exists(dest_folder):
        os.makedirs(dest_folder)
    file_name = url.split('/')[-1]
    file_path = os.path.join(dest_folder, file_name)

    response = requests.get(url, stream=True)
    if response.status_code == 200:
        with open(file_path, 'wb') as f:
            for chunk in response.iter_content(1024):
                f.write(chunk)
        print(f"Downloaded: {file_name}")
    else:
        print(f"Failed to download: {file_name}")

# Finds all subfolders in the base URL that contain the target file (dim-860.jpg)
def find_folders_with_file(url, target_file):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    folders = []
    
    for link in soup.find_all('a'):
        href = link.get('href')
        if href.endswith('/'):
            folder_url = url + href
            folder_response = requests.get(folder_url)
            folder_soup = BeautifulSoup(folder_response.text, 'html.parser')
            if any(target_file in a.get('href') for a in folder_soup.find_all('a')):
                folders.append(folder_url)
    return folders

# Downloads only the files containing 'dim-860.jpg' from the identified folders
def download_files_from_folders(folders, target_file):
    for folder in folders:
        response = requests.get(folder)
        soup = BeautifulSoup(response.text, 'html.parser')
        folder_name = folder.split('/')[-2]
        for link in soup.find_all('a'):
            file_href = link.get('href')
            if target_file in file_href and not file_href.endswith('/'):
                file_url = folder + file_href
                download_file(file_url, os.path.join(scratch_path, folder_name))

# Starts by specifying the target file and then finds the relevant folders. Finally, it downloads all the files from these folders
if __name__ == '__main__':
    target_file = 'dim-860.jpg'
    folders = find_folders_with_file(base_url, target_file)
    download_files_from_folders(folders, target_file)


Downloaded: bbzqa240202t1954_dim-860.jpg
Downloaded: bbzqa240202t2014_dim-860.jpg
Downloaded: bbzqa240202t2144_dim-860.jpg
Downloaded: bbzqa240202t2154_dim-860.jpg
Downloaded: bbzqa240203t1944_dim-860.jpg
Downloaded: bbzqa240203t1954_dim-860.jpg
Downloaded: bbzqa240203t2004_dim-860.jpg
Downloaded: bbzqa240203t2014_dim-860.jpg
Downloaded: bbzqa240203t2024_dim-860.jpg
Downloaded: bbzqa240203t2034_dim-860.jpg
Downloaded: bbzqa240203t2044_dim-860.jpg
Downloaded: bbzqa240203t2054_dim-860.jpg
Downloaded: bbzqa240203t2104_dim-860.jpg
Downloaded: bbzqa240203t2124_dim-860.jpg
Downloaded: bbzqa240203t2144_dim-860.jpg
Downloaded: bbzqa240203t2214_dim-860.jpg
Downloaded: bbzqa240207t1804_dim-860.jpg
Downloaded: bbzqa240207t1814_dim-860.jpg
Downloaded: bbzqa240208t2034_dim-860.jpg
Downloaded: bbzqa240208t2044_dim-860.jpg
Downloaded: bbzqa240208t2054_dim-860.jpg
Downloaded: bbzqa240208t2104_dim-860.jpg
Downloaded: bbzqa240208t2114_dim-860.jpg
Downloaded: bbzqa240208t2124_dim-860.jpg
Downloaded: bbzq

KeyboardInterrupt: 

In [None]:
# %run ../scripts/fetch_data.py