# Gong Data Archive

## Data source:

1. The Gong2 data is hosted here: https://gong2.nso.edu/archive/patch.pl?menutype=s. 
1. There is an FTP site organized with different subfolders: https://nispdata.nso.edu/ftp/
1. Data fetch initially from https://nispdata.nso.edu/ftp/oQR/zqa/202402/

## Subfolder breakdown:

1. `oQR` [1] refers to Quick-Reduce Outputs refers to outputs produced each minute, weather permitting, the GONG network observes the Sun at two spectral wavelengths: 676.78nm (a Ni I absorption line) and 656.28nm (the H-alpha absorption line).
1. `zqa` is a filetype, because GONG was not originally designed for precise calibration and removal of non-solar magnetic field bias, a separate zeropoint corrected (ZPC) “zqa” [2] file is [normally] created from each “bqa”. The two are identical except for the subtraction of a planar zeropoint correction.
1. `202402` YYYYMM data around Feburary 22, 2024, when the three top-tier X-class solar flares launched off the sun
1. The GONG network has telescopes strategically placed at six locations around the world. Each site represents one of the six longitudal bands that allows the network to make 24-hour a day observations of the Sun. Current coverage with the network is around 87%. Subfolders [3] correspond to the 6x locations:
1. bb = Big Bear Solar Observatory, California
1. ct = Cerro Tololo Interamerican Observatory, Chile
1. le = Learmonth Solar Observatory, Australia
1. td = El Teide Observatory, Canary Islands
1. ud = Udaipur Solar Observatory, India
1. ?? = Mauna Loa Observatory, Hawaii, USA

# References
- [1] https://catalog.data.gov/dataset/global-oscillation-network-group-gong-quick-reduce-outputs-oqr
- [2] https://ccmc.gsfc.nasa.gov/static/files/CCMC_SWPC_annex_final_report.pdf
- [3] https://nso.edu/telescopes/nisp/gong/

In [1]:
pip install --upgrade pip && pip install requests beautifulsoup4 -q

Note: you may need to restart the kernel to use updated packages.


# Fetch data

To download all the files in the folders that contain "dim-860.jpg" from https://nispdata.nso.edu/ftp/oQR/zqa/202402/, I used BeautifulSoup libraries to scrape the webpage for links. Then, I use the os and shutil libraries to create directories and download the files.

In [None]:
%%writefile ../solar_flare_demo/fetch_data.py

import requests
from bs4 import BeautifulSoup
import os
import time
from tqdm import tqdm

# The base URL of the directory is specified.
base_url = 'https://nispdata.nso.edu/ftp/oQR/zqa/202402/'

# Ensure the data is defined
data = '../data/raw'  # Define your data path here

# Downloads a file from a given URL and saves it to a specified folder.
def download_file(url, dest_folder):
    if not os.path.exists(dest_folder):
        os.makedirs(dest_folder)
    file_name = url.split('/')[-1]
    file_path = os.path.join(dest_folder, file_name)

    response = requests.get(url, stream=True)
    if response.status_code == 200:
        with open(file_path, 'wb') as f:
            for chunk in response.iter_content(1024):
                f.write(chunk)
        return True
    return False

# Finds all subfolders in the base URL that contain the target file (dim-860.jpg)
def find_folders_with_file(url, target_file):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    folders = []
    
    for link in soup.find_all('a'):
        href = link.get('href')
        if href.endswith('/'):
            folder_url = url + href
            folder_response = requests.get(folder_url)
            folder_soup = BeautifulSoup(folder_response.text, 'html.parser')
            if any(target_file in a.get('href') for a in folder_soup.find_all('a')):
                folders.append(folder_url)
    return folders

# Downloads only the files containing 'dim-860.jpg' from the identified folders into a single folder
def download_files_from_folders(folders, target_file, dest_folder):
    file_count = 0
    for folder in tqdm(folders, desc="Processing folders"):
        response = requests.get(folder)
        soup = BeautifulSoup(response.text, 'html.parser')
        for link in soup.find_all('a'):
            file_href = link.get('href')
            if target_file in file_href and not file_href.endswith('/'):
                file_url = folder + file_href
                if download_file(file_url, dest_folder):
                    file_count += 1
    return file_count

# Starts by specifying the target file and then finds the relevant folders. Finally, it downloads all the files from these folders into a single folder
if __name__ == '__main__':
    start_time = time.time()
    target_file = 'dim-860.jpg'
    folders = find_folders_with_file(base_url, target_file)
    file_count = download_files_from_folders(folders, target_file, data)
    end_time = time.time()
    elapsed_time = end_time - start_time
    print(f"Total files downloaded: {file_count}")
    print(f"Time taken: {elapsed_time:.2f} seconds")


In [None]:
%run ../scripts/fetch_data.py

In [None]:
# Sync to Object Bucket

In [2]:
# install requirements

%pip install -U boto3 python-dotenv -q

Note: you may need to restart the kernel to use updated packages.


In [3]:
# import minio and dependencies

from minio import Minio
import os
import glob
import urllib3

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

In [15]:
s3_client.list_buckets()

[Bucket('data')]

In [16]:
import os
import glob
from minio import Minio
from dotenv import load_dotenv

# assuming Minio is deployed, populate the environment variables

!  echo "AWS_S3_BUCKET=${AWS_S3_BUCKET:-data}" > .env
!  echo "AWS_S3_ENDPOINT=${AWS_S3_ENDPOINT:-http://minio.minio.svc:9000}" >> .env
!  echo "AWS_ACCESS_KEY_ID=$(oc -n minio extract secret/minio-root-user --keys=MINIO_ROOT_USER --to=-)" >> .env
!  echo "AWS_SECRET_ACCESS_KEY=$(oc -n minio extract secret/minio-root-user --keys=MINIO_ROOT_PASSWORD --to=-)" >> .env

load_dotenv()  # take environment variables from .env.

# fetch s3 env variable - these values will be fetched from Data Connection setup

access_key = os.getenv("AWS_ACCESS_KEY_ID", "minioadmin")
secret_key = os.getenv("AWS_SECRET_ACCESS_KEY", "minioadmin")
s3_endpoint = os.getenv("AWS_S3_ENDPOINT", "localhost:9000").lstrip("http://")
bucket_name = os.getenv("AWS_S3_BUCKET", "data")

# Initialize MinIO client

s3_client = Minio(
    s3_endpoint, access_key=access_key, secret_key=secret_key, secure=False
)

def upload_local_directory_to_s3(bucket_name, local_path):
    """
    Upload a local directory to an S3 bucket.

    :param bucket_name: Name of the S3 bucket
    :param local_path: Path to the local directory
    """
    assert os.path.isdir(local_path), f"{local_path} is not a directory."

    # Upload files in the directory to the bucket
    for local_file in glob.glob(os.path.join(local_path, '**'), recursive=True):
        local_file = local_file.replace(os.sep, "/")

        print("local file:", local_file)

        if not os.path.isfile(local_file):
            upload_local_directory_to_s3(bucket_name, local_file)
        else:
            remote_path = os.path.join(
                local_path.lstrip('../data'), local_file[1 + len(local_file) :]
            )
            remote_path = remote_path.replace(os.sep, "/")
            remote_file = os.path.join(remote_path, os.path.basename(local_file))

            # print("remote file:", remote_file)

            try:
                s3_client.stat_object(bucket_name, remote_file)
                print("remote exists:", remote_file)
            except Exception as e:
                s3_client.fput_object(bucket_name, remote_file, local_file)

def download_all_from_s3(local_path):
    """
    Download all files from all buckets in S3 to a local path.

    :param local_path: Path to the local directory
    """
    for bucket in s3_client.list_buckets():
        for item in s3_client.list_objects(bucket.name, recursive=True):
            local_file = os.path.join(local_path, item.object_name)

            if os.path.exists(local_file):
                print("local exists:", local_file)
            else:
                s3_client.fget_object(bucket.name, item.object_name, local_file)

# MINIO_ROOT_USER
# MINIO_ROOT_PASSWORD


In [18]:
# check if the bucket already exists
if not s3_client.bucket_exists(bucket_name):
    s3_client.make_bucket(bucket_name)
    print(f"Bucket '{bucket_name}' created successfully.")

In [14]:
# upload dataset
try:
    upload_local_directory_to_s3(bucket_name, "../data/raw")

except Exception as err:
    print(f"Failed to upload files to bucket '{bucket_name}': {err}")

local file: ../data/raw/tdzqa240202t1604_dim-860.jpg
local file: ../data/raw/udzqa240224t0554_dim-860.jpg
local file: ../data/raw/udzqa240222t1034_dim-860.jpg
local file: ../data/raw/udzqa240211t0424_dim-860.jpg
local file: ../data/raw/lezqa240202t0914_dim-860.jpg
local file: ../data/raw/ctzqa240228t1424_dim-860.jpg
local file: ../data/raw/lezqa240229t0424_dim-860.jpg
local file: ../data/raw/udzqa240220t0914_dim-860.jpg
local file: ../data/raw/tdzqa240219t1154_dim-860.jpg
local file: ../data/raw/lezqa240227t0354_dim-860.jpg
local file: ../data/raw/tdzqa240220t1114_dim-860.jpg
local file: ../data/raw/tdzqa240229t1254_dim-860.jpg
local file: ../data/raw/tdzqa240205t1014_dim-860.jpg
local file: ../data/raw/lezqa240207t0304_dim-860.jpg
local file: ../data/raw/udzqa240229t1004_dim-860.jpg
local file: ../data/raw/ctzqa240226t1644_dim-860.jpg
local file: ../data/raw/lezqa240221t0024_dim-860.jpg
local file: ../data/raw/lezqa240209t0314_dim-860.jpg
local file: ../data/raw/tdzqa240216t1154_dim-8