# SOHO analysis 
The purpose of this notebook is to analysis visual-spectrum images of the sun from the SOHO imaging satellite at the L1 Lagrange point (meaning the images are from the same perspective as the Earth) and use this data to determine the rotational period of the sun.

In [8]:
import os
from datetime import datetime

Recompute = False #Make this True to run the computation from scratch (takes > 1 minute)

# Function to extract the image paths and their timestamps
def get_files_with_times(root_dir = "data"):
    file_paths = []
    times = []
    for day_dir in sorted(os.listdir(root_dir)):
        day_path = os.path.join(root_dir,day_dir)
        if os.path.isdir(day_path):
            for file in sorted(os.listdir(day_path)):
                if file.endswith(".jpg"):
                    time_str = file.split('_')[1] # Extract time (hhmm)
                    time = datetime.strptime(f"{day_dir}_{time_str}", r"%Y%m%d_%H%M")
                    file_paths.append(os.path.join(day_path,file))
                    times.append(time)
    return file_paths, times

file_paths, times = get_files_with_times()

The first step of data processing is preprocessing and identification of the sunspots. This is done via the utility `image_processing.py`. The utility uses contouring to identify the centroids of features on the sun which *should* correspond well to the sunspots.

In [7]:
import matplotlib.pyplot as plt
from ipywidgets import interact, Dropdown
from utils.image_processing import detect_sunspots
import cv2

# Create a dropdown to select days
day_dirs = sorted([d for d in os.listdir("data") if os.path.isdir(os.path.join("data", d))])
@interact(day=Dropdown(options=day_dirs, description="Select Day:"))
def show_day_images(day):
    day_path = os.path.join("data", day)
    files = sorted([f for f in os.listdir(day_path) if f.endswith(".jpg")])[:16] #Only the first 12 images
    
    fig, axes = plt.subplots(3, 4, figsize=(15, 10))
    for ax, file in zip(axes.flat, files):
        img, centroids, solar_center, solar_radius = detect_sunspots(os.path.join(day_path, file))
        print(centroids)
        ax.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
        ax.scatter([c[0] for c in centroids], [c[1] for c in centroids], s=5, c='blue')
        ax.set_title(file.split('_')[1])  # Show time (hhmm)
        ax.axis('off')
    plt.tight_layout()

interactive(children=(Dropdown(description='Select Day:', options=('20250422', '20250423', '20250424', '202504…

The visualization above includes a drop down to select the day of choice and then the image represents all data (including coordinates at the top) collected from that date. The sunspot centroids are shown via blue dots.

---

The next part of the data analysis involves tracking the sunspots between frames. This was quite a complicated process with a decent amount of trial and error, but the main issue I faced was actually an error in using Carrington coordinates instead of Stony coordinates. This meant that all of my velocities were centered around 0 as Carrington is a rotating reference frame where the same point on the sun would move as the Earth orbited.

In [13]:
'''
This block computes the longitudinal angular-velocity of the sunspots between different
images via a nearst neighbour algorithm. Due to the sheer amount of computation (and inefficient code),
this block takes about 1~2 mins to run. Thus, the data is saved in a JSON file called sunspot_data.json.
If you want to run this block, you must change <Recompute> to True in the first code block of this notebook.
'''

#This function is for saving to JSON
def skycoord_to_dict(coord):
    return {
        'lon': coord.lon.deg,
        'lat': coord.lat.deg,
        'frame': coord.frame.name,
        'unit': 'deg'
    }

if Recompute:
    from utils.feature_tracking import SunspotTracker
    from astropy.coordinates import SkyCoord
    import json
    
    _, _, solar_center, solar_radius = detect_sunspots(file_paths[0]) #Initial value for solar radius from the first image
    tracker = SunspotTracker(solar_center, solar_radius, 1)

    #main feature tracking loop
    for img, time in zip(file_paths, times):
        img, centroids, solar_center, solar_radius = detect_sunspots(img)
        
        tracker.process_frame(time, centroids)
    
    #Filter the data
    filtered_tracks = [t for t in tracker.tracks if len(t['velocities']) >= 1]
    
    #Write the data to JSON
    #First convert necessary datatypes to str
    for entry in filtered_tracks:
        if 'times' in entry:
            entry['times'] = [t.isoformat() if isinstance(t, datetime) else t for t in entry['times']]
        if 'positions_helio' in entry:
            entry['positions_helio'] = [skycoord_to_dict(coord) if isinstance(coord, SkyCoord) else coord for coord in entry['positions_helio']]
   
    with open(file='sunspot_data.json',mode='w') as f:
        json.dump(filtered_tracks, f, indent=4)

The data is saved into a JSON file and then the following codeblock is used to extract the data again.

%%Important thing to remember is that the longitudinal velocity is higher the closer you get to the equator. Additionally, the sun spins on a not-perfectly vertical axis, so that might influence velocities as well. Individual velocities are sporadic, but we can perform extra data analysis to try and extract the mitigating and random effects and isolate the rotation.

In [9]:
'''
This code is to recover the data from the JSON
'''
from astropy.coordinates import SkyCoord
from datetime import datetime
import json

def dict_to_skycoord(d):
    return SkyCoord(lon=d['lon'], lat=d['lat'],
                    frame=d['frame'], unit=d['unit'])

with open("sunspot_data.json", 'r') as f:
    data = json.load(f)

for entry in data:
    if 'times' in entry:
        entry['times'] = [datetime.fromisoformat(t) if isinstance(t, str) else t for t in entry['times']]
    if 'positions_helio' in entry:
        entry['positions_helio'] = [dict_to_skycoord(coord) if isinstance(coord, SkyCoord) else coord for coord in entry['positions_helio']]


We can now use the same visualization code from before and match the coordinates of specific identified features with the centroids. This method is not perfect as I used a rudamentary, greedy assignment method. Possible improvements would be a 1:1 matching method rather than a FIFO approach; using velocity as a predictor and using precision markers (closest wins the identifier); Merge short tracks (although it'd be hard to know which to merge); and adding a Kalman filter for predictive smoothing. 

Aside from that, the identifiers can be visualized below.

In [12]:
import matplotlib.pyplot as plt
from ipywidgets import interact, Dropdown
from utils.image_processing import detect_sunspots
import cv2

# Create a dropdown to select days
day_dirs = sorted([d for d in os.listdir("data") if os.path.isdir(os.path.join("data", d))])
@interact(day=Dropdown(options=day_dirs, description="Select Day:"))
def show_day_images(day):
    day_path = os.path.join("data", day)
    files = sorted([f for f in os.listdir(day_path) if f.endswith(".jpg")])[:16] #Only the first 12 images
    
    fig, axes = plt.subplots(3, 4, figsize=(15, 10))
    for ax, file in zip(axes.flat, files):
        img, centroids, solar_center, solar_radius = detect_sunspots(os.path.join(day_path, file))
        print(centroids)
        
        ax.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
        ax.scatter([c[0] for c in centroids], [c[1] for c in centroids], s=5, c='red')
        matches = []
        for i, coords in enumerate(centroids):
            for track_idx, track in enumerate(data):
                for j, pos_px in enumerate(track['positions_px']):
                    if list(coords) == pos_px:
                        matches.append([i,track_idx])
        
        for centroid_idx, track_id in matches:
            x,y = centroids[centroid_idx]
            ax.text(x + 3, y + 3, str(track_id), color='blue', fontsize=8, weight='bold')
            
        
        
        ax.set_title(file.split('_')[1])  # Show time (hhmm)
        ax.axis('off')
    plt.tight_layout()

interactive(children=(Dropdown(description='Select Day:', options=('20250422', '20250423', '20250424', '202504…

Notice that the same spot often jumps to another identifier. This shouldn't be too much of an issue as you can see below, the average track length is 4 — which is a pretty decent sample size.

In [10]:
#Number of datapoints for each feature
lengths = []
for i,entry in enumerate(data):
    print(f"point {i}: len = {len(entry['velocities'])}")
    lengths.append(len(entry))
print(sum(lengths)/len(lengths))

point 0: len = 4
point 1: len = 2
point 2: len = 7
point 3: len = 8
point 4: len = 3
point 5: len = 11
point 6: len = 11
point 7: len = 3
point 8: len = 6
point 9: len = 8
point 10: len = 1
point 11: len = 2
point 12: len = 2
point 13: len = 3
point 14: len = 6
point 15: len = 2
point 16: len = 15
point 17: len = 2
point 18: len = 8
point 19: len = 1
point 20: len = 3
point 21: len = 1
point 22: len = 1
point 23: len = 1
point 24: len = 3
point 25: len = 3
point 26: len = 1
point 27: len = 1
point 28: len = 1
point 29: len = 4
point 30: len = 1
point 31: len = 2
point 32: len = 2
point 33: len = 1
point 34: len = 3
point 35: len = 2
point 36: len = 2
point 37: len = 3
point 38: len = 1
point 39: len = 1
point 40: len = 7
point 41: len = 1
point 42: len = 1
point 43: len = 2
point 44: len = 1
point 45: len = 2
point 46: len = 2
point 47: len = 1
point 48: len = 10
point 49: len = 3
point 50: len = 6
point 51: len = 2
point 52: len = 5
point 53: len = 2
point 54: len = 12
point 55: len =

Things to try:
- Plotting angular velocity by latitude