# Project 3 - Sagittarius A* Exploration
### By Angela Less & Steven Dahms
GitHub: https://github.com/Ang-M31/Project3-BlackHoles.git

Streamlit Model: https://project3-s2orbit.streamlit.app/

Streamlit Teaching Aid: https://project3-blackholes.streamlit.app/

## Project Goal
The goal of this project was to take the reported orbital data of star S2 and fit an elipse to it's orbit around Sag A*, the supermassive black hole at the center of our galaxy. This experiment won a Nobel Prize for Physics in 2020, for Andrea Ghez and Reinhard Genzel. While Sag A* has been known of since the 1930's, it wasn't until this experiment that they conclusively proved that it could only be a black hole, which was then subsequently photographed in 2022.

## Distribution of Work
Angela - Orbit modeling in Streamlit, obtaining data from Jacob Parzych
Steven - Streamlit teaching tool and higher percentage of work on the PowerPoint and Jupyter Notebook.
Collaboration on PowerPoint presentation and Jupyter Notebook creation.

## Challenges
Availability of data, eventually provided by Jacob Parzych's Team, with instructor permission. While the data is through VizieR, Cursor was not pulling the data which necessitated a direct link to the site instead. 

https://cdsarc.cds.unistra.fr/ftp/J/ApJ/707/L114/

AI redundancy would add code as a backup without clearly indicating it was doing so which lead to confusion and eventually its discovery. 

While AI is a fantastic tool, it tends to try to be overly helpful with can lead to excess decision making or idea generation. Either of which can take you down paths other than the one you are on.

## AI Statement
This project was made in collaboration with AI. Minimal code was copied from Claude with respect to the diagnostic tool and was provided to Cursor as reference. All remaining code was predominately AI prompted, but carefully adjusted, monitored, and edited.

## Project Potential

The Streamlit apps could be made more interactive for the user if they are intended as a teaching tool vs. sharing research. 

As more data is gathered and advancements made, the more we can do with the data to learn about the universe.

## So How Did We Do It?
If this work is winning a Nobel Prize, how did two undergrads manage to do it? A mix of youthful vigor and caffeine mostly, but we also have the power of published data and confirmation bias on our side.

We arranged our code into 3 files. 1 python script for data handling, and two streamlit apps, each with their own purposes. 

## Initial Python Script
We initially tried to pull data through Astroquery. This first block of code queries SIMBAD for the data on Sag A*, which we use later on in the streamlit app. We later changed this to pulling data directly from a VizieR site. 

In [None]:
def get_sagittarius_a_data():
    """
    Retrieve data for Sagittarius A* (Sgr A*), the supermassive black hole at the Galactic Center.

    Returns:
    --------
    pandas.DataFrame : SIMBAD data for Sgr A*
    dict : Additional properties extracted from the data
    """
    print("Querying SIMBAD for Sagittarius A*...")

    # Customize Simbad to return useful columns
    custom_simbad = Simbad()
    custom_simbad.add_votable_fields('otype', 'sp', 'flux(V)', 'flux(B)', 'plx', 'pm', 'rv', 'z_value')

    try:
        # Query Sagittarius A*
        result_table = custom_simbad.query_object("Sagittarius A*")

        if result_table is not None and len(result_table) > 0:
            df = result_table.to_pandas()

            # Extract key properties
            properties = {
                'name': df['MAIN_ID'].iloc[0] if 'MAIN_ID' in df.columns else 'Sagittarius A*',
                'ra': df['RA'].iloc[0] if 'RA' in df.columns else None,
                'dec': df['DEC'].iloc[0] if 'DEC' in df.columns else None,
                'object_type': df['OTYPE'].iloc[0] if 'OTYPE' in df.columns else None,
            }

            print(f"Found data for {properties['name']}")
            return df, properties
        else:
            print("No data found for Sagittarius A*")
            return pd.DataFrame(), {}

    except Exception as e:
        print(f"Error querying SIMBAD: {e}")
        return pd.DataFrame(), {}

The second block does the same thing for our reference point S2. This time we take a little more information, like proper motion and radial velocity.

To note: We have to do a fair bit of error handling on this one, just in case we're not able to find it easily. We search by name, and then co-ordinates if that doesn't work. Then we have to trim out the excess entries.

In [None]:
def get_s2_star_data():
    """
    Retrieve data for star S2 (also known as S0-2), which orbits Sgr A*.

    Returns:
    --------
    pandas.DataFrame : SIMBAD data for S2
    dict : Additional properties
    """
    print("Querying SIMBAD for star S2 (S0-2)...")

    custom_simbad = Simbad()
    custom_simbad.add_votable_fields('otype', 'sp', 'flux(V)', 'flux(B)', 'plx', 'pm', 'rv', 'z_value')

    try:
        # Try different names for S2
        names_to_try = ['S2', 'S0-2', 'S0 2', 'S02', 'Sgr A* S2']

        result_table = None
        for name in names_to_try:
            try:
                result_table = custom_simbad.query_object(name)
                if result_table is not None and len(result_table) > 0:
                    print(f"Found S2 using name: {name}")
                    break
            except:
                continue

        if result_table is None or len(result_table) == 0:
            # Query by coordinates near Sgr A*
            sgr_a_coords = (266.4168, -29.0078)  # Sgr A* coordinates in degrees
            result_table = custom_simbad.query_region(
                f"{sgr_a_coords[0]} {sgr_a_coords[1]}",
                radius="0d1m0s"  # 1 arcminute radius
            )

        if result_table is not None and len(result_table) > 0:
            df = result_table.to_pandas()

            # Filter for S2 if multiple results (look for S2 in the name)
            if len(df) > 1:
                s2_mask = df['MAIN_ID'].str.contains('S2|S0-2|S02', case=False, na=False)
                if s2_mask.any():
                    df = df[s2_mask]

            properties = {
                'name': df['MAIN_ID'].iloc[0] if 'MAIN_ID' in df.columns else 'S2',
                'ra': df['RA'].iloc[0] if 'RA' in df.columns else None,
                'dec': df['DEC'].iloc[0] if 'DEC' in df.columns else None,
                'spectral_type': df['SP_TYPE'].iloc[0] if 'SP_TYPE' in df.columns else None,
                'proper_motion_ra': df['PMRA'].iloc[0] if 'PMRA' in df.columns else None,
                'proper_motion_dec': df['PMDEC'].iloc[0] if 'PMDEC' in df.columns else None,
                'radial_velocity': df['RV_VALUE'].iloc[0] if 'RV_VALUE' in df.columns else None,
            }

            print(f"Found data for {properties['name']}")
            return df, properties
        else:
            print("No data found for S2")
            return pd.DataFrame(), {}

    except Exception as e:
        print(f"Error querying SIMBAD: {e}")
        return pd.DataFrame(), {}

This function searches near Sag A* and compiles data on stars near it. Only a radius of 2 arcminutes, as can be seen.

In [None]:
def get_s2_orbital_data_from_vizier():
    """
    Query Vizier catalogs for published S2 orbital data and parameters.
    Searches for catalogs containing orbital elements and observational data.

    Returns:
    --------
    pandas.DataFrame : Orbital parameters and observational data
    """
    print("Querying Vizier for S2 orbital data...")

    # Sgr A* coordinates
    sgr_a_coords = (266.4168, -29.0078)

    try:
        v = Vizier(columns=['**'], row_limit=500)

        # Search in a small region around Sgr A*
        result_list = v.query_region(
            f"{sgr_a_coords[0]} {sgr_a_coords[1]}",
            radius="0d2m0s",  # 2 arcminute radius
            catalog=["J/ApJ/*", "J/A+A/*", "J/MNRAS/*"]  # Common astronomy journals
        )

        if result_list:
            # Combine all results
            all_data = []
            for table in result_list:
                df = table.to_pandas()
                all_data.append(df)

            if all_data:
                combined_df = pd.concat(all_data, ignore_index=True)
                print(f"Found {len(combined_df)} records from Vizier")
                return combined_df

        return pd.DataFrame()

    except Exception as e:
        print(f"Error querying Vizier: {e}")
        return pd.DataFrame()

We combine that with data from the following method...

In [None]:
def get_gaia_data_for_region(ra=266.4168, dec=-29.0078, radius_arcsec=30):
    """
    Query Gaia catalog for stellar data in the region around Sgr A*.
    This can provide proper motions and positions for stars in the region.

    Parameters:
    -----------
    ra : float
        Right ascension in degrees
    dec : float
        Declination in degrees
    radius_arcsec : float
        Search radius in arcseconds

    Returns:
    --------
    pandas.DataFrame : Gaia catalog data
    """
    print(f"Querying Gaia for stellar data within {radius_arcsec} arcsec of Sgr A*...")

    try:
        # Query Gaia DR3
        job = Gaia.launch_job_async(
            f"""
            SELECT TOP 1000
                source_id, ra, dec,
                parallax, parallax_error,
                pmra, pmra_error, pmdec, pmdec_error,
                phot_g_mean_mag, phot_bp_mean_mag, phot_rp_mean_mag,
                radial_velocity, radial_velocity_error,
                astrometric_excess_noise
            FROM gaiadr3.gaia_source
            WHERE 1=CONTAINS(
                POINT('ICRS', ra, dec),
                CIRCLE('ICRS', {ra}, {dec}, {radius_arcsec/3600.0})
            )
            AND parallax IS NOT NULL
            ORDER BY phot_g_mean_mag
            """
        )

        result_table = job.get_results()

        if result_table is not None and len(result_table) > 0:
            df = result_table.to_pandas()
            print(f"Found {len(df)} stars in Gaia catalog")
            return df
        else:
            print("No Gaia data found")
            return pd.DataFrame()

    except Exception as e:
        print(f"Error querying Gaia: {e}")
        return pd.DataFrame()

This runs a query on the Gaia catalog to get more data from the surrounding region. So now we have two data sources queried for information on the neighborhood of SagA*.

Finally we query Simbad to give us one last source.

In [None]:
def get_galactic_center_stars():
    """
    Query for multiple stars in the Galactic Center region that orbit Sgr A*.
    This includes the S-stars cluster.

    Returns:
    --------
    pandas.DataFrame : Data for Galactic Center stars
    """
    print("Querying for Galactic Center stars...")

    sgr_a_coords = (266.4168, -29.0078)

    try:
        custom_simbad = Simbad()
        custom_simbad.add_votable_fields('otype', 'sp', 'pm', 'rv')

        # Query region around Sgr A*
        result_table = custom_simbad.query_region(
            f"{sgr_a_coords[0]} {sgr_a_coords[1]}",
            radius="0d2m0s"  # 2 arcminute radius
        )

        if result_table is not None and len(result_table) > 0:
            df = result_table.to_pandas()

            # Filter for stars (remove extended sources, galaxies, etc.)
            if 'OTYPE' in df.columns:
                star_mask = ~df['OTYPE'].str.contains('G|Neb|HII|PN', case=False, na=False)
                df = df[star_mask]

            print(f"Found {len(df)} stars in the Galactic Center region")
            return df
        else:
            return pd.DataFrame()

    except Exception as e:
        print(f"Error querying for Galactic Center stars: {e}")
        return pd.DataFrame()

Then we compile the data...

In [None]:
def compile_all_data():
    """
    Compile all available data for Sgr A* and S2.
    This is the main function to call for getting all relevant data.

    Returns:
    --------
    dict : Dictionary containing all retrieved data
    """
    print("=" * 70)
    print("Compiling data for Sagittarius A* and Star S2 (S0-2)")
    print("=" * 70)

    all_data = {}

    # Get Sgr A* data
    sgr_a_df, sgr_a_props = get_sagittarius_a_data()
    all_data['sgr_a_star'] = sgr_a_df
    all_data['sgr_a_properties'] = sgr_a_props

    # Get S2 star data
    s2_df, s2_props = get_s2_star_data()
    all_data['s2_star'] = s2_df
    all_data['s2_properties'] = s2_props

    # Get S2 orbital data from Vizier
    s2_orbital = get_s2_orbital_data_from_vizier()
    all_data['s2_orbital_data'] = s2_orbital

    # Get Gaia data for the region
    gaia_data = get_gaia_data_for_region()
    all_data['gaia_data'] = gaia_data

    # Get other Galactic Center stars
    gc_stars = get_galactic_center_stars()
    all_data['galactic_center_stars'] = gc_stars

    print("\n" + "=" * 70)
    print("Data compilation complete!")
    print("=" * 70)

    return all_data

And save everything to a data file.

In [None]:
def save_data_to_files(data_dict, prefix="sgr_a_s2_data"):
    """
    Save all retrieved data to CSV files.

    Parameters:
    -----------
    data_dict : dict
        Dictionary containing dataframes from compile_all_data()
    prefix : str
        Prefix for output filenames
    """
    import os

    for key, value in data_dict.items():
        if isinstance(value, pd.DataFrame) and not value.empty:
            filename = f"{prefix}_{key}.csv"
            value.to_csv(filename, index=False)
            print(f"Saved {key} to {filename}")
        elif isinstance(value, dict):
            # Save properties as a simple text file
            filename = f"{prefix}_{key}.txt"
            with open(filename, 'w') as f:
                for k, v in value.items():
                    f.write(f"{k}: {v}\n")
            print(f"Saved {key} to {filename}")


# Example usage and main execution
if __name__ == "__main__":
    print("\n" + "=" * 70)
    print("Sagittarius A* and Star S2 (S0-2) Data Retrieval")
    print("=" * 70 + "\n")

    # Compile all data
    data = compile_all_data()

    # Display summary
    print("\n" + "=" * 70)
    print("DATA SUMMARY")
    print("=" * 70)

    if not data['sgr_a_star'].empty:
        print("\nSagittarius A* Data:")
        print(data['sgr_a_star'][['MAIN_ID', 'RA', 'DEC', 'OTYPE']].to_string())

    if not data['s2_star'].empty:
        print("\nS2 Star Data:")
        print(data['s2_star'][['MAIN_ID', 'RA', 'DEC', 'SP_TYPE']].to_string())

    if not data['s2_orbital_data'].empty:
        print(f"\nS2 Orbital Data: {len(data['s2_orbital_data'])} records")
        print("Columns:", list(data['s2_orbital_data'].columns))

    if not data['gaia_data'].empty:
        print(f"\nGaia Data: {len(data['gaia_data'])} stars")
        print("Sample columns:", list(data['gaia_data'].columns[:10]))

    if not data['galactic_center_stars'].empty:
        print(f"\nGalactic Center Stars: {len(data['galactic_center_stars'])} stars")

    # Optionally save to files
    # save_data_to_files(data)

    print("\n" + "=" * 70)
    print("Data retrieval complete!")
    print("=" * 70)

## Streamlit App 1: Black Hole Teaching Tool
Commonly known as the ‘Event Horizon’ of a black hole, the Schwarzschild Radius is a mathematical equation that determines at what point an object will necessarily become a black hole when compressed. However, it also becomes the point at which nothing can escape it, i.e. equilibrium of the speed of light and escape velocity. In other words: if we took an object – like our sun – and compressed all of its mass, into a smaller and smaller area, eventually it would become so dense, that it would have no alternative than to become a black hole. It is a colossal tipping point. 

This first streamlit app simply takes in a user defined stellar mass, and creates a 3D graph of a black hole, and the size of it's accretion disk.

This block calculates the properties of the black hole from the user-defined mass and provides a general description of how large the radius may be to provide context to the user.

In [None]:

# Calculate properties
solar_mass = 1.989e30  # kg
mass_kg = mass_solar * solar_mass

# Schwarzschild radius
schwarzschild_radius = (2 * G * mass_kg) / (c ** 2)
schwarzschild_radius_km = schwarzschild_radius / 1000
schwarzschild_radius_miles = schwarzschild_radius_km * 0.621371

size_comparisons = [
    (0.1, "about the length of a football field"),
    (1, "roughly the height of the Empire State Building"),
    (5, "similar to the width of Manhattan"),
    (16, "close to the width of the Grand Canyon"),
    (100, "about the distance across Los Angeles"),
    (6371, "comparable to Earth's radius"),
    (696340, "approaching the Sun's radius"),
]

And this block plots out the black hole with the measurements we calculated above using a 3D plot projection. Multiple sections are needed to describe the black hole representation as well as the accretion disk.

In [None]:
    # Create a 3D visualization
    fig = plt.figure(figsize=(8, 8))
    ax = fig.add_subplot(111, projection="3d")

    # Event horizon sphere
    u = np.linspace(0, 2 * np.pi, 60)
    v = np.linspace(0, np.pi, 30)
    x = schwarzschild_radius_km * np.outer(np.cos(u), np.sin(v))
    y = schwarzschild_radius_km * np.outer(np.sin(u), np.sin(v))
    z = schwarzschild_radius_km * np.outer(np.ones_like(u), np.cos(v))
    ax.plot_surface(x, y, z, color="black", alpha=0.8, linewidth=0, shade=True)

    # Accretion disk (thin torus approximation)
    disk_r_outer = schwarzschild_radius_km * 3
    disk_r_inner = schwarzschild_radius_km * 1.2
    disk_u = np.linspace(0, 2 * np.pi, 200)
    disk_v = np.linspace(disk_r_inner, disk_r_outer, 2)
    disk_u, disk_v = np.meshgrid(disk_u, disk_v)
    disk_x = disk_v * np.cos(disk_u)
    disk_y = disk_v * np.sin(disk_u)
    disk_z = np.zeros_like(disk_x)
    ax.plot_surface(
        disk_x, disk_y, disk_z, color="orange", alpha=0.3, linewidth=0, shade=False
    )

All things considered, it's actually rather simple. Most of the lines come from the fact that there's simply more data required in plotting a 3D object.



## Streamlit App 2: Orbital Visualization

The Streamlit site is a way we can 'build' the model with the user. They can select the buttons from left to right that will show changes over time. First is the AI's interpretation based on basic data it pulled. All data shows the data points from VizieR color coded by telescope. Initial Data Model shows a rough fit of the orbital model with chi squared information. Lastly, the Adjusted Model shows the final model of the obit. Although the chi squared is still a bit high, it is the best fit thus far. This also includes a diagnostic tool that helped in getting a closer fit than prior iterations. 

Start with imports and loading in the data from the URL:


In [None]:
import streamlit as st
import numpy as np
import matplotlib.pyplot as plt
from scipy.constants import c, G
from scipy.optimize import minimize, curve_fit
import pandas as pd
import os
import re
import urllib.request
from io import StringIO



def load_cds_data(use_galactic=False):
    """
    Load observational data from CDS archive URL.
    Data source: https://cdsarc.cds.unistra.fr/ftp/J/ApJ/707/L114/table1.dat
    
    Fixed-width format:
    Columns 1-8:   Year (F8.3)
    Columns 10-14: oRA in mas (F5.1)
    Columns 16-19: e_oRA in mas (F4.1)
    Columns 21-25: oDE in mas (F5.1)
    Columns 27-30: e_oDE in mas (F4.1)
    Columns 32-37: Telescope (A6)
    Columns 39-46: Ep.V year (F8.3, optional)
    Columns 48-52: VLSR in km/s (I5, optional)
    Columns 54-56: e_VLSR in km/s (I3, optional)
    Columns 58-61: Tel.V (A4, optional)
    
    Returns DataFrame with ra, dec, ra_err, dec_err in arcseconds.
    If use_galactic=True, converts to galactic coordinates (l, b).
    """
                try:
                # Parse fixed-width columns
                year_str = line[0:8].strip()
                oRA_str = line[9:14].strip()
                e_oRA_str = line[15:19].strip()
                oDE_str = line[20:25].strip()
                e_oDE_str = line[26:30].strip()
                tel_str = line[31:37].strip() if len(line) > 31 else ""

Orbital calculations were made that accounted for Ra and Dec simultaneously (as opposed to separately), accounted for jitter, and used Keplarian orbit to get a good fitting model:

In [None]:
def calculate_orbit_from_params(times, a_arcsec, e, i_deg, Omega_deg, omega_deg, T0_year, P_year, distance_gc_kpc, use_damped_solver=False):
    ...
    # Mean anomaly, Kepler’s equation solve (optional damped Newton for Adjusted),
    # true anomaly ν, radius r, 3D rotation (ω, i, Ω), projection to sky, flatten x/y.
    return np.concatenate([x_sky, y_sky])

def fit_orbit_parameters(obs_data, initial_params, use_gaussian=False):
    ...
    if use_gaussian:
        # jittered errors, Keck weighting, Adjusted bounds, staged fitting
        ...
        # Stage 1: geometry (a, e, P, T0) with fixed orientation
        # Stage 2: orientation (i, Ω, ω) with fixed geometry
        # Stage 3: joint refinement using calculate_orbit_from_params(...)
    return fitted_params, residuals_ra, residuals_dec, (x_fit, y_fit), diagnostics_md

# "Adjusted Model" - independent fitting with Gaussian error assumptions
diagnostics_adjusted = None
if st.session_state.show_table_data_3 and obs_data is not None:
    fitted_params_adjusted, residuals_ra_adjusted, residuals_dec_adjusted, (x_fit_obs_adjusted, y_fit_obs_adjusted), diagnostics_adjusted = fit_orbit_parameters(obs_data, params, use_gaussian=True)
    ...
    pos_fit_full_adjusted = calculate_orbit_from_params(
        t_array,
        fitted_params_adjusted['a'], fitted_params_adjusted['e'], fitted_params_adjusted['i'],
        fitted_params_adjusted['Omega'], fitted_params_adjusted['omega'], fitted_params_adjusted['T0'],
        fitted_params_adjusted['P'], fitted_params_adjusted['distance_gc']
    )

The end result is the full streamlit app, and the graph it creates to visualize S2's orbit! There is of course more code in the streamlit python script, however most of it is geared to the actual display of the information, and as such isn't as important as the astronomical calculations themselves.

### Closing thoughts
This was a fun project. Having the ability to follow in the footsteps of such important work, and confirm the findings of esteemed scientists was a really enjoyable experience. Black holes are a fascinating class of astrological objects, and it was great getting to learn more about them.