# Stellar Classification using Photometric data

# Summary

# Results and Discussion

First of all, let's import the packages we will use to carry out the analysis.

In astronomy, it is useful to classify stars based on certain characteristics. In the earliest days it was based on mass and temperature; however, our modern classification system has evolved and we classify stars based on the *Morgan–Keenan (MK)* system which group stars into 7 classes based on their spectral characteristics. Under the MK system, astronomers analyse electromagnetic radiation from stars to determine its class. These electromagnetic spectrum have dark lines to determine which and how abundant elements are present in the star. The 7 classes in the MK system - *O, B, A, F, G, K*, and *M* - are sequenced from the hottest (*O* type) to the coolest (*K* type) which also exhibits a certain characteristic that is very visible - colour. Hence in this report, we will classify stars using photometric data and in the Discussion section, we will evaluate whether this is a reliable alternative for the traditional method.

## Definitions

**Photometry**: the measurement of the flux or intensity of an astronomical object's electromagnetic radiation

The photometric system we're using to classify star types is the *Sloan* system (add reference) used by the Sloan Digital Sky Survey. The system measures the intensity of electromagnetic radition from stars at 5 bands:
- *u* (345nm)
- *g* (475nm which is a light blue in the visible spectrum)
- *r* (622nm which is orange)
- *i* (763nm which is deep red)
- *z* (905nm)

NASA Exoplanet Archive collects data from various sources, including ground-based observatories and space telescopes such as the Kepler Space Telescope and the Transiting Exoplanet Survey Satellite (TESS). The dataset is we're using is their [Planetary Systems dataset](https://exoplanetarchive.ipac.caltech.edu/cgi-bin/TblView/nph-tblView?app=ExoTbls&config=PS) which has the columns of names, spectral type and measurements using Sloan photometric system selected.

In [5]:
import pandas as pd
import requests
import os
from datetime import datetime

We then download the dataset of interest; the Expoplanet Systems dataset from NASA, containing information about measurements of planets and stars. We are interested in the spectral type of stars given a subset of these measurements.

In [43]:
current_date = datetime.now().date().strftime("%Y-%m-%d")

raw_data_dir = os.path.join("data", "raw")
raw_data_path = os.path.join(raw_data_dir, f"{current_date}_planet-systems.csv")

# make directory where we store our raw data
os.makedirs(raw_data_dir, exist_ok=True)

# check if we already have the dataset downloaded
if len(os.listdir(raw_data_dir)) != 0:
    print(len(os.listdir(raw_data_dir)))
    print(f"Using already existing dataset under {raw_data_dir}")
else:
    # download the raw data as CSV under the raw data directory
    url = "https://exoplanetarchive.ipac.caltech.edu/cgi-bin/IceTable/nph-iceTblDownload"
    print(f"Downloading Planet Systems dataset from {url}")

    # define an HTTP request
    payload = {
        "workspace": "2024.02.29_21.58.35_020450/TblView/2024.03.02_14.52.28_004142",
        "useTimestamp": "1",
        "table": "/exodata/kvmexoweb/ExoTables/PS.tbl",
        "format": "CSV",
        "user": "",
        "label": "*",
        "columns": "pl_name_display,st_spectype,sy_umagstr,sy_gmagstr,sy_rmagstr,sy_imagstr,sy_zmagstr",
        "rows": "both",
        "mission": "ExoplanetArchive"
    }
    response = requests.get(url, params=payload)

    # assume request was successfull and access the downloaded content    
    raw_data = response.content

    # write downloaded content into a file under the raw data directory
    with open(raw_data_path, "wb") as f:
        f.write(raw_data)

# df holds the expolanet dataset as a DataFrame object
df = pd.read_csv(
    raw_data_path,
    header = 23, # 24-1=23
    dtype = {'pl_name' : 'string', 'st_spectype' : 'string'}
)

# remove columns in the dataset that have 'err' in their name
filtered_columns = [col for col in df.columns if 'err' not in col]

df = df[filtered_columns]

df

1
Using already existing dataset under data/raw


Unnamed: 0,pl_name,st_spectype,sy_umag,sy_gmag,sy_rmag,sy_imag,sy_zmag
0,11 Com b,,,,,,
1,11 Com b,G8 III,,,,,
2,11 Com b,G8 III,,,,,
3,11 UMi b,,,,,,
4,11 UMi b,,,,,,
...,...,...,...,...,...,...,...
35843,ups And d,,,,,,
35844,ups Leo b,G9 III,,,,,
35845,xi Aql b,,,,,,
35846,xi Aql b,,,,,,


This dataset from NASA's Exoplanet Archive include all planets and stars. Therefore we will wrangle the dataset such that it only contain stars with Sloan magnitudes for photometric measurements.

Variables:

st_spectype: Classification of the star based on their spectral characteristics following the Morgan-Keenan system

sy_umag: Brightness of the host star as measured using the Sloan Digital Sky Survey (SDSS) u band, in units of magnitudes

sy_gmag: Brightness of the host star as measured using the Sloan Digital Sky Survey (SDSS) g band, in units of magnitudes

sy_rmag: Brightness of the host star as measured using the Sloan Digital Sky Survey (SDSS) r band, in units of magnitudes

sy_imag: Brightness of the host star as measured using the Sloan Digital Sky Survey (SDSS) i band, in units of magnitudes

sy_zmag: Brightness of the host star as measured using the Sloan Digital Sky Survey (SDSS) z band, in units of magnitudes

In [42]:
only_stars_data = df.dropna(
    subset = [
        'st_spectype',
        'sy_umag',
        'sy_gmag',
        'sy_rmag',
        'sy_imag',
        'sy_zmag'
    ]
)

# we are only interested in the first letter of spectral type, which becomes our y later
only_stars_data_copy = only_stars_data.copy() # needed to avoid warning raised by pandas
only_stars_data_copy["st_spectype"] = only_stars_data_copy['st_spectype'].transform(lambda x: x[0])
only_stars_data = only_stars_data_copy

only_stars_data

Unnamed: 0,pl_name,st_spectype,sy_umag,sy_gmag,sy_rmag,sy_imag,sy_zmag
180,BD+20 2457 b,K,14.9129,10.5155,9.277390,8.84049,8.97784
182,BD+20 2457 c,K,14.9129,10.5155,9.277390,8.84049,8.97784
183,BD+20 274 b,K,14.3374,10.1863,8.941711,8.49228,8.90440
210,BD-17 63 b,K,14.1468,10.2994,9.211040,8.86964,9.00956
215,CD Cet b,M,17.4608,15.0603,13.949400,14.18480,13.29560
...,...,...,...,...,...,...,...
35641,XO-1 b,G,14.8365,11.7844,11.049500,10.86850,11.31440
35701,XO-5 b,G,15.3529,15.0080,11.988900,11.82070,13.65610
35702,XO-5 b,G,15.3529,15.0080,11.988900,11.82070,13.65610
35703,XO-5 b,G,15.3529,15.0080,11.988900,11.82070,13.65610
