# Stellar Classification using Photometric data

# Summary

# Introduction

In astronomy, it is useful to classify stars based on certain characteristics. In the earliest days it was based on mass and temperature; however, our modern classification system has evolved and we classify stars based on the *Morgan–Keenan (MK)* system which group stars into 7 classes based on their spectral characteristics. Under the MK system, astronomers analyse electromagnetic radiation from stars to determine its class. These electromagnetic spectrum have dark lines to determine which and how abundant elements are present in the star. The 7 classes in the MK system - *O, B, A, F, G, K*, and *M* - are sequenced from the hottest (*O* type) to the coolest (*K* type) which also exhibits a certain characteristic that is very visible - colour. Hence in this report, we will classify stars using photometric data and in the Discussion section, we will evaluate whether this is a reliable alternative for the traditional method.

## Definitions

**Photometry**: the measurement of the flux or intensity of an astronomical object's electromagnetic radiation

The photometric system we're using to classify star types is the *Sloan* system (add reference) used by the Sloan Digital Sky Survey. The system measures the intensity of electromagnetic radition from stars at 5 bands:
- *u* (345nm)
- *g* (475nm which is a light blue in the visible spectrum)
- *r* (622nm which is orange)
- *i* (763nm which is deep red)
- *z* (905nm)

NASA Exoplanet Archive collects data from various sources, including ground-based observatories and space telescopes such as the Kepler Space Telescope and the Transiting Exoplanet Survey Satellite (TESS). The dataset is we're using is their [Planetary Systems dataset](https://exoplanetarchive.ipac.caltech.edu/cgi-bin/TblView/nph-tblView?app=ExoTbls&config=PS) which has the columns of names, spectral type and measurements using Sloan photometric system selected.

In [1]:
import pandas as pd

exoplanet_data = pd.read_csv('data/2024-03-01-exoplanet_dataset.csv', 
                             header = 13,
                             dtype = {'pl_name' : 'string', 'st_spectype' : 'string'})
print(exoplanet_data)

         pl_name st_spectype  sy_umag  sy_gmag  sy_rmag  sy_imag  sy_zmag
0       11 Com b        <NA>      NaN      NaN      NaN      NaN      NaN
1       11 Com b      G8 III      NaN      NaN      NaN      NaN      NaN
2       11 Com b      G8 III      NaN      NaN      NaN      NaN      NaN
3       11 UMi b        <NA>      NaN      NaN      NaN      NaN      NaN
4       11 UMi b        <NA>      NaN      NaN      NaN      NaN      NaN
...          ...         ...      ...      ...      ...      ...      ...
35843  ups And d        <NA>      NaN      NaN      NaN      NaN      NaN
35844  ups Leo b      G9 III      NaN      NaN      NaN      NaN      NaN
35845   xi Aql b        <NA>      NaN      NaN      NaN      NaN      NaN
35846   xi Aql b        <NA>      NaN      NaN      NaN      NaN      NaN
35847   xi Aql b      K0 III      NaN      NaN      NaN      NaN      NaN

[35848 rows x 7 columns]


This dataset from NASA's Exoplanet Archive include all planets and stars. Therefore we will wrangle the dataset such that it only contain stars with Sloan magnitudes for photometric measurements.

Variables:

st_spectype: Classification of the star based on their spectral characteristics following the Morgan-Keenan system

sy_umag: Brightness of the host star as measured using the Sloan Digital Sky Survey (SDSS) u band, in units of magnitudes

sy_gmag: Brightness of the host star as measured using the Sloan Digital Sky Survey (SDSS) g band, in units of magnitudes

sy_rmag: Brightness of the host star as measured using the Sloan Digital Sky Survey (SDSS) r band, in units of magnitudes

sy_imag: Brightness of the host star as measured using the Sloan Digital Sky Survey (SDSS) i band, in units of magnitudes

sy_zmag: Brightness of the host star as measured using the Sloan Digital Sky Survey (SDSS) z band, in units of magnitudes

In [6]:
only_stars_data = exoplanet_data.dropna(subset = ['st_spectype',
                                                  'sy_umag',
                                                  'sy_gmag',
                                                  'sy_rmag',
                                                  'sy_imag',
                                                  'sy_zmag'])
print(only_stars_data.dtypes)
only_stars_data["st_spectype"] = only_stars_data['st_spectype'].transform(lambda x: x[0])
print(only_stars_data)

pl_name        string[python]
st_spectype    string[python]
sy_umag               float64
sy_gmag               float64
sy_rmag               float64
sy_imag               float64
sy_zmag               float64
dtype: object
            pl_name st_spectype  sy_umag  sy_gmag    sy_rmag   sy_imag  \
180    BD+20 2457 b           K  14.9129  10.5155   9.277390   8.84049   
182    BD+20 2457 c           K  14.9129  10.5155   9.277390   8.84049   
183     BD+20 274 b           K  14.3374  10.1863   8.941711   8.49228   
210      BD-17 63 b           K  14.1468  10.2994   9.211040   8.86964   
215        CD Cet b           M  17.4608  15.0603  13.949400  14.18480   
...             ...         ...      ...      ...        ...       ...   
35641        XO-1 b           G  14.8365  11.7844  11.049500  10.86850   
35701        XO-5 b           G  15.3529  15.0080  11.988900  11.82070   
35702        XO-5 b           G  15.3529  15.0080  11.988900  11.82070   
35703        XO-5 b           G  15.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  only_stars_data["st_spectype"] = only_stars_data['st_spectype'].transform(lambda x: x[0])
