Uses code from a notebook by Jake Vanderplas, edited by A. Mann (information below). 

<small><i>This notebook was put together by [Jake Vanderplas](http://www.vanderplas.com). Source and license info is on [GitHub](https://github.com/jakevdp/sklearn_tutorial/).</i></small>

Written by Donovan Schlekat for ASTR 519.

# Random Forest Aging of Star Clusters

This notebook attempts to age star clusters using supervised machine learning "Random Forest" methods and data from the Milky Way Star Cluster (MWSC) Catalog. 

<small><i>(available at: [https://heasarc.gsfc.nasa.gov/W3Browse/star-catalog/mwsc.html](url) )</i></small>

In [8]:
# Important imports

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import pandas as pd

import warnings
warnings.filterwarnings('ignore')

from sklearn import metrics
from sklearn.ensemble import RandomForestClassifier

In [16]:
# Read in the file
# File is the entire MWSC Catalog saved in csv format
csv_file_path = 'MWSC_Catalog.csv'
df = pd.read_csv(csv_file_path)
df_dropped = df.drop([
    'broad_type',
    'cluster_status',
    'log_age_error',
    'metallicity_error'
    ], axis=1)

# Filter data to only contain open clusters with metallicity data
open_metal_df = df_dropped[
    (df['metallicity'].notna()) &
    (df['class'].isin(['OPEN STAR CLUSTER']))
]

# Filter data to only contain globular clusters with metallicity data
glob_metal_df = df_dropped[
    (df['metallicity'].notna()) &
    (df['class'].isin(['GLOBULAR CLUSTER EXTENDED GALACTIC OR EXTRAGALACTIC']))
]

### Cluster Age Classification Using Random Forests

The following cells classify clusters into four categories: ancient (~ $10^9$ yr), old (~ $10^8$ yr), intermediate (~ $10^7$ yr), or young (~ $10^6$ yr).

In [17]:
# Classify data into new categories described above
def categorize_star_cluster_age(log_age):
    if 6 <= log_age < 7:
        return 'Young Star Cluster'
    elif 7 <= log_age < 8:
        return 'Intermediate-Age Star Cluster'
    elif 8 <= log_age < 9:
        return 'Old Star Cluster'
    else:
        return 'Ancient Star Cluster'
    
open_metal_df['age_group'] = open_metal_df['log_age'].apply(categorize_star_cluster_age)

In [18]:
headers = open_metal_df.columns
print(headers)
X = open_metal_df.to_numpy()
print(len(X))

Index(['name', 'ra', 'dec', 'lii', 'bii', 'core_radius', 'central_radius',
       'cluster_radius', 'num_core_stars', 'num_central_stars',
       'num_cluster_stars', 'distance', 'e_bv', 'log_age', 'reference_code',
       'cluster_type', 'metallicity', 'class', 'age_group'],
      dtype='object')
225
