<a href="https://colab.research.google.com/github/UmutMD/Star-Prediction-K-Fold/blob/main/star_prediction_k_fold.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Star Prediction with K-fold model

https://www.kaggle.com/datasets/deepu1109/star-dataset


This is a dataset consisting of several features of stars.

Some of them are:

Absolute Temperature (in K)
Relative Luminosity (L/Lo)
Relative Radius (R/Ro)
Absolute Magnitude (Mv)
Star Color (white,Red,Blue,Yellow,yellow-orange etc)
Spectral Class (O,B,A,F,G,K,,M)
Star Type **(Red Dwarf, Brown Dwarf, White Dwarf, Main Sequence , SuperGiants, HyperGiants)**
Lo = 3.828 x 10^26 Watts (Avg Luminosity of Sun)
Ro = 6.9551 x 10^8 m (Avg Radius of Sun)



#Libraries

In [2]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression


Google drive connection

In [3]:
from google.colab import drive
drive.mount('/content/drive/')


Mounted at /content/drive/


In [4]:
starData = pd.read_csv('/content/drive/MyDrive/Projects/Star-Prediction/6 class csv.csv')

In [5]:
starData

Unnamed: 0,Temperature (K),Luminosity(L/Lo),Radius(R/Ro),Absolute magnitude(Mv),Star type,Star color,Spectral Class
0,3068,0.002400,0.1700,16.12,0,Red,M
1,3042,0.000500,0.1542,16.60,0,Red,M
2,2600,0.000300,0.1020,18.70,0,Red,M
3,2800,0.000200,0.1600,16.65,0,Red,M
4,1939,0.000138,0.1030,20.06,0,Red,M
...,...,...,...,...,...,...,...
235,38940,374830.000000,1356.0000,-9.93,5,Blue,O
236,30839,834042.000000,1194.0000,-10.63,5,Blue,O
237,8829,537493.000000,1423.0000,-10.73,5,White,A
238,9235,404940.000000,1112.0000,-11.23,5,White,A


In [6]:
starData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 7 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Temperature (K)         240 non-null    int64  
 1   Luminosity(L/Lo)        240 non-null    float64
 2   Radius(R/Ro)            240 non-null    float64
 3   Absolute magnitude(Mv)  240 non-null    float64
 4   Star type               240 non-null    int64  
 5   Star color              240 non-null    object 
 6   Spectral Class          240 non-null    object 
dtypes: float64(3), int64(2), object(2)
memory usage: 13.2+ KB


In [7]:
pd.get_dummies(starData[ 'Star color'])

Unnamed: 0,Blue,Blue.1,Blue White,Blue white,Blue white.1,Blue-White,Blue-white,Orange,Orange-Red,Pale yellow orange,Red,White,White-Yellow,Whitish,Yellowish,Yellowish White,white,yellow-white,yellowish
0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
235,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
236,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
237,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
238,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0


Need a star mapping to assign related colors


In [8]:
def encode(starData, column, prefix):
  starData = starData.copy()
  dummies = pd.get_dummies(starData[column], prefix = prefix)
  starData = pd.concat([starData, dummies], axis=1)
  starData = starData.drop(column, axis=1 )
  return starData

In [9]:
starData['Star color'].unique()

array(['Red', 'Blue White', 'White', 'Yellowish White', 'Blue white',
       'Pale yellow orange', 'Blue', 'Blue-white', 'Whitish',
       'yellow-white', 'Orange', 'White-Yellow', 'white', 'Blue ',
       'yellowish', 'Yellowish', 'Orange-Red', 'Blue white ',
       'Blue-White'], dtype=object)

#Preprocessing the data


In [10]:
color_mapping = {
        'white' : 'White',
        'Blue ' : 'Blue',
        'Blue white': 'Blue White',
        'Blue-White' : 'Blue White',
        'Blue-white' :  'Blue White',
        'Blue white ': 'Blue White',
        'yellow-white' : 'Yellowish',
        'yellowish' : 'Yellowish',
        'White-Yellow': 'Yellowish'

   }

In [11]:
def preprocess(starData):
  starData = starData.copy()

  color_mapping = {
        'white' : 'White',
        'Blue ' : 'Blue',
        'Blue white': 'Blue White',
        'Blue-White' : 'Blue White',
        'Blue-white' :  'Blue White',
        'Blue white ': 'Blue White',
        'yellow-white' : 'Yellowish',
        'yellowish' : 'Yellowish',
        'White-Yellow': 'Yellowish'

   }

  return starData

In [12]:
  starData['Star color'] = starData ['Star color'].replace(color_mapping)

In [13]:
starData['Star color'].unique()

array(['Red', 'Blue White', 'White', 'Yellowish White',
       'Pale yellow orange', 'Blue', 'Whitish', 'Yellowish', 'Orange',
       'Orange-Red'], dtype=object)

Some of them left

In [16]:
model = LogisticRegression()

In [15]:
    y = starData['Star type']
    X = starData.drop('Star type', axis=1)
    
    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=1)
    
    # Scale X
    scaler = StandardScaler()
    scaler.fit(X_train)
    X_train = pd.DataFrame(scaler.transform(X_train), index=X_train.index, columns=X_train.columns)
    X_test = pd.DataFrame(scaler.transform(X_test), index=X_test.index, columns=X_test.columns)

ValueError: ignored