# Poppy Universe – Star Data Cleaning

Welcome to the **Poppy Universe star data cleaning notebook**!  
This notebook applies the insights from our exploration notebook (Stars_Exploration) to the Gaia star dataset. Here, we’ll drop, keep, or adjust columns based on what we discovered, making the dataset consistent and ready for further analysis and exportation in Poppy Universe.

---

## Goals

1. **Clean the dataset**  
   - Drop irrelevant or redundant columns  
   - Adjust or correct columns where needed  
   - Ensure consistency and proper units

2. **Prepare for analysis & exportation**  
   - Keep only the columns required for HR diagrams, clustering, and visualizations  
   - Flag columns for later Poppy Universe lore or derived calculations

3. **Apply exploration findings**  
   - Implement filters, cuts, or transformations identified earlier  
   - Handle any missing or anomalous values as determined in the exploration notebook

4. **Set up a clean dataset**  
   - Save a cleaned version ready for downstream analysis  
   - Ensure every star is properly identified and structured

---

## Folder & File References

- **Data/Stars/1_Start/gaia_data.csv** → Original Gaia dataset  
- **Data/Stars/4_Final/gaia_data_cleaned.csv** → Cleaned and prepared dataset

---

> Note: This notebook directly uses the **findings from the star exploration notebook**. Star names are not included at this stage; all stars are identified by their Gaia `Source` ID.


## 0) Imports

In [1]:
import pandas as pd
import numpy as np
import time
from datetime import datetime

In [2]:
# --- START TOTAL TIMER ---
total_start = time.time()

In [3]:
# =============================
# Preparation section
# =============================
prep_start = time.time()

## 1) Loading the dataset

In [4]:
prep_start = time.time()

In [5]:
Dataframe = pd.read_csv('../../Data/Stars/3_Names/dataGaia2_Names.csv')

In [6]:
Dataframe.head()

Unnamed: 0.1,Star_Name,Source,RA_ICRS,DE_ICRS,Gmag,BPmag,RPmag,orig_index,Unnamed: 0,e_RA_ICRS,...,Lum-Flame,Mass-Flame,Age-Flame,z-Flame,Evol,SpType-ELS,Flags-HS,EWHa,e_EWHa,f_EWHa
0,LB 2844,1306361548360576,44.589012,2.195298,16.908537,16.761435,17.13404,0,0,0.0655,...,298.48514,,,1.507248,,O,92,0.02128,0.19309,0
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,16.962143,16.841173,17.193855,1,1,0.0658,...,292.6721,,,1.332909,,O,92,0.02144,0.176,0
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,16.407494,16.382404,16.429598,2,2,0.0627,...,506.7328,,,1.582338,,O,92,0.13726,0.13364,0
3,PG 0310+149,31009771252186752,48.404909,15.105912,15.607131,15.4976,15.672352,3,3,0.05,...,9.055018,,,1.561845,,O,93,0.0947,0.10548,0
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,16.372738,16.3549,16.38479,4,4,0.0521,...,311.50284,,,1.521734,,O,92,0.08682,0.07831,0


In [7]:
prep_end = time.time()
prep_elapsed = prep_end - prep_start

## 3) Variable information

### Applying Exploration Findings: Variable Selection

Based on the analysis in the **star exploration notebook**, we have identified which variables are practical to keep, which may be too complex, and which can be ignored for the initial cleaning and preparation of the Gaia star dataset.

#### Variables to Keep

These are straightforward, interpretable, and directly useful for analysis:

- **Source** — unique identifier for each star  
- **RA_ICRS**, **DE_ICRS** — celestial positions  
- **Plx** — parallax, needed for distance calculations  
- **PM**, **pmRA**, **pmDE** — proper motion measurements  
- **Gmag**, **BPmag**, **RPmag**, **GRVSmag** — brightness and color indicators  
- **logg**, **[Fe/H]**, **Teff** — basic physical properties  
- **Dist** — derived distance in parsecs  
- **E(BP-RP)** — color excess for dust correction  
- **GMAG**, **Rad**, **Lum-Flame**, **Mass-Flame**, **Age-Flame** — intrinsic stellar properties  
- **SpType-ELS** — spectral type classification  

#### Variables to Reconsider or Simplify

Useful but potentially complex or redundant early on:

- Measurement uncertainties: **e_RA_ICRS**, **e_DE_ICRS**, **e_Plx**, **e_pmRA**, **e_pmDE**, **e_Gmag**, **e_BPmag**, **e_RPmag**, **e_GRVSmag**, **e_EWHa**  
- Extinction corrections: **A0**, **AG**, **ABP**, **ARP**  
- Classification probabilities: **PQSO**, **PGal**, **Pstar**, **PWD**, **Pbin**  
- Data quality/flags: **RUWE**, **Flags-HS**, **f_EWHa**  

#### Variables Too Complex to Keep (for now)

Require domain-specific modeling or deeper interpretation:

- **z-Flame**, **Evol**, **EWHa**  
- Model-dependent outputs: **Rad-Flame**, **Mass-Flame**, **Lum-Flame**, **Age-Flame**  

#### Summary

- Start with **positional, photometric, and basic physical** parameters.  
- Keep the dataset lean and interpretable.  
- More complex or model-derived columns can be revisited later for advanced astrophysical analysis.


---
### Variables Likely to Keep
---

In [8]:
# =============================
# ✅ Keep section
# =============================
keep_start = time.time()

In [9]:
Final_DF = pd.DataFrame()

## Column Copy & Type Conversion

We often need to copy columns from the original DataFrame to our cleaned one, convert their type, and verify they match.  

This **function** does all of that in one step:  

- Copies a column to the new DataFrame  
- Converts its type (manual or from the original)  
- Prints the original and new column heads, plus the new DataFrame head  
- Confirms the copied column matches the original  

```python
# Example usage
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "Source", dtype=int)
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "Star_Name", dtype=str)


In [10]:
def copy_and_cast_column(old_df, new_df, column_name, dtype=None, copy_dtype=False):
    """
    Copies a column from old_df to new_df with optional type conversion.
    Prints the head of the column before and after, and the head of new_df.
    Confirms if the copied column matches the original (content-wise) and
    checks that all previously copied columns are still aligned.
    
    Parameters:
    - old_df: source DataFrame
    - new_df: target DataFrame
    - column_name: str, name of the column to copy
    - dtype: type to cast to (e.g., int, str, float). Ignored if copy_dtype=True
    - copy_dtype: bool, if True, copy dtype from old_df
    """
    
    # Show original column
    print(f"Original '{column_name}' column head:")
    print(old_df[column_name].head(), "\n")
    
    # Determine dtype to use
    if copy_dtype:
        dtype_to_use = old_df[column_name].dtype
    elif dtype is not None:
        dtype_to_use = dtype
    else:
        dtype_to_use = old_df[column_name].dtype
    
    # Copy and cast column
    try:
        new_df[column_name] = old_df[column_name].astype(dtype_to_use)
    except Exception as e:
        print(f"Error casting column '{column_name}' to {dtype_to_use}: {e}")
        new_df[column_name] = old_df[column_name]  # fallback
    
    # Show new column and new_df head
    print(f"Copied '{column_name}' column head in new DataFrame:")
    print(new_df[column_name].head(), "\n")
    
    print("New DataFrame head:")
    print(new_df.head(), "\n")
    
    # Check if this column matches the original
    if new_df[column_name].equals(old_df[column_name].astype(new_df[column_name].dtype)):
        print(f"✅ '{column_name}' successfully copied and matches original.\n")
    else:
        print(f"⚠️ '{column_name}' does NOT match the original.\n")
    
    # Check overall alignment of all columns in new_df that exist in old_df
    common_cols = old_df.columns.intersection(new_df.columns)
    if new_df[common_cols].equals(old_df[common_cols].astype(new_df[common_cols].dtypes)):
        print("✅ All copied columns are aligned with the original DataFrame.\n")
    else:
        print("⚠️ There are mismatches or row shifts in the copied columns!\n")
    
    return new_df


### 3.1) `Star_Name` → unique star name (Source if no readable name found)  

In [11]:
# Show the original column head first
print(Dataframe["Star_Name"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "Star_Name", copy_dtype=True)


0                      LB  2844
1        GALEX J022125.9+085919
2    Gaia DR3 27109837867995776
3                   PG 0310+149
4              UCAC4 508-006228
Name: Star_Name, dtype: object 

Original 'Star_Name' column head:
0                      LB  2844
1        GALEX J022125.9+085919
2    Gaia DR3 27109837867995776
3                   PG 0310+149
4              UCAC4 508-006228
Name: Star_Name, dtype: object 

Copied 'Star_Name' column head in new DataFrame:
0                      LB  2844
1        GALEX J022125.9+085919
2    Gaia DR3 27109837867995776
3                   PG 0310+149
4              UCAC4 508-006228
Name: Star_Name, dtype: object 

New DataFrame head:
                    Star_Name
0                    LB  2844
1      GALEX J022125.9+085919
2  Gaia DR3 27109837867995776
3                 PG 0310+149
4            UCAC4 508-006228 

✅ 'Star_Name' successfully copied and matches original.

✅ All copied columns are aligned with the original DataFrame.



### 3.1) `Source` → unique star ID  

In [12]:
# Show the original column head first
print(Dataframe["Source"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "Source", copy_dtype=True)


0     1306361548360576
1    23700286669971584
2    27109837867995776
3    31009771252186752
4    36876009385300352
Name: Source, dtype: int64 

Original 'Source' column head:
0     1306361548360576
1    23700286669971584
2    27109837867995776
3    31009771252186752
4    36876009385300352
Name: Source, dtype: int64 

Copied 'Source' column head in new DataFrame:
0     1306361548360576
1    23700286669971584
2    27109837867995776
3    31009771252186752
4    36876009385300352
Name: Source, dtype: int64 

New DataFrame head:
                    Star_Name             Source
0                    LB  2844   1306361548360576
1      GALEX J022125.9+085919  23700286669971584
2  Gaia DR3 27109837867995776  27109837867995776
3                 PG 0310+149  31009771252186752
4            UCAC4 508-006228  36876009385300352 

✅ 'Source' successfully copied and matches original.

✅ All copied columns are aligned with the original DataFrame.



### 3.2 `RA_ICRS` → right ascension  

In [13]:
# Show the original column head first
print(Dataframe["RA_ICRS"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "RA_ICRS", copy_dtype=True)


0    44.589012
1    35.358035
2    44.450767
3    48.404909
4    57.092838
Name: RA_ICRS, dtype: float64 

Original 'RA_ICRS' column head:
0    44.589012
1    35.358035
2    44.450767
3    48.404909
4    57.092838
Name: RA_ICRS, dtype: float64 

Copied 'RA_ICRS' column head in new DataFrame:
0    44.589012
1    35.358035
2    44.450767
3    48.404909
4    57.092838
Name: RA_ICRS, dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS
0                    LB  2844   1306361548360576  44.589012
1      GALEX J022125.9+085919  23700286669971584  35.358035
2  Gaia DR3 27109837867995776  27109837867995776  44.450767
3                 PG 0310+149  31009771252186752  48.404909
4            UCAC4 508-006228  36876009385300352  57.092838 

✅ 'RA_ICRS' successfully copied and matches original.

✅ All copied columns are aligned with the original DataFrame.



In [14]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS
0,LB 2844,1306361548360576,44.589012
1,GALEX J022125.9+085919,23700286669971584,35.358035
2,Gaia DR3 27109837867995776,27109837867995776,44.450767
3,PG 0310+149,31009771252186752,48.404909
4,UCAC4 508-006228,36876009385300352,57.092838


### 3.3) `DE_ICRS` → declination  

In [15]:
# Show the original column head first
print(Dataframe["DE_ICRS"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "DE_ICRS", copy_dtype=True)

0     2.195298
1     8.988813
2    10.079118
3    15.105912
4    11.550927
Name: DE_ICRS, dtype: float64 

Original 'DE_ICRS' column head:
0     2.195298
1     8.988813
2    10.079118
3    15.105912
4    11.550927
Name: DE_ICRS, dtype: float64 

Copied 'DE_ICRS' column head in new DataFrame:
0     2.195298
1     8.988813
2    10.079118
3    15.105912
4    11.550927
Name: DE_ICRS, dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS
0                    LB  2844   1306361548360576  44.589012   2.195298
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118
3                 PG 0310+149  31009771252186752  48.404909  15.105912
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927 

✅ 'DE_ICRS' successfully copied and matches original.

✅ All copied columns are aligned with the original DataFrame.



In [16]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS
0,LB 2844,1306361548360576,44.589012,2.195298
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118
3,PG 0310+149,31009771252186752,48.404909,15.105912
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927


### 3.4) `Plx` → parallax (for distance)  

In [17]:
# Show the original column head first
print(Dataframe["Plx"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "Plx", copy_dtype=True)

0    0.2384
1    0.1666
2    0.3544
3    0.5962
4    0.4507
Name: Plx, dtype: float64 

Original 'Plx' column head:
0    0.2384
1    0.1666
2    0.3544
3    0.5962
4    0.4507
Name: Plx, dtype: float64 

Copied 'Plx' column head in new DataFrame:
0    0.2384
1    0.1666
2    0.3544
3    0.5962
4    0.4507
Name: Plx, dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS     Plx
0                    LB  2844   1306361548360576  44.589012   2.195298  0.2384
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813  0.1666
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118  0.3544
3                 PG 0310+149  31009771252186752  48.404909  15.105912  0.5962
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927  0.4507 

✅ 'Plx' successfully copied and matches original.

✅ All copied columns are aligned with the original DataFrame.



In [18]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507


### 3.5) `PM` → total proper motion  

In [19]:
# Show the original column head first
print(Dataframe["PM"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "PM", copy_dtype=True)

0    2.901
1    4.402
2    3.154
3    2.745
4    4.918
Name: PM, dtype: float64 

Original 'PM' column head:
0    2.901
1    4.402
2    3.154
3    2.745
4    4.918
Name: PM, dtype: float64 

Copied 'PM' column head in new DataFrame:
0    2.901
1    4.402
2    3.154
3    2.745
4    4.918
Name: PM, dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS  \
0                    LB  2844   1306361548360576  44.589012   2.195298   
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813   
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118   
3                 PG 0310+149  31009771252186752  48.404909  15.105912   
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927   

      Plx     PM  
0  0.2384  2.901  
1  0.1666  4.402  
2  0.3544  3.154  
3  0.5962  2.745  
4  0.4507  4.918   

✅ 'PM' successfully copied and matches original.

✅ All copied columns are aligned with the original Dat

In [20]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918


### 3.6) `pmRA` → proper motion in RA  

In [21]:
# Show the original column head first
print(Dataframe["pmRA"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "pmRA", copy_dtype=True)

0    2.088
1   -0.242
2    2.722
3    2.460
4   -2.851
Name: pmRA, dtype: float64 

Original 'pmRA' column head:
0    2.088
1   -0.242
2    2.722
3    2.460
4   -2.851
Name: pmRA, dtype: float64 

Copied 'pmRA' column head in new DataFrame:
0    2.088
1   -0.242
2    2.722
3    2.460
4   -2.851
Name: pmRA, dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS  \
0                    LB  2844   1306361548360576  44.589012   2.195298   
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813   
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118   
3                 PG 0310+149  31009771252186752  48.404909  15.105912   
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927   

      Plx     PM   pmRA  
0  0.2384  2.901  2.088  
1  0.1666  4.402 -0.242  
2  0.3544  3.154  2.722  
3  0.5962  2.745  2.460  
4  0.4507  4.918 -2.851   

✅ 'pmRA' successfully copied and matches original.



In [22]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851


### 3.7) `pmDE` → proper motion in Dec 

In [23]:
# Show the original column head first
print(Dataframe["pmDE"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "pmDE", copy_dtype=True)

0   -2.014
1   -4.396
2    1.593
3    1.218
4   -4.008
Name: pmDE, dtype: float64 

Original 'pmDE' column head:
0   -2.014
1   -4.396
2    1.593
3    1.218
4   -4.008
Name: pmDE, dtype: float64 

Copied 'pmDE' column head in new DataFrame:
0   -2.014
1   -4.396
2    1.593
3    1.218
4   -4.008
Name: pmDE, dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS  \
0                    LB  2844   1306361548360576  44.589012   2.195298   
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813   
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118   
3                 PG 0310+149  31009771252186752  48.404909  15.105912   
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927   

      Plx     PM   pmRA   pmDE  
0  0.2384  2.901  2.088 -2.014  
1  0.1666  4.402 -0.242 -4.396  
2  0.3544  3.154  2.722  1.593  
3  0.5962  2.745  2.460  1.218  
4  0.4507  4.918 -2.851 -4.008   

✅ 'pmDE' s

In [24]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA,pmDE
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088,-2.014
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242,-4.396
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722,1.593
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46,1.218
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851,-4.008


### 3.8) `Dist` → distance (derived from parallax) 

In [25]:
# Show the original column head first
print(Dataframe["Dist"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "Dist", copy_dtype=True)

0    19867.748
1    21021.896
2    14943.434
3    11625.332
4    12459.044
Name: Dist, dtype: float64 

Original 'Dist' column head:
0    19867.748
1    21021.896
2    14943.434
3    11625.332
4    12459.044
Name: Dist, dtype: float64 

Copied 'Dist' column head in new DataFrame:
0    19867.748
1    21021.896
2    14943.434
3    11625.332
4    12459.044
Name: Dist, dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS  \
0                    LB  2844   1306361548360576  44.589012   2.195298   
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813   
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118   
3                 PG 0310+149  31009771252186752  48.404909  15.105912   
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927   

      Plx     PM   pmRA   pmDE       Dist  
0  0.2384  2.901  2.088 -2.014  19867.748  
1  0.1666  4.402 -0.242 -4.396  21021.896  
2  0.3544  3.154  

In [26]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA,pmDE,Dist
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088,-2.014,19867.748
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242,-4.396,21021.896
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722,1.593,14943.434
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46,1.218,11625.332
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851,-4.008,12459.044


### 3.9) `Gmag` → Gaia G-band magnitude  

In [27]:
# Show the original column head first
print(Dataframe["Gmag"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "Gmag", copy_dtype=True)

0    16.908537
1    16.962143
2    16.407494
3    15.607131
4    16.372738
Name: Gmag, dtype: float64 

Original 'Gmag' column head:
0    16.908537
1    16.962143
2    16.407494
3    15.607131
4    16.372738
Name: Gmag, dtype: float64 

Copied 'Gmag' column head in new DataFrame:
0    16.908537
1    16.962143
2    16.407494
3    15.607131
4    16.372738
Name: Gmag, dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS  \
0                    LB  2844   1306361548360576  44.589012   2.195298   
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813   
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118   
3                 PG 0310+149  31009771252186752  48.404909  15.105912   
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927   

      Plx     PM   pmRA   pmDE       Dist       Gmag  
0  0.2384  2.901  2.088 -2.014  19867.748  16.908537  
1  0.1666  4.402 -0.242 -4.396  21021.89

In [28]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA,pmDE,Dist,Gmag
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088,-2.014,19867.748,16.908537
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242,-4.396,21021.896,16.962143
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722,1.593,14943.434,16.407494
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46,1.218,11625.332,15.607131
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851,-4.008,12459.044,16.372738


### 3.10) `BPmag` → blue-band magnitude  

In [29]:
# Show the original column head first
print(Dataframe["BPmag"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "BPmag", copy_dtype=True)

0    16.761435
1    16.841173
2    16.382404
3    15.497600
4    16.354900
Name: BPmag, dtype: float64 

Original 'BPmag' column head:
0    16.761435
1    16.841173
2    16.382404
3    15.497600
4    16.354900
Name: BPmag, dtype: float64 

Copied 'BPmag' column head in new DataFrame:
0    16.761435
1    16.841173
2    16.382404
3    15.497600
4    16.354900
Name: BPmag, dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS  \
0                    LB  2844   1306361548360576  44.589012   2.195298   
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813   
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118   
3                 PG 0310+149  31009771252186752  48.404909  15.105912   
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927   

      Plx     PM   pmRA   pmDE       Dist       Gmag      BPmag  
0  0.2384  2.901  2.088 -2.014  19867.748  16.908537  16.761435  
1  0.1666  4.

In [30]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA,pmDE,Dist,Gmag,BPmag
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088,-2.014,19867.748,16.908537,16.761435
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242,-4.396,21021.896,16.962143,16.841173
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722,1.593,14943.434,16.407494,16.382404
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46,1.218,11625.332,15.607131,15.4976
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851,-4.008,12459.044,16.372738,16.3549


### 3.11) `RPmag` → red-band magnitude  

In [31]:
# Show the original column head first
print(Dataframe["RPmag"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "RPmag", copy_dtype=True)

0    17.134040
1    17.193855
2    16.429598
3    15.672352
4    16.384790
Name: RPmag, dtype: float64 

Original 'RPmag' column head:
0    17.134040
1    17.193855
2    16.429598
3    15.672352
4    16.384790
Name: RPmag, dtype: float64 

Copied 'RPmag' column head in new DataFrame:
0    17.134040
1    17.193855
2    16.429598
3    15.672352
4    16.384790
Name: RPmag, dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS  \
0                    LB  2844   1306361548360576  44.589012   2.195298   
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813   
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118   
3                 PG 0310+149  31009771252186752  48.404909  15.105912   
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927   

      Plx     PM   pmRA   pmDE       Dist       Gmag      BPmag      RPmag  
0  0.2384  2.901  2.088 -2.014  19867.748  16.908537  16.761435  17.

In [32]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA,pmDE,Dist,Gmag,BPmag,RPmag
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088,-2.014,19867.748,16.908537,16.761435,17.13404
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242,-4.396,21021.896,16.962143,16.841173,17.193855
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722,1.593,14943.434,16.407494,16.382404,16.429598
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46,1.218,11625.332,15.607131,15.4976,15.672352
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851,-4.008,12459.044,16.372738,16.3549,16.38479


### 3.12) `Teff` → effective temperature 

In [33]:
# Show the original column head first
print(Dataframe["Teff"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "Teff", copy_dtype=True)

0    18148.611
1    17500.236
2    19761.363
3    19486.800
4    18453.346
Name: Teff, dtype: float64 

Original 'Teff' column head:
0    18148.611
1    17500.236
2    19761.363
3    19486.800
4    18453.346
Name: Teff, dtype: float64 

Copied 'Teff' column head in new DataFrame:
0    18148.611
1    17500.236
2    19761.363
3    19486.800
4    18453.346
Name: Teff, dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS  \
0                    LB  2844   1306361548360576  44.589012   2.195298   
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813   
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118   
3                 PG 0310+149  31009771252186752  48.404909  15.105912   
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927   

      Plx     PM   pmRA   pmDE       Dist       Gmag      BPmag      RPmag  \
0  0.2384  2.901  2.088 -2.014  19867.748  16.908537  16.761435  17.1340

In [34]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA,pmDE,Dist,Gmag,BPmag,RPmag,Teff
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088,-2.014,19867.748,16.908537,16.761435,17.13404,18148.611
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242,-4.396,21021.896,16.962143,16.841173,17.193855,17500.236
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722,1.593,14943.434,16.407494,16.382404,16.429598,19761.363
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46,1.218,11625.332,15.607131,15.4976,15.672352,19486.8
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851,-4.008,12459.044,16.372738,16.3549,16.38479,18453.346


### 3.13) `logg` → surface gravity  

In [35]:
# Show the original column head first
print(Dataframe["logg"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "logg", copy_dtype=True)

0    4.5728
1    4.4948
2    4.5526
3    4.5211
4    4.5807
Name: logg, dtype: float64 

Original 'logg' column head:
0    4.5728
1    4.4948
2    4.5526
3    4.5211
4    4.5807
Name: logg, dtype: float64 

Copied 'logg' column head in new DataFrame:
0    4.5728
1    4.4948
2    4.5526
3    4.5211
4    4.5807
Name: logg, dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS  \
0                    LB  2844   1306361548360576  44.589012   2.195298   
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813   
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118   
3                 PG 0310+149  31009771252186752  48.404909  15.105912   
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927   

      Plx     PM   pmRA   pmDE       Dist       Gmag      BPmag      RPmag  \
0  0.2384  2.901  2.088 -2.014  19867.748  16.908537  16.761435  17.134040   
1  0.1666  4.402 -0.242 -4.396  21021.8

In [36]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA,pmDE,Dist,Gmag,BPmag,RPmag,Teff,logg
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088,-2.014,19867.748,16.908537,16.761435,17.13404,18148.611,4.5728
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242,-4.396,21021.896,16.962143,16.841173,17.193855,17500.236,4.4948
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722,1.593,14943.434,16.407494,16.382404,16.429598,19761.363,4.5526
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46,1.218,11625.332,15.607131,15.4976,15.672352,19486.8,4.5211
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851,-4.008,12459.044,16.372738,16.3549,16.38479,18453.346,4.5807


### 3.14) `[Fe/H]` → metallicity  

In [37]:
# Show the original column head first
print(Dataframe["[Fe/H]"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "[Fe/H]", copy_dtype=True)

0   -0.9759
1   -1.1661
2   -0.9919
3   -0.9392
4   -0.9965
Name: [Fe/H], dtype: float64 

Original '[Fe/H]' column head:
0   -0.9759
1   -1.1661
2   -0.9919
3   -0.9392
4   -0.9965
Name: [Fe/H], dtype: float64 

Copied '[Fe/H]' column head in new DataFrame:
0   -0.9759
1   -1.1661
2   -0.9919
3   -0.9392
4   -0.9965
Name: [Fe/H], dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS  \
0                    LB  2844   1306361548360576  44.589012   2.195298   
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813   
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118   
3                 PG 0310+149  31009771252186752  48.404909  15.105912   
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927   

      Plx     PM   pmRA   pmDE       Dist       Gmag      BPmag      RPmag  \
0  0.2384  2.901  2.088 -2.014  19867.748  16.908537  16.761435  17.134040   
1  0.1666  4.402 -0.242 -4.39

In [38]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA,pmDE,Dist,Gmag,BPmag,RPmag,Teff,logg,[Fe/H]
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088,-2.014,19867.748,16.908537,16.761435,17.13404,18148.611,4.5728,-0.9759
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242,-4.396,21021.896,16.962143,16.841173,17.193855,17500.236,4.4948,-1.1661
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722,1.593,14943.434,16.407494,16.382404,16.429598,19761.363,4.5526,-0.9919
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46,1.218,11625.332,15.607131,15.4976,15.672352,19486.8,4.5211,-0.9392
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851,-4.008,12459.044,16.372738,16.3549,16.38479,18453.346,4.5807,-0.9965


### 3.15) `Rad` → radius  

In [39]:
# Show the original column head first
print(Dataframe["Rad"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "Rad", copy_dtype=True)

0    1.6635
1    1.7470
2    1.8439
3    1.8604
4    1.6652
Name: Rad, dtype: float64 

Original 'Rad' column head:
0    1.6635
1    1.7470
2    1.8439
3    1.8604
4    1.6652
Name: Rad, dtype: float64 

Copied 'Rad' column head in new DataFrame:
0    1.6635
1    1.7470
2    1.8439
3    1.8604
4    1.6652
Name: Rad, dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS  \
0                    LB  2844   1306361548360576  44.589012   2.195298   
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813   
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118   
3                 PG 0310+149  31009771252186752  48.404909  15.105912   
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927   

      Plx     PM   pmRA   pmDE       Dist       Gmag      BPmag      RPmag  \
0  0.2384  2.901  2.088 -2.014  19867.748  16.908537  16.761435  17.134040   
1  0.1666  4.402 -0.242 -4.396  21021.896  1

In [40]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA,pmDE,Dist,Gmag,BPmag,RPmag,Teff,logg,[Fe/H],Rad
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088,-2.014,19867.748,16.908537,16.761435,17.13404,18148.611,4.5728,-0.9759,1.6635
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242,-4.396,21021.896,16.962143,16.841173,17.193855,17500.236,4.4948,-1.1661,1.747
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722,1.593,14943.434,16.407494,16.382404,16.429598,19761.363,4.5526,-0.9919,1.8439
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46,1.218,11625.332,15.607131,15.4976,15.672352,19486.8,4.5211,-0.9392,1.8604
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851,-4.008,12459.044,16.372738,16.3549,16.38479,18453.346,4.5807,-0.9965,1.6652


### 3.16) `Lum-Flame` → luminosity

In [41]:
# Show the original column head first
print(Dataframe["Lum-Flame"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "Lum-Flame", copy_dtype=True)

0    298.485140
1    292.672100
2    506.732800
3      9.055018
4    311.502840
Name: Lum-Flame, dtype: float64 

Original 'Lum-Flame' column head:
0    298.485140
1    292.672100
2    506.732800
3      9.055018
4    311.502840
Name: Lum-Flame, dtype: float64 

Copied 'Lum-Flame' column head in new DataFrame:
0    298.485140
1    292.672100
2    506.732800
3      9.055018
4    311.502840
Name: Lum-Flame, dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS  \
0                    LB  2844   1306361548360576  44.589012   2.195298   
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813   
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118   
3                 PG 0310+149  31009771252186752  48.404909  15.105912   
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927   

      Plx     PM   pmRA   pmDE       Dist       Gmag      BPmag      RPmag  \
0  0.2384  2.901  2.088 -2.014  

In [42]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA,pmDE,Dist,Gmag,BPmag,RPmag,Teff,logg,[Fe/H],Rad,Lum-Flame
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088,-2.014,19867.748,16.908537,16.761435,17.13404,18148.611,4.5728,-0.9759,1.6635,298.48514
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242,-4.396,21021.896,16.962143,16.841173,17.193855,17500.236,4.4948,-1.1661,1.747,292.6721
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722,1.593,14943.434,16.407494,16.382404,16.429598,19761.363,4.5526,-0.9919,1.8439,506.7328
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46,1.218,11625.332,15.607131,15.4976,15.672352,19486.8,4.5211,-0.9392,1.8604,9.055018
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851,-4.008,12459.044,16.372738,16.3549,16.38479,18453.346,4.5807,-0.9965,1.6652,311.50284


### 3.17) `Mass-Flame` → mass 

In [43]:
# Show the original column head first
print(Dataframe["Mass-Flame"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "Mass-Flame", copy_dtype=True)

0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
Name: Mass-Flame, dtype: float64 

Original 'Mass-Flame' column head:
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
Name: Mass-Flame, dtype: float64 

Copied 'Mass-Flame' column head in new DataFrame:
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
Name: Mass-Flame, dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS  \
0                    LB  2844   1306361548360576  44.589012   2.195298   
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813   
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118   
3                 PG 0310+149  31009771252186752  48.404909  15.105912   
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927   

      Plx     PM   pmRA   pmDE       Dist       Gmag      BPmag      RPmag  \
0  0.2384  2.901  2.088 -2.014  19867.748  16.908537  16.761435  17.134040   
1  0.1666  4.402 -0.242 -4.396  21021.896  16.962143  16.841173  17.1

In [44]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA,pmDE,Dist,Gmag,BPmag,RPmag,Teff,logg,[Fe/H],Rad,Lum-Flame,Mass-Flame
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088,-2.014,19867.748,16.908537,16.761435,17.13404,18148.611,4.5728,-0.9759,1.6635,298.48514,
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242,-4.396,21021.896,16.962143,16.841173,17.193855,17500.236,4.4948,-1.1661,1.747,292.6721,
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722,1.593,14943.434,16.407494,16.382404,16.429598,19761.363,4.5526,-0.9919,1.8439,506.7328,
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46,1.218,11625.332,15.607131,15.4976,15.672352,19486.8,4.5211,-0.9392,1.8604,9.055018,
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851,-4.008,12459.044,16.372738,16.3549,16.38479,18453.346,4.5807,-0.9965,1.6652,311.50284,


### 3.18) `Age-Flame` → age  

In [45]:
# Show the original column head first
print(Dataframe["Age-Flame"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "Age-Flame", copy_dtype=True)

0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
Name: Age-Flame, dtype: float64 

Original 'Age-Flame' column head:
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
Name: Age-Flame, dtype: float64 

Copied 'Age-Flame' column head in new DataFrame:
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
Name: Age-Flame, dtype: float64 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS  \
0                    LB  2844   1306361548360576  44.589012   2.195298   
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813   
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118   
3                 PG 0310+149  31009771252186752  48.404909  15.105912   
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927   

      Plx     PM   pmRA   pmDE       Dist       Gmag      BPmag      RPmag  \
0  0.2384  2.901  2.088 -2.014  19867.748  16.908537  16.761435  17.134040   
1  0.1666  4.402 -0.242 -4.396  21021.896  16.962143  16.841173  17.193855

In [46]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA,pmDE,Dist,Gmag,BPmag,RPmag,Teff,logg,[Fe/H],Rad,Lum-Flame,Mass-Flame,Age-Flame
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088,-2.014,19867.748,16.908537,16.761435,17.13404,18148.611,4.5728,-0.9759,1.6635,298.48514,,
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242,-4.396,21021.896,16.962143,16.841173,17.193855,17500.236,4.4948,-1.1661,1.747,292.6721,,
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722,1.593,14943.434,16.407494,16.382404,16.429598,19761.363,4.5526,-0.9919,1.8439,506.7328,,
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46,1.218,11625.332,15.607131,15.4976,15.672352,19486.8,4.5211,-0.9392,1.8604,9.055018,,
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851,-4.008,12459.044,16.372738,16.3549,16.38479,18453.346,4.5807,-0.9965,1.6652,311.50284,,


### 3.19) `SpType-ELS` → spectral type  

In [47]:
# Show the original column head first
print(Dataframe["SpType-ELS"].head(), "\n")

# Copy column to Final_DF using its original dtype
Final_DF = copy_and_cast_column(Dataframe, Final_DF, "SpType-ELS", copy_dtype=True)

0    O      
1    O      
2    O      
3    O      
4    O      
Name: SpType-ELS, dtype: object 

Original 'SpType-ELS' column head:
0    O      
1    O      
2    O      
3    O      
4    O      
Name: SpType-ELS, dtype: object 

Copied 'SpType-ELS' column head in new DataFrame:
0    O      
1    O      
2    O      
3    O      
4    O      
Name: SpType-ELS, dtype: object 

New DataFrame head:
                    Star_Name             Source    RA_ICRS    DE_ICRS  \
0                    LB  2844   1306361548360576  44.589012   2.195298   
1      GALEX J022125.9+085919  23700286669971584  35.358035   8.988813   
2  Gaia DR3 27109837867995776  27109837867995776  44.450767  10.079118   
3                 PG 0310+149  31009771252186752  48.404909  15.105912   
4            UCAC4 508-006228  36876009385300352  57.092838  11.550927   

      Plx     PM   pmRA   pmDE       Dist       Gmag      BPmag      RPmag  \
0  0.2384  2.901  2.088 -2.014  19867.748  16.908537  16.761435  17.134040 

In [48]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA,pmDE,Dist,Gmag,BPmag,RPmag,Teff,logg,[Fe/H],Rad,Lum-Flame,Mass-Flame,Age-Flame,SpType-ELS
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088,-2.014,19867.748,16.908537,16.761435,17.13404,18148.611,4.5728,-0.9759,1.6635,298.48514,,,O
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242,-4.396,21021.896,16.962143,16.841173,17.193855,17500.236,4.4948,-1.1661,1.747,292.6721,,,O
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722,1.593,14943.434,16.407494,16.382404,16.429598,19761.363,4.5526,-0.9919,1.8439,506.7328,,,O
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46,1.218,11625.332,15.607131,15.4976,15.672352,19486.8,4.5211,-0.9392,1.8604,9.055018,,,O
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851,-4.008,12459.044,16.372738,16.3549,16.38479,18453.346,4.5807,-0.9965,1.6652,311.50284,,,O


In [49]:
keep_end = time.time()
keep_elapsed = keep_end - keep_start

---
### Variables to Reconsider / Simplify
---

In [50]:
# =============================
# ⚠️ Reconsider section
# =============================
reconsider_start = time.time()

### 3.20) `GRVSmag` → radial velocity magnitude  

In [51]:
Dataframe["GRVSmag"].tail() # .tail becaise .head is all NaN

626011    12.879037
626012    13.860483
626013    13.792645
626014    10.761367
626015    14.094444
Name: GRVSmag, dtype: float64

In [52]:
# describtion 
Dataframe['GRVSmag'].describe()

count    509709.000000
mean         12.415666
std           1.361668
min           2.856422
25%          11.654520
50%          12.835275
75%          13.474037
max          14.099996
Name: GRVSmag, dtype: float64

In [53]:
# Compute median of GRVSmag ignoring NaNs
grvs_median = Dataframe["GRVSmag"].median()

# Add only the new categorical column to Final_DF
Final_DF["GRVSmag_category"] = np.where(
    Dataframe["GRVSmag"].isna(), "unknown",
    np.where(Dataframe["GRVSmag"] < grvs_median, "approaching", "receding")
)

In [54]:
# Check the new column
Final_DF["GRVSmag_category"].tail()

626011       receding
626012       receding
626013       receding
626014    approaching
626015       receding
Name: GRVSmag_category, dtype: object

In [55]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA,pmDE,Dist,Gmag,...,RPmag,Teff,logg,[Fe/H],Rad,Lum-Flame,Mass-Flame,Age-Flame,SpType-ELS,GRVSmag_category
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088,-2.014,19867.748,16.908537,...,17.13404,18148.611,4.5728,-0.9759,1.6635,298.48514,,,O,unknown
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242,-4.396,21021.896,16.962143,...,17.193855,17500.236,4.4948,-1.1661,1.747,292.6721,,,O,unknown
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722,1.593,14943.434,16.407494,...,16.429598,19761.363,4.5526,-0.9919,1.8439,506.7328,,,O,unknown
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46,1.218,11625.332,15.607131,...,15.672352,19486.8,4.5211,-0.9392,1.8604,9.055018,,,O,unknown
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851,-4.008,12459.044,16.372738,...,16.38479,18453.346,4.5807,-0.9965,1.6652,311.50284,,,O,unknown


### 3.21) `RV` → radial velocity 

In [56]:
Dataframe["RV"].tail()

626011      2.59
626012    -98.80
626013    135.18
626014    -16.75
626015    -35.44
Name: RV, dtype: float64

In [57]:
# describtion 
Dataframe['RV'].describe()

count    510348.000000
mean         -5.882808
std          41.978304
min        -885.200000
25%         -26.390000
50%          -4.970000
75%          17.120000
max         808.890000
Name: RV, dtype: float64

In [58]:
# Compute median of RV ignoring NaNs
rv_median = Dataframe["RV"].median()

# Add only the new categorical column to Final_DF
Final_DF["RV_category"] = np.where(
    Dataframe["RV"].isna(), "unknown",
    np.where(Dataframe["RV"] < rv_median, "approaching", "receding")
)


In [59]:
# Check the new column
Final_DF[["RV_category"]].tail()


Unnamed: 0,RV_category
626011,receding
626012,approaching
626013,receding
626014,approaching
626015,approaching


In [60]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA,pmDE,Dist,Gmag,...,Teff,logg,[Fe/H],Rad,Lum-Flame,Mass-Flame,Age-Flame,SpType-ELS,GRVSmag_category,RV_category
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088,-2.014,19867.748,16.908537,...,18148.611,4.5728,-0.9759,1.6635,298.48514,,,O,unknown,unknown
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242,-4.396,21021.896,16.962143,...,17500.236,4.4948,-1.1661,1.747,292.6721,,,O,unknown,unknown
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722,1.593,14943.434,16.407494,...,19761.363,4.5526,-0.9919,1.8439,506.7328,,,O,unknown,unknown
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46,1.218,11625.332,15.607131,...,19486.8,4.5211,-0.9392,1.8604,9.055018,,,O,unknown,unknown
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851,-4.008,12459.044,16.372738,...,18453.346,4.5807,-0.9965,1.6652,311.50284,,,O,unknown,unknown


### 3.22) `z-Flame` → redshift / derived parameter  

In [61]:
Dataframe["z-Flame"].head()

0    1.507248
1    1.332909
2    1.582338
3    1.561845
4    1.521734
Name: z-Flame, dtype: float64

In [62]:
# describtion 
Dataframe['z-Flame'].describe()

count    611293.000000
mean          0.445233
std           0.233019
min           0.001505
25%           0.313836
50%           0.480814
75%           0.570397
max           1.616032
Name: z-Flame, dtype: float64

In [63]:
# Compute quantiles for z-Flame
z_25 = Dataframe["z-Flame"].quantile(0.33)
z_66 = Dataframe["z-Flame"].quantile(0.66)

# Add only the new categorical column to Final_DF
Final_DF["z-Flame_category"] = np.where(
    Dataframe["z-Flame"].isna(), "unknown",
    np.where(Dataframe["z-Flame"] <= z_25, "nearby",
    np.where(Dataframe["z-Flame"] <= z_66, "moderate distance", "far"))
)

In [64]:
Final_DF.head()

Unnamed: 0,Star_Name,Source,RA_ICRS,DE_ICRS,Plx,PM,pmRA,pmDE,Dist,Gmag,...,logg,[Fe/H],Rad,Lum-Flame,Mass-Flame,Age-Flame,SpType-ELS,GRVSmag_category,RV_category,z-Flame_category
0,LB 2844,1306361548360576,44.589012,2.195298,0.2384,2.901,2.088,-2.014,19867.748,16.908537,...,4.5728,-0.9759,1.6635,298.48514,,,O,unknown,unknown,far
1,GALEX J022125.9+085919,23700286669971584,35.358035,8.988813,0.1666,4.402,-0.242,-4.396,21021.896,16.962143,...,4.4948,-1.1661,1.747,292.6721,,,O,unknown,unknown,far
2,Gaia DR3 27109837867995776,27109837867995776,44.450767,10.079118,0.3544,3.154,2.722,1.593,14943.434,16.407494,...,4.5526,-0.9919,1.8439,506.7328,,,O,unknown,unknown,far
3,PG 0310+149,31009771252186752,48.404909,15.105912,0.5962,2.745,2.46,1.218,11625.332,15.607131,...,4.5211,-0.9392,1.8604,9.055018,,,O,unknown,unknown,far
4,UCAC4 508-006228,36876009385300352,57.092838,11.550927,0.4507,4.918,-2.851,-4.008,12459.044,16.372738,...,4.5807,-0.9965,1.6652,311.50284,,,O,unknown,unknown,far


### 3.23)`Evol` → evolutionary stage code  

In [65]:
Dataframe["Evol"].tail()

626011     746.0
626012    1217.0
626013     292.0
626014    1288.0
626015     519.0
Name: Evol, dtype: float64

In [66]:
# describtion 
Dataframe['Evol'].describe()

count    590797.000000
mean        364.190439
std         238.863381
min         100.000000
25%         216.000000
50%         286.000000
75%         457.000000
max        1735.000000
Name: Evol, dtype: float64

In [67]:
# Compute quantiles for Evol ignoring NaNs
evol_25 = Dataframe["Evol"].quantile(0.25)
evol_50 = Dataframe["Evol"].quantile(0.50)
evol_75 = Dataframe["Evol"].quantile(0.75)

# Add only the new categorical column to Final_DF
Final_DF["Evol_category"] = np.where(
    Dataframe["Evol"].isna(), "unknown",
    np.where(Dataframe["Evol"] <= evol_25, "early",
    np.where(Dataframe["Evol"] <= evol_50, "mid",
    np.where(Dataframe["Evol"] <= evol_75, "late", "final"))))


In [68]:
# Check the new column
Final_DF[["Evol_category"]].tail()

Unnamed: 0,Evol_category
626011,final
626012,final
626013,late
626014,final
626015,final


### 3.24) `A0`, `AG`, `ABP`, `ARP`, `E(BP-RP)` → extinction / reddening corrections 

In [69]:
# Missing Values
extinction_columns = ['A0', 'AG', 'ABP', 'ARP', 'E(BP-RP)']

for col in extinction_columns:
    print(Dataframe[col], "\n")


0         0.0076
1         0.0071
2         0.5045
3         0.2342
4         0.5265
           ...  
626011    3.2158
626012    9.9721
626013    6.8247
626014    4.5530
626015    9.5857
Name: A0, Length: 626016, dtype: float64 

0         0.0076
1         0.0071
2         0.5001
3         0.2343
4         0.5193
           ...  
626011    2.3015
626012    5.7604
626013    5.2256
626014    2.7308
626015    5.9190
Name: AG, Length: 626016, dtype: float64 

0         0.0088
1         0.0082
2         0.5830
3         0.2718
4         0.6062
           ...  
626011    3.0893
626012    8.6747
626013    6.8761
626014    4.1371
626015    8.5459
Name: ABP, Length: 626016, dtype: float64 

0         0.0047
1         0.0044
2         0.3134
3         0.1457
4         0.3267
           ...  
626011    1.8487
626012    5.1878
626013    3.9422
626014    2.4100
626015    5.1573
Name: ARP, Length: 626016, dtype: float64 

0         0.0041
1         0.0038
2         0.2696
3         0.1260
4         

In [70]:
for col in extinction_columns:
    # describtion 
    print(Dataframe[col].describe(), "\n")

count    626016.000000
mean          1.439635
std           1.465222
min           0.000000
25%           0.489000
50%           0.930400
75%           1.972000
max          10.000000
Name: A0, dtype: float64 

count    626016.000000
mean          1.134374
std           1.060210
min           0.000000
25%           0.401300
50%           0.770100
75%           1.611000
max           7.185600
Name: AG, dtype: float64 

count    626016.000000
mean          1.466110
std           1.424287
min           0.000000
25%           0.512400
50%           0.972300
75%           2.052700
max           9.754800
Name: ABP, dtype: float64 

count    626016.000000
mean          0.841649
std           0.827668
min           0.000000
25%           0.291900
50%           0.554700
75%           1.171100
max           5.640000
Name: ARP, dtype: float64 

count    626016.000000
mean          0.624461
std           0.598200
min           0.000000
25%           0.219600
50%           0.417200
75%           0.

In [71]:
def add_quantile_category(old_df, new_df, column_name, new_column_name, labels=["low", "moderate", "high", "very high"]):
    """
    Adds a new categorical column to new_df based on quantiles of a numeric column from old_df.
    - Missing values are ignored (kept as NaN)
    - Splits the data into 4 quantiles by default
    """
    # Compute the 25%, 50%, 75% quantiles
    q25, q50, q75 = old_df[column_name].quantile([0.25, 0.5, 0.75])
    
    # Add the categorical column to new_df
    new_df[new_column_name] = np.where(
        old_df[column_name].isna(), "unknown",
        np.where(old_df[column_name] <= q25, labels[0],
        np.where(old_df[column_name] <= q50, labels[1],
        np.where(old_df[column_name] <= q75, labels[2], labels[3])))
    )


In [72]:
for col in extinction_columns:
    add_quantile_category(Dataframe, Final_DF, col, f"{col}_category")


In [73]:
for col in extinction_columns:
    print(Final_DF[f"{col}_category"])

0               low
1               low
2          moderate
3               low
4          moderate
            ...    
626011    very high
626012    very high
626013    very high
626014    very high
626015    very high
Name: A0_category, Length: 626016, dtype: object
0               low
1               low
2          moderate
3               low
4          moderate
            ...    
626011    very high
626012    very high
626013    very high
626014    very high
626015    very high
Name: AG_category, Length: 626016, dtype: object
0               low
1               low
2          moderate
3               low
4          moderate
            ...    
626011    very high
626012    very high
626013    very high
626014    very high
626015    very high
Name: ABP_category, Length: 626016, dtype: object
0               low
1               low
2          moderate
3               low
4          moderate
            ...    
626011    very high
626012    very high
626013    very high
626014    ve

### 3.25) `RUWE` → Gaia astrometric fit quality  

In [74]:
Dataframe["RUWE"].head()

0    1.079
1    0.940
2    1.060
3    1.052
4    1.051
Name: RUWE, dtype: float64

In [75]:
# describtion 
Dataframe['RUWE'].describe()

count    626016.000000
mean          1.476731
std           2.211868
min           0.462000
25%           0.986000
50%           1.036000
75%           1.129000
max          80.725000
Name: RUWE, dtype: float64

In [76]:
# Add only the new categorical column to Final_DF
Final_DF["RUWE_category"] = np.where(
    Dataframe["RUWE"].isna(), "unknown",
    np.where(Dataframe["RUWE"] <= 1.4, "reliable", "potentially unreliable")
)

In [77]:
# Check the new column
Final_DF["RUWE_category"].head()

0    reliable
1    reliable
2    reliable
3    reliable
4    reliable
Name: RUWE_category, dtype: object

### 3.26) `Rad-Flame` → radius from FLAME pipeline

In [78]:
Dataframe["Rad-Flame"].head()

0    1.7096
1    1.8249
2    1.9063
3    0.2579
4    1.7220
Name: Rad-Flame, dtype: float64

In [79]:
# describtion 
Dataframe['Rad-Flame'].describe()

count    611293.000000
mean          6.850684
std          17.021461
min           0.046300
25%           1.343500
50%           2.017800
75%           3.695100
max         181.063300
Name: Rad-Flame, dtype: float64

In [80]:
# Compute quantiles for Rad-Flame ignoring NaNs
rad_33 = Dataframe["Rad-Flame"].quantile(0.33)
rad_66 = Dataframe["Rad-Flame"].quantile(0.66)

# Add only the new categorical column to Final_DF
Final_DF["Rad-Flame_category"] = np.where(
    Dataframe["Rad-Flame"].isna(), "unknown",
    np.where(Dataframe["Rad-Flame"] <= rad_33, "small",
    np.where(Dataframe["Rad-Flame"] <= rad_66, "medium", "large"))
)

In [81]:
# Check the new column
Final_DF["Rad-Flame_category"].head()

0    medium
1    medium
2    medium
3     small
4    medium
Name: Rad-Flame_category, dtype: object

### 2.27) `EWHa`, `f_EWHa`, `e_EWHa` → emission line measurements  

In [82]:
# Missing Values
extinction_columns = ['EWHa', 'f_EWHa', 'e_EWHa']

for col in extinction_columns:
    print(Dataframe[col], "\n")

0         0.02128
1         0.02144
2         0.13726
3         0.09470
4         0.08682
           ...   
626011   -0.06731
626012   -0.41858
626013    0.03248
626014    0.12118
626015   -0.08326
Name: EWHa, Length: 626016, dtype: float64 

0         0
1         0
2         0
3         0
4         0
         ..
626011    1
626012    1
626013    0
626014    1
626015    1
Name: f_EWHa, Length: 626016, dtype: int64 

0         0.19309
1         0.17600
2         0.13364
3         0.10548
4         0.07831
           ...   
626011    0.18151
626012    0.40188
626013    0.12931
626014    0.05450
626015    0.19256
Name: e_EWHa, Length: 626016, dtype: float64 



In [83]:
for col in extinction_columns:
    # describtion 
    print(Dataframe[col].describe(), "\n")

count    626016.000000
mean          0.158441
std           0.227429
min          -6.651180
25%           0.045490
50%           0.147890
75%           0.301950
max           1.883830
Name: EWHa, dtype: float64 

count    626016.000000
mean          0.211773
std           0.408565
min           0.000000
25%           0.000000
50%           0.000000
75%           0.000000
max           1.000000
Name: f_EWHa, dtype: float64 

count    626016.000000
mean          0.037198
std           0.040021
min           0.005400
25%           0.018830
50%           0.027910
75%           0.039490
max           3.296320
Name: e_EWHa, dtype: float64 



In [84]:
def add_activity_category(old_df, new_df, columns, new_column_names=None, labels=["inactive", "moderate", "active"]):
    """
    Adds categorical columns to new_df based on numeric activity-related columns from old_df.
    - Splits values into 3 quantiles (low/mid/high)
    - Missing values are labeled 'unknown'
    
    Parameters:
    - columns: list of column names in old_df
    - new_column_names: optional list of names for new categorical columns
    - labels: list of 3 labels (low/mid/high)
    """
    if new_column_names is None:
        new_column_names = [f"{col}_category" for col in columns]
    
    for col, new_col in zip(columns, new_column_names):
        q33 = old_df[col].quantile(0.33)
        q66 = old_df[col].quantile(0.66)
        
        new_df[new_col] = np.where(
            old_df[col].isna(), "unknown",
            np.where(old_df[col] <= q33, labels[0],
            np.where(old_df[col] <= q66, labels[1], labels[2]))
        )

In [85]:
add_activity_category(Dataframe, Final_DF, extinction_columns)

In [86]:
for col in extinction_columns:
    print(Final_DF[f"{col}_category"])

0         inactive
1         inactive
2         moderate
3         moderate
4         moderate
            ...   
626011    inactive
626012    inactive
626013    inactive
626014    moderate
626015    inactive
Name: EWHa_category, Length: 626016, dtype: object
0         inactive
1         inactive
2         inactive
3         inactive
4         inactive
            ...   
626011      active
626012      active
626013    inactive
626014      active
626015      active
Name: f_EWHa_category, Length: 626016, dtype: object
0         active
1         active
2         active
3         active
4         active
           ...  
626011    active
626012    active
626013    active
626014    active
626015    active
Name: e_EWHa_category, Length: 626016, dtype: object


In [87]:
reconsider_end = time.time()
reconsider_elapsed = reconsider_end - reconsider_start

---
### Variables to Drop / Ignore (Technical / redundant / non-essential)
---

In [88]:
# =============================
# 🗑️ Drop / Ignore section
# =============================
drop_start = time.time()

### 3.28) `Unnamed: 0` → old CSV index  
  Not included because it is an artifact from the original CSV and does not provide meaningful information for star recommendations.

### 3.29) `e_RA_ICRS`, `e_DE_ICRS`, `e_Plx`, `e_pmRA`, `e_pmDE` → measurement errors  
  Not included because the recommendation engine focuses on stellar properties, not the precision of their measurements.

### 3.30) `e_Gmag`, `e_BPmag`, `e_RPmag`, `e_GRVSmag` → magnitude errors  
  Not included as users do not need to see observational uncertainties; the engine uses corrected or nominal values for brightness.

### 3.31) `PQSO`, `PGal`, `Pstar`, `PWD`, `Pbin` → classification probabilities  
  Not included because the engine uses definitive star classifications rather than probabilistic estimates, simplifying recommendations for users.

### 3.32) `Flags-HS` → internal Gaia flags 
  Not included since these flags are technical details irrelevant to general users and do not influence star selection in Poppy Universe.


In [89]:
# --- put your "drop/ignore" code here ---
drop_end = time.time()
drop_elapsed = drop_end - drop_start

---
## End of ceating Final_DF
---

## 4) Final_DF check and Exportation

In [90]:
# =============================
# 🛠️ Exportation section
# =============================
Export_start = time.time()

In [91]:
# Trim all string/object columns in Final_DF
for col in Final_DF.select_dtypes(include="object").columns:
    Final_DF[col] = Final_DF[col].str.strip()

### DataFrame Equality Check

To ensure that our cleaning steps haven’t accidentally shifted rows or changed values, we created a **simple function** to compare two DataFrames.  

- Confirms that **column names and order** match  
- Confirms that **all values are identical**  
- Prints a clear message whether the DataFrames are exactly the same or not  

```python
# Example usage
check_dataframes_equal(Final_DF, Dataframe)


In [92]:
def detailed_dataframe_check(df1, df2):
    """
    Compares two DataFrames and prints column names for each category:
    - Fully matching columns
    - Columns differing but excused (_category in df2 has corresponding column in df1)
    - Columns differing unexpectedly
    - Columns in df1 missing in df2
    - Columns where all row values match ignoring dtype
    """
    matching = []
    excused = []
    not_matching = []

    # Columns in df1 but not in df2 or its _category version
    missing_in_df2 = [col for col in df1.columns if col not in df2.columns and (col + "_category") not in df2.columns]

    for col in df2.columns:
        if col in df1.columns:
            if df2[col].equals(df1[col]):
                matching.append(col)
            else:
                # Check if excused: df2 column is _category and df1 has the original column
                if col.endswith("_category") and col.replace("_category", "") in df1.columns:
                    excused.append(col)
                else:
                    not_matching.append(col)
        else:
            # Column only exists in df2
            if col.endswith("_category") and col.replace("_category", "") in df1.columns:
                excused.append(col)
            else:
                not_matching.append(col)

    # Columns where all row values match ignoring dtype differences
    fully_matching_rows = [col for col in df2.columns 
                           if col in df1.columns 
                           and (df2[col].fillna("NaN") == df1[col].fillna("NaN")).all()]

    print("✅ Fully matching columns:")
    print(matching, "\n")
    
    print("ℹ️ Differing but excused (_category columns with original in df1):")
    print(excused, "\n")
    
    print("⚠️ Differing columns (not excused):")
    print(not_matching, "\n")
    
    print("❌ Columns in df1 but not in df2:")
    print(missing_in_df2, "\n")
    
    print("🔹 Columns where all row values match (ignoring dtype differences):")
    print(fully_matching_rows, "\n")
    
    return {
        "matching": matching,
        "excused": excused,
        "not_matching": not_matching,
        "missing_in_df2": missing_in_df2,
        "fully_matching_rows": fully_matching_rows
    }


In [93]:
detailed_dataframe_check(Dataframe, Final_DF)

✅ Fully matching columns:
['Star_Name', 'Source', 'RA_ICRS', 'DE_ICRS', 'Plx', 'PM', 'pmRA', 'pmDE', 'Dist', 'Gmag', 'BPmag', 'RPmag', 'Teff', 'logg', '[Fe/H]', 'Rad', 'Lum-Flame', 'Mass-Flame', 'Age-Flame'] 

ℹ️ Differing but excused (_category columns with original in df1):
['GRVSmag_category', 'RV_category', 'z-Flame_category', 'Evol_category', 'A0_category', 'AG_category', 'ABP_category', 'ARP_category', 'E(BP-RP)_category', 'RUWE_category', 'Rad-Flame_category', 'EWHa_category', 'f_EWHa_category', 'e_EWHa_category'] 

⚠️ Differing columns (not excused):
['SpType-ELS'] 

❌ Columns in df1 but not in df2:
['orig_index', 'Unnamed: 0', 'e_RA_ICRS', 'e_DE_ICRS', 'e_Plx', 'e_pmRA', 'e_pmDE', 'e_Gmag', 'e_BPmag', 'e_RPmag', 'e_GRVSmag', 'PQSO', 'PGal', 'Pstar', 'PWD', 'Pbin', 'GMAG', 'Flags-HS'] 

🔹 Columns where all row values match (ignoring dtype differences):
['Star_Name', 'Source', 'RA_ICRS', 'DE_ICRS', 'Plx', 'PM', 'pmRA', 'pmDE', 'Dist', 'Gmag', 'BPmag', 'RPmag', 'Teff', 'logg', '[

{'matching': ['Star_Name',
  'Source',
  'RA_ICRS',
  'DE_ICRS',
  'Plx',
  'PM',
  'pmRA',
  'pmDE',
  'Dist',
  'Gmag',
  'BPmag',
  'RPmag',
  'Teff',
  'logg',
  '[Fe/H]',
  'Rad',
  'Lum-Flame',
  'Mass-Flame',
  'Age-Flame'],
 'excused': ['GRVSmag_category',
  'RV_category',
  'z-Flame_category',
  'Evol_category',
  'A0_category',
  'AG_category',
  'ABP_category',
  'ARP_category',
  'E(BP-RP)_category',
  'RUWE_category',
  'Rad-Flame_category',
  'EWHa_category',
  'f_EWHa_category',
  'e_EWHa_category'],
 'not_matching': ['SpType-ELS'],
 'missing_in_df2': ['orig_index',
  'Unnamed: 0',
  'e_RA_ICRS',
  'e_DE_ICRS',
  'e_Plx',
  'e_pmRA',
  'e_pmDE',
  'e_Gmag',
  'e_BPmag',
  'e_RPmag',
  'e_GRVSmag',
  'PQSO',
  'PGal',
  'Pstar',
  'PWD',
  'Pbin',
  'GMAG',
  'Flags-HS'],
 'fully_matching_rows': ['Star_Name',
  'Source',
  'RA_ICRS',
  'DE_ICRS',
  'Plx',
  'PM',
  'pmRA',
  'pmDE',
  'Dist',
  'Gmag',
  'BPmag',
  'RPmag',
  'Teff',
  'logg',
  '[Fe/H]',
  'Rad',
  'Lum-

In [94]:
# Export the entire Final_DF to CSV
Final_DF.to_csv("../../Data/Stars/4_Final/dataGaia2_Final.csv", index=False)

print("✅ Full Final_DF exported as 'Final_DF.csv'")


✅ Full Final_DF exported as 'Final_DF.csv'


In [95]:
# --- put your "drop/ignore" code here ---
Export_elapsed = time.time()

---
## Notebook Runtime
---

In [96]:
# =============================
# TOTAL RUNTIME
# =============================
total_end = time.time()
total_elapsed = total_end - total_start

# --- CALCULATE PERCENTAGES ---
prep_pct = (prep_elapsed / total_elapsed) * 100
keep_pct = (keep_elapsed / total_elapsed) * 100
reconsider_pct = (reconsider_elapsed / total_elapsed) * 100
drop_pct = (drop_elapsed / total_elapsed) * 100
Export_pct = (Export_elapsed / total_elapsed) * 100

# --- PRINT FINAL SUMMARY ---
print("📝 Notebook Runtime Summary")
print(f"🛠️ Preparation: {prep_elapsed:.2f} sec ({prep_pct:.2f}%)")
print(f"✅ Keep section: {keep_elapsed:.2f} sec ({keep_pct:.2f}%)")
print(f"⚠️ Reconsider section: {reconsider_elapsed:.2f} sec ({reconsider_pct:.2f}%)")
print(f"🗑️ Drop/Ignore section: {drop_elapsed:.2f} sec ({drop_pct:.2f}%)")
print(f"💾 Exportation: {Export_elapsed:.2f} sec ({Export_pct:.2f}%)")
print(f"⏱️ Total runtime: {total_elapsed:.2f} sec")
print(f"🕒 Finished at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

📝 Notebook Runtime Summary
🛠️ Preparation: 5.72 sec (13.37%)
✅ Keep section: 6.91 sec (16.14%)
⚠️ Reconsider section: 3.87 sec (9.03%)
🗑️ Drop/Ignore section: 0.01 sec (0.03%)
💾 Exportation: 1765823217.28 sec (4126438319.13%)
⏱️ Total runtime: 42.79 sec
🕒 Finished at: 2025-12-15 19:26:57
