<h1 style='font-size: 35px; color: crimson; font-family: Colonna MT; text-align: center; font-weight: 600'>Exploring Invalid Entries Dtypes</h1>

---


Exploring invalid entries in data types involves identifying values that do not match the expected format or category within each column. This includes detecting inconsistencies such as numerical values in categorical fields, incorrect data formats, or unexpected symbols and typos. Invalid entries can lead to errors in analysis and model performance, making it essential to standardize data types and correct anomalies.

In [2]:
import pandas as pd  
import numpy as np 

pd.set_option('display.max_columns', 10) 
filepath = "../Datasets/Fertilizer and Light Exposure Experiment Dataset.csv"
df = pd.read_csv(filepath)

def simplify_dtype(dtype):
    if dtype in (int, float, np.number): return 'Numeric'
    elif np.issubdtype(dtype, np.datetime64): return 'Datetime'
    elif dtype == str: return 'String'
    elif dtype == type(None): return 'Missing'
    else: return 'Other'

def analyze_column_dtypes(df):
    all_dtypes = {'Numeric', 'Datetime', 'String', 'Missing', 'Other'}
    results = pd.DataFrame(index=df.columns, columns=list(all_dtypes), dtype=object).fillna('-')
    
    for column in df.columns:
        dtypes = df[column].apply(lambda x: simplify_dtype(type(x))).value_counts()
        percentages = (dtypes / len(df)) * 100
        for dtype, percent in percentages.items():
            if percent > 0:
                results.at[column, dtype] = f'{percent:.2f}%'  # Add % sign and format to 2 decimal places
            else:
                results.at[column, dtype] = '-'  # Add dash for 0%
    return results

results = analyze_column_dtypes(df)
display(results)

Unnamed: 0,Numeric,Missing,Other,Datetime,String
Fertilizer,-,-,-,-,100.00%
Light Exposure,-,-,-,-,100.00%
Plant Height (cm),100.00%,-,-,-,-
Leaf Area (cm²),100.00%,-,-,-,-
Chlorophyll Content (SPAD units),100.00%,-,-,-,-
Root Length (cm),100.00%,-,-,-,-
Biomass (g),100.00%,-,-,-,-
Flower Count (number),100.00%,-,-,-,-
Seed Yield (g),100.00%,-,-,-,-
Stomatal Conductance (mmol/m²/s),100.00%,-,-,-,-
