# Assignment 1:
### - The automated_stat_analyzer Function
- Scenario: A retail company needs a utility to quickly summarize sales data. Students must create a function that identifies the 
"Central Tendency" and "Dispersion" of any numerical column.
- ### Requirements:

* Accept a Pandas DataFrame and a column name.

* Calculate the Mean, Median, and Standard Deviation .

* Identify if the data is "Skewed" by comparing the Mean and Median.


* Bonus: If the column is categorical, return the Mode instead.

### Your Data

In [4]:
import pandas as pd
import numpy as np

def automated_stat_analyzer(df, column):

    # Check column exists
    if column not in df.columns:
        raise ValueError("Column not found in DataFrame.")

    series = df[column]

    # If numeric column
    if pd.api.types.is_numeric_dtype(series):

        mean_val = series.mean()
        median_val = series.median()
        std_val = series.std()

        # Detect skew
        if mean_val > median_val:
            skew = "Right Skewed"
        elif mean_val < median_val:
            skew = "Left Skewed"
        else:
            skew = "Symmetrical"

        return {
            "Mean": mean_val,
            "Median": median_val,
            "Standard Deviation": std_val,
            "Skewness": skew
        }

    else:
        # categorical
        mode_val = series.mode().iloc[0]

        return {
            "Mode": mode_val
        }

# Example usage
data = {
    'Age': [25, 30, 35, 40, 45],        
    'Salary': [50000, 60000, 75000, 85000, 95000],
    'Department': ['IT', 'HR', 'Finance', 'Marketing', 'IT']
}
df = pd.DataFrame(data)
print(automated_stat_analyzer(df, 'Age'))
print(automated_stat_analyzer(df, 'Salary'))
print(automated_stat_analyzer(df, 'Department'))

{'Mean': 35.0, 'Median': 35.0, 'Standard Deviation': 7.905694150420948, 'Skewness': 'Symmetrical'}
{'Mean': 73000.0, 'Median': 75000.0, 'Standard Deviation': 18234.582528810468, 'Skewness': 'Left Skewed'}
{'Mode': 'IT'}


In [7]:
df_test = pd.DataFrame(data)


In [6]:
import pandas as pd

def automated_stat_analyzer(df, column):

    if column not in df.columns:
        raise ValueError("Column not found in DataFrame.")

    series = df[column].dropna()

    if pd.api.types.is_numeric_dtype(series):

        mean_val = series.mean()
        median_val = series.median()
        std_val = series.std()

        if mean_val > median_val:
            skew = "Right Skewed"
        elif mean_val < median_val:
            skew = "Left Skewed"
        else:
            skew = "Symmetrical"

        return {
            "Mean": mean_val,
            "Median": median_val,
            "Standard Deviation": std_val,
            "Skewness": skew
        }

    else:
        return {
            "Mode": series.mode().iloc[0]
        }


## Assignment 2: 
  ### The null_handling_strategy Function


#### Scenario: Incoming user data often has missing values.Students must implement a flexible strategy to handle these "Null Values" to prepare data for Machine Learning.
### Requirements:

* Check for null values in the DataFrame.

* Apply a strategy based on parameters: "drop_rows", "fill_mean", or "fill_median" .

* Ensure the function only fills numerical columns when using mean or median.

In [None]:
def null_handling_strategy(df, strategy="fill_mean"):

    # Check if there are null values
    if not df.isnull().values.any():
        return df

    cleaned_df = df.copy()

    if strategy == "drop_rows":
        cleaned_df = cleaned_df.dropna()

    elif strategy == "fill_mean":
        numeric_cols = cleaned_df.select_dtypes(include='number').columns
        cleaned_df[numeric_cols] = cleaned_df[numeric_cols].fillna(
            cleaned_df[numeric_cols].mean()
        )

    elif strategy == "fill_median":
        numeric_cols = cleaned_df.select_dtypes(include='number').columns
        cleaned_df[numeric_cols] = cleaned_df[numeric_cols].fillna(
            cleaned_df[numeric_cols].median()
        )

    else:
        raise ValueError("Invalid strategy selected.")

    return cleaned_df


In [9]:
clean_df = null_handling_strategy(df_test, "fill_mean")
print(clean_df.isnull().sum())


Age           0
Salary        0
Department    0
dtype: int64
