# Assignment 1:
### - The automated_stat_analyzer Function
- Scenario: A retail company needs a utility to quickly summarize sales data. Students must create a function that identifies the 
"Central Tendency" and "Dispersion" of any numerical column.
- ### Requirements:

* Accept a Pandas DataFrame and a column name.

* Calculate the Mean, Median, and Standard Deviation .

* Identify if the data is "Skewed" by comparing the Mean and Median.


* Bonus: If the column is categorical, return the Mode instead.

### Your Data

In [2]:
import pandas as pd
import numpy as np

# Create a synthetic Company Sales Dataset
data = {
    'Transaction_ID': range(1, 11),
    'Product_Category': ['Electronics', 'Home', 'Electronics', 'Sports', 'Home', 
                         'Electronics', 'Home', 'Sports', 'Electronics', 'Electronics'],
    'Sales_Amount': [150, 200, 155, 300, 210, 180, 205, 1000, 190, 160], # 1000 is an Outlier
    'Customer_Age': [25, 34, np.nan, 45, 23, 31, 29, np.nan, 38, 40],    # Contains Nulls (NaN)
    'Rating': [5, 4, 3, 5, 2, 4, 5, 2, 4, 3]
}

df_test = pd.DataFrame(data)

# Save to CSV for students to practice loading files [cite: 74]
df_test.to_csv('company_sales_test.csv', index=False)
print("Test dataset created successfully!")


Test dataset created successfully!


In [3]:
import pandas as pd
import numpy as np

def automated_stat_analyzer(df, column_name):
    print(f"--- Analyzing: {column_name} ---")
    
    # استخراج العمود في متغير لتسهيل التعامل معه
    col_data = df[column_name]
    
    if pd.api.types.is_numeric_dtype(col_data):
        
        # TODO: احسب المقاييس الإحصائية
        mean_val = col_data.mean()
        median_val = col_data.median()
        std_val = col_data.std()
        
        print(f"Mean: {mean_val:.2f}, Median: {median_val:.2f}, Std Dev: {std_val:.2f}")
        
        # TODO: منطق الالتواء (Skewness Logic)
        if mean_val > median_val:
            status = "Right Skewed (Positively)"
        elif mean_val < median_val:
            status = "Left Skewed (Negatively)"
        else:
            status = "Symmetric (Normal Distribution)"
            
        print(f"Distribution Status: {status}")
        
    else:
        # 2. إذا لم تكن رقمية (Bonus)
        # TODO: احسب المنوال (Mode)
        # تلميح: col_data.mode() ترجع قائمة، خذ العنصر الأول [0]
        mode_val = col_data.mode()[0]
        print(f"Category Mode: {mode_val}")

    print("-" * 30)

# --- منطقة التجربة ---
# سنستخدم الـ DataFrame الذي أنشأته أنت في السؤال
automated_stat_analyzer(df_test, 'Sales_Amount')   # رقمي (فيه Outlier)
automated_stat_analyzer(df_test, 'Product_Category') # نصي (Bonus)

--- Analyzing: Sales_Amount ---
Mean: 275.00, Median: 195.00, Std Dev: 258.31
Distribution Status: Right Skewed (Positively)
------------------------------
--- Analyzing: Product_Category ---
Category Mode: Electronics
------------------------------


In [4]:
df_test.head()

Unnamed: 0,Transaction_ID,Product_Category,Sales_Amount,Customer_Age,Rating
0,1,Electronics,150,25.0,5
1,2,Home,200,34.0,4
2,3,Electronics,155,,3
3,4,Sports,300,45.0,5
4,5,Home,210,23.0,2


In [None]:
import pandas as pd
import numpy as np

def automated_stat_analyzer(df, column_name):
    print(f"--- Analyzing: {column_name} ---")
    
    col_data = df[column_name]
    
    if pd.api.types.is_numeric_dtype(col_data):
        
        mean_val = col_data.mean()
        median_val = col_data.median()
        std_val = col_data.std()
        
        print(f"Mean: {mean_val:.2f}, Median: {median_val:.2f}, Std Dev: {std_val:.2f}")
        
        if mean_val > median_val:
            status = "Right Skewed (Positively)"
        elif mean_val < median_val:
            status = "Left Skewed (Negatively)"
        else:
            status = "Symmetric (Normal Distribution)"
            
        print(f"Distribution Status: {status}")
        
    else:
        mode_val = col_data.mode()[0]
        print(f"Category Mode: {mode_val}")

    print("-" * 30)

automated_stat_analyzer(df_test, 'Sales_Amount')   # رقمي (فيه Outlier)
automated_stat_analyzer(df_test, 'Product_Category') # نصي (Bonus)

## Assignment 2: 
  ### The null_handling_strategy Function


#### Scenario: Incoming user data often has missing values.Students must implement a flexible strategy to handle these "Null Values" to prepare data for Machine Learning.
### Requirements:

* Check for null values in the DataFrame.

* Apply a strategy based on parameters: "drop_rows", "fill_mean", or "fill_median" .

* Ensure the function only fills numerical columns when using mean or median.

In [None]:
import pandas as pd
import numpy as np

def null_handling_strategy(df, strategy="fill_mean"):
    """
    Company Task: Clean a dataset by resolving missing (NaN) values.
    """
    df_clean = df.copy()
    
    print(f"--- Strategy Applied: {strategy} ---")
    print(f"Missing values before: {df_clean.isnull().sum().sum()}")

    # Strategy 1: Delete Rows (Drop)
    if strategy == "drop_rows":
        return df_clean.dropna()
    numeric_cols = df_clean.select_dtypes(include=[np.number]).columns
    
    if strategy == "fill_mean":
        df_clean[numeric_cols] = df_clean[numeric_cols].fillna(df_clean[numeric_cols].mean())
        
    elif strategy == "fill_median":
        df_clean[numeric_cols] = df_clean[numeric_cols].fillna(df_clean[numeric_cols].median())

    print(f"Missing values after: {df_clean.isnull().sum().sum()}")
    return df_clean

print("\n=== Test 1: Fill with Mean ===")
cleaned_df_mean = null_handling_strategy(df_test, strategy="fill_mean")
print(cleaned_df_mean[['Customer_Age']].head(10)) # لنرى النتيجة

print("\n=== Test 2: Fill with Median ===")
cleaned_df_median = null_handling_strategy(df_test, strategy="fill_median")
print(cleaned_df_median[['Customer_Age']].head(10))


=== Test 1: Fill with Mean ===
--- Strategy Applied: fill_mean ---
Missing values before: 2
Missing values after: 0
   Customer_Age
0        25.000
1        34.000
2        33.125
3        45.000
4        23.000
5        31.000
6        29.000
7        33.125
8        38.000
9        40.000

=== Test 2: Fill with Median ===
--- Strategy Applied: fill_median ---
Missing values before: 2
Missing values after: 0
   Customer_Age
0          25.0
1          34.0
2          32.5
3          45.0
4          23.0
5          31.0
6          29.0
7          32.5
8          38.0
9          40.0
