**1. Import Necessary Libraries**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import math

**2. To read the data into Python.**

In [None]:
college = pd.read_csv('cleaned_college_data.csv')
print(college.head())

**3. To produce a numerical summary of the variables in the data set.**

In [None]:
college.describe()

* All attributes have 777 entries, indicating **no missing values** for any of the numerical features.
* For application attribute `Apps`,
    - min = 81,   mean = 3001,    max=48094, std=3870
    - since mean is 30001 and max=48094, there must be potential **outliers** in the `Apps` attribute
* attribute `Accept`, `Enroll`, `F.Undergrad`, `P.Undergrad`, `Outstate`, `Expend` are suspected to be outliers
* Also, Attribute `Grad.Rate` has maximum value of 118 typically says that there is obvious data errors since graduation rate cannot exceed 100%

In [None]:
outlier_suspected_columns = ['Apps', 'Accept', 'Enroll', 'F.Undergrad', 'P.Undergrad', 'Outstate', 'Expend']

**4. Coefficient Of Variation**

In [None]:
print("--- Coefficient of Variation (CV) ---")
for col in outlier_suspected_columns:
    mean = college[col].mean()
    std = college[col].std()
    if mean != 0:
        cv = (std / mean) * 100
        print(f"CV for {col}: {cv:.2f}%")
    else:
        print(f"CV for {col}: Mean is zero, cannot calculate CV.")

**5. Median and Mode**

In [None]:
print("--- Median and Mode ---")
for col in outlier_suspected_columns:
    median = college[col].median()
    mode = college[col].mode()
    mean = college[col].mean()
    print(f"{col}:")
    print(f"  Mean: {mean:.2f}")
    print(f"  Median: {median:.2f}")
    # Mode can have multiple values, so print all if any
    if not mode.empty:
        print(f"  Mode: {', '.join(mode.astype(str).tolist())}")
    else:
        print("  Mode: No unique mode found (or multiple modes)")
print("\n")

**7. Skewness Coefficient**

In [None]:
print("--- Skewness Coefficient ---")
for col in outlier_suspected_columns:
    skewness = college[col].skew()
    print(f"Skewness for {col}: {skewness:.2f}")
    if skewness > 0.5:
        print(f"  - {col} is highly right-skewed.")
    elif skewness < -0.5:
        print(f"  - {col} is highly left-skewed.")
    elif skewness >= -0.5 and skewness <= 0.5:
        print(f"  - {col} is fairly symmetrical.")
    else:
        print(f"  - {col} shows moderate skewness.")

**Conclusion of 4,5,6th steps: Presence Of Potential Outliers**

Since,
* Coefficient of Variation of every attribute is higher,
* comparison of Mean, Median and Mode:
    - For `Apps`, Mean > Median > Mode
    - For `Accept`, Mean > Median > Mode
    - For `Enroll`, Mean > Median > Mode
    - For `F.Undergrad`, Mean > Median > Mode
    - For `P.Undergrad`, Mean > Median > Mode
    - For `Outstate`, Mean > Median > Mode
* Skewness Coefficient for every attribute further shows their values are higher than 0.5

We conclude that all these attributes have **potential outliers** and **right-Skewed**. Meaning a few large institutions receive a disproportionately high number.

However, for Expend, multiple values are tied for the highest frequency. This suggests that the mode is not a particularly informative measure of central tendency for this continuous-like attribute. Though, mean > median, we will ensure its skewness through visualizations (histograms and box plots).

----
**FURTHER EXPLORATIONS WILL BE IN EXPLORATORY DATA ANALYSIS (EDA)**

-----