**1. Import Necessary Libraries**

In [4]:
import pandas as pd

**2. To read the data into Python.**

In [5]:
college = pd.read_csv('cleaned_college_data.csv')
print(college.head())

                        College Private  Apps  Accept  Enroll  Top10perc  \
0  Abilene Christian University     yes  1660    1232     721         23   
1            Adelphi University     yes  2186    1924     512         16   
2                Adrian College     yes  1428    1097     336         22   
3           Agnes Scott College     yes   417     349     137         60   
4     Alaska Pacific University     yes   193     146      55         16   

   Top25perc  Fundergrad  Pundergrad  Outstate  RoomBoard  Books  Personal  \
0         52        2885         537      7440       3300    450      2200   
1         29        2683        1227     12280       6450    750      1500   
2         50        1036          99     11250       3750    400      1165   
3         89         510          63     12960       5450    450       875   
4         44         249         869      7560       4120    800      1500   

   PhD  Terminal  SFRatio  percalumni  Expend  GradRate  
0   70        78

**3. To produce a numerical summary of the variables in the data set.**

In [6]:
college.describe()

Unnamed: 0,Apps,Accept,Enroll,Top10perc,Top25perc,Fundergrad,Pundergrad,Outstate,RoomBoard,Books,Personal,PhD,Terminal,SFRatio,percalumni,Expend,GradRate
count,777.0,777.0,777.0,777.0,777.0,777.0,777.0,777.0,777.0,777.0,777.0,777.0,777.0,777.0,777.0,777.0,777.0
mean,3001.638353,2018.804376,779.972973,27.558559,55.796654,3699.907336,855.298584,10440.669241,4357.526384,549.380952,1340.642214,72.660232,79.702703,14.089704,22.743887,9660.171171,65.46332
std,3870.201484,2451.113971,929.17619,17.640364,19.804778,4850.420531,1522.431887,4023.016484,1096.696416,165.10536,677.071454,16.328155,14.722359,3.958349,12.391801,5221.76844,17.17771
min,81.0,72.0,35.0,1.0,9.0,139.0,1.0,2340.0,1780.0,96.0,250.0,8.0,24.0,2.5,0.0,3186.0,10.0
25%,776.0,604.0,242.0,15.0,41.0,992.0,95.0,7320.0,3597.0,470.0,850.0,62.0,71.0,11.5,13.0,6751.0,53.0
50%,1558.0,1110.0,434.0,23.0,54.0,1707.0,353.0,9990.0,4200.0,500.0,1200.0,75.0,82.0,13.6,21.0,8377.0,65.0
75%,3624.0,2424.0,902.0,35.0,69.0,4005.0,967.0,12925.0,5050.0,600.0,1700.0,85.0,92.0,16.5,31.0,10830.0,78.0
max,48094.0,26330.0,6392.0,96.0,100.0,31643.0,21836.0,21700.0,8124.0,2340.0,6800.0,103.0,100.0,39.8,64.0,56233.0,118.0


* All attributes have 777 entries, indicating **no missing values** for any of the numerical features.
* For application attribute `Apps`,
    - min = 81,   mean = 3001,    max=48094, std=3870
    - since mean is 30001 and max=48094, there must be potential **outliers** in the `Apps` attribute
* attribute `Accept`, `Enroll`, `Fundergrad`, `Pundergrad`, `Outstate`, `Expend` are suspected to be outliers
* Also, Attribute `GradRate` has maximum value of 118 typically says that there is obvious data errors since graduation rate cannot exceed 100%

In [7]:
outlier_suspected_columns = ['Apps', 'Accept', 'Enroll', 'Fundergrad', 'Pundergrad', 'Outstate', 'Expend']

**4. Coefficient Of Variation**

In [8]:
print("--- Coefficient of Variation (CV) ---")
for col in outlier_suspected_columns:
    mean = college[col].mean()
    std = college[col].std()
    if mean != 0:
        cv = (std / mean) * 100
        print(f"CV for {col}: {cv:.2f}%")
    else:
        print(f"CV for {col}: Mean is zero, cannot calculate CV.")

--- Coefficient of Variation (CV) ---
CV for Apps: 128.94%
CV for Accept: 121.41%
CV for Enroll: 119.13%
CV for Fundergrad: 131.10%
CV for Pundergrad: 178.00%
CV for Outstate: 38.53%
CV for Expend: 54.05%


**5. Median and Mode**

In [10]:
print("--- Median and Mode ---")
for col in outlier_suspected_columns:
    median = college[col].median()
    mode = college[col].mode()
    mean = college[col].mean()
    print(f"{col}:")
    print(f"  Mean: {mean:.2f}")
    print(f"  Median: {median:.2f}")
    if not mode.empty:
        print(f"  Mode: {', '.join(mode.astype(str).tolist())}")
    else:
        print("  Mode: No unique mode found (or multiple modes)")
print("\n")

--- Median and Mode ---
Apps:
  Mean: 3001.64
  Median: 1558.00
  Mode: 440, 663, 1006
Accept:
  Mean: 2018.80
  Median: 1110.00
  Mode: 452
Enroll:
  Mean: 779.97
  Median: 434.00
  Mode: 177, 295
Fundergrad:
  Mean: 3699.91
  Median: 1707.00
  Mode: 500, 662, 959, 1115, 1306, 1345, 1707
Pundergrad:
  Mean: 855.30
  Median: 353.00
  Mode: 30
Outstate:
  Mean: 10440.67
  Median: 9990.00
  Mode: 6550
Expend:
  Mean: 9660.17
  Median: 8377.00
  Mode: 4900, 5935, 6333, 6413, 6433, 6562, 6716, 6719, 6898, 6971, 7041, 7114, 7309, 7348, 7762, 7881, 7940, 8118, 8135, 8189, 8324, 8355, 8604, 8686, 8847, 8954, 9084, 9158, 9209, 9431, 10872, 10912, 10922




**7. Skewness Coefficient**

In [11]:
print("--- Skewness Coefficient ---")
for col in outlier_suspected_columns:
    skewness = college[col].skew()
    print(f"Skewness for {col}: {skewness:.2f}")
    if skewness > 0.5:
        print(f"  - {col} is highly right-skewed.")
    elif skewness < -0.5:
        print(f"  - {col} is highly left-skewed.")
    elif skewness >= -0.5 and skewness <= 0.5:
        print(f"  - {col} is fairly symmetrical.")
    else:
        print(f"  - {col} shows moderate skewness.")

--- Skewness Coefficient ---
Skewness for Apps: 3.72
  - Apps is highly right-skewed.
Skewness for Accept: 3.42
  - Accept is highly right-skewed.
Skewness for Enroll: 2.69
  - Enroll is highly right-skewed.
Skewness for Fundergrad: 2.61
  - Fundergrad is highly right-skewed.
Skewness for Pundergrad: 5.69
  - Pundergrad is highly right-skewed.
Skewness for Outstate: 0.51
  - Outstate is highly right-skewed.
Skewness for Expend: 3.46
  - Expend is highly right-skewed.


**Conclusion of 4,5,6th steps: Presence Of Potential Outliers**

Since,
* Coefficient of Variation of every attribute is higher,
* comparison of Mean, Median and Mode:
    - For `Apps`, Mean > Median > Mode
    - For `Accept`, Mean > Median > Mode
    - For `Enroll`, Mean > Median > Mode
    - For `Fundergrad`, Mean > Median > Mode
    - For `Pundergrad`, Mean > Median > Mode
    - For `Outstate`, Mean > Median > Mode
* Skewness Coefficient for every attribute further shows their values are higher than 0.5

We conclude that all these attributes have **potential outliers** and **right-Skewed**. Meaning a few large institutions receive a disproportionately high number.

However, for Expend, multiple values are tied for the highest frequency. This suggests that the mode is not a particularly informative measure of central tendency for this continuous-like attribute. Though, mean > median, we will ensure its skewness through visualizations (histograms and box plots).

----
**FURTHER EXPLORATIONS WILL BE IN EXPLORATORY DATA ANALYSIS (EDA)**

-----