**1. Import Necessary Libraries**

In [2]:
import pandas as pd
import numpy as np

**2. To read the data into Python.**

In [3]:
college = pd.read_csv('College.csv')
print(college.head())

                     Unnamed: 0 Private  Apps  Accept  Enroll  Top10perc  \
0  Abilene Christian University     Yes  1660    1232     721         23   
1            Adelphi University     Yes  2186    1924     512         16   
2                Adrian College     Yes  1428    1097     336         22   
3           Agnes Scott College     Yes   417     349     137         60   
4     Alaska Pacific University     Yes   193     146      55         16   

   Top25perc  F.Undergrad  P.Undergrad  Outstate  Room.Board  Books  Personal  \
0         52         2885          537      7440        3300    450      2200   
1         29         2683         1227     12280        6450    750      1500   
2         50         1036           99     11250        3750    400      1165   
3         89          510           63     12960        5450    450       875   
4         44          249          869      7560        4120    800      1500   

   PhD  Terminal  S.F.Ratio  perc.alumni  Expend  Grad.R

**3. Check For any missing Values**

In [4]:
print(college.isnull().sum())

Unnamed: 0     0
Private        0
Apps           0
Accept         0
Enroll         0
Top10perc      0
Top25perc      0
F.Undergrad    0
P.Undergrad    0
Outstate       0
Room.Board     0
Books          0
Personal       0
PhD            0
Terminal       0
S.F.Ratio      0
perc.alumni    0
Expend         0
Grad.Rate      0
dtype: int64


**4. Check for any duplicate entries**

In [5]:
print(college.duplicated().sum())

0


**5. Check for Data Types**

In [6]:
print(college.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 777 entries, 0 to 776
Data columns (total 19 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Unnamed: 0   777 non-null    object 
 1   Private      777 non-null    object 
 2   Apps         777 non-null    int64  
 3   Accept       777 non-null    int64  
 4   Enroll       777 non-null    int64  
 5   Top10perc    777 non-null    int64  
 6   Top25perc    777 non-null    int64  
 7   F.Undergrad  777 non-null    int64  
 8   P.Undergrad  777 non-null    int64  
 9   Outstate     777 non-null    int64  
 10  Room.Board   777 non-null    int64  
 11  Books        777 non-null    int64  
 12  Personal     777 non-null    int64  
 13  PhD          777 non-null    int64  
 14  Terminal     777 non-null    int64  
 15  S.F.Ratio    777 non-null    float64
 16  perc.alumni  777 non-null    int64  
 17  Expend       777 non-null    int64  
 18  Grad.Rate    777 non-null    int64  
dtypes: float

**6. Standardizing data**

In [7]:
college['Private'] = college['Private'].str.lower()
college['Private'] = college['Private'].str.strip()

print(college['Private'].unique())

['yes' 'no']


**7. Renaming Columns**

In [8]:
college = college.rename ({'Unnamed: 0': 'College', 'F.Undergrad': 'Fundergrad', 'P.Undergrad': 'Pundergrad', 'Room.Board': 'RoomBoard', 'S.F.Ratio': 'SFRatio', 'perc.alumni': 'percalumni', 'Grad.Rate': 'GradRate'}, axis =1)
print(college.head())

                        College Private  Apps  Accept  Enroll  Top10perc  \
0  Abilene Christian University     yes  1660    1232     721         23   
1            Adelphi University     yes  2186    1924     512         16   
2                Adrian College     yes  1428    1097     336         22   
3           Agnes Scott College     yes   417     349     137         60   
4     Alaska Pacific University     yes   193     146      55         16   

   Top25perc  Fundergrad  Pundergrad  Outstate  RoomBoard  Books  Personal  \
0         52        2885         537      7440       3300    450      2200   
1         29        2683        1227     12280       6450    750      1500   
2         50        1036          99     11250       3750    400      1165   
3         89         510          63     12960       5450    450       875   
4         44         249         869      7560       4120    800      1500   

   PhD  Terminal  SFRatio  percalumni  Expend  GradRate  
0   70        78

**8. Dealing with Inconsistent Data**

In [9]:
numerical_cols = college.select_dtypes(include=np.number).columns
for col in numerical_cols:
    invalid = college[college[col] < 0]

    if not invalid.empty:
        print(f"Invalid (negative) values found in '{col}':\n", invalid)
        print("-" * 30)

**9. Save Cleaned Data**

In [10]:
college.to_csv('cleaned_college_data.csv', index=False)

**Conclusion:**

- No missing values.
- No duplicate records.
- Appropriate Data types for all attributes.
- Converted values of categorical attribute `Private` to lowercase and removed trailing spaces.
- Renamed columns which had invalid names & had dot in it to avoid future issues.
- No negative values in any attributes.
- saved the cleaned data.