Step 1: Data bilan tanishish
CSV faylni ochib, birinchi 10 qatorni ko‘rish.
Column nomlari va sonini tekshirish.
Har bir columndagi missing values sonini aniqlash.
Data turlari (string, numeric, date, json) haqida tushuncha hosil qilish.
Step 2: Oddiy string tozalash
Har bir string columnning bo‘sh joylarini olib tashlash (strip).
Bo‘sh stringlarni NaN yoki Null bilan almashtirish.
Step 3: Raqam va sanalarni tozalash
Age, score, GPA, attendance, money_spent kabi numeric columnlarni tozalash va int yoki float formatiga
keltirish.
Date columnlarni pandas datetime formatiga o‘tkazish.
Step 4: Email va phone validation
Emaillarni kichik harflarga o‘tkazish va noto‘g‘ri formatdagi emaillarni aniqlash.
Telefon raqamlarni tozalash va yagona standart formatga keltirish (masalan: +998...).
Step 5: JSON parsing
JSON columnlarini ochish (profile_json).
JSON ichidagi ma’lumotlarni alohida columnlarga ajratish: hobbies, skills, family, devices.
JSON ichidagi array yoki nested objectlarni tekshirish va flatten qilish.
Step 6: Address parsing
Address columnni alohida qism va key-larga ajratish: shahar, tuman, pochta kodi.
Raw address columnni saqlab, yangi columnlar hosil qilish (addr_city, addr_district, addr_postal).
Step 7: Duplicate va missing data tekshirish
Duplicate rowsni aniqlash va o‘chirish.
Muhim columnlarda missing values mavjudligini aniqlash va qaror qabul qilish (fillna yoki dropna).
Step 8: Data normalization
Gender columnini standard formatga keltirish: Male / Female / Unknown.
Course columnlarini yagona nomga keltirish: Data Science / Python / Other.
Status columnlarini standart formatga keltirish (lower-case yoki uniform).
Step 9: Final type conversion va export
Barcha columnlarning to‘g‘ri type da ekanligini tekshirish: string, int, float, datetime.
Sanalarni yagona formatga keltirish (YYYY-MM-DD HH:MM:SS).
Tozalangan data CSV faylga saqlash (super_dirty_students_cleaned.csv).
Step 10: QA checks
Original va cleaned row sonini solishtirish.
Missing email va phone sonini tekshirish.
Numeric columnlar (GPA, attendance, score) qiymatlari to‘g‘ri diapazonda ekanligini tekshirish.
Duplicate rowlar yo‘qligini tasdiqlash.

In [20]:
import numpy as np
import pandas as pd
df=pd.read_csv('super_dirty_students.csv')
print(df.columns)
print(df.shape)
df.head(10)


(1000, 18)


Unnamed: 0,student_id,name,age,gender,score,phone,city,email,date_of_join,course,attendance,status,gpa,remarks,money_spent,event_time,address_raw,profile_json
0,1,Claudia Short,20,,,+1-619-379-4152x102,Katieland,someonegmail.com,1662247364,Data Science,,active,3.72,good,$135,1629312830,"Apartment 37, South Kevin district, Tashkent, ...","{'hobbies': ['gun', 'nice'], 'skills': {'tech'..."
1,2,,20,Female,ninety,,Dawnburgh,psmith@chen.com,2017/08/29,Data Science,,active,1.88,excellent,$152,11/10/2001 04:19 AM,UZ 100332 Tashkent South Patricia,"{hobbies:['against', 'good']}"
2,3,Kathryn Moyer,20,,ninety,,Lake Stevenmouth,,2017-08-14,DATA SCIENCE,,pending,,excellent,185.0,1657837622,"Wendyshire 12-kv, dom 1, Tashkent","{'hobbies': ['fast', 'clearly'], 'skills': {'t..."
3,4,Ruben Wilson,twenty,fmale,81,,Port Pamelafort,special,1973-09-17,data-sciens,,inactive,0,good,175 USD,1682795130,"Apartment 16, North Tamara district, Tashkent,...","{'hobbies': ['left', 'role'], 'skills': {'tech..."
4,5,Robert Pruitt,20,Female,,001-182-659-5631x02803,Kingburgh,six,2023/11/25,data_sciense,,pending,4.37,average,$282,2013-10-13,"BROKEN,ADDRESS,DATA,,,","{'hobbies': ['sort', 'science'], 'skills': {'t..."
5,6,David Martinez,20,Male,ninety,001-074-828-6016x937,Hugheshaven,nathangibbs@hotmail.com,1603493647,DATA SCIENCE,110,pending,15.5,average,8500,1983-04-30T01:46:36,"Apartment 42, North Jamesfurt district, Tashke...","{hobbies:['order', 'keep']}"
6,7,,18 years,FEMALE,71,314.018.0928x3134,Sharpfurt,nguyenlisa@@horn.org,1982-01-16,data science,,active,0,excellent,275.0,19/11/2004 11:42 PM,"Lake Tracishire 29-kv, dom 8, Tashkent","{'hobbies': ['fall', 'teacher'], 'skills': {'t..."
7,8,Nicholas Dennis,21,FEMALE,ninety,8091556011,Matthewfort,avoid,2000.06.26,python,63,active,four point five,excellent,135,1978/09/23,"70576 Butler Harbor Suite 847, Janetberg",INVALID_JSON_DATA
8,9,,twenty,fmale,77,854-964-5423,South Paulchester,stacysmith@taylor.biz,2000-03-15,ds,110%,active,15.5,excellent,104 USD,1995-10-20,"Williamsborough 2-kv, dom 18, Tashkent","{'hobbies': ['require', 'open'], 'skills': {'t..."
9,10,Elizabeth Villegas,22,femlae,,(205)467-4476x0583,South Dianatown,anthonywalker@@gmail.com,2013-02-11T07:12:29,data_sciense,99,inactive,37,average,21500,2014-12-16T06:44:51,"BROKEN,ADDRESS,DATA,,,","{'hobbies': ['management', 'to'], 'skills': {'..."


In [3]:
df.isnull().sum()

student_id        0
name            335
age             108
gender          219
score           246
phone           371
city              0
email           151
date_of_join      0
course            0
attendance      326
status            0
gpa             141
remarks         204
money_spent       0
event_time        0
address_raw       0
profile_json      0
dtype: int64

In [24]:
columns=df.select_dtypes(include=[object]).columns
for column in columns:
    df[column].str.strip()
df

Unnamed: 0,student_id,name,age,gender,score,phone,city,email,date_of_join,course,attendance,status,gpa,remarks,money_spent,event_time,address_raw,profile_json
0,1,Claudia Short,20,,,+1-619-379-4152x102,Katieland,someonegmail.com,1662247364,Data Science,,active,3.72,good,$135,1629312830,"Apartment 37, South Kevin district, Tashkent, ...","{'hobbies': ['gun', 'nice'], 'skills': {'tech'..."
1,2,,20,Female,ninety,,Dawnburgh,psmith@chen.com,2017/08/29,Data Science,,active,1.88,excellent,$152,11/10/2001 04:19 AM,UZ 100332 Tashkent South Patricia,"{hobbies:['against', 'good']}"
2,3,Kathryn Moyer,20,,ninety,,Lake Stevenmouth,,2017-08-14,DATA SCIENCE,,pending,,excellent,185.0,1657837622,"Wendyshire 12-kv, dom 1, Tashkent","{'hobbies': ['fast', 'clearly'], 'skills': {'t..."
3,4,Ruben Wilson,twenty,fmale,81,,Port Pamelafort,special,1973-09-17,data-sciens,,inactive,0,good,175 USD,1682795130,"Apartment 16, North Tamara district, Tashkent,...","{'hobbies': ['left', 'role'], 'skills': {'tech..."
4,5,Robert Pruitt,20,Female,,001-182-659-5631x02803,Kingburgh,six,2023/11/25,data_sciense,,pending,4.37,average,$282,2013-10-13,"BROKEN,ADDRESS,DATA,,,","{'hobbies': ['sort', 'science'], 'skills': {'t..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,Gary Sawyer,22,fmale,,,Gabriellefurt,amy97@woodard.net,1992.11.13,DATA SCIENCE,,inactive,,excellent,$64,14/10/2007 06:53 PM,"Apartment 12, Laurenhaven district, Tashkent, ...","{'hobbies': ['collection', 'his'], 'skills': {..."
996,997,Scott Clark,20,,,001-205-065-8737,Rogerborough,katiepage@hensley.com,2024.10.05,DATA SCIENCE,,pending,37,,143,2021-06-21T07:46:27,"Patrickmouth 4-kv, dom 6, Tashkent",INVALID_JSON_DATA
997,998,,20,MALE,,264-577-8585,South Mary,someonegmail.com,19/11/2021 02:23 AM,data_sciense,95,inactive,-2,excellent,79.0,1975/05/11,"Apartment 38, New Brendan district, Tashkent, ...","{'hobbies': ['street', 'read'], 'skills': {'te..."
998,999,Brittany Barrett,21,FEMALE,86,001-663-118-1327x207,Port Rachael,richardgreen@@shannon-jenkins.info,1625723463,data_sciense,110%,active,3.06,average,23800,2015.05.24,"25998 Martinez Grove Apt. 473, West Scott","{'hobbies': ['game', 'total'], 'skills': {'tec..."


In [25]:
df.fillna('Nan')
df

Unnamed: 0,student_id,name,age,gender,score,phone,city,email,date_of_join,course,attendance,status,gpa,remarks,money_spent,event_time,address_raw,profile_json
0,1,Claudia Short,20,,,+1-619-379-4152x102,Katieland,someonegmail.com,1662247364,Data Science,,active,3.72,good,$135,1629312830,"Apartment 37, South Kevin district, Tashkent, ...","{'hobbies': ['gun', 'nice'], 'skills': {'tech'..."
1,2,,20,Female,ninety,,Dawnburgh,psmith@chen.com,2017/08/29,Data Science,,active,1.88,excellent,$152,11/10/2001 04:19 AM,UZ 100332 Tashkent South Patricia,"{hobbies:['against', 'good']}"
2,3,Kathryn Moyer,20,,ninety,,Lake Stevenmouth,,2017-08-14,DATA SCIENCE,,pending,,excellent,185.0,1657837622,"Wendyshire 12-kv, dom 1, Tashkent","{'hobbies': ['fast', 'clearly'], 'skills': {'t..."
3,4,Ruben Wilson,twenty,fmale,81,,Port Pamelafort,special,1973-09-17,data-sciens,,inactive,0,good,175 USD,1682795130,"Apartment 16, North Tamara district, Tashkent,...","{'hobbies': ['left', 'role'], 'skills': {'tech..."
4,5,Robert Pruitt,20,Female,,001-182-659-5631x02803,Kingburgh,six,2023/11/25,data_sciense,,pending,4.37,average,$282,2013-10-13,"BROKEN,ADDRESS,DATA,,,","{'hobbies': ['sort', 'science'], 'skills': {'t..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,Gary Sawyer,22,fmale,,,Gabriellefurt,amy97@woodard.net,1992.11.13,DATA SCIENCE,,inactive,,excellent,$64,14/10/2007 06:53 PM,"Apartment 12, Laurenhaven district, Tashkent, ...","{'hobbies': ['collection', 'his'], 'skills': {..."
996,997,Scott Clark,20,,,001-205-065-8737,Rogerborough,katiepage@hensley.com,2024.10.05,DATA SCIENCE,,pending,37,,143,2021-06-21T07:46:27,"Patrickmouth 4-kv, dom 6, Tashkent",INVALID_JSON_DATA
997,998,,20,MALE,,264-577-8585,South Mary,someonegmail.com,19/11/2021 02:23 AM,data_sciense,95,inactive,-2,excellent,79.0,1975/05/11,"Apartment 38, New Brendan district, Tashkent, ...","{'hobbies': ['street', 'read'], 'skills': {'te..."
998,999,Brittany Barrett,21,FEMALE,86,001-663-118-1327x207,Port Rachael,richardgreen@@shannon-jenkins.info,1625723463,data_sciense,110%,active,3.06,average,23800,2015.05.24,"25998 Martinez Grove Apt. 473, West Scott","{'hobbies': ['game', 'total'], 'skills': {'tec..."


In [39]:
df['age']=df['age'].str.replace('twenty','20')
df['age']=df['age'].str.replace('years','')
df['age']=df['age'].str.replace('NaN','')
df['age']=df['age'].astype(float)
df

Unnamed: 0,student_id,name,age,gender,score,phone,city,email,date_of_join,course,attendance,status,gpa,remarks,money_spent,event_time,address_raw,profile_json
0,1,Claudia Short,20.0,,,+1-619-379-4152x102,Katieland,someonegmail.com,1662247364,Data Science,,active,3.72,good,$135,1629312830,"Apartment 37, South Kevin district, Tashkent, ...","{'hobbies': ['gun', 'nice'], 'skills': {'tech'..."
1,2,,20.0,Female,ninety,,Dawnburgh,psmith@chen.com,2017/08/29,Data Science,,active,1.88,excellent,$152,11/10/2001 04:19 AM,UZ 100332 Tashkent South Patricia,"{hobbies:['against', 'good']}"
2,3,Kathryn Moyer,20.0,,ninety,,Lake Stevenmouth,,2017-08-14,DATA SCIENCE,,pending,,excellent,185.0,1657837622,"Wendyshire 12-kv, dom 1, Tashkent","{'hobbies': ['fast', 'clearly'], 'skills': {'t..."
3,4,Ruben Wilson,20.0,fmale,81,,Port Pamelafort,special,1973-09-17,data-sciens,,inactive,0,good,175 USD,1682795130,"Apartment 16, North Tamara district, Tashkent,...","{'hobbies': ['left', 'role'], 'skills': {'tech..."
4,5,Robert Pruitt,20.0,Female,,001-182-659-5631x02803,Kingburgh,six,2023/11/25,data_sciense,,pending,4.37,average,$282,2013-10-13,"BROKEN,ADDRESS,DATA,,,","{'hobbies': ['sort', 'science'], 'skills': {'t..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,Gary Sawyer,22.0,fmale,,,Gabriellefurt,amy97@woodard.net,1992.11.13,DATA SCIENCE,,inactive,,excellent,$64,14/10/2007 06:53 PM,"Apartment 12, Laurenhaven district, Tashkent, ...","{'hobbies': ['collection', 'his'], 'skills': {..."
996,997,Scott Clark,20.0,,,001-205-065-8737,Rogerborough,katiepage@hensley.com,2024.10.05,DATA SCIENCE,,pending,37,,143,2021-06-21T07:46:27,"Patrickmouth 4-kv, dom 6, Tashkent",INVALID_JSON_DATA
997,998,,20.0,MALE,,264-577-8585,South Mary,someonegmail.com,19/11/2021 02:23 AM,data_sciense,95,inactive,-2,excellent,79.0,1975/05/11,"Apartment 38, New Brendan district, Tashkent, ...","{'hobbies': ['street', 'read'], 'skills': {'te..."
998,999,Brittany Barrett,21.0,FEMALE,86,001-663-118-1327x207,Port Rachael,richardgreen@@shannon-jenkins.info,1625723463,data_sciense,110%,active,3.06,average,23800,2015.05.24,"25998 Martinez Grove Apt. 473, West Scott","{'hobbies': ['game', 'total'], 'skills': {'tec..."


In [41]:
df['score']=df['score'].str.replace('ninety','90')
df['score']=df['score'].str.replace('Nan','')
df['score']=df['score'].astype(float)
df

Unnamed: 0,student_id,name,age,gender,score,phone,city,email,date_of_join,course,attendance,status,gpa,remarks,money_spent,event_time,address_raw,profile_json
0,1,Claudia Short,20.0,,,+1-619-379-4152x102,Katieland,someonegmail.com,1662247364,Data Science,,active,3.72,good,$135,1629312830,"Apartment 37, South Kevin district, Tashkent, ...","{'hobbies': ['gun', 'nice'], 'skills': {'tech'..."
1,2,,20.0,Female,90.0,,Dawnburgh,psmith@chen.com,2017/08/29,Data Science,,active,1.88,excellent,$152,11/10/2001 04:19 AM,UZ 100332 Tashkent South Patricia,"{hobbies:['against', 'good']}"
2,3,Kathryn Moyer,20.0,,90.0,,Lake Stevenmouth,,2017-08-14,DATA SCIENCE,,pending,,excellent,185.0,1657837622,"Wendyshire 12-kv, dom 1, Tashkent","{'hobbies': ['fast', 'clearly'], 'skills': {'t..."
3,4,Ruben Wilson,20.0,fmale,81.0,,Port Pamelafort,special,1973-09-17,data-sciens,,inactive,0,good,175 USD,1682795130,"Apartment 16, North Tamara district, Tashkent,...","{'hobbies': ['left', 'role'], 'skills': {'tech..."
4,5,Robert Pruitt,20.0,Female,,001-182-659-5631x02803,Kingburgh,six,2023/11/25,data_sciense,,pending,4.37,average,$282,2013-10-13,"BROKEN,ADDRESS,DATA,,,","{'hobbies': ['sort', 'science'], 'skills': {'t..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,Gary Sawyer,22.0,fmale,,,Gabriellefurt,amy97@woodard.net,1992.11.13,DATA SCIENCE,,inactive,,excellent,$64,14/10/2007 06:53 PM,"Apartment 12, Laurenhaven district, Tashkent, ...","{'hobbies': ['collection', 'his'], 'skills': {..."
996,997,Scott Clark,20.0,,,001-205-065-8737,Rogerborough,katiepage@hensley.com,2024.10.05,DATA SCIENCE,,pending,37,,143,2021-06-21T07:46:27,"Patrickmouth 4-kv, dom 6, Tashkent",INVALID_JSON_DATA
997,998,,20.0,MALE,,264-577-8585,South Mary,someonegmail.com,19/11/2021 02:23 AM,data_sciense,95,inactive,-2,excellent,79.0,1975/05/11,"Apartment 38, New Brendan district, Tashkent, ...","{'hobbies': ['street', 'read'], 'skills': {'te..."
998,999,Brittany Barrett,21.0,FEMALE,86.0,001-663-118-1327x207,Port Rachael,richardgreen@@shannon-jenkins.info,1625723463,data_sciense,110%,active,3.06,average,23800,2015.05.24,"25998 Martinez Grove Apt. 473, West Scott","{'hobbies': ['game', 'total'], 'skills': {'tec..."


In [42]:
df['attendance']=df['attendance'].str.strip('%')
df['attendance']=df['attendance'].str.replace('Nan','')
df['attendance']=df['attendance'].astype(float)
df

Unnamed: 0,student_id,name,age,gender,score,phone,city,email,date_of_join,course,attendance,status,gpa,remarks,money_spent,event_time,address_raw,profile_json
0,1,Claudia Short,20.0,,,+1-619-379-4152x102,Katieland,someonegmail.com,1662247364,Data Science,,active,3.72,good,$135,1629312830,"Apartment 37, South Kevin district, Tashkent, ...","{'hobbies': ['gun', 'nice'], 'skills': {'tech'..."
1,2,,20.0,Female,90.0,,Dawnburgh,psmith@chen.com,2017/08/29,Data Science,,active,1.88,excellent,$152,11/10/2001 04:19 AM,UZ 100332 Tashkent South Patricia,"{hobbies:['against', 'good']}"
2,3,Kathryn Moyer,20.0,,90.0,,Lake Stevenmouth,,2017-08-14,DATA SCIENCE,,pending,,excellent,185.0,1657837622,"Wendyshire 12-kv, dom 1, Tashkent","{'hobbies': ['fast', 'clearly'], 'skills': {'t..."
3,4,Ruben Wilson,20.0,fmale,81.0,,Port Pamelafort,special,1973-09-17,data-sciens,,inactive,0,good,175 USD,1682795130,"Apartment 16, North Tamara district, Tashkent,...","{'hobbies': ['left', 'role'], 'skills': {'tech..."
4,5,Robert Pruitt,20.0,Female,,001-182-659-5631x02803,Kingburgh,six,2023/11/25,data_sciense,,pending,4.37,average,$282,2013-10-13,"BROKEN,ADDRESS,DATA,,,","{'hobbies': ['sort', 'science'], 'skills': {'t..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,Gary Sawyer,22.0,fmale,,,Gabriellefurt,amy97@woodard.net,1992.11.13,DATA SCIENCE,,inactive,,excellent,$64,14/10/2007 06:53 PM,"Apartment 12, Laurenhaven district, Tashkent, ...","{'hobbies': ['collection', 'his'], 'skills': {..."
996,997,Scott Clark,20.0,,,001-205-065-8737,Rogerborough,katiepage@hensley.com,2024.10.05,DATA SCIENCE,,pending,37,,143,2021-06-21T07:46:27,"Patrickmouth 4-kv, dom 6, Tashkent",INVALID_JSON_DATA
997,998,,20.0,MALE,,264-577-8585,South Mary,someonegmail.com,19/11/2021 02:23 AM,data_sciense,95.0,inactive,-2,excellent,79.0,1975/05/11,"Apartment 38, New Brendan district, Tashkent, ...","{'hobbies': ['street', 'read'], 'skills': {'te..."
998,999,Brittany Barrett,21.0,FEMALE,86.0,001-663-118-1327x207,Port Rachael,richardgreen@@shannon-jenkins.info,1625723463,data_sciense,110.0,active,3.06,average,23800,2015.05.24,"25998 Martinez Grove Apt. 473, West Scott","{'hobbies': ['game', 'total'], 'skills': {'tec..."


In [44]:
df['gpa']=df['gpa'].str.replace('four point five','4.5')
df['gpa']=df['gpa'].str.replace('NaN','')
df['gpa']=df['gpa'].str.replace(',','.')
df['gpa']=df['gpa'].astype(float)
df

Unnamed: 0,student_id,name,age,gender,score,phone,city,email,date_of_join,course,attendance,status,gpa,remarks,money_spent,event_time,address_raw,profile_json
0,1,Claudia Short,20.0,,,+1-619-379-4152x102,Katieland,someonegmail.com,1662247364,Data Science,,active,3.72,good,$135,1629312830,"Apartment 37, South Kevin district, Tashkent, ...","{'hobbies': ['gun', 'nice'], 'skills': {'tech'..."
1,2,,20.0,Female,90.0,,Dawnburgh,psmith@chen.com,2017/08/29,Data Science,,active,1.88,excellent,$152,11/10/2001 04:19 AM,UZ 100332 Tashkent South Patricia,"{hobbies:['against', 'good']}"
2,3,Kathryn Moyer,20.0,,90.0,,Lake Stevenmouth,,2017-08-14,DATA SCIENCE,,pending,,excellent,185.0,1657837622,"Wendyshire 12-kv, dom 1, Tashkent","{'hobbies': ['fast', 'clearly'], 'skills': {'t..."
3,4,Ruben Wilson,20.0,fmale,81.0,,Port Pamelafort,special,1973-09-17,data-sciens,,inactive,0.00,good,175 USD,1682795130,"Apartment 16, North Tamara district, Tashkent,...","{'hobbies': ['left', 'role'], 'skills': {'tech..."
4,5,Robert Pruitt,20.0,Female,,001-182-659-5631x02803,Kingburgh,six,2023/11/25,data_sciense,,pending,4.37,average,$282,2013-10-13,"BROKEN,ADDRESS,DATA,,,","{'hobbies': ['sort', 'science'], 'skills': {'t..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,Gary Sawyer,22.0,fmale,,,Gabriellefurt,amy97@woodard.net,1992.11.13,DATA SCIENCE,,inactive,,excellent,$64,14/10/2007 06:53 PM,"Apartment 12, Laurenhaven district, Tashkent, ...","{'hobbies': ['collection', 'his'], 'skills': {..."
996,997,Scott Clark,20.0,,,001-205-065-8737,Rogerborough,katiepage@hensley.com,2024.10.05,DATA SCIENCE,,pending,3.70,,143,2021-06-21T07:46:27,"Patrickmouth 4-kv, dom 6, Tashkent",INVALID_JSON_DATA
997,998,,20.0,MALE,,264-577-8585,South Mary,someonegmail.com,19/11/2021 02:23 AM,data_sciense,95.0,inactive,-2.00,excellent,79.0,1975/05/11,"Apartment 38, New Brendan district, Tashkent, ...","{'hobbies': ['street', 'read'], 'skills': {'te..."
998,999,Brittany Barrett,21.0,FEMALE,86.0,001-663-118-1327x207,Port Rachael,richardgreen@@shannon-jenkins.info,1625723463,data_sciense,110.0,active,3.06,average,23800,2015.05.24,"25998 Martinez Grove Apt. 473, West Scott","{'hobbies': ['game', 'total'], 'skills': {'tec..."


In [None]:
df['money_spent']=df['money_spent'].str.strip('$')
df['money_spent']=df['money_spent'].str.replace(',','.')
df['money_spent']=df['money_spent'].str.replace('USD','')
df['money_spent']=df['money_spent'].str.replace('Nan','')
df['money_spent']=df['money_spent'].astype(float)
df

Unnamed: 0,student_id,name,age,gender,score,phone,city,email,date_of_join,course,attendance,status,gpa,remarks,money_spent,event_time,address_raw,profile_json
0,1,Claudia Short,20.0,,,+1-619-379-4152x102,Katieland,someonegmail.com,1662247364,Data Science,,active,3.72,good,135.0,1629312830,"Apartment 37, South Kevin district, Tashkent, ...","{'hobbies': ['gun', 'nice'], 'skills': {'tech'..."
1,2,,20.0,Female,90.0,,Dawnburgh,psmith@chen.com,2017/08/29,Data Science,,active,1.88,excellent,152.0,11/10/2001 04:19 AM,UZ 100332 Tashkent South Patricia,"{hobbies:['against', 'good']}"
2,3,Kathryn Moyer,20.0,,90.0,,Lake Stevenmouth,,2017-08-14,DATA SCIENCE,,pending,,excellent,185.0,1657837622,"Wendyshire 12-kv, dom 1, Tashkent","{'hobbies': ['fast', 'clearly'], 'skills': {'t..."
3,4,Ruben Wilson,20.0,fmale,81.0,,Port Pamelafort,special,1973-09-17,data-sciens,,inactive,0.00,good,175.0,1682795130,"Apartment 16, North Tamara district, Tashkent,...","{'hobbies': ['left', 'role'], 'skills': {'tech..."
4,5,Robert Pruitt,20.0,Female,,001-182-659-5631x02803,Kingburgh,six,2023/11/25,data_sciense,,pending,4.37,average,282.0,2013-10-13,"BROKEN,ADDRESS,DATA,,,","{'hobbies': ['sort', 'science'], 'skills': {'t..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,Gary Sawyer,22.0,fmale,,,Gabriellefurt,amy97@woodard.net,1992.11.13,DATA SCIENCE,,inactive,,excellent,64.0,14/10/2007 06:53 PM,"Apartment 12, Laurenhaven district, Tashkent, ...","{'hobbies': ['collection', 'his'], 'skills': {..."
996,997,Scott Clark,20.0,,,001-205-065-8737,Rogerborough,katiepage@hensley.com,2024.10.05,DATA SCIENCE,,pending,3.70,,143.0,2021-06-21T07:46:27,"Patrickmouth 4-kv, dom 6, Tashkent",INVALID_JSON_DATA
997,998,,20.0,MALE,,264-577-8585,South Mary,someonegmail.com,19/11/2021 02:23 AM,data_sciense,95.0,inactive,-2.00,excellent,79.0,1975/05/11,"Apartment 38, New Brendan district, Tashkent, ...","{'hobbies': ['street', 'read'], 'skills': {'te..."
998,999,Brittany Barrett,21.0,FEMALE,86.0,001-663-118-1327x207,Port Rachael,richardgreen@@shannon-jenkins.info,1625723463,data_sciense,110.0,active,3.06,average,238.0,2015.05.24,"25998 Martinez Grove Apt. 473, West Scott","{'hobbies': ['game', 'total'], 'skills': {'tec..."


In [None]:
def convert(value):
    try:
        return pd.to_datetime(float(value),unit='s')
    except:
        return pd.to_datetime(value)
df['date_of_join']=df['date_of_join'].apply(convert)
df['event_time']=df['event_time'].apply(convert)
df

In [None]:
df['email']=df['email'].str.lower()
df

In [113]:
df['new_phone']=df['phone'].str.split('x').str[0]
df['new_phone']=df['new_phone'].str.replace(r'\D','',regex=True)
df['new_phone']=df['new_phone'].str[-10:]
df['phone']='+1-'+df['new_phone'].str[0:3]+'-'+df['new_phone'].str[3:6]+'-'+df['new_phone'].str[6:]
df

Unnamed: 0,student_id,name,age,gender,score,phone,city,email,date_of_join,course,...,money_spent,event_time,address_raw,profile_json,add_house,add_district,add_city,add_country,add_postal,new_phone
0,1,Claudia Short,20.0,unknown,,+1-619-379-4152,Katieland,someonegmail.com,2022-09-03 23:22:44,Data Science,...,135.0,2021-08-18 18:53:50,"Apartment 37, South Kevin district, Tashkent, ...","{'hobbies': ['gun', 'nice'], 'skills': {'tech'...",Apartment 37,South Kevin district,Tashkent,UZ,100539,6193794152
1,2,,20.0,Female,90.0,,Dawnburgh,psmith@chen.com,2017-08-29 00:00:00,Data Science,...,152.0,2001-11-10 04:19:00,UZ 100332 Tashkent South Patricia,"{hobbies:['against', 'good']}",UZ 100332 Tashkent South Patricia,,,,,
2,3,Kathryn Moyer,20.0,unknown,90.0,,Lake Stevenmouth,,2017-08-14 00:00:00,Data Science,...,185.0,2022-07-14 22:27:02,"Wendyshire 12-kv, dom 1, Tashkent","{'hobbies': ['fast', 'clearly'], 'skills': {'t...",Wendyshire 12-kv,dom 1,Tashkent,,,
3,4,Ruben Wilson,20.0,Female,81.0,,Port Pamelafort,special,1973-09-17 00:00:00,Data Science,...,175.0,2023-04-29 19:05:30,"Apartment 16, North Tamara district, Tashkent,...","{'hobbies': ['left', 'role'], 'skills': {'tech...",Apartment 16,North Tamara district,Tashkent,UZ,100097,
4,5,Robert Pruitt,20.0,Female,,+1-182-659-5631,Kingburgh,six,2023-11-25 00:00:00,Data Science,...,282.0,2013-10-13 00:00:00,,"{'hobbies': ['sort', 'science'], 'skills': {'t...",,,,,,1826595631
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,Gary Sawyer,22.0,Female,,,Gabriellefurt,amy97@woodard.net,1992-11-13 00:00:00,Data Science,...,64.0,2007-10-14 18:53:00,"Apartment 12, Laurenhaven district, Tashkent, ...","{'hobbies': ['collection', 'his'], 'skills': {...",Apartment 12,Laurenhaven district,Tashkent,UZ,100254,
996,997,Scott Clark,20.0,unknown,,+1-205-065-8737,Rogerborough,katiepage@hensley.com,2024-10-05 00:00:00,Data Science,...,143.0,2021-06-21 07:46:27,"Patrickmouth 4-kv, dom 6, Tashkent",INVALID_JSON_DATA,Patrickmouth 4-kv,dom 6,Tashkent,,,2050658737
997,998,,20.0,Male,,+1-264-577-8585,South Mary,someonegmail.com,2021-11-19 02:23:00,Data Science,...,79.0,1975-05-11 00:00:00,"Apartment 38, New Brendan district, Tashkent, ...","{'hobbies': ['street', 'read'], 'skills': {'te...",Apartment 38,New Brendan district,Tashkent,UZ,100394,2645778585
998,999,Brittany Barrett,21.0,Female,86.0,+1-663-118-1327,Port Rachael,richardgreen@@shannon-jenkins.info,2021-07-08 05:51:03,Data Science,...,238.0,2015-05-24 00:00:00,"25998 Martinez Grove Apt. 473, West Scott","{'hobbies': ['game', 'total'], 'skills': {'tec...",25998 Martinez Grove Apt. 473,West Scott,,,,6631181327


Unnamed: 0,student_id,name,age,gender,score,phone,city,email,date_of_join,course,...,remarks,money_spent,event_time,address_raw,profile_json,add_house,add_district,add_city,add_country,add_postal
0,1,Claudia Short,20.0,unknown,,+1-619-379-4152x102,Katieland,someonegmail.com,2022-09-03 23:22:44,Data Science,...,good,135.0,2021-08-18 18:53:50,"Apartment 37, South Kevin district, Tashkent, ...","{'hobbies': ['gun', 'nice'], 'skills': {'tech'...",Apartment 37,South Kevin district,Tashkent,UZ,100539
1,2,,20.0,Female,90.0,,Dawnburgh,psmith@chen.com,2017-08-29 00:00:00,Data Science,...,excellent,152.0,2001-11-10 04:19:00,UZ 100332 Tashkent South Patricia,"{hobbies:['against', 'good']}",UZ 100332 Tashkent South Patricia,,,,
2,3,Kathryn Moyer,20.0,unknown,90.0,,Lake Stevenmouth,,2017-08-14 00:00:00,Data Science,...,excellent,185.0,2022-07-14 22:27:02,"Wendyshire 12-kv, dom 1, Tashkent","{'hobbies': ['fast', 'clearly'], 'skills': {'t...",Wendyshire 12-kv,dom 1,Tashkent,,
3,4,Ruben Wilson,20.0,Female,81.0,,Port Pamelafort,special,1973-09-17 00:00:00,Data Science,...,good,175.0,2023-04-29 19:05:30,"Apartment 16, North Tamara district, Tashkent,...","{'hobbies': ['left', 'role'], 'skills': {'tech...",Apartment 16,North Tamara district,Tashkent,UZ,100097
4,5,Robert Pruitt,20.0,Female,,001-182-659-5631x02803,Kingburgh,six,2023-11-25 00:00:00,Data Science,...,average,282.0,2013-10-13 00:00:00,,"{'hobbies': ['sort', 'science'], 'skills': {'t...",,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,Gary Sawyer,22.0,Female,,,Gabriellefurt,amy97@woodard.net,1992-11-13 00:00:00,Data Science,...,excellent,64.0,2007-10-14 18:53:00,"Apartment 12, Laurenhaven district, Tashkent, ...","{'hobbies': ['collection', 'his'], 'skills': {...",Apartment 12,Laurenhaven district,Tashkent,UZ,100254
996,997,Scott Clark,20.0,unknown,,001-205-065-8737,Rogerborough,katiepage@hensley.com,2024-10-05 00:00:00,Data Science,...,,143.0,2021-06-21 07:46:27,"Patrickmouth 4-kv, dom 6, Tashkent",INVALID_JSON_DATA,Patrickmouth 4-kv,dom 6,Tashkent,,
997,998,,20.0,Male,,264-577-8585,South Mary,someonegmail.com,2021-11-19 02:23:00,Data Science,...,excellent,79.0,1975-05-11 00:00:00,"Apartment 38, New Brendan district, Tashkent, ...","{'hobbies': ['street', 'read'], 'skills': {'te...",Apartment 38,New Brendan district,Tashkent,UZ,100394
998,999,Brittany Barrett,21.0,Female,86.0,001-663-118-1327x207,Port Rachael,richardgreen@@shannon-jenkins.info,2021-07-08 05:51:03,Data Science,...,average,238.0,2015-05-24 00:00:00,"25998 Martinez Grove Apt. 473, West Scott","{'hobbies': ['game', 'total'], 'skills': {'tec...",25998 Martinez Grove Apt. 473,West Scott,,,


In [68]:
df['address_raw']=df['address_raw'].str.strip(',')
df['address_raw']=df['address_raw'].str.replace('BROKEN,ADDRESS,DATA','')
df[['add_house','add_district','add_city','add_country','add_postal']]=df['address_raw'].str.split(',',expand=True)
df

Unnamed: 0,student_id,name,age,gender,score,phone,city,email,date_of_join,course,...,remarks,money_spent,event_time,address_raw,profile_json,add_house,add_district,add_city,add_country,add_postal
0,1,Claudia Short,20.0,,,+1-619-379-4152x102,Katieland,someonegmail.com,2022-09-03 23:22:44,Data Science,...,good,135.0,2021-08-18 18:53:50,"Apartment 37, South Kevin district, Tashkent, ...","{'hobbies': ['gun', 'nice'], 'skills': {'tech'...",Apartment 37,South Kevin district,Tashkent,UZ,100539
1,2,,20.0,Female,90.0,,Dawnburgh,psmith@chen.com,2017-08-29 00:00:00,Data Science,...,excellent,152.0,2001-11-10 04:19:00,UZ 100332 Tashkent South Patricia,"{hobbies:['against', 'good']}",UZ 100332 Tashkent South Patricia,,,,
2,3,Kathryn Moyer,20.0,,90.0,,Lake Stevenmouth,,2017-08-14 00:00:00,DATA SCIENCE,...,excellent,185.0,2022-07-14 22:27:02,"Wendyshire 12-kv, dom 1, Tashkent","{'hobbies': ['fast', 'clearly'], 'skills': {'t...",Wendyshire 12-kv,dom 1,Tashkent,,
3,4,Ruben Wilson,20.0,fmale,81.0,,Port Pamelafort,special,1973-09-17 00:00:00,data-sciens,...,good,175.0,2023-04-29 19:05:30,"Apartment 16, North Tamara district, Tashkent,...","{'hobbies': ['left', 'role'], 'skills': {'tech...",Apartment 16,North Tamara district,Tashkent,UZ,100097
4,5,Robert Pruitt,20.0,Female,,001-182-659-5631x02803,Kingburgh,six,2023-11-25 00:00:00,data_sciense,...,average,282.0,2013-10-13 00:00:00,,"{'hobbies': ['sort', 'science'], 'skills': {'t...",,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,Gary Sawyer,22.0,fmale,,,Gabriellefurt,amy97@woodard.net,1992-11-13 00:00:00,DATA SCIENCE,...,excellent,64.0,2007-10-14 18:53:00,"Apartment 12, Laurenhaven district, Tashkent, ...","{'hobbies': ['collection', 'his'], 'skills': {...",Apartment 12,Laurenhaven district,Tashkent,UZ,100254
996,997,Scott Clark,20.0,,,001-205-065-8737,Rogerborough,katiepage@hensley.com,2024-10-05 00:00:00,DATA SCIENCE,...,,143.0,2021-06-21 07:46:27,"Patrickmouth 4-kv, dom 6, Tashkent",INVALID_JSON_DATA,Patrickmouth 4-kv,dom 6,Tashkent,,
997,998,,20.0,MALE,,264-577-8585,South Mary,someonegmail.com,2021-11-19 02:23:00,data_sciense,...,excellent,79.0,1975-05-11 00:00:00,"Apartment 38, New Brendan district, Tashkent, ...","{'hobbies': ['street', 'read'], 'skills': {'te...",Apartment 38,New Brendan district,Tashkent,UZ,100394
998,999,Brittany Barrett,21.0,FEMALE,86.0,001-663-118-1327x207,Port Rachael,richardgreen@@shannon-jenkins.info,2021-07-08 05:51:03,data_sciense,...,average,238.0,2015-05-24 00:00:00,"25998 Martinez Grove Apt. 473, West Scott","{'hobbies': ['game', 'total'], 'skills': {'tec...",25998 Martinez Grove Apt. 473,West Scott,,,


In [69]:
df.drop_duplicates()
df

Unnamed: 0,student_id,name,age,gender,score,phone,city,email,date_of_join,course,...,remarks,money_spent,event_time,address_raw,profile_json,add_house,add_district,add_city,add_country,add_postal
0,1,Claudia Short,20.0,,,+1-619-379-4152x102,Katieland,someonegmail.com,2022-09-03 23:22:44,Data Science,...,good,135.0,2021-08-18 18:53:50,"Apartment 37, South Kevin district, Tashkent, ...","{'hobbies': ['gun', 'nice'], 'skills': {'tech'...",Apartment 37,South Kevin district,Tashkent,UZ,100539
1,2,,20.0,Female,90.0,,Dawnburgh,psmith@chen.com,2017-08-29 00:00:00,Data Science,...,excellent,152.0,2001-11-10 04:19:00,UZ 100332 Tashkent South Patricia,"{hobbies:['against', 'good']}",UZ 100332 Tashkent South Patricia,,,,
2,3,Kathryn Moyer,20.0,,90.0,,Lake Stevenmouth,,2017-08-14 00:00:00,DATA SCIENCE,...,excellent,185.0,2022-07-14 22:27:02,"Wendyshire 12-kv, dom 1, Tashkent","{'hobbies': ['fast', 'clearly'], 'skills': {'t...",Wendyshire 12-kv,dom 1,Tashkent,,
3,4,Ruben Wilson,20.0,fmale,81.0,,Port Pamelafort,special,1973-09-17 00:00:00,data-sciens,...,good,175.0,2023-04-29 19:05:30,"Apartment 16, North Tamara district, Tashkent,...","{'hobbies': ['left', 'role'], 'skills': {'tech...",Apartment 16,North Tamara district,Tashkent,UZ,100097
4,5,Robert Pruitt,20.0,Female,,001-182-659-5631x02803,Kingburgh,six,2023-11-25 00:00:00,data_sciense,...,average,282.0,2013-10-13 00:00:00,,"{'hobbies': ['sort', 'science'], 'skills': {'t...",,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,Gary Sawyer,22.0,fmale,,,Gabriellefurt,amy97@woodard.net,1992-11-13 00:00:00,DATA SCIENCE,...,excellent,64.0,2007-10-14 18:53:00,"Apartment 12, Laurenhaven district, Tashkent, ...","{'hobbies': ['collection', 'his'], 'skills': {...",Apartment 12,Laurenhaven district,Tashkent,UZ,100254
996,997,Scott Clark,20.0,,,001-205-065-8737,Rogerborough,katiepage@hensley.com,2024-10-05 00:00:00,DATA SCIENCE,...,,143.0,2021-06-21 07:46:27,"Patrickmouth 4-kv, dom 6, Tashkent",INVALID_JSON_DATA,Patrickmouth 4-kv,dom 6,Tashkent,,
997,998,,20.0,MALE,,264-577-8585,South Mary,someonegmail.com,2021-11-19 02:23:00,data_sciense,...,excellent,79.0,1975-05-11 00:00:00,"Apartment 38, New Brendan district, Tashkent, ...","{'hobbies': ['street', 'read'], 'skills': {'te...",Apartment 38,New Brendan district,Tashkent,UZ,100394
998,999,Brittany Barrett,21.0,FEMALE,86.0,001-663-118-1327x207,Port Rachael,richardgreen@@shannon-jenkins.info,2021-07-08 05:51:03,data_sciense,...,average,238.0,2015-05-24 00:00:00,"25998 Martinez Grove Apt. 473, West Scott","{'hobbies': ['game', 'total'], 'skills': {'tec...",25998 Martinez Grove Apt. 473,West Scott,,,


In [88]:
df['gender']=df['gender'].str.lower()
df['gender']=df['gender'].replace('fmale','female')
df['gender']=df['gender'].replace('femlae','female')
df['gender']=df['gender'].replace('female','Female')
df['gender']=df['gender'].replace('male','Male')
df['gender']=df['gender'].replace('NaN','Unknown')
df['gender']=df['gender'].fillna('Unknown')

df

Unnamed: 0,student_id,name,age,gender,score,phone,city,email,date_of_join,course,...,remarks,money_spent,event_time,address_raw,profile_json,add_house,add_district,add_city,add_country,add_postal
0,1,Claudia Short,20.0,unknown,,+1-619-379-4152x102,Katieland,someonegmail.com,2022-09-03 23:22:44,Data Science,...,good,135.0,2021-08-18 18:53:50,"Apartment 37, South Kevin district, Tashkent, ...","{'hobbies': ['gun', 'nice'], 'skills': {'tech'...",Apartment 37,South Kevin district,Tashkent,UZ,100539
1,2,,20.0,Female,90.0,,Dawnburgh,psmith@chen.com,2017-08-29 00:00:00,Data Science,...,excellent,152.0,2001-11-10 04:19:00,UZ 100332 Tashkent South Patricia,"{hobbies:['against', 'good']}",UZ 100332 Tashkent South Patricia,,,,
2,3,Kathryn Moyer,20.0,unknown,90.0,,Lake Stevenmouth,,2017-08-14 00:00:00,Data Science,...,excellent,185.0,2022-07-14 22:27:02,"Wendyshire 12-kv, dom 1, Tashkent","{'hobbies': ['fast', 'clearly'], 'skills': {'t...",Wendyshire 12-kv,dom 1,Tashkent,,
3,4,Ruben Wilson,20.0,Female,81.0,,Port Pamelafort,special,1973-09-17 00:00:00,Data Science,...,good,175.0,2023-04-29 19:05:30,"Apartment 16, North Tamara district, Tashkent,...","{'hobbies': ['left', 'role'], 'skills': {'tech...",Apartment 16,North Tamara district,Tashkent,UZ,100097
4,5,Robert Pruitt,20.0,Female,,001-182-659-5631x02803,Kingburgh,six,2023-11-25 00:00:00,Data Science,...,average,282.0,2013-10-13 00:00:00,,"{'hobbies': ['sort', 'science'], 'skills': {'t...",,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,Gary Sawyer,22.0,Female,,,Gabriellefurt,amy97@woodard.net,1992-11-13 00:00:00,Data Science,...,excellent,64.0,2007-10-14 18:53:00,"Apartment 12, Laurenhaven district, Tashkent, ...","{'hobbies': ['collection', 'his'], 'skills': {...",Apartment 12,Laurenhaven district,Tashkent,UZ,100254
996,997,Scott Clark,20.0,unknown,,001-205-065-8737,Rogerborough,katiepage@hensley.com,2024-10-05 00:00:00,Data Science,...,,143.0,2021-06-21 07:46:27,"Patrickmouth 4-kv, dom 6, Tashkent",INVALID_JSON_DATA,Patrickmouth 4-kv,dom 6,Tashkent,,
997,998,,20.0,Male,,264-577-8585,South Mary,someonegmail.com,2021-11-19 02:23:00,Data Science,...,excellent,79.0,1975-05-11 00:00:00,"Apartment 38, New Brendan district, Tashkent, ...","{'hobbies': ['street', 'read'], 'skills': {'te...",Apartment 38,New Brendan district,Tashkent,UZ,100394
998,999,Brittany Barrett,21.0,Female,86.0,001-663-118-1327x207,Port Rachael,richardgreen@@shannon-jenkins.info,2021-07-08 05:51:03,Data Science,...,average,238.0,2015-05-24 00:00:00,"25998 Martinez Grove Apt. 473, West Scott","{'hobbies': ['game', 'total'], 'skills': {'tec...",25998 Martinez Grove Apt. 473,West Scott,,,


In [86]:
df['course']=df['course'].str.lower()
df['course']=df['course'].replace('data-sciens','data science')
df['course']=df['course'].replace('data_sciense','data science')
df['course']=df['course'].replace('ds','data science')
df['course']=df['course'].replace('d.s.','data science')
df['course']=df['course'].replace('data science','Data Science')
df['course']=df['course'].replace('pythno','python')
df['course']=df['course'].replace('pyhton','python')
df['course']=df['course'].replace('python','Python')

df

Unnamed: 0,student_id,name,age,gender,score,phone,city,email,date_of_join,course,...,remarks,money_spent,event_time,address_raw,profile_json,add_house,add_district,add_city,add_country,add_postal
0,1,Claudia Short,20.0,unknown,,+1-619-379-4152x102,Katieland,someonegmail.com,2022-09-03 23:22:44,Data Science,...,good,135.0,2021-08-18 18:53:50,"Apartment 37, South Kevin district, Tashkent, ...","{'hobbies': ['gun', 'nice'], 'skills': {'tech'...",Apartment 37,South Kevin district,Tashkent,UZ,100539
1,2,,20.0,Female,90.0,,Dawnburgh,psmith@chen.com,2017-08-29 00:00:00,Data Science,...,excellent,152.0,2001-11-10 04:19:00,UZ 100332 Tashkent South Patricia,"{hobbies:['against', 'good']}",UZ 100332 Tashkent South Patricia,,,,
2,3,Kathryn Moyer,20.0,unknown,90.0,,Lake Stevenmouth,,2017-08-14 00:00:00,Data Science,...,excellent,185.0,2022-07-14 22:27:02,"Wendyshire 12-kv, dom 1, Tashkent","{'hobbies': ['fast', 'clearly'], 'skills': {'t...",Wendyshire 12-kv,dom 1,Tashkent,,
3,4,Ruben Wilson,20.0,Female,81.0,,Port Pamelafort,special,1973-09-17 00:00:00,Data Science,...,good,175.0,2023-04-29 19:05:30,"Apartment 16, North Tamara district, Tashkent,...","{'hobbies': ['left', 'role'], 'skills': {'tech...",Apartment 16,North Tamara district,Tashkent,UZ,100097
4,5,Robert Pruitt,20.0,Female,,001-182-659-5631x02803,Kingburgh,six,2023-11-25 00:00:00,Data Science,...,average,282.0,2013-10-13 00:00:00,,"{'hobbies': ['sort', 'science'], 'skills': {'t...",,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,Gary Sawyer,22.0,Female,,,Gabriellefurt,amy97@woodard.net,1992-11-13 00:00:00,Data Science,...,excellent,64.0,2007-10-14 18:53:00,"Apartment 12, Laurenhaven district, Tashkent, ...","{'hobbies': ['collection', 'his'], 'skills': {...",Apartment 12,Laurenhaven district,Tashkent,UZ,100254
996,997,Scott Clark,20.0,unknown,,001-205-065-8737,Rogerborough,katiepage@hensley.com,2024-10-05 00:00:00,Data Science,...,,143.0,2021-06-21 07:46:27,"Patrickmouth 4-kv, dom 6, Tashkent",INVALID_JSON_DATA,Patrickmouth 4-kv,dom 6,Tashkent,,
997,998,,20.0,Male,,264-577-8585,South Mary,someonegmail.com,2021-11-19 02:23:00,Data Science,...,excellent,79.0,1975-05-11 00:00:00,"Apartment 38, New Brendan district, Tashkent, ...","{'hobbies': ['street', 'read'], 'skills': {'te...",Apartment 38,New Brendan district,Tashkent,UZ,100394
998,999,Brittany Barrett,21.0,Female,86.0,001-663-118-1327x207,Port Rachael,richardgreen@@shannon-jenkins.info,2021-07-08 05:51:03,Data Science,...,average,238.0,2015-05-24 00:00:00,"25998 Martinez Grove Apt. 473, West Scott","{'hobbies': ['game', 'total'], 'skills': {'tec...",25998 Martinez Grove Apt. 473,West Scott,,,


In [85]:
df['remarks']=df['remarks'].str.strip()
df

Unnamed: 0,student_id,name,age,gender,score,phone,city,email,date_of_join,course,...,remarks,money_spent,event_time,address_raw,profile_json,add_house,add_district,add_city,add_country,add_postal
0,1,Claudia Short,20.0,unknown,,+1-619-379-4152x102,Katieland,someonegmail.com,2022-09-03 23:22:44,Data Science,...,good,135.0,2021-08-18 18:53:50,"Apartment 37, South Kevin district, Tashkent, ...","{'hobbies': ['gun', 'nice'], 'skills': {'tech'...",Apartment 37,South Kevin district,Tashkent,UZ,100539
1,2,,20.0,Female,90.0,,Dawnburgh,psmith@chen.com,2017-08-29 00:00:00,Data Science,...,excellent,152.0,2001-11-10 04:19:00,UZ 100332 Tashkent South Patricia,"{hobbies:['against', 'good']}",UZ 100332 Tashkent South Patricia,,,,
2,3,Kathryn Moyer,20.0,unknown,90.0,,Lake Stevenmouth,,2017-08-14 00:00:00,Data Science,...,excellent,185.0,2022-07-14 22:27:02,"Wendyshire 12-kv, dom 1, Tashkent","{'hobbies': ['fast', 'clearly'], 'skills': {'t...",Wendyshire 12-kv,dom 1,Tashkent,,
3,4,Ruben Wilson,20.0,Female,81.0,,Port Pamelafort,special,1973-09-17 00:00:00,Data Science,...,good,175.0,2023-04-29 19:05:30,"Apartment 16, North Tamara district, Tashkent,...","{'hobbies': ['left', 'role'], 'skills': {'tech...",Apartment 16,North Tamara district,Tashkent,UZ,100097
4,5,Robert Pruitt,20.0,Female,,001-182-659-5631x02803,Kingburgh,six,2023-11-25 00:00:00,Data Science,...,average,282.0,2013-10-13 00:00:00,,"{'hobbies': ['sort', 'science'], 'skills': {'t...",,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,Gary Sawyer,22.0,Female,,,Gabriellefurt,amy97@woodard.net,1992-11-13 00:00:00,Data Science,...,excellent,64.0,2007-10-14 18:53:00,"Apartment 12, Laurenhaven district, Tashkent, ...","{'hobbies': ['collection', 'his'], 'skills': {...",Apartment 12,Laurenhaven district,Tashkent,UZ,100254
996,997,Scott Clark,20.0,unknown,,001-205-065-8737,Rogerborough,katiepage@hensley.com,2024-10-05 00:00:00,Data Science,...,,143.0,2021-06-21 07:46:27,"Patrickmouth 4-kv, dom 6, Tashkent",INVALID_JSON_DATA,Patrickmouth 4-kv,dom 6,Tashkent,,
997,998,,20.0,Male,,264-577-8585,South Mary,someonegmail.com,2021-11-19 02:23:00,Data Science,...,excellent,79.0,1975-05-11 00:00:00,"Apartment 38, New Brendan district, Tashkent, ...","{'hobbies': ['street', 'read'], 'skills': {'te...",Apartment 38,New Brendan district,Tashkent,UZ,100394
998,999,Brittany Barrett,21.0,Female,86.0,001-663-118-1327x207,Port Rachael,richardgreen@@shannon-jenkins.info,2021-07-08 05:51:03,Data Science,...,average,238.0,2015-05-24 00:00:00,"25998 Martinez Grove Apt. 473, West Scott","{'hobbies': ['game', 'total'], 'skills': {'tec...",25998 Martinez Grove Apt. 473,West Scott,,,
