# Introduction
Your manager is impressed with your progress but points out that the data is messy. <br> Before we can analyze it effectively, we need to **clean and structure the data** properly.

## Your task is to:
- Handle missing values
- Remove duplicate or inconsistent data
- Standardize the data format<br>
Let's get started!

# Task 1: Identify Issues in the Data
<h3 > Your manager provides you with an example dataset where some records are incomplete or incorrect.Here's an example</h3>
<pre>
{
    "users": [
        {"id": 1, "name": "Amit", "friends": [2, 3], "liked_pages": [101]},
        {"id": 2, "name": "Priya", "friends": [1, 4], "liked_pages": [102]},
        {"id": 3, "name": "", "friends": [1], "liked_pages": [101, 103]},
        {"id": 4, "name": "Sara", "friends": [2, 2], "liked_pages": [104]},
        {"id": 5, "name": "Amit", "friends": [], "liked_pages": []}
    ],
    "pages": [
        {"id": 101, "name": "Python Developers"},
        {"id": 102, "name": "Data Science Enthusiasts"},
        {"id": 103, "name": "AI & ML Community"},
        {"id": 104, "name": "Web Dev Hub"},
        {"id": 104, "name": "Web Development"}
    ]
}   
</pre>


Problems:

**1.** User **ID 3** has an empty name.<br>
**2.** User **ID 4** has a duplicate friend entry.<br>
**3.** User **ID 5** has no connections or liked pages (inactive user).<br>
**4.** The **pages list** contains duplicate page IDs.   <br>



In [17]:
import json

In [18]:
def loadData(filename) :
    with open(filename,"r") as f :
     data=json.load(f)
    return data

In [19]:
data=loadData("02_dump-data.json")

In [20]:
data

{'users': [{'id': 1, 'name': 'Akib', 'friends': [2, 3], 'liked_pages': [101]},
  {'id': 2, 'name': 'Sakib', 'friends': [1, 4], 'liked_pages': [102]},
  {'id': 3, 'name': '', 'friends': [1], 'liked_pages': [101, 103]},
  {'id': 4, 'name': 'Runa', 'friends': [2, 2], 'liked_pages': [104]},
  {'id': 5, 'name': 'Nisan', 'friends': [], 'liked_pages': []}],
 'pages': [{'id': 101, 'name': 'Python Developers'},
  {'id': 102, 'name': 'Data Science Enthusiasts'},
  {'id': 103, 'name': 'AI & ML Community'},
  {'id': 104, 'name': 'Web Dev Hub'},
  {'id': 104, 'name': 'Web Development'}]}

In [26]:
def cleanedData(filename) :
   
    #remove the users with missing name
    data['users']=[u for u in data['users'] if u['name'].strip()] 
    
    # removing duplicate friend entry
    for user in data['users']:
        user['friends']= list(set(user["friends"]))
        
    # removing no connection or liked pages
    data['users']=[u for u in data['users'] if u['friends'] or u['liked_pages'] ] #truthly or falsely behavior
    
    #removing duplicate page list
    unique={}
    for page in data['pages']:
        #dictionary will overwrite duplicate the keys 
        unique[page['id']]=page # stored a full set => unique[101]={"id":101,"name":"Python Developer"}
    data['pages']=list(unique.values())
    
    return data
        


In [27]:
cleaned_data=cleanedData("02_dump-data.json")

In [28]:
json.dump(cleaned_data,open("02_dump-data.json","w"))

In [29]:
cleaned_data

{'users': [{'id': 1, 'name': 'Akib', 'friends': [2, 3], 'liked_pages': [101]},
  {'id': 2, 'name': 'Sakib', 'friends': [1, 4], 'liked_pages': [102]},
  {'id': 4, 'name': 'Runa', 'friends': [2], 'liked_pages': [104]}],
 'pages': [{'id': 101, 'name': 'Python Developers'},
  {'id': 102, 'name': 'Data Science Enthusiasts'},
  {'id': 103, 'name': 'AI & ML Community'},
  {'id': 104, 'name': 'Web Development'}]}