# Welcome to CodeBook – Your Data Science Internship Begins!

### Introduction

Congratulations! You have just been hired as a Data Scientist Intern at CodeBook – The Social Media for Coders. This Delhi-based company is offering you a ₹10 LPA job if you successfully complete this 1-month internship. But before you get there, you must prove your skills using only Python—no pandas, NumPy, or fancy libraries!

Your manager Puneet Kumar has assigned you your first task: analyzing a data dump of CodeBook users using pure Python. Your job is to load and explore the data to understand its structure.

In [None]:
# Let's write the function to load the data

import json

def load_data(filename):
      with open(filename, "r") as f:
            data = json.load(f)
      return data

In [None]:
data = load_data("file.json")
data

In [None]:
# Write a function to display users their connections

def display_users(data):
      
      print("----------| Users and their Connections |----------\n")
      for user in data['users']:
            print(f"ID{user['id']}: {user['name']} is friends with {user['friends']} and liked Pages are {user['liked_pages']}")
      
      print("\n\n------------| Pages Informations |------------\n")
      for page in data['pages']:
            print(f"{page['id']}: {page['name']}")

display_users(data)


# Cleaning and Structuring the Data

## Introduction
Your manager is impressed with your progress but points out that the data is messy. Before we can analyze it effectively, we need to clean and structure the data properly.

Your task is to:

    -Handle missing values
    -Remove duplicate or inconsistent data
    -Standardize the data format
Let's get started!

## Task 1: Identify Issues in the Data
Your manager provides you with an example dataset where some records are incomplete or incorrect.

### Problems:
    -User ID 3 has an empty name.
    -User ID 4 has a duplicate friend entry.
    -User ID 5 has no connections or liked pages (inactive user).
    -The pages list contains duplicate page IDs.

## Task 2: Clean the Data
We will:

    -Remove users with missing names.
    -Remove duplicate friend entries.
    -Remove inactive users (users with no friends and no liked pages).
    -Deduplicate pages based on IDs.

## Next Steps
Your manager is happy with the cleaned data and says: "Great! Now that our data is structured, let's start analyzing it. You are an intern, but he is so confident in your skills that he asks you - Can you build a 'People You May Know' feature?" Let's do that next!


# Finding "People You May Know"

Now that our data is cleaned and structured, your manager assigns you a new task: Build a 'People You May Know' feature!

In social networks, this feature helps users connect with others by suggesting friends based on mutual connections. Your job is to analyze mutual friends and recommend potential connections.

## Task 1: Understand the Logic
### How 'People You May Know' Works:
    -If User A and User B are not friends but have mutual friends, we suggest User B to User A and vice versa.
    -More mutual friends = higher priority recommendation.
    
### Example:

    -Amit (ID: 1) is friends with Priya (ID: 2) and Rahul (ID: 3).
    -Priya (ID: 2) is friends with Sara (ID: 4).
    -Amit is not directly friends with Sara, but they share Priya as a mutual friend.
    -Suggest Sara to Amit as "People You May Know".
But there are cases where we will have more than one "People You May Know". In those cases, greater the number of mutual friends, higher the probability that the user might know the person we are recommending.

# Task 2: Implement the Algorithm
We'll create a function that:

    -Finds all friends of a given user.
    -Identifies mutual friends between non-friends.
    -Ranks recommendations by the number of mutual friends.

## Next Steps
Your manager is excited about your progress and now says: "Great job! Next, find 'Pages You Might Like' based on your connections and preferences."

Let's make sure we live up to his expectations.