### Task 1: Detecting Missing Values during Data Ingestion
**Description**: You have a CSV file with missing values in some columns. Write a Python script to detect and report missing values during the ingestion process.

**Steps**:
1. Load data
2. Check for missing values
3. Report missing values

In [1]:
import pandas as pd

df = pd.read_csv("data.csv")
missing_report = df.isnull().sum()
missing_report = missing_report[missing_report > 0]
print(missing_report)


FileNotFoundError: [Errno 2] No such file or directory: 'data.csv'

### Task 2: Validate Data Types during Extraction
**Description**: You have a JSON file that should have specific data types for each field. Write a script to validate if the data types match the expected schema.

**Steps**:
1. Define expected schema
2. Validate data types

In [2]:
import json
import pandas as pd

expected_schema = {"id": int, "name": str, "age": int, "active": bool}
with open("data.json") as f:
    data = json.load(f)
df = pd.DataFrame(data)
type_mismatches = {col: df[col].map(type).ne(dtype).sum() for col, dtype in expected_schema.items()}
type_mismatches = {k: v for k, v in type_mismatches.items() if v > 0}
print(type_mismatches)


FileNotFoundError: [Errno 2] No such file or directory: 'data.json'

### Task 3: Remove Duplicate Records in Data
**Description**: You have a dataset with duplicate entries. Write a Python script to find and remove duplicate records using Pandas.

**Steps**:
1. Find duplicate records
2. Remove duplicates
3. Report results

In [3]:
import pandas as pd

data = {
    "id": [1, 2, 3, 3, 4, 5, 5],
    "name": ["Alice", "Bob", "Charlie", "Charlie", "David", "Eve", "Eve"],
    "amount": [100, 200, 300, 300, 400, 500, 500]
}

df = pd.DataFrame(data)
duplicates = df[df.duplicated()]
df = df.drop_duplicates()
print(duplicates)
print(df)









   id     name  amount
3   3  Charlie     300
6   5      Eve     500
   id     name  amount
0   1    Alice     100
1   2      Bob     200
2   3  Charlie     300
4   4    David     400
5   5      Eve     500
