### Task 1: Basic Data Profiling of a CSV File
**Description**: Load a CSV file and generate a Pandas-Profiling report.

**Steps**:
1. Load a CSV File: Make sure you have a CSV file (e.g., data.csv ). Load it using pandas.
2. Generate a Profile Report.

In [10]:
import pandas as pd
from ydata_profiling import ProfileReport

# Sample data to avoid FileNotFoundError
data = {
    "ID": [1, 2, 3, 4],
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Age": [23, 35, 45, 29],
    "Grade": ["A", "B", "A", "C"],
    "Email": ["alice@example.com", "bob@example.com", "charlie@example.com", "david@example.com"],
}

df = pd.DataFrame(data)

# Generate profile report with correlations disabled via config
profile = ProfileReport(
    df,
    title="Sample Data Profiling Report",
    explorative=True,
    config={
        "correlations": {
            "pearson": False,
            "spearman": False,
            "kendall": False,
            "phi_k": False,
            "cramers": False,
        }
    },
)

profile.to_file("sample_data_report.html")
print("✅ Sample data profiling report generated: 'sample_data_report.html'")

TypeCheckError: argument "config" (dict) did not match any element in the union:
  ydata_profiling.config.Settings: is not an instance of ydata_profiling.config.Settings
  NoneType: is not an instance of NoneType

### Task 2: Understanding Missing Values with Pandas-Profiling

**Description**: Identify missing values in your dataset using pandas-profiling.

**Steps**: 
1. Generate a Profile Report to Analyze Missing Values


In [3]:
import pandas as pd
from ydata_profiling import ProfileReport

# Replace with your actual CSV filename
file_path = "data.csv"

try:
    # Load dataset
    df = pd.read_csv(file_path)

    # Generate profile report focused on missing values
    profile = ProfileReport(
        df,
        title="🔍 Missing Values Analysis Report",
        explorative=True,
        correlations={"pearson": {"calculate": False}},  # Faster if you skip correlation
        missing_diagrams={
            "bar": True,
            "matrix": True,
            "heatmap": True,
        }
    )

    # Save the report as an HTML file
    profile.to_file("missing_values_report.html")

    print("✅ Missing values report generated: 'missing_values_report.html'")

except FileNotFoundError:
    print("❌ Error: 'data.csv' not found. Please make sure the file is in the current directory.")

❌ Error: 'data.csv' not found. Please make sure the file is in the current directory.


### Task 3: Analyze Data Types Using Pandas-Profiling
**Description**: Use Pandas-Profiling to analyze and check data types of your dataset.

In [9]:
import pandas as pd
from ydata_profiling import ProfileReport

# Sample data to simulate students.csv content
data = {
    "ID": [1, 2, 3, 4, 5],
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Age": [25, 30, None, 22, 40],
    "Grade": ["A", "B", "A", None, "C"],
    "Email": [
        "alice@example.com",
        "bob_at_example.com",  # intentionally invalid email format
        "charlie@example.com",
        None,
        "eva@example.com",
    ],
}

# Create DataFrame
df = pd.DataFrame(data)

# Generate profile report disabling correlations to avoid errors
profile = ProfileReport(
    df,
    title="Sample Data Profiling Report",
    explorative=True,
    correlations={
        "pearson": False,
        "spearman": False,
        "kendall": False,
        "phi_k": False,
        "cramers": False,
    },
)

# Save the report to an HTML file
profile.to_file("sample_data_report.html")

print("✅ Profile report generated successfully: 'sample_data_report.html'")

TypeError: argument of type 'bool' is not iterable

### Task 4: Detect Unique Values and Duplicates
**Description**: Use Pandas-Profiling to detect unique values and duplicates in your dataset.

In [6]:
import pandas as pd
from ydata_profiling import ProfileReport

# Step 1: Create a sample dataset with duplicates and unique values
data = {
    "ID": [1, 2, 3, 4, 4, 5, 6, 6, 6],  # Duplicate IDs 4 and 6
    "Name": ["Alice", "Bob", "Charlie", "David", "David", "Eve", "Frank", "Frank", "Frank"],
    "Age": [25, 30, 35, 40, 40, 45, 50, 50, 50],
    "Email": [
        "alice@example.com",
        "bob@example.com",
        "charlie@example.com",
        "david@example.com",
        "david@example.com",
        "eve@example.com",
        "frank@example.com",
        "frank@example.com",
        "frank@example.com",
    ],
}

df = pd.DataFrame(data)

# Step 2: Generate the profile report
profile = ProfileReport(df, title="Unique Values & Duplicates Report", explorative=True)

# Step 3: Save the report
profile.to_file("uniques_duplicates_report.html")

print("✅ Report generated: uniques_duplicates_report.html")

100%|██████████| 4/4 [00:00<00:00, 133.56it/s]<00:00, 50.33it/s, Describe variable: Email]
Summarize dataset: 100%|██████████| 17/17 [00:00<00:00, 24.26it/s, Completed]                 
Generate report structure: 100%|██████████| 1/1 [00:01<00:00,  1.85s/it]
Render HTML: 100%|██████████| 1/1 [00:00<00:00,  3.47it/s]
Export report to file: 100%|██████████| 1/1 [00:00<00:00, 650.38it/s]

✅ Report generated: uniques_duplicates_report.html



