### Task 1: Data Profiling to Understand Data Quality
**Description**: Use basic statistical methods to profile a dataset and identify potential quality issues.

**Steps**:
1. Load the dataset using pandas in Python.
2. Understand the data by checking its basic statistics.
3. Identify null values.
4. Check unique values for categorical columns.
5. Review outliers using box plots.

In [None]:
import unittest
import pandas as pd
import numpy as np
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler
from Ques_1 import generate_synthetic_data, clean_data, validate_data, detect_anomalies, normalize_data


class TestDataQualityStrategies(unittest.TestCase):

    def test_generate_synthetic_data(self):
        """Test for generating synthetic data"""
        df = generate_synthetic_data(n_samples=1000)
        
        # Check the shape of the generated DataFrame
        self.assertEqual(df.shape, (1000, 4), "DataFrame shape mismatch")
        
        # Check the columns
        self.assertTrue(all(col in df.columns for col in ['age', 'income', 'score', 'region']), "Missing columns in the generated data")
        
        # Check for missing values
        self.assertTrue(df.isnull().sum().sum() > 0, "No missing values detected in the generated data")

    def test_clean_data(self):
        """Test for cleaning the data"""
        df = generate_synthetic_data(n_samples=1000)
        
        # Clean the data
        df_cleaned = clean_data(df)
        
        # Check that there are no missing values
        self.assertEqual(df_cleaned.isnull().sum().sum(), 0, "Missing values not handled correctly")
        
        # Check for duplicate removal
        self.assertEqual(df_cleaned.duplicated().sum(), 0, "Duplicates not removed")

    def test_validate_data(self):
        """Test for data validation"""
        df = generate_synthetic_data(n_samples=1000)
        
        # Validate the data
        df_validated = validate_data(df)
        
        # Check if income is non-negative
        self.assertTrue((df_validated['income'] >= 0).all(), "Income contains negative values")
        
        # Check if score is between 0 and 100
        self.assertTrue((df_validated['score'] >= 0).all() and (df_validated['score'] <= 100).all(), "Score contains values out of range")

    def test_detect_anomalies(self):
        """Test for anomaly detection"""
        df = generate_synthetic_data(n_samples=1000)
        
        # Clean the data
        df_cleaned = clean_data(df)
        
        # Detect anomalies
        df_with_anomalies = detect_anomalies(df_cleaned)
        
        # Check that anomalies are marked correctly
        self.assertIn('anomaly', df_with_anomalies.columns, "Anomaly column missing")
        self.assertTrue(df_with_anomalies['anomaly'].value_counts().sum() == df_with_anomalies.shape[0], "Anomaly detection failed")

    def test_normalize_data(self):
        """Test for data normalization"""
        df = generate_synthetic_data(n_samples=1000)
        
        # Normalize the data
        df_normalized = normalize_data(df)
        
        # Check if the mean of 'age', 'income', and 'score' is approximately 0 after normalization
        self.assertAlmostEqual(df_normalized['age'].mean(), 0, delta=0.1, msg="Age normalization failed")
        self.assertAlmostEqual(df_normalized['income'].mean(), 0, delta=0.1, msg="Income normalization failed")
        self.assertAlmostEqual(df_normalized['score'].mean(), 0, delta=0.1, msg="Score normalization failed")

if __name__ == '__main__':
    unittest.main()


ModuleNotFoundError: No module named 'Ques_1'

### Task 2: Implement Simple Data Validation
**Description**: Write a Python script to validate the data types and constraints of each column in a dataset.

**Steps**:
1. Define constraints for each column.
2. Validate each column based on its constraints.

In [None]:
# write your code from here

### Task 3: Detect Missing Data Patterns
**Description**: Analyze and visualize missing data patterns in a dataset.

**Steps**:
1. Visualize missing data using a heatmap.
2. Identify patterns in missing data.

In [None]:
# write your code from here

### Task 4: Integrate Automated Data Quality Checks
**Description**: Integrate automated data quality checks using the Great Expectations library for a dataset.

**Steps**:
1. Install and initialize Great Expectations.
2. Set up Great Expectations.
3. Add further checks and validate.

In [None]:
# write your code from here