### Task 1: Measure Data Accuracy using a Trusted Source

**Description**: You have two datasets of product prices: `company_prices.csv` and
`trusted_prices.csv` . Check if the prices in `company_prices.csv` match the prices in
`trusted_prices.csv` . Assume both files have a "product_id" and "price" column.

In [1]:
# Write your code from here
import pandas as pd

# Load the datasets
company_prices = None  # Initialize to None
trusted_prices = None  # Initialize to None
try:
    company_prices = pd.read_csv('company_prices.csv')
    trusted_prices = pd.read_csv('trusted_prices.csv')
except FileNotFoundError:
    print("Error: One or both of the CSV files were not found. Please ensure 'company_prices.csv' and 'trusted_prices.csv' are in the correct directory.")
    exit()

# Rename the 'price' column in the trusted dataset for clarity during merging
if trusted_prices is not None:
    trusted_prices = trusted_prices.rename(columns={'price': 'trusted_price'})

# Merge the two dataframes based on 'product_id'
if company_prices is not None and trusted_prices is not None:
    merged_df = pd.merge(company_prices, trusted_prices, on='product_id', how='inner')

    # Check if a trusted price was found for each product in the company prices
    if merged_df.empty:
        print("No matching product IDs found between the two datasets. Cannot assess price accuracy.")
    else:
        # Compare the 'price' from the company with the 'trusted_price'
        merged_df['price_match'] = merged_df['price'] == merged_df['trusted_price']

        # Calculate the accuracy
        accuracy = merged_df['price_match'].mean() * 100

        # Display the comparison and accuracy
        print("Comparison of Company Prices with Trusted Prices:")
        print(merged_df[['product_id', 'price', 'trusted_price', 'price_match']])
        print(f"\nAccuracy of Company Prices compared to Trusted Prices: {accuracy:.2f}%")

        # You can also analyze discrepancies
        mismatched_prices = merged_df[~merged_df['price_match']]
        if not mismatched_prices.empty:
            print("\nProducts with Mismatched Prices:")
            print(mismatched_prices[['product_id', 'price', 'trusted_price']])
        else:
            print("\nAll matched product prices are accurate.")
else:
    print("Could not proceed with price comparison due to missing datasets.")

Error: One or both of the CSV files were not found. Please ensure 'company_prices.csv' and 'trusted_prices.csv' are in the correct directory.
Could not proceed with price comparison due to missing datasets.


### Task 2: Detect Incorrect Values

**Description**: In `company_prices.csv` , detect any negative price values which are incorrect values for prices.

In [2]:
# Write your code from here
import pandas as pd

# Load the dataset
company_prices = None  # Initialize to None
try:
    company_prices = pd.read_csv('company_prices.csv')
except FileNotFoundError:
    print("Error: 'company_prices.csv' was not found. Please ensure the file is in the correct directory.")
    exit()

# Detect negative price values only if the DataFrame was loaded
if company_prices is not None:
    incorrect_prices = company_prices[company_prices['price'] < 0]

    # Check if any incorrect prices were found
    if not incorrect_prices.empty:
        print("Detected Incorrect (Negative) Price Values in company_prices.csv:")
        print(incorrect_prices)
        print(f"\nNumber of incorrect price values found: {len(incorrect_prices)}")
    else:
        print("No incorrect (negative) price values found in company_prices.csv.")
else:
    print("Could not proceed with detecting incorrect prices because 'company_prices.csv' was not loaded.")

Error: 'company_prices.csv' was not found. Please ensure the file is in the correct directory.
Could not proceed with detecting incorrect prices because 'company_prices.csv' was not loaded.


### Task 3: Check Missing Data Rates

**Description**: Calculate the percentage of missing values in `customer_data.csv` .

In [3]:
# Write your code from here
import pandas as pd

# Load the dataset
customer_data = None  # Initialize to None
try:
    customer_data = pd.read_csv('customer_data.csv')
except FileNotFoundError:
    print("Error: 'customer_data.csv' was not found. Please ensure the file is in the correct directory.")
    exit()

# Calculate the number of missing values per column only if the DataFrame was loaded
if customer_data is not None:
    missing_values_count = customer_data.isnull().sum()

    # Calculate the total number of entries in the DataFrame
    total_entries = len(customer_data)

    # Calculate the percentage of missing values per column
    missing_percentage = (missing_values_count / total_entries) * 100

    # Display the missing data rates
    print("Percentage of Missing Values per Column in customer_data.csv:")
    print(missing_percentage)

    # Optionally, you can display columns with any missing values
    columns_with_missing = missing_percentage[missing_percentage > 0]
    if not columns_with_missing.empty:
        print("\nColumns with Missing Data:")
        print(columns_with_missing)
    else:
        print("\nNo missing data found in any column.")
else:
    print("Could not proceed with calculating missing data rates because 'customer_data.csv' was not loaded.")

Error: 'customer_data.csv' was not found. Please ensure the file is in the correct directory.
Could not proceed with calculating missing data rates because 'customer_data.csv' was not loaded.


### Task 4: Handling Partially Available Records

**Description**: In `customer_data.csv` , identify records with missing "email" or "phone number" and decide whether to drop or fill them.

In [None]:
# Write your code from here
import pandas as pd

# Load the dataset
customer_data = None  # Initialize to None
try:
    customer_data = pd.read_csv('customer_data.csv')
except FileNotFoundError:
    print("Error: 'customer_data.csv' was not found. Please ensure the file is in the correct directory.")
    exit()

# Identify records with missing "email" or "phone_number" only if the DataFrame was loaded
if customer_data is not None:
    missing_contact_info = customer_data[customer_data['email'].isnull() | customer_data['phone_number'].isnull()]

    print("Records with Missing 'email' or 'phone_number':")
    print(missing_contact_info)
    print(f"\nNumber of records with missing 'email' or 'phone_number': {len(missing_contact_info)}")

    # --- Decision Point: Drop or Fill ---
    decision = input("\nDo you want to (drop) these records or (fill) the missing values? (drop/fill): ").lower()

    if decision == 'drop':
        customer_data_cleaned = customer_data[~(customer_data['email'].isnull() | customer_data['phone_number'].isnull())].copy()
        print("\nRecords with missing 'email' OR 'phone_number' dropped.")
        print(f"Shape of original data: {customer_data.shape}")
        print(f"Shape of data after dropping: {customer_data_cleaned.shape}")
    elif decision == 'fill':
        fill_choice = input("\nHow do you want to fill the missing values? (e.g., 'unknown', specific value): ")
        customer_data_filled = customer_data.copy()
        customer_data_filled['email'].fillna(fill_choice, inplace=True)
        customer_data_filled['phone_number'].fillna(fill_choice, inplace=True)
        print(f"\nMissing 'email' and 'phone_number' values filled with '{fill_choice}'.")
        print("\nData after filling:")
        print(customer_data_filled[customer_data['email'].isnull() | customer_data['phone_number'].isnull()])
    else:
        print("\nInvalid decision. No changes were made to the data.")

else:
    print("Could not proceed with handling partially available records because 'customer_data.csv' was not loaded.")

# You can further work with customer_data_cleaned (if dropped) or customer_data_filled (if filled)

Error: 'customer_data.csv' was not found. Please ensure the file is in the correct directory.
Could not proceed with handling partially available records because 'customer_data.csv' was not loaded.


: 