### Task 1: Measure Data Accuracy using a Trusted Source

**Description**: You have two datasets of product prices: `company_prices.csv` and
`trusted_prices.csv` . Check if the prices in `company_prices.csv` match the prices in
`trusted_prices.csv` . Assume both files have a "product_id" and "price" column.

In [1]:
# Write your code from here
import pandas as pd

def measure_price_accuracy(company_file, trusted_file):
    """
    Checks if product prices in one CSV match those in a trusted source CSV.

    Args:
        company_file (str): Path to the company prices CSV file.
        trusted_file (str): Path to the trusted prices CSV file.

    Returns:
        pandas.DataFrame: A DataFrame showing the comparison of prices
                          and indicating if they match.
    """
    try:
        company_prices_df = pd.read_csv(company_file)
        trusted_prices_df = pd.read_csv(trusted_file)
    except FileNotFoundError as e:
        print(f"Error: One or both files not found: {e}")
        return None

    # Merge the two DataFrames on 'product_id'
    merged_df = pd.merge(company_prices_df, trusted_prices_df, on='product_id', suffixes=('_company', '_trusted'))

    if 'price_company' not in merged_df.columns or 'price_trusted' not in merged_df.columns:
        print("Error: Both files must have 'price' column after merging.")
        return None

    # Compare the prices
    merged_df['price_match'] = merged_df['price_company'] == merged_df['price_trusted']

    return merged_df

# Example usage:
company_file = 'company_prices.csv'
trusted_file = 'trusted_prices.csv'
price_comparison = measure_price_accuracy(company_file, trusted_file)

if price_comparison is not None:
    print("Price Comparison:")
    print(price_comparison)

Error: One or both files not found: [Errno 2] No such file or directory: 'company_prices.csv'


### Task 2: Detect Incorrect Values

**Description**: In `company_prices.csv` , detect any negative price values which are incorrect values for prices.

In [3]:
# Write your code from here
import pandas as pd

def detect_negative_prices(file_path):
    """
    Detects any negative price values in a CSV file.

    Args:
        file_path (str): Path to the CSV file containing product prices.

    Returns:
        pandas.DataFrame: A DataFrame containing rows with negative price values.
                          Returns None if the file is not found or 'price' column
                          is missing.
    """
    try:
        df = pd.read_csv(file_path)
    except FileNotFoundError as e:
        print(f"Error: File not found: {e}")
        return None

    if 'price' not in df.columns:
        print("Error: The CSV file must have a 'price' column.")
        return None

    negative_prices_df = df[df['price'] < 0]
    return negative_prices_df

# Example usage:
company_file = 'company_prices.csv'
negative_prices = detect_negative_prices(company_file)

if negative_prices is not None:
    print("\nRows with Negative Prices:")
    print(negative_prices)

Error: File not found: [Errno 2] No such file or directory: 'company_prices.csv'


### Task 3: Check Missing Data Rates

**Description**: Calculate the percentage of missing values in `customer_data.csv` .

In [4]:
# Write your code from here
import pandas as pd

def calculate_missing_percentage(file_path):
    """
    Calculates the percentage of missing values for each column in a CSV file.

    Args:
        file_path (str): Path to the CSV file.

    Returns:
        pandas.Series: A Series containing the percentage of missing values
                       for each column. Returns None if the file is not found.
    """
    try:
        df = pd.read_csv(file_path)
    except FileNotFoundError as e:
        print(f"Error: File not found: {e}")
        return None

    missing_percentage = (df.isnull().sum() / len(df)) * 100
    return missing_percentage

# Example usage:
customer_file = 'customer_data.csv'
missing_rates = calculate_missing_percentage(customer_file)

if missing_rates is not None:
    print("\nMissing Data Rates:")
    print(missing_rates)

Error: File not found: [Errno 2] No such file or directory: 'customer_data.csv'


### Task 4: Handling Partially Available Records

**Description**: In `customer_data.csv` , identify records with missing "email" or "phone number" and decide whether to drop or fill them.

In [5]:
# Write your code from here
import pandas as pd

def handle_partially_available_records(file_path, columns_to_check):
    """
    Identifies records with missing values in specified columns and suggests handling.

    Args:
        file_path (str): Path to the CSV file.
        columns_to_check (list): A list of column names ('email', 'phone number').

    Returns:
        pandas.DataFrame: A DataFrame containing rows with missing values
                          in the specified columns. Returns None if the file
                          is not found or specified columns are missing.
    """
    try:
        df = pd.read_csv(file_path)
    except FileNotFoundError as e:
        print(f"Error: File not found: {e}")
        return None

    missing_records_df = df[df[columns_to_check].isnull().any(axis=1)]

    print(f"\nRecords with missing '{' or '.join(columns_to_check)}':")
    print(missing_records_df)

    # Decision on handling (you would implement your logic here)
    print("\nDecision on Handling:")
    print("Consider the impact of dropping these records. If the missing data is crucial and the number of missing records is small, consider imputation techniques (filling missing values) instead of dropping.")
    print("For 'email', if missing, you might not be able to contact the customer via email.")
    print("For 'phone number', if missing, you might not be able to contact the customer via phone.")

    # Example of dropping rows with missing 'email' or 'phone number'
    # df_cleaned = df.dropna(subset=columns_to_check)
    # print("\nDataFrame after dropping rows with missing 'email' or 'phone number':")
    # print(df_cleaned)

    # Example of filling missing 'email' with a placeholder
    # df['email'].fillna('no_email_provided', inplace=True)
    # print("\nDataFrame after filling missing 'email':")
    # print(df.head()) # Show first few rows

    return missing_records_df

# Example usage:
customer_file = 'customer_data.csv'
columns_to_check = ['email', 'phone number']
partially_available = handle_partially_available_records(customer_file, columns_to_check)

Error: File not found: [Errno 2] No such file or directory: 'customer_data.csv'
