### Task 1: Measure Data Accuracy using a Trusted Source

**Description**: You have two datasets of product prices: `company_prices.csv` and
`trusted_prices.csv` . Check if the prices in `company_prices.csv` match the prices in
`trusted_prices.csv` . Assume both files have a "product_id" and "price" column.

In [None]:
# Write your code from here

In [None]:
import pandas as pd

# Load the datasets
company_prices = pd.read_csv('company_prices.csv')
trusted_prices = pd.read_csv('trusted_prices.csv')

# Merge the datasets on 'product_id' to compare prices
merged_data = company_prices.merge(trusted_prices, on='product_id', suffixes=('_company', '_trusted'))

# Check for mismatched prices
mismatched_prices = merged_data[merged_data['price_company'] != merged_data['price_trusted']]

# Display mismatched prices
print("Mismatched Prices:")
print(mismatched_prices)

In [None]:
import pandas as pd

# Load the datasets
try:
    company_prices = pd.read_csv('company_prices.csv')
    trusted_prices = pd.read_csv('trusted_prices.csv')
except FileNotFoundError as e:
    print(f"Error: {e}")
    raise

# Ensure the required columns exist in both datasets
required_columns = ['product_id', 'price']
for df_name, df in [('company_prices', company_prices), ('trusted_prices', trusted_prices)]:
    for col in required_columns:
        if col not in df.columns:
            raise ValueError(f"Missing required column '{col}' in {df_name}")

# Merge the datasets on 'product_id' to compare prices
merged_data = company_prices.merge(trusted_prices, on='product_id', suffixes=('_company', '_trusted'))

# Check for mismatched prices
mismatched_prices = merged_data[merged_data['price_company'] != merged_data['price_trusted']]

# Display mismatched prices
if mismatched_prices.empty:
    print("No mismatched prices found.")
else:
    print("Mismatched Prices:")
    print(mismatched_prices)

### Task 2: Detect Incorrect Values

**Description**: In `company_prices.csv` , detect any negative price values which are incorrect values for prices.

In [None]:
# Write your code from here

In [None]:
# Detect negative price values in company_prices.csv
negative_prices = company_prices[company_prices['price'] < 0]

# Display negative price values
if negative_prices.empty:
    print("No negative price values found.")
else:
    print("Negative Price Values:")
    print(negative_prices)

### Task 3: Check Missing Data Rates

**Description**: Calculate the percentage of missing values in `customer_data.csv` .

In [None]:
# Write your code from here

In [None]:
import pandas as pd

# Load the customer data
try:
    customer_data = pd.read_csv('customer_data.csv')
except FileNotFoundError as e:
    print(f"Error: {e}")
    raise

# Calculate the percentage of missing values for each column
missing_data_percentage = customer_data.isnull().mean() * 100

# Display the percentage of missing values
print("Percentage of Missing Values:")
print(missing_data_percentage)

### Task 4: Handling Partially Available Records

**Description**: In `customer_data.csv` , identify records with missing "email" or "phone number" and decide whether to drop or fill them.

In [None]:
# Write your code from here

In [None]:
# Identify records with missing "email" or "phone number"
missing_contact_info = customer_data[customer_data['email'].isnull() | customer_data['phone number'].isnull()]

# Display records with missing contact information
if missing_contact_info.empty:
    print("No records with missing contact information found.")
else:
    print("Records with Missing Contact Information:")
    print(missing_contact_info)

# Decide whether to drop or fill them
# For this example, let's fill missing values with placeholders
customer_data['email'].fillna('no_email_provided@example.com', inplace=True)
customer_data['phone number'].fillna('000-000-0000', inplace=True)

# Display the updated dataset
print("Updated Customer Data:")
print(customer_data)