### Task 1: Measure Data Accuracy using a Trusted Source

**Description**: You have two datasets of product prices: `company_prices.csv` and
`trusted_prices.csv` . Check if the prices in `company_prices.csv` match the prices in
`trusted_prices.csv` . Assume both files have a "product_id" and "price" column.

In [1]:
# Write your code from here
import pandas as pd

# Sample data creation
company_prices = pd.DataFrame({
    'product_id': [101, 102, 103, 104],
    'price': [20.0, 35.5, 15.0, 40.0]
})

trusted_prices = pd.DataFrame({
    'product_id': [101, 102, 103, 104],
    'price': [20.0, 35.0, 15.0, 40.0]
})

# Merge on product_id to compare prices
merged = company_prices.merge(trusted_prices, on='product_id', suffixes=('_company', '_trusted'))

# Check if prices match
merged['price_match'] = merged['price_company'] == merged['price_trusted']

print("Price comparison:")
print(merged[['product_id', 'price_company', 'price_trusted', 'price_match']])

# Summary of mismatches
mismatches = merged[~merged['price_match']]
print(f"\nNumber of price mismatches: {len(mismatches)}")
print(mismatches)


Price comparison:
   product_id  price_company  price_trusted  price_match
0         101           20.0           20.0         True
1         102           35.5           35.0        False
2         103           15.0           15.0         True
3         104           40.0           40.0         True

Number of price mismatches: 1
   product_id  price_company  price_trusted  price_match
1         102           35.5           35.0        False


### Task 2: Detect Incorrect Values

**Description**: In `company_prices.csv` , detect any negative price values which are incorrect values for prices.

In [2]:
# Write your code from here
# Detect negative prices
negative_prices = company_prices[company_prices['price'] < 0]

print("Negative prices found:")
print(negative_prices)


Negative prices found:
Empty DataFrame
Columns: [product_id, price]
Index: []


### Task 3: Check Missing Data Rates

**Description**: Calculate the percentage of missing values in `customer_data.csv` .

In [3]:
# Write your code from here
# Sample customer_data with some missing values
customer_data = pd.DataFrame({
    'customer_id': [1, 2, 3, 4],
    'email': ['a@example.com', None, 'c@example.com', 'd@example.com'],
    'phone_number': ['123-456', '234-567', None, '456-789'],
    'age': [25, 30, None, 22]
})

missing_percent = customer_data.isnull().mean() * 100

print("Percentage of missing values per column:")
print(missing_percent)


Percentage of missing values per column:
customer_id      0.0
email           25.0
phone_number    25.0
age             25.0
dtype: float64


### Task 4: Handling Partially Available Records

**Description**: In `customer_data.csv` , identify records with missing "email" or "phone number" and decide whether to drop or fill them.

In [4]:
# Find rows missing email or phone_number
missing_contact = customer_data[customer_data['email'].isnull() | customer_data['phone_number'].isnull()]

print("Records missing email or phone number:")
print(missing_contact)

# Decide: for example, drop rows missing both, fill others
# Drop rows missing both email and phone_number
drop_condition = customer_data['email'].isnull() & customer_data['phone_number'].isnull()
cleaned_data = customer_data[~drop_condition].copy()

# Fill missing emails or phones with a placeholder
cleaned_data['email'].fillna('no_email_provided@example.com', inplace=True)
cleaned_data['phone_number'].fillna('000-000-0000', inplace=True)

print("\nCleaned customer data:")
print(cleaned_data)


Records missing email or phone number:
   customer_id          email phone_number   age
1            2           None      234-567  30.0
2            3  c@example.com         None   NaN

Cleaned customer data:
   customer_id                          email  phone_number   age
0            1                  a@example.com       123-456  25.0
1            2  no_email_provided@example.com       234-567  30.0
2            3                  c@example.com  000-000-0000   NaN
3            4                  d@example.com       456-789  22.0


In [None]:
# Write your code from here