### Task 1: Measure Data Accuracy using a Trusted Source

**Description**: You have two datasets of product prices: `company_prices.csv` and
`trusted_prices.csv` . Check if the prices in `company_prices.csv` match the prices in
`trusted_prices.csv` . Assume both files have a "product_id" and "price" column.

In [2]:
import pandas as pd
from io import StringIO

# Sample datasets as CSV strings
company_csv = """product_id,price
101,19.99
102,5.49
103,12.00
104,7.99
105,24.50
"""

trusted_csv = """product_id,price
101,19.99
102,5.49
103,12.49
104,7.99
105,24.50
"""

# Read datasets from strings
company_df = pd.read_csv(StringIO(company_csv))
trusted_df = pd.read_csv(StringIO(trusted_csv))

# Merge on product_id
merged_df = pd.merge(company_df, trusted_df, on="product_id", suffixes=('_company', '_trusted'))

# Compare prices
merged_df["price_match"] = merged_df["price_company"] == merged_df["price_trusted"]

# Calculate accuracy
accuracy = merged_df["price_match"].mean()

# Print accuracy
print(f"Price Accuracy: {accuracy:.2%}")

# Print mismatches
mismatches = merged_df[~merged_df["price_match"]]
print("\nMismatched Records:")
print(mismatches[["product_id", "price_company", "price_trusted"]])

Price Accuracy: 80.00%

Mismatched Records:
   product_id  price_company  price_trusted
2         103           12.0          12.49


### Task 2: Detect Incorrect Values

**Description**: In `company_prices.csv` , detect any negative price values which are incorrect values for prices.

In [3]:
# Write your code from here
import pandas as pd
from io import StringIO

# Sample dataset with possible incorrect values
company_csv = """product_id,price
101,19.99
102,-5.49
103,12.00
104,0.00
105,-24.50
106,8.75
"""

# Read the dataset
company_df = pd.read_csv(StringIO(company_csv))

# Detect negative prices
incorrect_values = company_df[company_df['price'] < 0]

# Output results
print("Incorrect Price Values (Negative Prices):")
print(incorrect_values)

Incorrect Price Values (Negative Prices):
   product_id  price
1         102  -5.49
4         105 -24.50


### Task 3: Check Missing Data Rates

**Description**: Calculate the percentage of missing values in `customer_data.csv` .

In [4]:
# Write your code from here
import pandas as pd
from io import StringIO

# Sample dataset with missing values
customer_csv = """customer_id,name,email,age
1,John Doe,john@example.com,28
2,Jane Smith,,34
3,Bob Johnson,bob@example.com,
4,,susan@example.com,45
5,Alice White,, 
"""

# Read the dataset
customer_df = pd.read_csv(StringIO(customer_csv))

# Calculate missing data rate for each column
missing_percentage = customer_df.isnull().mean() * 100

# Output results
print("Missing Data Percentage by Column:")
print(missing_percentage)

Missing Data Percentage by Column:
customer_id     0.0
name           20.0
email          40.0
age            20.0
dtype: float64


### Task 4: Handling Partially Available Records

**Description**: In `customer_data.csv` , identify records with missing "email" or "phone number" and decide whether to drop or fill them.

In [5]:
# Write your code from here
import pandas as pd
from io import StringIO

# Sample dataset with missing email or phone_number
customer_csv = """customer_id,name,email,phone_number
1,John Doe,john@example.com,1234567890
2,Jane Smith,,9876543210
3,Bob Johnson,bob@example.com,
4,Susan Lee,, 
5,Alice White,alice@example.com,5551234567
"""

# Read the dataset
customer_df = pd.read_csv(StringIO(customer_csv))

# Identify records with missing email or phone_number
missing_contact = customer_df[customer_df[['email', 'phone_number']].isnull().any(axis=1)]

# Drop records with missing contact info
cleaned_df = customer_df.dropna(subset=['email', 'phone_number'])

# Output
print("Records with Missing Email or Phone Number:")
print(missing_contact)
print("\nData After Dropping Incomplete Contact Info:")
print(cleaned_df)

Records with Missing Email or Phone Number:
   customer_id         name            email phone_number
1            2   Jane Smith              NaN   9876543210
2            3  Bob Johnson  bob@example.com          NaN
3            4    Susan Lee              NaN             

Data After Dropping Incomplete Contact Info:
   customer_id         name              email phone_number
0            1     John Doe   john@example.com   1234567890
4            5  Alice White  alice@example.com   5551234567
