### Task 1: Measure Data Accuracy using a Trusted Source

**Description**: You have two datasets of product prices: `company_prices.csv` and
`trusted_prices.csv` . Check if the prices in `company_prices.csv` match the prices in
`trusted_prices.csv` . Assume both files have a "product_id" and "price" column.

In [1]:
# Write your code from here
import pandas as pd
from io import StringIO

# Sample data for company_prices.csv (replace with your actual file)
company_data = """product_id,price
101,25.50
102,12.00
103,45.75
104,9.99
105,25.50
"""
company_prices_df = pd.read_csv(StringIO(company_data))

# Sample data for trusted_prices.csv (replace with your actual file)
trusted_data = """product_id,price
101,25.50
102,12.50
103,45.75
104,9.99
106,15.00
"""
trusted_prices_df = pd.read_csv(StringIO(trusted_data))

# Merge the two dataframes on 'product_id'
merged_df = pd.merge(company_prices_df, trusted_prices_df, on='product_id', suffixes=('_company', '_trusted'), how='left')

# Compare the prices
merged_df['price_match'] = merged_df['price_company'] == merged_df['price_trusted']

# Calculate the accuracy rate
accuracy_rate = (merged_df['price_match'].sum() / len(merged_df)) * 100
print(f"Data Accuracy Rate: {accuracy_rate:.2f}%")

# Identify the products with mismatched prices
mismatched_prices = merged_df[merged_df['price_match'] == False]
print("\nProducts with Mismatched Prices:")
print(mismatched_prices)

Data Accuracy Rate: 60.00%

Products with Mismatched Prices:
   product_id  price_company  price_trusted  price_match
1         102           12.0           12.5        False
4         105           25.5            NaN        False


### Task 2: Detect Incorrect Values

**Description**: In `company_prices.csv` , detect any negative price values which are incorrect values for prices.

In [2]:
# Write your code from here
# import pandas as pd
from io import StringIO

# Sample data for company_prices.csv (replace with your actual file)
company_data = """product_id,price
101,25.50
102,12.00
103,45.75
104,9.99
105,-5.00
106,30.25
107,-0.50
108,15.00
"""
company_prices_df = pd.read_csv(StringIO(company_data))

# Identify incorrect (negative) price values
incorrect_prices_df = company_prices_df[company_prices_df['price'] < 0]

# Calculate the number and percentage of incorrect values
num_incorrect_prices = len(incorrect_prices_df)
total_prices = len(company_prices_df)
percentage_incorrect = (num_incorrect_prices / total_prices) * 100 if total_prices > 0 else 0

print("Detection of Incorrect Price Values (Negative Prices):")
print("\nIncorrect Price Values:")
print(incorrect_prices_df)
print(f"\nNumber of Incorrect Price Values: {num_incorrect_prices}")
print(f"Percentage of Incorrect Price Values: {percentage_incorrect:.2f}%")

Detection of Incorrect Price Values (Negative Prices):

Incorrect Price Values:
   product_id  price
4         105   -5.0
6         107   -0.5

Number of Incorrect Price Values: 2
Percentage of Incorrect Price Values: 25.00%


### Task 3: Check Missing Data Rates

**Description**: Calculate the percentage of missing values in `customer_data.csv` .

In [4]:
import pandas as pd
from io import StringIO

# Sample data for customer_data.csv (as provided)
customer_data = """CustomerID,Email,Country,LastOrderDate,Status,Age
1,john.doe@example.com,US,2025-05-10,Active,30
2,jane.smith@example.co.uk,GB,2025-05-05,Inactive,
3,alice.wonder@example.in,,2025-05-12,Active,25
4,bob.builder@example.ca,CA,2025-04-28,,40
5,charlie.brown@example.com.au,AU,,,Inactive,
6,david.jones@example.us,US,2025-05-15,Active,35
7,eve.adams@example.de,DE,2025-05-08,Inactive,
8,frank.zappa@example.fr,FR,2025-05-11,Active,50
9,grace.hopper@example.jp,JP,2025-05-03,Inactive,
10,heidi.klum@example.ch,,2025-05-14,Active,45
"""

# Load the CSV data from the string into a Pandas DataFrame
customer_data_df = pd.read_csv(StringIO(customer_data))

# Calculate the percentage of missing values for each column
missing_percentage = (customer_data_df.isnull().sum() / len(customer_data_df)) * 100

print("Percentage of Missing Values per Column:")
print(missing_percentage)

# Calculate the overall percentage of missing values in the entire DataFrame
total_missing = customer_data_df.isnull().sum().sum()
total_values = customer_data_df.size
overall_missing_percentage = (total_missing / total_values) * 100 if total_values > 0 else 0

print(f"\nOverall Percentage of Missing Values in the Dataset: {overall_missing_percentage:.2f}%")

ParserError: Error tokenizing data. C error: Expected 6 fields in line 6, saw 7


### Task 4: Handling Partially Available Records

**Description**: In `customer_data.csv` , identify records with missing "email" or "phone number" and decide whether to drop or fill them.

In [None]:
# Write your code from here