6.1 Introduction
Data validation is the process of ensuring that the data entered into a system meets certain criteria or standards. It is essential for maintaining the accuracy and integrity of data, preventing errors, and ensuring that the data is suitable for analysis and decision-making.

Definition
Data validation involves checking data against a set of rules or constraints to ensure that it is accurate, complete, and reliable. This process includes verifying data types, ranges, and formats, as well as identifying and handling inconsistencies or anomalies.

Objective
A. The objective of data validation is to:
B. Ensure data accuracy and consistency.
C. Prevent entry of incorrect or invalid data.
D. Improve data quality and reliability.
E. Facilitate better data-driven decision-making.

Importance
Accuracy: Ensures data entered is correct, reducing errors in analysis.
Consistency: Maintains uniformity in data entry and format.
Reliability: Enhances the trustworthiness of data used in decision-making.
Efficiency: Prevents errors and discrepancies, saving time and resources in data processing.

6.2 Techniques List and Definition
1. Consistency Checks
2. Range Checks
3. Uniqueness Checks
4. Cross-Field Validation
5. Pattern Matching

6.2.1 Consistency Checks

Introduction:
Consistency checks verify that data values are logically coherent and adhere to predefined rules. They ensure that data fields are not contradictory and meet the expected relationships between different data points.

In [1]:
import pandas as pd

# Load the dataset
df = pd.read_csv('D:/Projects/Data-cleaning-series/Chapter06 Data Validation/Products.csv')

# Example consistency check: Ensure 'Price' is always positive
df['Price Consistent'] = df['Price'].apply(lambda x: x > 0)

# Print results
print("Consistency Check Results:")
print(df[['Product Name', 'Price', 'Price Consistent']])


Consistency Check Results:
  Product Name  Price  Price Consistent
0     Widget A  19.99              True
1     Widget B  29.99              True
2          NaN  15.00              True
3     Widget D    NaN             False
4     Widget E   9.99              True
5     Widget F  25.00              True
6     Widget G    NaN             False
7     Widget H  39.99              True
8     Widget I    NaN             False
9     Widget J  49.99              True


Explanation:

Purpose: Checks that all values in the 'Price' column are positive.

Code Breakdown:
  df['Price'].apply(lambda x: x > 0): Applies a function to check if 'Price' values are positive.
  df['Price Consistent']: Adds a new column indicating whether each 'Price' value passes the consistency check.