# CSI 4142 Fundamentals of Data Science, Winter 2025
# Assignment 2 - Data Cleaning

## Group A - 72
### Hilaire Junior Kalala - 300289737
### Johann Rajosefa - 300300054

## Dataset 1 - CLEAN DATA CHECKER
## Introduction
This assignment, part of the Fundamentals of Data Science course, focuses on data cleaning techniques using Python. We were tasked with developing a clean data checker, a tool designed to identify potentially invalid data and duplicate entries within a dataset. Additionally, we were required to implement and evaluate various imputation methods for handling missing data.

The first notebook presents different methods for assessing dataset validity. Below, I outline the core requirements that define the structure of the clean data checker for this assignment.

The clean data checker is designed to operate with minimal user input while generating a comprehensive report on potential data inconsistencies. It assumes that the user is familiar with the dataset and can specify relevant attributes and validation rules. The tool systematically identifies errors based on predefined parameters and provides a structured analysis.

For each type of data issue, the notebook follows a four-cell structure:

Error Type – Name and description of the identified issue.
Parameters/Rules – Configurable settings that allow the user to adjust detection criteria.
Detection Code – Python script that executes the validation process based on the provided parameters.
Results & Explanation – A summary of findings, including descriptive sentences and illustrative examples.

To simplify user interaction, we replace real-time input requests with configurable parameters in Cell 2. Users can modify these parameters directly, ensuring flexibility while maintaining ease of use.

## Dataset description : MOVIES DATASET FOR FEATURE EXTRACION ,PREDICTION
Context
The data is succesfully scrapped from imdb top netflix movies and tvshows.This dataset need clever programming knowledge for feature extraction also you can build a RECOMMENDATION system either GENRE prediction model.

Content
The dataset contain more than 9 columns desrcibe the data pattern .I scrap the data from imdb web site using beautifulsoup.it takes a day to learn i am begginer even to data science but i learned quickly web scrapping with advanced python.Through this process i gained lot and also i suggest the data featuring part of this project takes time.

In [194]:
# Initialization of libraries and dataset I am going to work with for the Data Clean Checker part
import os
import numpy as np
import pandas as pd

#Initialization of the dataset
movie_dt = pd.read_csv("https://raw.githubusercontent.com/KugleBlitz007/CSI4142/refs/heads/main/movies.csv")
movie_dt.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Data columns (total 9 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   MOVIES    9999 non-null   object 
 1   YEAR      9355 non-null   object 
 2   GENRE     9919 non-null   object 
 3   RATING    8179 non-null   float64
 4   ONE-LINE  9999 non-null   object 
 5   STARS     9999 non-null   object 
 6   VOTES     8179 non-null   object 
 7   RunTime   7041 non-null   float64
 8   Gross     460 non-null    object 
dtypes: float64(2), object(7)
memory usage: 703.2+ KB


### 1) Data type check
In this test, we will verify the type of the values of a chosen attribute.

In [195]:
# Please enter the attribute and the parameters to perform the test
attribute = "RunTime";
expected_type = "float64";


In [196]:
#The code that will check for the invalid values
invalid_values = movie_dt[~movie_dt[attribute].apply(lambda x: isinstance(x, float))]


In [197]:
#Results
invalid_values
# We can see that there are no values in that column who has an invalid type


Unnamed: 0,MOVIES,YEAR,GENRE,RATING,ONE-LINE,STARS,VOTES,RunTime,Gross


### 2) Range check
In this test, we will check that all the values of a specified attribute respect the range requirements. 
The user has to specify the attribute and the minimum and maximaum values. 

In [198]:
# Please enter the attribute
attribute = "RATING"
# Enter the min and max value
min_val = 1.1
max_val = 9.9

In [199]:
#The code to execute the verification task
def check_Range():
    boolean_index = (movie_dt[attribute] < min_val) | (movie_dt[attribute] > max_val);
    outOfRange_values = movie_dt[boolean_index]
    return outOfRange_values

In [200]:
#Execution and results of the verification
check_Range()
# Here again, there are no invalid values of range for the RATING attribute based on the parameters of min 1.1 and max 9.9
# But for testing purposes, let's change the range limits
min_val = 2.0
max_val = 9.9
check_Range()
# There are two data points with a rating less than 2.0 as we can see below

Unnamed: 0,MOVIES,YEAR,GENRE,RATING,ONE-LINE,STARS,VOTES,RunTime,Gross
1166,Raketsonyeondan,(2021– ),"\nComedy, Drama, Sport",1.1,\nA city kid is brought to the countryside by ...,"\n \n Stars:\nKim Sang-kyung, \n...",25629,80.0,
5365,Defcon 2012,(2010),\nSci-Fi,1.8,"\nOn October 30, 2009 an independent filmmaker...",\n Director:\nR. Christian Anderson\n| \n ...,377,92.0,


### 3) Format check
For this test, we will find the values that have an invalid format in a specified attribute.
// I used ChatGpt to find how to perform format check for numeric values

In [201]:
# Please enter the attribute and the expected format
attribute = "VOTES"
attr_pattern = r'^\d{1,3}(,\d{3})*$'


In [202]:
# Code to execute the verification
def check_format(attr, attr_pattern):
    invalid_values = movie_dt[~movie_dt['VOTES'].astype(str).str.match(attr_pattern, na=False)]
    return invalid_values

In [203]:
#Execution and results
check_format(attribute,attr_pattern)
# There are 1890 data points that have an invalid format at the VOTES attribute

Unnamed: 0,MOVIES,YEAR,GENRE,RATING,ONE-LINE,STARS,VOTES,RunTime,Gross
4,Army of Thieves,(2021),"\nAction, Crime, Horror",,"\nA prequel, set before the events of Army of ...",\n Director:\nMatthias Schweighöfer\n| \n ...,,,
24,He-Man and the Masters of the Universe,(2021– ),"\nAnimation, Action, Adventure",,\nEternia's Prince Adam discovers the power of...,\n,,,
214,Sing 2,(2021),"\nAnimation, Adventure, Comedy",,\nBuster Moon and his friends must persuade re...,\n Director:\nGarth Jennings\n| \n Stars...,,,
217,Knives Out 2,(2022),"\nComedy, Crime, Drama",,\nPlot unknown. Sequel to the 2019 film 'Knive...,\n Director:\nRian Johnson\n| \n Stars:\...,,,
222,Don't Look Up,(2021),\nComedy,,"\nThe story of two low-level astronomers, who ...",\n Director:\nAdam McKay\n| \n Stars:\nT...,,145.0,
...,...,...,...,...,...,...,...,...,...
9994,The Imperfects,(2021– ),"\nAdventure, Drama, Fantasy",,\nAdd a Plot\n,\n \n Stars:\nMorgan Taylor Camp...,,,
9995,Arcane,(2021– ),"\nAnimation, Action, Adventure",,\nAdd a Plot\n,\n,,,
9996,Heart of Invictus,(2022– ),"\nDocumentary, Sport",,\nAdd a Plot\n,\n Director:\nOrlando von Einsiedel\n| \n ...,,,
9997,The Imperfects,(2021– ),"\nAdventure, Drama, Fantasy",,\nAdd a Plot\n,\n Director:\nJovanka Vuckovic\n| \n Sta...,,,


### 4) Consistency check
For this test, we will check the consistency of values in the dataset by checking the consistency between RATINGS and VOTES.
The user will have to specify the parameters to determine whether or not there's consistency.

In [204]:
# Please enter the values to define thresholds for inconcistency
high_rating_low_votes = [9.5,1000]
low_rating_high_votes = [3.5,500000]


In [205]:
# The code of consistency check
def check_consistency(tab1,tab2):
    #I will start by creating a new variable NUMERCIC_VOTES
    movie_dt['NUMERIC_VOTES'] = movie_dt['VOTES'].str.replace(',', '').astype(float)

    # Define thresholds for inconsistency
    high_rating_low_votes_check = (movie_dt['RATING'] >= tab1[0]) & (movie_dt['NUMERIC_VOTES'] < tab1[1])
    low_rating_high_votes_check = (movie_dt['RATING'] <= tab2[0]) & (movie_dt['NUMERIC_VOTES'] > tab2[1])

    # Find inconsistent rows
    inconsistent_ratings_votes = movie_dt[high_rating_low_votes_check | low_rating_high_votes_check]
    return inconsistent_ratings_votes

In [206]:
#Execution and results
check_consistency(high_rating_low_votes, low_rating_high_votes)
#The results show that there are 3 data points that don't the consistency requirements between the RATING and the VOTES attribute

Unnamed: 0,MOVIES,YEAR,GENRE,RATING,ONE-LINE,STARS,VOTES,RunTime,Gross,NUMERIC_VOTES
7870,The Dragon Prince,(2018– ),"\nAnimation, Adventure, Drama",9.6,\nThe human alliance led by King Viren start a...,\n Director:\nVillads Spangsberg\n| \n S...,727,,,727.0
8656,Meerkat Manor,(2005–2008),\nDocumentary,9.6,"\nWith family size dwindling, the Whiskers are...",\n \n Star:\nStockard Channing\n,7,22.0,,7.0
8721,Power Rangers Beast Morphers,(2019–2020),"\nAction, Adventure, Drama",9.5,\nNate makes a horrifying discovery and must r...,\n Director:\nSimon Bennett\n| \n Stars:...,26,,,26.0


### 5) Uniqueness check
In this test, we will check that a specified attribute only has unique entries. The user will only have to enter the attribute he would like to test.


In [207]:
# Please enter the attribute
attr = "MOVIES"

In [208]:
#The function that checks for unique values
def check_unique(attr):
    duplicates = movie_dt[movie_dt.duplicated(subset=[attr], keep='last')]
    return duplicates

In [209]:
# Execution and results
check_unique(attr)
#There are 3182 data points for which the MOVIES attribute was not unique.

Unnamed: 0,MOVIES,YEAR,GENRE,RATING,ONE-LINE,STARS,VOTES,RunTime,Gross,NUMERIC_VOTES
34,Kingdom,(2019– ),"\nAction, Drama, History",8.4,\nWhile strange rumors about their ill King gr...,"\n \n Stars:\nJu Ji-Hoon, \nBae ...",34906,45.0,,34906.0
129,Avatar: The Last Airbender,(2005–2008),"\nAnimation, Action, Adventure",9.3,"\nIn a war-torn world of elemental magic, a yo...","\n \n Stars:\nDee Bradley Baker,...",265845,23.0,,265845.0
139,Heist,(2021– ),"\nDocumentary, Crime, Mystery",6.8,\nMillions in stolen cash. Missing luxury bour...,"\n \n Stars:\nWilliam Guirola, \...",846,41.0,,846.0
177,Snowpiercer,(2020– ),"\nAction, Drama, Sci-Fi",6.9,\nSeven years after the world has become a fro...,"\n \n Stars:\nDaveed Diggs, \nId...",39433,60.0,,39433.0
235,Sexy Beasts,(2021– ),"\nReality-TV, Romance",4.7,"\nHoping to say goodbye to superficial dating,...",\n \n Star:\nRob Delaney\n,592,,,592.0
...,...,...,...,...,...,...,...,...,...,...
9989,1899,(2022– ),"\nDrama, History, Horror",,\nAdd a Plot\n,\n Director:\nBaran bo Odar\n,,,,
9990,1899,(2022– ),"\nDrama, History, Horror",,\nAdd a Plot\n,\n Director:\nBaran bo Odar\n,,,,
9991,1899,(2022– ),"\nDrama, History, Horror",,\nAdd a Plot\n,\n Director:\nBaran bo Odar\n,,,,
9994,The Imperfects,(2021– ),"\nAdventure, Drama, Fantasy",,\nAdd a Plot\n,\n \n Stars:\nMorgan Taylor Camp...,,,,


### 6) Presence check
We will test that there's no field left blank or NaN for a specified attribute. 

In [210]:
# Please enter the attribute you would like to check for blank fields and NaN
attr = "VOTES"

In [211]:
#the function that check for NaN values
def check_presence(attr):
    bool_index = movie_dt[attr].isnull()
    results = movie_dt[bool_index]
    return results

In [212]:
#Execution and results 
check_presence(attr)

#There are 1820 data points with null values at the attribute VOTES

Unnamed: 0,MOVIES,YEAR,GENRE,RATING,ONE-LINE,STARS,VOTES,RunTime,Gross,NUMERIC_VOTES
4,Army of Thieves,(2021),"\nAction, Crime, Horror",,"\nA prequel, set before the events of Army of ...",\n Director:\nMatthias Schweighöfer\n| \n ...,,,,
24,He-Man and the Masters of the Universe,(2021– ),"\nAnimation, Action, Adventure",,\nEternia's Prince Adam discovers the power of...,\n,,,,
214,Sing 2,(2021),"\nAnimation, Adventure, Comedy",,\nBuster Moon and his friends must persuade re...,\n Director:\nGarth Jennings\n| \n Stars...,,,,
217,Knives Out 2,(2022),"\nComedy, Crime, Drama",,\nPlot unknown. Sequel to the 2019 film 'Knive...,\n Director:\nRian Johnson\n| \n Stars:\...,,,,
222,Don't Look Up,(2021),\nComedy,,"\nThe story of two low-level astronomers, who ...",\n Director:\nAdam McKay\n| \n Stars:\nT...,,145.0,,
...,...,...,...,...,...,...,...,...,...,...
9994,The Imperfects,(2021– ),"\nAdventure, Drama, Fantasy",,\nAdd a Plot\n,\n \n Stars:\nMorgan Taylor Camp...,,,,
9995,Arcane,(2021– ),"\nAnimation, Action, Adventure",,\nAdd a Plot\n,\n,,,,
9996,Heart of Invictus,(2022– ),"\nDocumentary, Sport",,\nAdd a Plot\n,\n Director:\nOrlando von Einsiedel\n| \n ...,,,,
9997,The Imperfects,(2021– ),"\nAdventure, Drama, Fantasy",,\nAdd a Plot\n,\n Director:\nJovanka Vuckovic\n| \n Sta...,,,,


### 7) Length check
A length is performed to ensure that a string object length is neither too long or too short. 
The user will have to enter the attribute name, the minimum and the maximum value for the length. 

In [213]:
# Please enter the attribute, the minimum length and the maximum length
attr = "STARS"
min_length = 100
max_length = 400

In [214]:
# The function check_length responsible to perform the verification of the length
def check_length(attr,min_l,max_l):
    bool_index = (movie_dt[attr].str.len() < min_l) | (movie_dt[attr].str.len() > max_l)
    invalid_entries = movie_dt[bool_index]
    return invalid_entries

In [215]:
# Execution and results
check_length(attr, min_length, max_length)

# There are 5154 data points with a string object having a length less than 100 or more than 400 at the STARS attribute

Unnamed: 0,MOVIES,YEAR,GENRE,RATING,ONE-LINE,STARS,VOTES,RunTime,Gross,NUMERIC_VOTES
1,Masters of the Universe: Revelation,(2021– ),"\nAnimation, Action, Adventure",5.0,\nThe war for Eternia begins again in what may...,"\n \n Stars:\nChris Wood, \nSara...",17870,25.0,,17870.0
2,The Walking Dead,(2010–2022),"\nDrama, Horror, Thriller",8.2,\nSheriff Deputy Rick Grimes wakes up from a c...,"\n \n Stars:\nAndrew Lincoln, \n...",885805,44.0,,885805.0
3,Rick and Morty,(2013– ),"\nAnimation, Adventure, Comedy",9.2,\nAn animated series that follows the exploits...,"\n \n Stars:\nJustin Roiland, \n...",414849,23.0,,414849.0
5,Outer Banks,(2020– ),"\nAction, Crime, Drama",7.6,\nA group of teenagers from the wrong side of ...,"\n \n Stars:\nChase Stokes, \nMa...",25858,50.0,,25858.0
7,Dexter,(2006–2013),"\nCrime, Drama, Mystery",8.6,"\nBy day, mild-mannered Dexter is a blood-spat...","\n \n Stars:\nMichael C. Hall, \...",665387,53.0,,665387.0
...,...,...,...,...,...,...,...,...,...,...
9991,1899,(2022– ),"\nDrama, History, Horror",,\nAdd a Plot\n,\n Director:\nBaran bo Odar\n,,,,
9992,1899,(2022– ),"\nDrama, History, Horror",,\nAdd a Plot\n,\n Director:\nBaran bo Odar\n,,,,
9994,The Imperfects,(2021– ),"\nAdventure, Drama, Fantasy",,\nAdd a Plot\n,\n \n Stars:\nMorgan Taylor Camp...,,,,
9995,Arcane,(2021– ),"\nAnimation, Action, Adventure",,\nAdd a Plot\n,\n,,,,


### 8) Look-up errors check 
In this test, using a table of acceptable values for a specified attribute, we will check if the values of the dataset fit in the limited set of values.
The user needs to provide the attribute to look up and the set of possible values.

In [216]:
# Please the attribute and a table of possible values
attr = "GENRE"
values = ["Action, Horror, Thriller", "Comedy", "Drama, Horror, Thriller", 
          "Adventure, Drama, Fantasy", "Crime, Drama, Mystery", "Animation, Adventure, Comedy", "Documentary, Sport"]


In [217]:
# The function Look_up that takes the attribute and a table of acceptable values in parameters
def look_up( attr, acc_values):
    invalid_genres = movie_dt[~movie_dt[attr].isin(acc_values)]
    return invalid_genres

In [218]:
# Execution and results
look_up(attr,values)
# If the different values are entered correctly, the function will return the rows that do not have an acceptable value on a corresponding attribute

Unnamed: 0,MOVIES,YEAR,GENRE,RATING,ONE-LINE,STARS,VOTES,RunTime,Gross,NUMERIC_VOTES
0,Blood Red Sky,(2021),"\nAction, Horror, Thriller",6.1,\nA woman with a mysterious illness is forced ...,\n Director:\nPeter Thorwarth\n| \n Star...,21062,121.0,,21062.0
1,Masters of the Universe: Revelation,(2021– ),"\nAnimation, Action, Adventure",5.0,\nThe war for Eternia begins again in what may...,"\n \n Stars:\nChris Wood, \nSara...",17870,25.0,,17870.0
2,The Walking Dead,(2010–2022),"\nDrama, Horror, Thriller",8.2,\nSheriff Deputy Rick Grimes wakes up from a c...,"\n \n Stars:\nAndrew Lincoln, \n...",885805,44.0,,885805.0
3,Rick and Morty,(2013– ),"\nAnimation, Adventure, Comedy",9.2,\nAn animated series that follows the exploits...,"\n \n Stars:\nJustin Roiland, \n...",414849,23.0,,414849.0
4,Army of Thieves,(2021),"\nAction, Crime, Horror",,"\nA prequel, set before the events of Army of ...",\n Director:\nMatthias Schweighöfer\n| \n ...,,,,
...,...,...,...,...,...,...,...,...,...,...
9994,The Imperfects,(2021– ),"\nAdventure, Drama, Fantasy",,\nAdd a Plot\n,\n \n Stars:\nMorgan Taylor Camp...,,,,
9995,Arcane,(2021– ),"\nAnimation, Action, Adventure",,\nAdd a Plot\n,\n,,,,
9996,Heart of Invictus,(2022– ),"\nDocumentary, Sport",,\nAdd a Plot\n,\n Director:\nOrlando von Einsiedel\n| \n ...,,,,
9997,The Imperfects,(2021– ),"\nAdventure, Drama, Fantasy",,\nAdd a Plot\n,\n Director:\nJovanka Vuckovic\n| \n Sta...,,,,


### 9) Exact duplicate test
This test will determine whether there are exact duplicates in the dataset. 
Because we take all the attributes into account, the user does not need to enter any parameter.

In [219]:
# The function find_exactDuplicates will do exactly what his name stands for lol. 
def find_exactDuplicates():
    duplicates = movie_dt[movie_dt.duplicated(keep = 'first')]
    return duplicates

In [220]:
# Execution and results
find_exactDuplicates()
# There are 431 data points that have exact duplicates in our dataset

Unnamed: 0,MOVIES,YEAR,GENRE,RATING,ONE-LINE,STARS,VOTES,RunTime,Gross,NUMERIC_VOTES
6833,Mighty Little Bheem,(2019– ),"\nAnimation, Short, Adventure",,\nAdd a Plot\n,"\n Directors:\nRajiv Chilaka, \nKrishna Moh...",,,,
6835,Mighty Little Bheem,(2019– ),"\nAnimation, Short, Adventure",9.0,\nAdd a Plot\n,"\n Directors:\nRajiv Chilaka, \nKrishna Moh...",6,,,6.0
6836,Mighty Little Bheem,(2019– ),"\nAnimation, Short, Adventure",9.0,\nAdd a Plot\n,"\n Directors:\nRajiv Chilaka, \nKrishna Moh...",6,,,6.0
6837,Mighty Little Bheem,(2019– ),"\nAnimation, Short, Adventure",,\nAdd a Plot\n,"\n Directors:\nRajiv Chilaka, \nKrishna Moh...",,,,
6838,Mighty Little Bheem,(2019– ),"\nAnimation, Short, Adventure",,\nAdd a Plot\n,"\n Directors:\nRajiv Chilaka, \nKrishna Moh...",,,,
...,...,...,...,...,...,...,...,...,...,...
9989,1899,(2022– ),"\nDrama, History, Horror",,\nAdd a Plot\n,\n Director:\nBaran bo Odar\n,,,,
9990,1899,(2022– ),"\nDrama, History, Horror",,\nAdd a Plot\n,\n Director:\nBaran bo Odar\n,,,,
9991,1899,(2022– ),"\nDrama, History, Horror",,\nAdd a Plot\n,\n Director:\nBaran bo Odar\n,,,,
9992,1899,(2022– ),"\nDrama, History, Horror",,\nAdd a Plot\n,\n Director:\nBaran bo Odar\n,,,,


### 10) Near duplicate search
This test purpose is to find near duplicates that can be really hard to spot. The user will need to provide a table of attributes he would like
to use as a subset to make the duplicates search.

In [221]:
# Please enter the table of attributes
tab = ['GENRE', 'RATING', 'YEAR', 'VOTES','RunTime','Gross']

In [222]:
# The function find_nearDuplicate will find the duplicates based on the table of attributes given by the user as a parameter
def find_nearDuplicates(tab):
    duplicates = movie_dt[movie_dt.duplicated(subset = tab)]
    return duplicates

In [223]:
# Execution and results
find_nearDuplicates(tab)
# Based on the attributes selected, there are 1299 near duplicates

Unnamed: 0,MOVIES,YEAR,GENRE,RATING,ONE-LINE,STARS,VOTES,RunTime,Gross,NUMERIC_VOTES
238,Hotel Transylvania: Transformania,(2021),"\nAnimation, Adventure, Comedy",,"\nDrac's Pack is back, like you've never seen ...","\n Directors:\nDerek Drymon, \nJennifer Klu...",,,,
834,Havoc,(2022),"\nAction, Thriller",,\nThe story is set after a drug deal gone wron...,\n Director:\nGareth Evans\n| \n Stars:\...,,,,
1396,Love Hard,(2021),"\nComedy, Romance",,\nA woman travels to her online date's hometow...,\n Director:\nHernan Jimenez\n| \n Stars...,,,,
1428,The Division,,"\nAction, Adventure, Drama",,"\nIn the near future, a pandemic virus is spre...",\n Director:\nRawson Marshall Thurber\n| \n...,,,,
1433,Luckiest Girl Alive,(2022),"\nDrama, Mystery",,"\nA woman in New York, who seems to have thing...",\n Director:\nMike Barker\n| \n Stars:\n...,,,,
...,...,...,...,...,...,...,...,...,...,...
9994,The Imperfects,(2021– ),"\nAdventure, Drama, Fantasy",,\nAdd a Plot\n,\n \n Stars:\nMorgan Taylor Camp...,,,,
9995,Arcane,(2021– ),"\nAnimation, Action, Adventure",,\nAdd a Plot\n,\n,,,,
9996,Heart of Invictus,(2022– ),"\nDocumentary, Sport",,\nAdd a Plot\n,\n Director:\nOrlando von Einsiedel\n| \n ...,,,,
9997,The Imperfects,(2021– ),"\nAdventure, Drama, Fantasy",,\nAdd a Plot\n,\n Director:\nJovanka Vuckovic\n| \n Sta...,,,,


## Conclusion
In this assignment, we successfully implemented various data-cleaning techniques to ensure the integrity and consistency of the Movies Dataset. We performed data type checks, range validations, format and consistency checks, uniqueness verification, and lookup validation to identify and correct errors in the dataset. These steps helped improve data quality, making it more suitable for analysis and feature extraction. While we addressed many common data issues, further improvements could be made by applying advanced imputation techniques for missing values, automating error detection, and integrating data cleaning pipelines for larger datasets. Future work could also include leveraging machine learning models to detect anomalies and inconsistencies more efficiently.