### What are Conflicting Labels? 

The check searches for identical samples with different labels. This can occur due to either mislabeled data, or when the data collected is missing features necessary to separate the labels. If the data is mislabled, it can confuse the model and can result in lower performance of the model.

In [3]:
import pandas as pd

from deepchecks.tabular import Dataset
from deepchecks.tabular.checks.integrity import ConflictingLabels
from deepchecks.tabular.datasets.classification.phishing import load_data

#### Load Data

In [4]:
phishing_dataframe = load_data(as_train_test=False, data_format='Dataframe')
phishing_dataset = Dataset(phishing_dataframe, label='target', features=['urlLength', 'numDigits', 'numParams', 'num_%20', 'num_@', 'bodyLength', 'numTitles', 'numImages', 'numLinks', 'specialChars'])

It is recommended to initialize Dataset with categorical features by doing "Dataset(df, cat_features=categorical_list)". No categorical features were passed, therefore heuristically inferring categorical features in the data.
3 categorical features were inferred: numParams, num_%20, num_@


#### Run the Check

In [5]:
ConflictingLabels().run(phishing_dataset)

VBox(children=(HTML(value='<h4><b>Conflicting Labels</b></h4>'), HTML(value="<p>Find samples which have the ex…

In [7]:
ConflictingLabels(columns=['urlLength', 'numDigits']).run(phishing_dataset)

VBox(children=(HTML(value='<h4><b>Conflicting Labels</b></h4>'), HTML(value="<p>Find samples which have the ex…

#### Add a condition to check

In [8]:
check = ConflictingLabels()
check.add_condition_ratio_of_conflicting_labels_not_greater_than(0)
result = check.run(phishing_dataset)
result.show(show_additional_outputs=False)

VBox(children=(HTML(value='<h4><b>Conflicting Labels</b></h4>'), HTML(value="<p>Find samples which have the ex…