QST: Question regarding documentation #988

borisRa · 2022-03-09T20:46:13Z

Research

I have searched the [deepchecks] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://docs.deepchecks.com/en/stable/user-guide/when_should_you_use.html#when-should-you-use-new-data

Question about deepchecks

Hi,
Based on your documentation page => https://docs.deepchecks.com/en/stable/user-guide/when_should_you_use.html#when-should-you-use-new-data

There are 4 phases :
1)New Data: Single Dataset Validation -> For these purposes you can use the single_dataset_integrity()
2)After Splitting the Data: Train-Test Validation -> For these purposes you can use the -> train_test_validation()
3)After Training a Model: Analysis & Validation ->model_evaluation()
4)General Overview: Full Suite -> full_suite()

The question where can I see the examples of using this 4 options ?

Thanks,
Boris

noamzbr · 2022-03-10T17:38:44Z

Hi @borisRa, you can see an example of using the three suites in our use-case example: https://docs.deepchecks.com/en/stable/examples/use-cases/phishing_urls.html

When to use each built-in suite:

The single_dataset_integrity() suite contains checks for validating a single dataset (usually pre-processed and before any train-test split)
The train_test_validation() is used on two dataset (the two splits) to make sure the split was done correctly from a distribution and methodology standpoint - checking for various kinds of data drift an leakage.
The model_evaluation() is used once the model is trained (and thus requires both two datasets and a model) to check the resulting model itself, searching for "simple" performance issues and also more complex problems (such as weak segments).
The full_suite() is typically used in cases in which you already had a complete pipeline prior of using deepchecks, and now you wish to validate in retrospect your whole process, or rather do a final signoff on an existing modeling process. This suite runs all of the checks in belonging to all the other suites detailed above.

borisRa · 2022-03-12T21:44:01Z

Thanks !

2. The train_test_validation() -> in the usecase example this function is used with a pre-trained model. why ?
vsuite = train_test_validation()
vsuite.run(model=logreg, train_dataset=ds_train, test_dataset=ds_test)

I mean from the explanation above , model_evaluation() should use model and train_test_validation() just validating the split distributions .

Thanks !
Boris

noamzbr · 2022-03-13T08:20:33Z

2)After Splitting the Data: Train-Test Validation -> For these purposes you can use the -> train_test_validation()

You are right, for the checks in the train_test_validation() suite it is not necessary to pass the model, but if passed the model will be used to calculate feature importance for prioritization of different features in the check displays.

borisRa · 2022-03-13T09:20:42Z

suite

Thanks !
Tried this and then wanted to see the results through a HTML file .
Got the error bellow .

My code :
` from deepchecks.suites import train_test_validation

vsuite = train_test_validation()

vsuite.run( train_dataset=df_full, test_dataset=df_full)

vsuite.save_as_html('my_suite.html')`

and the error is :
AttributeError: 'Suite' object has no attribute 'save_as_html'

Used this page as an example : https://docs.deepchecks.com/en/stable/examples/guides/save_suite_result_as_html.html

How can I fix this ?

Thanks !
Boris

matanper · 2022-03-13T09:27:17Z

@borisRa Probably you are on an older version of deepchecks, can you try to update (pip install -U deepchecks) and see if that works?

borisRa · 2022-03-13T14:34:21Z

pip install -U deepchecks

ver 0.5.0 ( the last one from pypi)

matanper · 2022-03-13T14:52:46Z

@borisRa Sorry I haven't looked at your code properly! suite.run method returns a SuiteResult object. you should run it like this:

vsuite = train_test_validation()

result = vsuite.run( train_dataset=df_full, test_dataset=df_full)

result.save_as_html('my_suite.html')`

borisRa · 2022-03-13T15:59:40Z

@borisRa Sorry I haven't looked at your code properly! suite.run method returns a SuiteResult object. you should run it like this:
vsuite = train_test_validation()

result = vsuite.run( train_dataset=df_full, test_dataset=df_full)

result.save_as_html('my_suite.html')`

Thanks this worked !

Now getting a HTML without any plots to compare distributions . why ?
This is what I get :

matanper · 2022-03-13T16:21:27Z

@borisRa Do you see there are tabs? Please have a look in all 3 of them

borisRa · 2022-03-13T19:28:49Z

@borisRa Do you see there are tabs? Please have a look in all 3 of them

Thanks !
Found the problem. can't use pandas data frame as input to train_test_validation() must use deepchecks's built-in Dataset.

borisRa added the question Further information is requested label Mar 9, 2022

noamzbr closed this as completed Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QST: Question regarding documentation #988

QST: Question regarding documentation #988

borisRa commented Mar 9, 2022

noamzbr commented Mar 10, 2022

borisRa commented Mar 12, 2022 •

edited

noamzbr commented Mar 13, 2022

borisRa commented Mar 13, 2022

matanper commented Mar 13, 2022

borisRa commented Mar 13, 2022

matanper commented Mar 13, 2022

borisRa commented Mar 13, 2022

matanper commented Mar 13, 2022

borisRa commented Mar 13, 2022

QST: Question regarding documentation #988

QST: Question regarding documentation #988

Comments

borisRa commented Mar 9, 2022

Research

Link to question on StackOverflow

Question about deepchecks

noamzbr commented Mar 10, 2022

borisRa commented Mar 12, 2022 • edited

noamzbr commented Mar 13, 2022

borisRa commented Mar 13, 2022

matanper commented Mar 13, 2022

borisRa commented Mar 13, 2022

matanper commented Mar 13, 2022

borisRa commented Mar 13, 2022

matanper commented Mar 13, 2022

borisRa commented Mar 13, 2022

borisRa commented Mar 12, 2022 •

edited