Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QST: Question regarding documentation #988

Closed
2 tasks done
borisRa opened this issue Mar 9, 2022 · 10 comments
Closed
2 tasks done

QST: Question regarding documentation #988

borisRa opened this issue Mar 9, 2022 · 10 comments
Labels
question Further information is requested

Comments

@borisRa
Copy link

borisRa commented Mar 9, 2022

Research

Link to question on StackOverflow

https://docs.deepchecks.com/en/stable/user-guide/when_should_you_use.html#when-should-you-use-new-data

Question about deepchecks

Hi,
Based on your documentation page => https://docs.deepchecks.com/en/stable/user-guide/when_should_you_use.html#when-should-you-use-new-data

There are 4 phases :
1)New Data: Single Dataset Validation -> For these purposes you can use the single_dataset_integrity()
2)After Splitting the Data: Train-Test Validation -> For these purposes you can use the -> train_test_validation()
3)After Training a Model: Analysis & Validation ->model_evaluation()
4)General Overview: Full Suite -> full_suite()

The question where can I see the examples of using this 4 options ?

Thanks,
Boris

@borisRa borisRa added the question Further information is requested label Mar 9, 2022
@noamzbr
Copy link
Collaborator

noamzbr commented Mar 10, 2022

Hi @borisRa, you can see an example of using the three suites in our use-case example: https://docs.deepchecks.com/en/stable/examples/use-cases/phishing_urls.html

When to use each built-in suite:

  1. The single_dataset_integrity() suite contains checks for validating a single dataset (usually pre-processed and before any train-test split)
  2. The train_test_validation() is used on two dataset (the two splits) to make sure the split was done correctly from a distribution and methodology standpoint - checking for various kinds of data drift an leakage.
  3. The model_evaluation() is used once the model is trained (and thus requires both two datasets and a model) to check the resulting model itself, searching for "simple" performance issues and also more complex problems (such as weak segments).
  4. The full_suite() is typically used in cases in which you already had a complete pipeline prior of using deepchecks, and now you wish to validate in retrospect your whole process, or rather do a final signoff on an existing modeling process. This suite runs all of the checks in belonging to all the other suites detailed above.

@noamzbr noamzbr closed this as completed Mar 10, 2022
@borisRa
Copy link
Author

borisRa commented Mar 12, 2022

Thanks !

2. The train_test_validation() -> in the usecase example this function is used with a pre-trained model. why ?
vsuite = train_test_validation()
vsuite.run(model=logreg, train_dataset=ds_train, test_dataset=ds_test)

I mean from the explanation above , model_evaluation() should use model and train_test_validation() just validating the split distributions .

Thanks !
Boris

@noamzbr
Copy link
Collaborator

noamzbr commented Mar 13, 2022

2)After Splitting the Data: Train-Test Validation -> For these purposes you can use the -> train_test_validation()

You are right, for the checks in the train_test_validation() suite it is not necessary to pass the model, but if passed the model will be used to calculate feature importance for prioritization of different features in the check displays.

@borisRa
Copy link
Author

borisRa commented Mar 13, 2022

suite

Thanks !
Tried this and then wanted to see the results through a HTML file .
Got the error bellow .

My code :
` from deepchecks.suites import train_test_validation

vsuite = train_test_validation()

vsuite.run( train_dataset=df_full, test_dataset=df_full)

vsuite.save_as_html('my_suite.html')`

and the error is :
AttributeError: 'Suite' object has no attribute 'save_as_html'

Used this page as an example : https://docs.deepchecks.com/en/stable/examples/guides/save_suite_result_as_html.html

How can I fix this ?

Thanks !
Boris

@matanper
Copy link
Contributor

@borisRa Probably you are on an older version of deepchecks, can you try to update (pip install -U deepchecks) and see if that works?

@borisRa
Copy link
Author

borisRa commented Mar 13, 2022

pip install -U deepchecks

ver 0.5.0 ( the last one from pypi)

@matanper
Copy link
Contributor

@borisRa Sorry I haven't looked at your code properly! suite.run method returns a SuiteResult object. you should run it like this:

vsuite = train_test_validation()

result = vsuite.run( train_dataset=df_full, test_dataset=df_full)

result.save_as_html('my_suite.html')`

@borisRa
Copy link
Author

borisRa commented Mar 13, 2022

@borisRa Sorry I haven't looked at your code properly! suite.run method returns a SuiteResult object. you should run it like this:

vsuite = train_test_validation()

result = vsuite.run( train_dataset=df_full, test_dataset=df_full)

result.save_as_html('my_suite.html')`

Thanks this worked !

Now getting a HTML without any plots to compare distributions . why ?
This is what I get :

image

@matanper
Copy link
Contributor

@borisRa Do you see there are tabs? Please have a look in all 3 of them

@borisRa
Copy link
Author

borisRa commented Mar 13, 2022

@borisRa Do you see there are tabs? Please have a look in all 3 of them

Thanks !
Found the problem. can't use pandas data frame as input to train_test_validation() must use deepchecks's built-in Dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants