Add adversarial validation to sample_similarity

Once we find out that the distribution of the training and test data are different, we can use adversarial validation to identify a subset of training data that is similar to the test data.
This subset can then be used as the validation set.Thus we will have a good idea of how our model will perform on the test set, which belongs to a different distribution than our training set.
The pseudo-code to include adversarial validation can be as follows :
```def get_adversarial_validation_set(train,test,clf,threshold=0.5):

    """
    Args:
        train   : training dataframe
        test    : testing dataframe
        clf     : classifier used to seperate train and test
        threshold : threshold , default = 0.5

    Returns:
       adv_val_set : validation set.

    """

    train['istest']=0
    test['istest']=1
    df = pd.concat([train,test],axis=0)
    y = df['istest']
    X = df.drop(columns=['istest'],axis=1)
    proba = cross_val_predict(clf,X,y,cv=3,method='predict_proba')
    df['test_proba'] = proba[:,1]
    adv_val_set = df.query('istest==0 and test_proba > {threshold}')
    return adv_val_set
```
More information can be found here and here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add adversarial validation to sample_similarity #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add adversarial validation to sample_similarity #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions