![giskard_logo.png](https://raw.githubusercontent.com/Giskard-AI/giskard/main/readme/Logo_full_darkgreen.png)

# About Giskard

Open-Source CI/CD platform for ML teams. Deliver ML products, better & faster. 

*   Collaborate faster with feedback from business stakeholders.
*   Deploy automated tests to eliminate regressions, errors & biases.

🏡 [Website](https://giskard.ai/)

📗 [Documentation](https://docs.giskard.ai/)

#Start by creating an ML model 🚀🚀🚀

Let's create a credit scoring Model using the German Credit scoring dataset [(Link](https://github.com/Giskard-AI/giskard-client/tree/main/sample_data/classification) to download the dataset)

In [None]:
import pandas as pd
import sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn import model_selection
from sklearn.preprocessing import StandardScaler
import numpy as np
from sklearn.model_selection import RandomizedSearchCV
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier



from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer


In [None]:
url = 'https://raw.githubusercontent.com/Giskard-AI/giskard-client/main/sample_data/classification/credit/german_credit_prepared.csv'
credit = pd.read_csv(url, sep=',',engine="python") #To download go to https://github.com/Giskard-AI/giskard-client/tree/main/sample_data/classification

In [None]:
column_types = {'default':"category",
               'account_check_status':"category", 
               'duration_in_month':"numeric",
               'credit_history':"category",
               'purpose':"category",
               'credit_amount':"numeric",
               'savings':"category",
               'present_emp_since':"category",
               'installment_as_income_perc':"numeric",
               'sex':"category",
               'personal_status':"category",
               'other_debtors':"category",
               'present_res_since':"numeric",
               'property':"category",
               'age':"numeric",
               'other_installment_plans':"category",
               'housing':"category",
               'credits_this_bank':"numeric",
               'job':"category",
               'people_under_maintenance':"numeric",
               'telephone':"category",
               'foreign_worker':"category"}

In [None]:
feature_types = {i:column_types[i] for i in column_types if i!='default'}

columns_to_scale = [key for key in feature_types.keys() if feature_types[key]=="numeric"]

numeric_transformer = Pipeline([('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])


columns_to_encode = [key for key in feature_types.keys() if feature_types[key]=="category"]

categorical_transformer = Pipeline([
        ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
        ('onehot', OneHotEncoder(handle_unknown='ignore',sparse=False)) ])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, columns_to_scale),
      ('cat', categorical_transformer, columns_to_encode)
          ]
)
clf_logistic_regression = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', LogisticRegression(max_iter =1000))])


Y=credit['default']
X= credit.drop(columns="default")
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=0.20,random_state = 30, stratify = Y)

In [None]:
clf_logistic_regression.fit(X_train, Y_train)
clf_logistic_regression.score(X_test, Y_test)

In [None]:
train_data = pd.concat([X_train, Y_train], axis=1)
test_data = pd.concat([X_test, Y_test ], axis=1)

#Upload the model in Giskard 🚀🚀🚀


#### Install Giskard library



In [None]:
!pip install giskard


### Initiate a project

In [None]:
from giskard.giskard_client import GiskardClient

url1 = "http://gsk1.giskard.ai:10000"
token1 = "eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiJhZG1pbiIsInRva2VuX3R5cGUiOiJBUEkiLCJhdXRoIjoiUk9MRV9BRE1JTiIsImV4cCI6MTY2MjkzMDE3Mn0.A0hdmCnddvdhVj62mRCMvQ_N-Cor13SdcHeLa7e8J9YqEucWlZRpTt8hbK6PKIa1yfgCrwN7EQQ4Q4mYMNNeXQ"

#url = "http://localhost:19000" #If Giskard is installed locally
#token = "eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiJhZG1pbiIsInRva2VuX3R5cGUiOiJBUEkiLCJhdXRoIjoiUk9MRV9BRE1JTiIsImV4cCI6MTY2Mjc1Nzg5Nn0.vKOmgNqi3wMFq1nABvmlpi-nq1zLLFGEJwLKREXl0fF6_8kGX4a-MwQn3TszxRUngC_bElR_Ui2uivjyCZ9Tgg"
#Find your token in the Admin tab of your app (login: admin; password: admin)


client = GiskardClient(url1, token1)

credit_scoring = client.create_project("credit_scoring", "German Credit Scoring", "Project to predict if user will default")

#If you've already created a project with the key "credit-scoring" use
#credit_scoring = client.get_project("credit_scoring")


### Upload your model and a dataset (see [documentation](https://docs.giskard.ai/start/guides/upload-your-model))

In [None]:
credit_scoring.upload_model_and_df(
    prediction_function=clf_logistic_regression.predict_proba, 
    model_type='classification',
    df=test_data, #the dataset you want to use to inspect your model
    column_types=column_types, #all the column types of df
    target='default', #the column name in df corresponding to the actual target variable (ground truth).
    feature_names=list(feature_types.keys()),#list of the feature names of prediction_function
    classification_labels=clf_logistic_regression.classes_ , 
    model_name='logistic_regression_v1',
    dataset_name='test_data'
)

### 🌟 If you want to upload a dataset without a model






For example, let's upload the train set in Giskard, this is key to create drift tests in Giskard.


In [None]:
credit_scoring.upload_df(
    df=train_data,
    column_types=column_types, #all the column types of df
    target="default", # do not pass this parameter if dataset doesnt contain target column 
    name="train_data"
)

You can also upload new production data to use it as a validatation set for your existing model. In that case, you might not have the ground truth target variable

In [None]:
production_data = credit.drop(columns="default")

In [None]:
credit_scoring.upload_df(
    df=production_data,
    column_types=feature_types, #all the column types without the target
    name="production_data"
)

### 🌟 If you just want to upload a model without a dataframe 

This happens for instance when you built a new version of the model and you want to inspect it using a validation dataframe that is already in Giskard

For example, let's create a second version of the model using random forest

In [None]:
clf_random_forest = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', RandomForestClassifier(max_depth=10,random_state=0))])

clf_random_forest.fit(X_train, Y_train)
clf_random_forest.score(X_test, Y_test)

In [None]:
credit_scoring.upload_model(
    prediction_function=clf_random_forest.predict_proba,
    model_type='classification',
    feature_names=list(feature_types.keys()),#list of the feature names of prediction_function
    name='random_forest',
    validate_df=train_data, #Optional. Validatation df is not uploaded in the app, it's only used to check whether the model has the good format
    target="default", #Optional. target should be a column of validate_df
    classification_labels=["Non default", "Default"] 

)

### Happy Exploration ! 🧑‍🚀