![giskard_logo.png](https://raw.githubusercontent.com/Giskard-AI/giskard/main/readme/Logo_full_darkgreen.png)


# Basic text classification

This tutorial demonstrates [text classification of Tensorflow](https://www.tensorflow.org/tutorials/keras/text_classification) modified for uploading on [Giskard](https://www.giskard.ai/) starting from plain text files stored on disk. You'll train a binary classifier to perform sentiment analysis on an IMDB dataset.


In [1]:
#!pip install tensorflow

In [1]:
import os
import shutil
import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow.keras import layers
from tensorflow.keras import losses

In [3]:
import sys
 
print("User Current Version:-", sys.version)
import giskard
giskard.__version__
import mlflow
mlflow.__version__
tf.__version__

User Current Version:- 3.8.16 (default, Dec  7 2022, 01:36:19) 
[Clang 13.0.0 (clang-1300.0.29.30)]


'2.11.0'

## Sentiment analysis

This notebook trains a sentiment analysis model to classify movie reviews as *positive* or *negative*, based on the text of the review. This is an example of *binary*—or two-class—classification, an important and widely applicable kind of machine learning problem.

You'll use the [Large Movie Review Dataset](https://ai.stanford.edu/~amaas/data/sentiment/) that contains the text of 50,000 movie reviews from the [Internet Movie Database](https://www.imdb.com/). These are split into 25,000 reviews for training and 25,000 reviews for testing. The training and testing sets are *balanced*, meaning they contain an equal number of positive and negative reviews.


### Download and explore the IMDB dataset

Let's download and extract the dataset, then explore the directory structure.

In [2]:
url = "https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz"

dataset = tf.keras.utils.get_file("aclImdb_v1", url,
                                    untar=True, cache_dir='.',
                                    cache_subdir='')

dataset_dir = os.path.join(os.path.dirname(dataset), 'aclImdb')

In [3]:
os.listdir(dataset_dir)

['imdbEr.txt', 'test', 'imdb.vocab', 'README', 'train']

In [4]:
train_dir = os.path.join(dataset_dir, 'train')
os.listdir(train_dir)

['urls_unsup.txt',
 'neg',
 'urls_pos.txt',
 'unsup',
 'urls_neg.txt',
 'pos',
 'unsupBow.feat',
 'labeledBow.feat']

The `aclImdb/train/pos` and `aclImdb/train/neg` directories contain many text files, each of which is a single movie review. Let's take a look at one of them.

In [5]:
sample_file = os.path.join(train_dir, 'pos/1181_9.txt')
with open(sample_file) as f:
  print(f.read())

Rachel Griffiths writes and directs this award winning short film. A heartwarming story about coping with grief and cherishing the memory of those we've loved and lost. Although, only 15 minutes long, Griffiths manages to capture so much emotion and truth onto film in the short space of time. Bud Tingwell gives a touching performance as Will, a widower struggling to cope with his wife's death. Will is confronted by the harsh reality of loneliness and helplessness as he proceeds to take care of Ruth's pet cow, Tulip. The film displays the grief and responsibility one feels for those they have loved and lost. Good cinematography, great direction, and superbly acted. It will bring tears to all those who have lost a loved one, and survived.


To prepare a dataset for binary classification, you will need two folders on disk, corresponding to `class_a` and `class_b`. These will be the positive and negative movie reviews, which can be found in  `aclImdb/train/pos` and `aclImdb/train/neg`. As the IMDB dataset contains additional folders, you will remove them before using this utility.

In [6]:
remove_dir = os.path.join(train_dir, 'unsup')
shutil.rmtree(remove_dir)

Next, you will use the `text_dataset_from_directory` utility to create a labeled `tf.data.Dataset`. [tf.data](https://www.tensorflow.org/guide/data) is a powerful collection of tools for working with data. 

When running a machine learning experiment, it is a best practice to divide your dataset into three splits: [train](https://developers.google.com/machine-learning/glossary#training_set), [validation](https://developers.google.com/machine-learning/glossary#validation_set), and [test](https://developers.google.com/machine-learning/glossary#test-set). 

The IMDB dataset has already been divided into train and test, but it lacks a validation set. Let's create a validation set using an 80:20 split of the training data by using the `validation_split` argument below.

In [7]:
batch_size = 32
seed = 42

raw_train_ds = tf.keras.utils.text_dataset_from_directory(
    'aclImdb/train', 
    batch_size=batch_size, 
    validation_split=0.2, 
    subset='training', 
    seed=seed)

Found 25000 files belonging to 2 classes.
Using 20000 files for training.


In [8]:
raw_val_ds = tf.keras.utils.text_dataset_from_directory(
    'aclImdb/train', 
    batch_size=batch_size, 
    validation_split=0.2, 
    subset='validation', 
    seed=seed)

Found 25000 files belonging to 2 classes.
Using 5000 files for validation.


In [9]:
raw_test_ds = tf.keras.utils.text_dataset_from_directory(
    'aclImdb/test', 
    batch_size=batch_size)

Found 25000 files belonging to 2 classes.


## We convert this data to pandas dataframe to upload on Giskard

In [10]:
train_dataset = {'Review':[], 'Label':[]}
for text_batch, label_batch in raw_train_ds.take(625):
  for i in range(32):
    train_dataset['Review'].append(text_batch.numpy()[i])
    train_dataset['Label'].append(label_batch.numpy()[i])

train_df = pd.DataFrame.from_dict(train_dataset)

In [11]:
print("Label 0 corresponds to", raw_train_ds.class_names[0])
print("Label 1 corresponds to", raw_train_ds.class_names[1])

Label 0 corresponds to neg
Label 1 corresponds to pos


In [12]:
val_dataset = {'Review':[], 'Label':[]}
for text_batch, label_batch in raw_val_ds.take(157):
  for i in range(8):
    val_dataset['Review'].append(text_batch.numpy()[i])
    val_dataset['Label'].append(label_batch.numpy()[i])

val_df = pd.DataFrame.from_dict(val_dataset)


In [13]:
test_dataset = {'Review':[], 'Label':[]}
for text_batch, label_batch in raw_test_ds.take(782):
  for i in range(8):
    test_dataset['Review'].append(text_batch.numpy()[i])
    test_dataset['Label'].append(label_batch.numpy()[i])
  test_df = pd.DataFrame.from_dict(test_dataset)

### Prepare the dataset for training

Next, you will standardize, tokenize, and vectorize the data using the helpful `tf.keras.layers.TextVectorization` layer. 

Standardization refers to preprocessing the text, typically to remove punctuation or HTML elements to simplify the dataset. Tokenization refers to splitting strings into tokens (for example, splitting a sentence into individual words, by splitting on whitespace). Vectorization refers to converting tokens into numbers so they can be fed into a neural network. All of these tasks can be accomplished with this layer.

As you saw above, the reviews contain various HTML tags like `<br />`. These tags will not be removed by the default standardizer in the `TextVectorization` layer (which converts text to lowercase and strips punctuation by default, but doesn't strip HTML). You will write a custom standardization function to remove the HTML.

Note: To prevent [training-testing skew](https://developers.google.com/machine-learning/guides/rules-of-ml#training-serving_skew) (also known as training-serving skew), it is important to preprocess the data identically at train and test time. To facilitate this, the `TextVectorization` layer can be included directly inside your model, as shown later in this tutorial.

In [14]:
max_features = 10000
sequence_length = 250
vectorize_layer = tf.keras.layers.TextVectorization(
    standardize='lower_and_strip_punctuation',
    max_tokens=max_features,
    output_mode='int',
    output_sequence_length=sequence_length)

Next, you will create a `TextVectorization` layer. You will use this layer to standardize, tokenize, and vectorize our data. You set the `output_mode` to `int` to create unique integer indices for each token.

Note that you're using the default split function, and the custom standardization function you defined above. You'll also define some constants for the model, like an explicit maximum `sequence_length`, which will cause the layer to pad or truncate sequences to exactly `sequence_length` values.

Next, you will call `adapt` to fit the state of the preprocessing layer to the dataset. This will cause the model to build an index of strings to integers.

Note: It's important to only use your training data when calling adapt (using the test set would leak information).

In [15]:
# Make a text-only dataset (without labels), then call adapt
train_text = raw_train_ds.map(lambda x, y: x)
vectorize_layer.adapt(train_text)

Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089


Let's create a function to see the result of using this layer to preprocess some data.

In [16]:
def vectorize_text(text, label):
  text = tf.expand_dims(text, -1)
  return vectorize_layer(text), label

In [17]:
# retrieve a batch (of 32 reviews and labels) from the dataset
text_batch, label_batch = next(iter(raw_train_ds))
first_review, first_label = text_batch[0], label_batch[0]
print("Review", first_review)
print("Label", raw_train_ds.class_names[first_label])
print("Vectorized review", vectorize_text(first_review, first_label))

Review tf.Tensor(b'Great movie - especially the music - Etta James - "At Last". This speaks volumes when you have finally found that special someone.', shape=(), dtype=string)
Label neg
Vectorized review (<tf.Tensor: shape=(1, 250), dtype=int64, numpy=
array([[  87,   18,  259,    2,  223,    1,  566,   31,  228,   11, 2422,
           1,   52,   23,   26,  400,  250,   12,  308,  280,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
       

You are nearly ready to train your model. As a final preprocessing step, you will apply the TextVectorization layer you created earlier to the train, validation, and test dataset.

In [18]:
train_ds = raw_train_ds.map(vectorize_text)
val_ds = raw_val_ds.map(vectorize_text)
test_ds = raw_test_ds.map(vectorize_text)

### Configure the dataset for performance

These are two important methods you should use when loading data to make sure that I/O does not become blocking.

`.cache()` keeps data in memory after it's loaded off disk. This will ensure the dataset does not become a bottleneck while training your model. If your dataset is too large to fit into memory, you can also use this method to create a performant on-disk cache, which is more efficient to read than many small files.

`.prefetch()` overlaps data preprocessing and model execution while training. 

You can learn more about both methods, as well as how to cache data to disk in the [data performance guide](https://www.tensorflow.org/guide/data_performance).

In [19]:
AUTOTUNE = tf.data.AUTOTUNE

train_ds = train_ds.cache().prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
test_ds = test_ds.cache().prefetch(buffer_size=AUTOTUNE)

### Create the model

It*'*s time to create your neural network:

In [20]:
embedding_dim = 16

In [21]:
model = tf.keras.Sequential([
  layers.Embedding(max_features + 1, embedding_dim),
  layers.Dropout(0.2),
  layers.GlobalAveragePooling1D(),
  layers.Dropout(0.2),
  layers.Dense(1)
  ])

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, None, 16)          160016    
                                                                 
 dropout (Dropout)           (None, None, 16)          0         
                                                                 
 global_average_pooling1d (G  (None, 16)               0         
 lobalAveragePooling1D)                                          
                                                                 
 dropout_1 (Dropout)         (None, 16)                0         
                                                                 
 dense (Dense)               (None, 1)                 17        
                                                                 
Total params: 160,033
Trainable params: 160,033
Non-trainable params: 0
__________________________________________________

The layers are stacked sequentially to build the classifier:

1. The first layer is an `Embedding` layer. This layer takes the integer-encoded reviews and looks up an embedding vector for each word-index. These vectors are learned as the model trains. The vectors add a dimension to the output array. The resulting dimensions are: `(batch, sequence, embedding)`.  To learn more about embeddings, check out the [Word embeddings](https://www.tensorflow.org/text/guide/word_embeddings) tutorial.
2. Next, a `GlobalAveragePooling1D` layer returns a fixed-length output vector for each example by averaging over the sequence dimension. This allows the model to handle input of variable length, in the simplest way possible.
3. This fixed-length output vector is piped through a fully-connected (`Dense`) layer with 16 hidden units. 
4. The last layer is densely connected with a single output node.

### Loss function and optimizer

A model needs a loss function and an optimizer for training. Since this is a binary classification problem and the model outputs a probability (a single-unit layer with a sigmoid activation), you'll use `losses.BinaryCrossentropy` loss function.

Now, configure the model to use an optimizer and a loss function:

In [22]:
model.compile(loss=losses.BinaryCrossentropy(from_logits=True),
              optimizer='adam',
              metrics=tf.metrics.BinaryAccuracy(threshold=0.0))

### Train the model

You will train the model by passing the `dataset` object to the fit method.

In [23]:
epochs = 1
history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=epochs)



### Evaluate the model

Let's see how the model performs. Two values will be returned. Loss (a number which represents our error, lower values are better), and accuracy.

In [24]:
loss, accuracy = model.evaluate(test_ds)

print("Loss: ", loss)
print("Accuracy: ", accuracy)

Loss:  0.6239201426506042
Accuracy:  0.7626000046730042


This fairly naive approach achieves an accuracy of about 86%.

## Export the model

In the code above, you applied the `TextVectorization` layer to the dataset before feeding text to the model. If you want to make your model capable of processing raw strings (for example, to simplify deploying it), you can include the `TextVectorization` layer inside your model. To do so, you can create a new model using the weights you just trained.

In [25]:
export_model = tf.keras.Sequential([
    vectorize_layer,
    model,
    layers.Activation('sigmoid')
])

export_model.compile(
    loss=losses.BinaryCrossentropy(from_logits=False), optimizer="adam", metrics=['accuracy']
)

# Test it with `raw_test_ds`, which yields raw strings
loss, accuracy = export_model.evaluate(raw_test_ds)
print(accuracy)

0.7626000046730042


### Inference on new data

To get predictions for new examples, you can simply call `model.predict()`.

In [26]:
examples = [
  "The movie was great!",
  "The movie was okay.",
  "The movie was terrible..."
]

export_model.predict(examples)



array([[0.5226061 ],
       [0.513277  ],
       [0.50991774]], dtype=float32)

## Initiate Project On Giskard

### Giskard require a prediction function which takes dataframe as input and provides prediction probability as output for classification models

In [27]:
def predict(test_dataset):
    test_dataset= test_dataset.squeeze(axis=1)
    test_dataset = list(test_dataset)
    predictions = export_model.predict(test_dataset)
    predictions = np.insert(predictions, 1, 1 - predictions[:, 0], axis=1)
    return predictions

In [28]:
from giskard import GiskardClient

url = "http://localhost:9000" #if Giskard is installed locally (for installation, see: https://docs.giskard.ai/start/guides/installation)
#url = "http://app.giskard.ai" # If you want to upload on giskard URL
token = "xxx" #you can generate your API token in the Admin tab of the Giskard application (for installation, see: https://docs.giskard.ai/start/guides/installation)

client = GiskardClient(url, token)

#your_project = client.create_project("project_key", "PROJECT_NAME", "DESCRIPTION")
# Choose the arguments you want. But "project_key" should be unique and in lower case
#tensorflow_text_classification = client.create_project("tensorflow_text_classification", "Tensorflow_text_Classification", "Classification Of Text using Tensorflow Neural Network")

# If you've already created a project with the key "tensorflow_text_classification" use
tensorflow_text_classification = client.get_project("tensorflow_text_classification")

In [29]:
# Declare the type of each column in the dataset(example: category, numeric, text)
column_types = {'Review':"text",
               'Label':"category"}

#### Old way to upload the model and dataset

In [30]:
#tensorflow_text_classification.upload_model_and_df(
#    prediction_function=predict, # Python function which takes pandas dataframe as input and returns probabilities for classification model OR returns predictions for regression model
#    model_type='classification', # "classification" for classification model OR "regression" for regression model
#    df=test_df, # the dataset you want to use to inspect your model
#    column_types=column_types, # A dictionary with columns names of df as key and types(category, numeric, text) of columns as values
#    target='Label', # The column name in df corresponding to the actual target variable (ground truth).
#    feature_names=['Review'], # List of the feature names of prediction_function
#    classification_labels=[0,1],  # List of the classification labels of your prediction
#    model_name='Tensorflow', # Name of the model
#    dataset_name='test_data' # Name of the dataset
#)

Old error:
```
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/giskard/client/project.py in _validate_model_is_pickleable(prediction_function)
    678         try:
--> 679             pickled_model = cloudpickle.dumps(prediction_function)
    680             unpickled_model = cloudpickle.loads(pickled_model)

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py in dumps(obj, protocol)
    101             cp = CloudPickler(file, protocol=protocol)
--> 102             cp.dump(obj)
    103             return file.getvalue()

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py in dump(self, obj)
    631         try:
--> 632             return Pickler.dump(self, obj)
    633         except RuntimeError as e:

/usr/local/Cellar/python@3.7/3.7.16/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py in dump(self, obj)
    436             self.framer.start_framing()
--> 437         self.save(obj)
    438         self.write(STOP)

/usr/local/Cellar/python@3.7/3.7.16/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py in save_function(self, obj, name)
    818                 return self._save_reduce_pickle5(
--> 819                     *self._dynamic_function_reduce(obj), obj=obj
    820                 )

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py in _save_reduce_pickle5(self, func, args, state, listitems, dictitems, state_setter, obj)
    760             save(obj)  # simple BINGET opcode as obj is already memoized.
--> 761             save(state)
    762             write(pickle.TUPLE2)

/usr/local/Cellar/python@3.7/3.7.16/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return

/usr/local/Cellar/python@3.7/3.7.16/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py in save_tuple(self, obj)
    773             for element in obj:
--> 774                 save(element)
    775             # Subtle.  Same as in the big comment below.

/usr/local/Cellar/python@3.7/3.7.16/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/dill/_dill.py in save_module_dict(pickler, obj)
   1185             pickler._first_pass = False
-> 1186         StockPickler.save_dict(pickler, obj)
   1187         logger.trace(pickler, "# D2")

/usr/local/Cellar/python@3.7/3.7.16/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py in save_dict(self, obj)
    858         self.memoize(obj)
--> 859         self._batch_setitems(obj.items())
    860 

/usr/local/Cellar/python@3.7/3.7.16/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py in _batch_setitems(self, items)
    884                     save(k)
--> 885                     save(v)
    886                 write(SETITEMS)

/usr/local/Cellar/python@3.7/3.7.16/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/dill/_dill.py in save_module_dict(pickler, obj)
   1185             pickler._first_pass = False
-> 1186         StockPickler.save_dict(pickler, obj)
   1187         logger.trace(pickler, "# D2")

/usr/local/Cellar/python@3.7/3.7.16/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py in save_dict(self, obj)
    858         self.memoize(obj)
--> 859         self._batch_setitems(obj.items())
    860 

/usr/local/Cellar/python@3.7/3.7.16/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py in _batch_setitems(self, items)
    884                     save(k)
--> 885                     save(v)
    886                 write(SETITEMS)

/usr/local/Cellar/python@3.7/3.7.16/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    523             if reduce is not None:
--> 524                 rv = reduce(self.proto)
    525             else:

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/keras/engine/training.py in __reduce__(self)
    366                 pickle_utils.deserialize_model_from_bytecode,
--> 367                 (pickle_utils.serialize_model_as_bytecode(self),),
    368             )

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/keras/saving/pickle_utils.py in serialize_model_as_bytecode(model)
     72     except Exception as e:
---> 73         raise e
     74     else:

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/keras/saving/pickle_utils.py in serialize_model_as_bytecode(model)
     68         filepath = os.path.join(temp_dir, "model.keras")
---> 69         saving_lib.save_model(model, filepath)
     70         with open(filepath, "rb") as f:

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/keras/saving/experimental/saving_lib.py in save_model(model, filepath)
    152     except Exception as e:
--> 153         raise e
    154     finally:

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/keras/saving/experimental/saving_lib.py in save_model(model, filepath)
    142             inner_path="",
--> 143             visited_trackables=set(),
    144         )

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/keras/saving/experimental/saving_lib.py in _save_state(trackable, weights_handler, assets_handler, inner_path, visited_trackables)
    252                 inner_path=tf.io.gfile.join(inner_path, child_attr),
--> 253                 visited_trackables=visited_trackables,
    254             )

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/keras/saving/experimental/saving_lib.py in _save_container_state(container, weights_handler, assets_handler, inner_path, visited_trackables)
    316                 inner_path=tf.io.gfile.join(inner_path, name),
--> 317                 visited_trackables=visited_trackables,
    318             )

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/keras/saving/experimental/saving_lib.py in _save_state(trackable, weights_handler, assets_handler, inner_path, visited_trackables)
    244                 inner_path=tf.io.gfile.join(inner_path, child_attr),
--> 245                 visited_trackables=visited_trackables,
    246             )

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/keras/saving/experimental/saving_lib.py in _save_state(trackable, weights_handler, assets_handler, inner_path, visited_trackables)
    223     if hasattr(trackable, "_save_own_variables"):
--> 224         trackable._save_own_variables(weights_handler.make(inner_path))
    225     if hasattr(trackable, "_save_assets"):

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/keras/engine/base_layer.py in _save_own_variables(self, store)
   3432         for i, v in enumerate(all_vars):
-> 3433             store[f"{i}"] = v.numpy()
   3434 

AttributeError: 'VocabWeightHandler' object has no attribute 'numpy'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-36-7f56dfbe720a> in <module>
      8     classification_labels=[0,1],  # List of the classification labels of your prediction
      9     model_name='Tensorflow', # Name of the model
---> 10     dataset_name='test_data' # Name of the dataset
     11 )

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/giskard/client/project.py in upload_model_and_df(self, prediction_function, model_type, df, column_types, feature_names, target, model_name, dataset_name, classification_threshold, classification_labels)
    396             prediction_function,
    397             target,
--> 398             df,
    399         )
    400         data_res = self._post_data(column_types, data, dataset_name, raw_column_types, target)

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/giskard/client/project.py in _validate_model(self, classification_labels, classification_threshold, feature_names, model_type, prediction_function, target, validate_df)
    222             validate_df,
    223     ):
--> 224         prediction_function = self._validate_model_is_pickleable(prediction_function)
    225         transformed_pred_func = self.transform_prediction_function(
    226             prediction_function, feature_names

~/Documents/giskard/python-client/.venv/lib/python3.7/site-packages/giskard/client/project.py in _validate_model_is_pickleable(prediction_function)
    680             unpickled_model = cloudpickle.loads(pickled_model)
    681         except Exception:
--> 682             raise ValueError("Unable to pickle or unpickle model on Giskard")
    683         return unpickled_model
    684 

ValueError: Unable to pickle or unpickle model on Giskard
```

In [32]:
test_df.head()

Unnamed: 0,Review,Label
0,"b""Pilot Mitch MacAfee (Jeff Morrow) sees a UFO...",0
1,"b""This movie could have been great. It is not ...",0
2,b'I had to write a review for this film after ...,0
3,"b'Now that I have seen it, it was NOT what I w...",1
4,b'the movie is complete disaster. i don\'t kno...,0


#### new way to upload model and dataset 

In [34]:
#tensorflow_text_classification.upload_model_and_df(
#    prediction_function=predict, # Python function which takes pandas dataframe as input and returns probabilities for classification model OR returns predictions for regression model
#    model_type='classification', # "classification" for classification model OR "regression" for regression model
#    df=test_df, # the dataset you want to use to inspect your model
#    column_types=column_types, # A dictionary with columns names of df as key and types(category, numeric, text) of columns as values
#    target='Label', # The column name in df corresponding to the actual target variable (ground truth).
#    feature_names=['Review'], # List of the feature names of prediction_function
#    classification_labels=[0,1],  # List of the classification labels of your prediction
#    model_name='Tensorflow', # Name of the model
#    dataset_name='test_data' # Name of the dataset
#)

from giskard import Model, TensorFlowModel, GiskardClient, Dataset

# Wrap your clf with SKLearnModel from Giskard
def data_preparation_function(df):
    test_dataset = df.squeeze(axis=1)
    test_dataset = list(test_dataset)
    return test_dataset

my_model = TensorFlowModel(name="TextClassification",
                        clf=export_model,
                        model_type="classification",
                        classification_labels=['0','1'],
                        data_preparation_function=data_preparation_function,
                        feature_names=['Review'])

# Wrap your dataset with Dataset from Giskard
my_test_dataset = Dataset(test_df, name="test dataset", target="Label", column_meanings=column_types)

# save model and dataset to Giskard server
mid = my_model.save(client, "tensorflow_text_classification", validate_ds=my_test_dataset)
did = my_test_dataset.save(client, "tensorflow_text_classification")



INFO:tensorflow:Assets written to: /var/folders/jp/b7681vg128nf8s2hw47sl6380000gn/T/giskard-model-h3n9ihky/data/model/assets
2023-01-18 11:48:26,872 pid:85650 MainThread tensorflow   INFO     Assets written to: /var/folders/jp/b7681vg128nf8s2hw47sl6380000gn/T/giskard-model-h3n9ihky/data/model/assets
Hint: "Your target variable values are numeric. It is recommended to have Human readable string as your target values to make results more understandable in Giskard."
2023-01-18 11:48:39,427 pid:85650 MainThread giskard.ml_worker.core.dataset INFO     Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}
2023-01-18 11:48:40,032 pid:85650 MainThread giskard.ml_worker.utils.logging INFO     Predicted dataset with shape (10, 2) executed in 0:00:00.606969
2023-01-18 11:48:40,035 pid:85650 MainThread giskard.ml_worker.core.dataset INFO     Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}
2023-01-18 11:48:40,152 pid:85650 MainThread giskard.ml_work



INFO:tensorflow:Assets written to: /var/folders/jp/b7681vg128nf8s2hw47sl6380000gn/T/giskard-model-9czvbc54/data/model/assets
2023-01-18 11:48:42,018 pid:85650 MainThread tensorflow   INFO     Assets written to: /var/folders/jp/b7681vg128nf8s2hw47sl6380000gn/T/giskard-model-9czvbc54/data/model/assets
Model successfully uploaded to project key 'tensorflow_text_classification' with ID = f9125162-2461-4574-ba36-a69c48b4a41b
Dataset successfully uploaded to project key 'tensorflow_text_classification' with ID = 1797cdbd-270e-4e32-b325-187ed37d50c3


In [None]:
#tensorflow_text_classification.upload_df(
#    df=train_df, # The dataset you want to upload
#    column_types=column_types, # All the column types without the target
#    name="train_data"  # Name of the dataset
#)

In [None]:
#tensorflow_text_classification.upload_df(
#    df=val_df, # The dataset you want to upload
#    column_types=column_types, # All the column types without the target
#    name="val_data"  # Name of the dataset
#)