# Semantic Similarity with BERT

**Author:** [Mohamad Merchant](https://twitter.com/mohmadmerchant1)<br>
**Date created:** 2020/08/15<br>
**Last modified:** 2020/08/29<br>
**Description:** Natural Language Inference by fine-tuning BERT model on SNLI Corpus.

## Introduction

Semantic Similarity is the task of determining how similar
two sentences are, in terms of what they mean.
This example demonstrates the use of SNLI (Stanford Natural Language Inference) Corpus
to predict sentence semantic similarity with Transformers.
We will fine-tune a BERT model that takes two sentences as inputs
and that outputs a similarity score for these two sentences.

### References

* [BERT](https://arxiv.org/pdf/1810.04805.pdf)
* [SNLI](https://nlp.stanford.edu/projects/snli/)

## Setup

Note: install HuggingFace `transformers` via `pip install transformers` (version >= 2.11.0).

In [None]:
!pip install transformers



In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
import transformers

## Configuration

In [None]:
max_length = 256  # Maximum length of input sentence to the model.
batch_size = 16
epochs = 3
tf.random.set_seed(42)

# Labels in our dataset.
labels = ["contradiction", "entailment"]
query = "The chemical compound or material has a bulk modulus value and unit."
# query = "Bulk modulus can be measured from chemical compound or material."

## Load the Data

In [None]:
# There are more than 550k samples in total; we will use 100k for this example.
train_df = pd.read_csv("train_combined.csv")
valid_df = pd.read_csv("valid_combined.csv")
test_df = pd.read_csv("test_combined.csv")

valid_df['sentence2'] = query
train_df['sentence2'] = query
test_df['sentence2'] = query

# Shape of the data
print(f"Total train samples : {train_df.shape[0]}")
print(f"Total validation samples: {valid_df.shape[0]}")
print(f"Total test samples: {test_df.shape[0]}")

Total train samples : 244
Total validation samples: 28
Total test samples: 31


Dataset Overview:

- sentence1: The premise caption that was supplied to the author of the pair.
- sentence2: The hypothesis caption that was written by the author of the pair.
- similarity: This is the label chosen by the majority of annotators.
Where no majority exists, the label "-" is used (we will skip such samples here).

Here are the "similarity" label values in our dataset:

- Contradiction: The sentences share no similarity.
- Entailment: The sentences have similar meaning.
- Neutral: The sentences are neutral.

Let's look at one sample from the dataset:

In [None]:
train_df.head()

Unnamed: 0,sentence1,similarity,sentence2
0,A deeper understanding of the diverse properti...,contradiction,The chemical compound or material has a bulk m...
1,The bulk modulus of brucite is thus likely to ...,entailment,The chemical compound or material has a bulk m...
2,The ternary potentials accurately capture the ...,contradiction,The chemical compound or material has a bulk m...
3,"The relaxed adiabatic bulk modulus, derived fr...",entailment,The chemical compound or material has a bulk m...
4,Recent studies on quenchedhigh density ZrO2 an...,entailment,The chemical compound or material has a bulk m...


In [None]:
print(f"Sentence1: {train_df.loc[1, 'sentence1']}")
print(f"Sentence2: {train_df.loc[1, 'sentence2']}")
print(f"Similarity: {train_df.loc[1, 'similarity']}")

Sentence1: The bulk modulus of brucite is thus likely to lie closer to 40 than to 50 GPa; measurements of elastic constants by ultrasonic methods or Brillouin scattering are needed to obtain a more accurate value. A
Sentence2: The chemical compound or material has a bulk modulus value and unit.
Similarity: entailment


## Preprocessing

Distribution of our training targets.

In [None]:
print("Train Target Distribution")
print(train_df.similarity.value_counts())

Train Target Distribution
contradiction    143
entailment       101
Name: similarity, dtype: int64


Distribution of our validation targets.

In [None]:
print("Validation Target Distribution")
print(valid_df.similarity.value_counts())

Validation Target Distribution
contradiction    18
entailment       10
Name: similarity, dtype: int64


The value "-" appears as part of our training and validation targets.
We will skip these samples.

In [None]:
# train_df = (
#     train_df[train_df.similarity != "-"]
#     .sample(frac=1.0, random_state=42)
#     .reset_index(drop=True)
# )
# valid_df = (
#     valid_df[valid_df.similarity != "-"]
#     .sample(frac=1.0, random_state=42)
#     .reset_index(drop=True)
# )

One-hot encode training, validation, and test labels.

In [None]:
train_df["label"] = train_df["similarity"].apply(
    lambda x: 0 if x == "contradiction" else 1 
)
y_train = tf.keras.utils.to_categorical(train_df.label, num_classes=2)

valid_df["label"] = valid_df["similarity"].apply(
    lambda x: 0 if x == "contradiction" else 1 
)
y_val = tf.keras.utils.to_categorical(valid_df.label, num_classes=2)

test_df["label"] = test_df["similarity"].apply(
    lambda x: 0 if x == "contradiction" else 1 
)
y_test = tf.keras.utils.to_categorical(test_df.label, num_classes=2)

## Keras Custom Data Generator

In [None]:

class BertSemanticDataGenerator(tf.keras.utils.Sequence):
    """Generates batches of data.

    Args:
        sentence_pairs: Array of premise and hypothesis input sentences.
        labels: Array of labels.
        batch_size: Integer batch size.
        shuffle: boolean, whether to shuffle the data.
        include_targets: boolean, whether to incude the labels.

    Returns:
        Tuples `([input_ids, attention_mask, `token_type_ids], labels)`
        (or just `[input_ids, attention_mask, `token_type_ids]`
         if `include_targets=False`)
    """

    def __init__(
        self,
        sentence_pairs,
        labels,
        batch_size=batch_size,
        shuffle=True,
        include_targets=True,
    ):
        self.sentence_pairs = sentence_pairs
        self.labels = labels
        self.shuffle = shuffle
        self.batch_size = batch_size
        self.include_targets = include_targets
        # Load our BERT Tokenizer to encode the text.
        # We will use base-base-uncased pretrained model.
        self.tokenizer = transformers.BertTokenizer.from_pretrained(
            "bert-base-uncased", do_lower_case=True
        )
        self.indexes = np.arange(len(self.sentence_pairs))
        self.on_epoch_end()

    def __len__(self):
        # Denotes the number of batches per epoch.
        return len(self.sentence_pairs) // self.batch_size

    def __getitem__(self, idx):
        # Retrieves the batch of index.
        indexes = self.indexes[idx * self.batch_size : (idx + 1) * self.batch_size]
        sentence_pairs = self.sentence_pairs[indexes]

        # With BERT tokenizer's batch_encode_plus batch of both the sentences are
        # encoded together and separated by [SEP] token.
        encoded = self.tokenizer.batch_encode_plus(
            sentence_pairs.tolist(),
            add_special_tokens=True,
            max_length=max_length,
            return_attention_mask=True,
            return_token_type_ids=True,
            pad_to_max_length=True,
            return_tensors="tf",
        )

        # Convert batch of encoded features to numpy array.
        input_ids = np.array(encoded["input_ids"], dtype="int32")
        attention_masks = np.array(encoded["attention_mask"], dtype="int32")
        token_type_ids = np.array(encoded["token_type_ids"], dtype="int32")

        # Set to true if data generator is used for training/validation.
        if self.include_targets:
            labels = np.array(self.labels[indexes], dtype="int32")
            return [input_ids, attention_masks, token_type_ids], labels
        else:
            return [input_ids, attention_masks, token_type_ids]

    def on_epoch_end(self):
        # Shuffle indexes after each epoch if shuffle is set to True.
        if self.shuffle:
            np.random.RandomState(42).shuffle(self.indexes)


## Build the model.

In [None]:
# Create the model under a distribution strategy scope.
strategy = tf.distribute.MirroredStrategy()

with strategy.scope():
    # Encoded token ids from BERT tokenizer.
    input_ids = tf.keras.layers.Input(
        shape=(max_length,), dtype=tf.int32, name="input_ids"
    )
    # Attention masks indicates to the model which tokens should be attended to.
    attention_masks = tf.keras.layers.Input(
        shape=(max_length,), dtype=tf.int32, name="attention_masks"
    )
    # Token type ids are binary masks identifying different sequences in the model.
    token_type_ids = tf.keras.layers.Input(
        shape=(max_length,), dtype=tf.int32, name="token_type_ids"
    )
    # Loading pretrained BERT model.
    bert_model = transformers.TFBertModel.from_pretrained("bert-base-uncased")
    # Freeze the BERT model to reuse the pretrained features without modifying them.
    bert_model.trainable = False

    # sequence_output, pooled_output = bert_model(
    #     input_ids, attention_mask=attention_masks, token_type_ids=token_type_ids
    # )
    bert_output = bert_model(
      input_ids, attention_mask=attention_masks, token_type_ids=token_type_ids
    )
    sequence_output = bert_output.last_hidden_state
    pooled_output = bert_output.pooler_output

    # Add trainable layers on top of frozen layers to adapt the pretrained features on the new data.
    bi_lstm = tf.keras.layers.Bidirectional(
        tf.keras.layers.LSTM(64, return_sequences=True)
    )(sequence_output)
 
    ###########################################
    # Applying hybrid pooling approach to bi_lstm sequence output.
    avg_pool = tf.keras.layers.GlobalAveragePooling1D()(bi_lstm)
    max_pool = tf.keras.layers.GlobalMaxPooling1D()(bi_lstm)
    concat = tf.keras.layers.concatenate([avg_pool, max_pool])
    dropout = tf.keras.layers.Dropout(0.3)(concat)
    output = tf.keras.layers.Dense(2, activation="softmax")(dropout)
    model = tf.keras.models.Model(
        inputs=[input_ids, attention_masks, token_type_ids], outputs=output
    )

    model.compile(
        optimizer=tf.keras.optimizers.Adam(),
        loss="categorical_crossentropy",
        metrics=["acc"],
    )


print(f"Strategy: {strategy}")
model.summary()

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)


Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Cause: while/else statement not yet supported
Cause: while/else statement not yet supported
Instructions for updating:
The `validate_indices` argument has no effect. Indices are always validated on CPU and never validated on GPU.
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then 

Create train and validation data generators

In [None]:
train_data = BertSemanticDataGenerator(
    train_df[["sentence1", "sentence2"]].values.astype("str"),
    y_train,
    batch_size=batch_size,
    shuffle=True,
)
valid_data = BertSemanticDataGenerator(
    valid_df[["sentence1", "sentence2"]].values.astype("str"),
    y_val,
    batch_size=batch_size,
    shuffle=False,
)

## Train the Model

Training is done only for the top layers to perform "feature extraction",
which will allow the model to use the representations of the pretrained model.

In [None]:
history = model.fit(
    train_data,
    validation_data=valid_data,
    epochs=epochs,
    use_multiprocessing=True,
    workers=-1,
)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Epoch 1/3
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Epoch 2/3
Epoch 3/3


## Fine-tuning

This step must only be performed after the feature extraction model has
been trained to convergence on the new data.

This is an optional last step where `bert_model` is unfreezed and retrained
with a very low learning rate. This can deliver meaningful improvement by
incrementally adapting the pretrained features to the new data.

In [None]:
# Unfreeze the bert_model.
bert_model.trainable = True
# Recompile the model to make the change effective.
model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-5),
    loss="categorical_crossentropy",
    metrics=["accuracy"],
)
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_ids (InputLayer)          [(None, 256)]        0                                            
__________________________________________________________________________________________________
attention_masks (InputLayer)    [(None, 256)]        0                                            
__________________________________________________________________________________________________
token_type_ids (InputLayer)     [(None, 256)]        0                                            
__________________________________________________________________________________________________
tf_bert_model (TFBertModel)     TFBaseModelOutputWit 109482240   input_ids[0][0]                  
                                                                 attention_masks[0][0]        

# Train the entire model end-to-end.

In [None]:
history = model.fit(
    train_data,
    validation_data=valid_data,
    epochs=epochs,
    use_multiprocessing=True,
    workers=-1,
)

Epoch 1/3




Epoch 2/3
Epoch 3/3


## Evaluate model on the test set

In [None]:
test_data = BertSemanticDataGenerator(
    test_df[["sentence1", "sentence2"]].values.astype("str"),
    y_test,
    batch_size=batch_size,
    shuffle=False,
)
model.evaluate(test_data, verbose=1)


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.




[0.009088022634387016, 1.0]

In [None]:
# def check_similarity(sentence1, sentence2):
#     sentence_pairs = np.array([[str(sentence1), str(sentence2)]])
#     test_data = BertSemanticDataGenerator(
#         sentence_pairs, labels=None, batch_size=1, shuffle=False, include_targets=False,
#     )

#     proba = model.predict(test_data)[0]
#     idx = np.argmax(proba)
#     proba = f"{proba[idx]: .2f}%"
#     pred = labels[idx]
#     return pred, proba


In [None]:
# preds = []
# probs = []

# for i in range(len(test_df)):
#   sentence1 = train_df['sentence2'][0]
#   sentence2 = test_df.iloc[i, 0]
#   pred, prob = check_similarity(sentence1, sentence2)
#   preds.append(pred)
#   probs.append(prob)

# copy = test_df.copy()
# copy['prediction'] = preds
# copy['probs'] = probs

In [None]:
# acc= np.mean(copy['prediction'] == copy['similarity'])
# print(f'Accuracy: {acc * 100}%')

In [None]:
# copy

In [None]:
# from sklearn.metrics import precision_recall_fscore_support

# eval = precision_recall_fscore_support(copy['similarity'], copy['prediction'], average='macro')
# print(f'precision: {eval[0]}')
# print(f'recall: {eval[1]}')
# print(f'f1: {eval[2]}')

## Inference on custom sentences

In [None]:
def check_similarity(model, sentence1, sentence2, max_length):
    sentence_pairs = np.array([[str(sentence1), str(sentence2)]])
    test_data = BertSemanticDataGenerator(
        sentence_pairs, labels=None, batch_size=1, shuffle=False, include_targets=False
    )
    
    labels = ["contradiction", "entailment"]
    proba = model.predict(test_data)[0]
    idx = np.argmax(proba)
    print(proba)
    proba = f"{round(proba[idx] * 100)}%"
    pred = labels[idx]
    print('-----------------------------------------')
    print(sentence2)
    print(f'pred: {pred}, prob: {proba}')
    print('-----------------------------------------')
    return pred, proba

In [None]:
query = "Bulk modulus can be measured from chemical compound or material."

ent_sents = ['Bulk modulus of Ti was found to be 25 GPa.',
              "The density, adiabatic bulk modulus and P-wave velocity of liquid Fe calculated up to 328.9 GPa.",
             "The calculated bulk modulus of Nb 3 Ir is 328.6 GPa, which is much larger than that of the other Nb Ir compounds.",
             "The first-principles calculation reveals that the marcasite-type NiN2 is a narrow-gap semiconductor, and high-pressure in-situ X-ray diffraction measurements revealed a zero-pressure bulk modulus of 172(6) GPa.",
             "The bulk modulus, shear modulus and Young's modulus of Mo 3 Al are 220.7 GPa, 123.5 GPa and 312.3 GPa, respectively, which are larger than the other Mo–Al alloys."]
                 
for s in ent_sents:
  check_similarity(model, query, s, max_length)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[0.01154541 0.9884546 ]
-----------------------------------------
Bulk modulus of Ti was found to be 25 GPa.
pred: entailment, prob: 99%
-----------------------------------------


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[0.08876456 0.9112354 ]
-----------------------------------------
The density, adiabatic bulk modulus and P-wave velocity of liquid Fe calculated up to 328.9 GPa.
pred: entailment, prob: 91%
-----------------------------------------


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[0.00699744 0.9930026 ]
-----------------------------------------
The calculated bulk modulus of Nb 3 Ir is 328.6 GPa, which is much larger than that of the other Nb Ir compounds.
pred: entailment, prob: 99%
-----------------------------------------


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[0.27677885 0.7232212 ]
-----------------------------------------
The first-principles calculation reveals that the marcasite-type NiN2 is a narrow-gap semiconductor, and high-pressure in-situ X-ray diffraction measurements revealed a zero-pressure bulk modulus of 172(6) GPa.
pred: entailment, prob: 72%
-----------------------------------------


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[0.01928779 0.9807122 ]
-----------------------------------------
The bulk modulus, shear modulus and Young's modulus of Mo 3 Al are 220.7 GPa, 123.5 GPa and 312.3 GPa, respectively, which are larger than the other Mo–Al alloys.
pred: entailment, prob: 98%
-----------------------------------------


In [None]:
cont_sents = ['We found an increase of 25% in sheer bulk modulus of Ni.',
              'Sheer bulk modulus of Ni was found to increase in this experiment.',
               'Ni has a curie temperature of 38.5 C.',
               'Some men are playing a sport',
               "Using Lyakhov and Oganov's model [19], the hardness of C 64 is predicted to be 60.2 GPa.",
               "Similarly, the elastic constants, bulk modulus and other mechanical properties obtained from our calculations are in very good agreement with the values available in the literature.",
               "We calculate elastic constants C ij (GPa), bulk modulus B (GPa), shear modulus G (GPa), Young's modulus Y (GPa) and B/G ratio of PtAlTM.",]

query = "Bulk modulus can be measured from chemical compound or material."

for s in cont_sents:
  check_similarity(model, query, s, max_length)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[0.9851     0.01489999]
-----------------------------------------
We found an increase of 25% in sheer bulk modulus of Ni.
pred: contradiction, prob: 99%
-----------------------------------------


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[0.9921968  0.00780319]
-----------------------------------------
Sheer bulk modulus of Ni was found to increase in this experiment.
pred: contradiction, prob: 99%
-----------------------------------------


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[0.99430877 0.00569126]
-----------------------------------------
Ni has a curie temperature of 38.5 C.
pred: contradiction, prob: 99%
-----------------------------------------


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[0.9983499  0.00165007]
-----------------------------------------
Some men are playing a sport
pred: contradiction, prob: 100%
-----------------------------------------


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[0.97017896 0.02982105]
-----------------------------------------
Using Lyakhov and Oganov's model [19], the hardness of C 64 is predicted to be 60.2 GPa.
pred: contradiction, prob: 97%
-----------------------------------------


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[0.99438125 0.00561883]
-----------------------------------------
Similarly, the elastic constants, bulk modulus and other mechanical properties obtained from our calculations are in very good agreement with the values available in the literature.
pred: contradiction, prob: 99%
-----------------------------------------


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[0.83049184 0.16950814]
-----------------------------------------
We calculate elastic constants C ij (GPa), bulk modulus B (GPa), shear modulus G (GPa), Young's modulus Y (GPa) and B/G ratio of PtAlTM.
pred: contradiction, prob: 83%
-----------------------------------------


Check results on some example sentence pairs.

In [None]:
model.save('sem_sim_ep2_bs16_ml256_query_diff')





INFO:tensorflow:Assets written to: sem_sim_ep2_bs16_ml256_query_diff/assets


INFO:tensorflow:Assets written to: sem_sim_ep2_bs16_ml256_query_diff/assets


In [None]:
sentence1 = train_df['sentence2'][0]
sentence2 = "Bulk modulus of Ti was found to be 25 GPa."
check_similarity(sentence1, sentence2)

TypeError: ignored

Check results on some example sentence pairs.

In [None]:
sentence1 = train_df['sentence2'][0]
sentence2 = "We found an increase of 25% in sheer bulk modulus of Ni."
check_similarity(sentence1, sentence2)

In [None]:
sentence1 = train_df['sentence2'][0]
sentence2 = "Sheer bulk modulus of Ni was found to increase in this experiment."
check_similarity(sentence1, sentence2)

In [None]:
sentence1 = train_df['sentence2'][0]
sentence2 = "Ni has a curie temperature of 38.5 C."
check_similarity(sentence1, sentence2)

Check results on some example sentence pairs

In [None]:
sentence1 = train_df['sentence2'][0]
sentence2 = "Some men are playing a sport"
check_similarity(sentence1, sentence2)