# Instructions

This demo shows how you can load our fine-tuned transformer models for a given topic, define some example text to feed through the models, and see the inferences each model makes.

# Code

## Loading Models

First, choose the topic you'd which the models will classify for (in this example we choose prayer -- "תפילה")

In [None]:
# CHOOSE YOUR TOPIC, options are ישראל, למוד, תורה, תפלה, תשובה
TOPIC = 'תפלה'

Next we define the paths to the files we'll need for our models:

In [None]:
topic_to_english = {
    'ישראל' : 'yisrael',
    'תורה' : 'torah',
    'תשובה' : 'teshuva',
    'תפלה' : 'tefillah',
    'למוד' : 'limmud'
}
def get_transformer_model_path(topic, base_model_name):
  return f't4-project/{topic_to_english[topic]}-{base_model_name}'

BEREL_BASE_PATH = 't4-project/BEREL-base'

Next we install the huggingface transformers library

In [None]:
! pip install transformers




[notice] A new release of pip available: 22.3 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


And finally, we load the models

In [None]:
from transformers import AutoModelForSequenceClassification, TextClassificationPipeline
from transformers import BertTokenizerFast

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
from transformers import BertTokenizer
from rabtokenizer import RabbinicTokenizer
from transformers import AutoTokenizer

In [None]:
alephBERT_tokenizer = BertTokenizerFast.from_pretrained('onlplab/alephbert-base')
alephBERT_loaded_model = AutoModelForSequenceClassification.from_pretrained(get_transformer_model_path(TOPIC, 'alephBERT'), num_labels=2)
alephBERT_pipe = TextClassificationPipeline(model=alephBERT_loaded_model, tokenizer=alephBERT_tokenizer, return_all_scores=True)



In [None]:
berel_loaded_model = AutoModelForSequenceClassification.from_pretrained(get_transformer_model_path(TOPIC, 'BEREL'), num_labels=2)
berel_tokenizer = RabbinicTokenizer(BertTokenizer.from_pretrained(BEREL_BASE_PATH, model_max_length=512))
berel_pipe =TextClassificationPipeline(model=berel_loaded_model, tokenizer=berel_tokenizer, return_all_scores=True)



In [None]:
heBERT_tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT", model_max_length=512)
heBERT_loaded_model = AutoModelForSequenceClassification.from_pretrained(get_transformer_model_path(TOPIC, 'heBERT'), num_labels=2)
heBERT_pipe = TextClassificationPipeline(model=heBERT_loaded_model, tokenizer=heBERT_tokenizer, return_all_scores=True)

Downloading (…)lve/main/config.json: 100%|██████████| 505/505 [00:00<00:00, 524kB/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Downloading (…)solve/main/vocab.txt: 100%|██████████| 299k/299k [00:00<00:00, 2.49MB/s]


## Inference

Here you can define any text as a string to feed into the models. For this example, we choose one sentence that is highly correlated with the topic (it translates to "standing before the Holy One Blessed be He, and calling out with all of one's heart about one's troubles") and one sentence that is not at all correlated (it translates to "it is nothing more than frivolity and light-headedness, and a son of Torah should not engage in these matters").

Note that for the related sentence, we choose a sentence that does not contain the topic's word, and yet the models are able to deduce the relationship.

In [None]:
is_topic_text = 'עומד לפני הקדוש ברוך הוא וצועק בלב שלם על צרותיו'
is_not_topic_text = 'אין זה אלא שחוק וקלות ראש, ואין לבן תורה לעסוק בענינים אלו'

To read the predictions, see that we are shown the score for LABEL_0 which means "not related to the topic," and LABEL_1 which means "related to the topic

In [None]:
print(f'Model Predictions for a text that is tagged with the topic:\n {alephBERT_pipe(is_topic_text)}')
print(f'Model Predictions for a text that is NOT tagged with the topic:\n {alephBERT_pipe(is_not_topic_text)}')

Model Predictions for a text that is tagged with the topic:
 [[{'label': 'LABEL_0', 'score': 0.015154956839978695}, {'label': 'LABEL_1', 'score': 0.9848451018333435}]]
Model Predictions for a text that is NOT tagged with the topic:
 [[{'label': 'LABEL_0', 'score': 0.9984642267227173}, {'label': 'LABEL_1', 'score': 0.0015357907395809889}]]


In [None]:
print(f'Model Predictions for a text that is tagged with the topic:\n {berel_pipe(is_topic_text)}')
print(f'Model Predictions for a text that is NOT tagged with the topic:\n {berel_pipe(is_not_topic_text)}')

Model Predictions for a text that is tagged with the topic:
 [[{'label': 'LABEL_0', 'score': 0.0011187914060428739}, {'label': 'LABEL_1', 'score': 0.9988811612129211}]]
Model Predictions for a text that is NOT tagged with the topic:
 [[{'label': 'LABEL_0', 'score': 0.3649466931819916}, {'label': 'LABEL_1', 'score': 0.635053277015686}]]


In [None]:
print(f'Model Predictions for a text that is tagged with the topic:\n {heBERT_pipe(is_topic_text)}')
print(f'Model Predictions for a text that is NOT tagged with the topic:\n {heBERT_pipe(is_not_topic_text)}')

Model Predictions for a text that is tagged with the topic:
 [[{'label': 'LABEL_0', 'score': 0.0016728010959923267}, {'label': 'LABEL_1', 'score': 0.9983271956443787}]]
Model Predictions for a text that is NOT tagged with the topic:
 [[{'label': 'LABEL_0', 'score': 0.9907053112983704}, {'label': 'LABEL_1', 'score': 0.009294671006500721}]]


As expected, all models predict that the related sentence is related to the topic and therefore LABEL_1 is high, and the opposite for the unrelated sentence.

To try other texts, simply feed any string into `alephBERT_pipe`, `berel_pipe`, or `heBERT_pipe`