<a href="https://colab.research.google.com/github/aLehav/Olami/blob/main/AntisemitismSentiment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# [Antisemitism Labeling Project](https://github.com/Nicolas-le/antisemitism-detector)
# [DistilBERT](https://paperswithcode.com/method/distillbert#)
# [PyTorch Sentiment Analysis](https://github.com/bentrevett/pytorch-sentiment-analysis)

In [26]:
!pip install -q transformers
from transformers import pipeline
import spacy
from spacy import displacy
import tweepy

 # 1. Get Data

We will try to use [Tweepy](https://www.tweepy.org/) to get twitter data relating to certain tags.

 We import an example message from [CyberWell](https://app.cyberwell.org/index.php):

The text is then tokenized into the doc object and the entities visualized with displaCy's render function

In the future, an entity linker using a spaCy `KnowlegeBase` and an `EntityLinker` can be .


In [19]:
text = "These crypto-Kabbalist Jewish mystics infiltrated the Vatican through Pope Callixtus III and Pope Alexander VI, both of the House of Borgia, as agents of the Crown of Aragon."

nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
displacy.render(doc, style="ent", jupyter=True)


# 2. Import Models and Run on Data

Models from [HuggingFace](https://huggingface.co/blog/sentiment-analysis-python) can then be used for sentiment analysis and pipelining.

A generic model can be used or a particular model, like in the case below [roBERTa](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment), with labels being negative, neutral, then positive. roBERTa is fine-tuned for tweets and so is a likely good choice for sentiment analysis.

In [22]:
sentiment_pipeline = pipeline("sentiment-analysis")
data = [text, "I love Jews", "I hate Jews"]
sentiment_pipeline(data)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

[{'label': 'NEGATIVE', 'score': 0.8017222285270691},
 {'label': 'POSITIVE', 'score': 0.9994775652885437},
 {'label': 'NEGATIVE', 'score': 0.9971469044685364}]

In [25]:
specific_model = pipeline(model="cardiffnlp/twitter-roberta-base-sentiment")
specific_model(data)

Downloading (…)lve/main/config.json:   0%|          | 0.00/747 [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/499M [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

[{'label': 'LABEL_1', 'score': 0.8552122116088867},
 {'label': 'LABEL_2', 'score': 0.9523991346359253},
 {'label': 'LABEL_0', 'score': 0.9706779718399048}]