<a href="https://colab.research.google.com/github/TheophileBlard/french-sentiment-analysis-with-bert/blob/master/colab/french_sentiment_analysis_with_bert.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# French sentiment analysis with BERT

This notebook acts as an online demo for [this repository](https://github.com/TheophileBlard/french-sentiment-analysis-with-bert).

With this notebook, you can perform inference on your own sentences. 
You cannot train the model, to do so, please clone the repo.

*This is still experimental, so let me know if something doesn't work !*

**Author**: Théophile Blard ([LinkedIn](https://www.linkedin.com/in/theophile-blard))

## Loading data and librairies

Loads the model (> 400 mB) from Google Drive.

*Please run these cells, otherwise inference won't work.*

In [1]:
!sudo wget -O /usr/sbin/gdrivedl 'https://f.mjh.nz/gdrivedl'
!sudo chmod +x /usr/sbin/gdrivedl
!gdrivedl https://drive.google.com/open?id=1GL0zdThuAECX6zo1rA_ExKha1CPu1h_h camembert_sentiment.tar.xz
!tar xf camembert_sentiment.tar.xz

--2020-03-29 10:12:02--  https://f.mjh.nz/gdrivedl
Resolving f.mjh.nz (f.mjh.nz)... 104.28.30.233, 104.28.31.233, 2606:4700:3032::681c:1ee9, ...
Connecting to f.mjh.nz (f.mjh.nz)|104.28.30.233|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1362 (1.3K) [application/octet-stream]
Saving to: ‘/usr/sbin/gdrivedl’


2020-03-29 10:12:02 (27.6 MB/s) - ‘/usr/sbin/gdrivedl’ saved [1362/1362]

File ID: 1GL0zdThuAECX6zo1rA_ExKha1CPu1h_h
Downloading: https://docs.google.com/uc?export=download&id=1GL0zdThuAECX6zo1rA_ExKha1CPu1h_h > camembert_sentiment.tar.xz.434.file
Downloading: https://docs.google.com/uc?export=download&id=1GL0zdThuAECX6zo1rA_ExKha1CPu1h_h&confirm=euVD > camembert_sentiment.tar.xz.434.file
Moving: camembert_sentiment.tar.xz.434.file > camembert_sentiment.tar.xz
Saved: camembert_sentiment.tar.xz
DONE!


In [2]:
!pip install transformers

import numpy as np

import tensorflow as tf
assert tf.__version__ >= "2.0"



 ## Preprocessing functions


*Please run this cell, otherwise inference won't work.*

In [0]:
from transformers import CamembertTokenizer, TFCamembertForSequenceClassification

def encode_reviews(tokenizer, reviews, max_length):
    token_ids = np.zeros(shape=(len(reviews), max_length),
                         dtype=np.int32)
    for i, review in enumerate(reviews):
        encoded = tokenizer.encode(review, max_length=max_length)
        token_ids[i, 0:len(encoded)] = encoded
    attention_mask = (token_ids != 0).astype(np.int32)
    return {"input_ids": token_ids, "attention_mask": attention_mask}
  
TOKENIZER = "camembert-base" # Downloaded
MODEL = "camembert_sentiment" # Local model
MAX_SEQ_LEN = 400

tokenizer = CamembertTokenizer.from_pretrained(TOKENIZER)
model = TFCamembertForSequenceClassification.from_pretrained(MODEL)

## Inference

Here, you can enter your own sentences, and click on the "CLASSIFY!" button to feed BERT with your input. You can expand the text area by dragging the bottom right corner.

*Please run this cell, or nothing will happen. You don't have to click on SHOW CODE*.

In [4]:
#@title
import ipywidgets as widgets
from IPython.display import display

class Color:
   PURPLE = '\033[95m'
   CYAN = '\033[96m'
   DARKCYAN = '\033[36m'
   BLUE = '\033[94m'
   GREEN = '\033[92m'
   YELLOW = '\033[93m'
   RED = '\033[91m'
   BOLD = '\033[1m'
   UNDERLINE = '\033[4m'
   END = '\033[0m'

button = widgets.Button(
    description='CLASSIFY !',
    button_style='success'
  )

text_area = widgets.Textarea(
    value='',
    placeholder='Type something',
    description='',
    disabled=False
)
output = widgets.Output()

def on_button_clicked(b):
  text = text_area.value  
  X = encode_reviews(tokenizer, [text], MAX_SEQ_LEN)
  scores = model.predict(X)
  y_pred = np.argmax(scores[0], axis=1)  

  if y_pred == 1:
    prediction = "Positive"
    color = Color.GREEN    
  else:    
    prediction = "Negative" 
    color = Color.RED

  with output:    
    print(Color.BOLD + color + f'{prediction}: ' + Color.END + f'"{text[:50]}"')

button.on_click(on_button_clicked)
display(text_area, button, output)

Textarea(value='', placeholder='Type something')

Button(button_style='success', description='CLASSIFY !', style=ButtonStyle())

Output()