### Example of Model Composition

<img src="images/PatternsMLProduction.png" width="70%" height="40%">

In this short tutorial we going to use HuggingFace Transformer 🤗 to accomplish three tasks:
 1. Analyse the sentiment of a tweet: Positive or Negative
 2. Translate it into French
 3. Demonstrate the model composition deployment pattern
 
 <img src="images/sentiment_analysis.jpeg" width="70%" height="40%">

#### Install HuggingFace Transformers and Torch modules

In [1]:
%pip install transformers torch

Note: you may need to restart the kernel to use updated packages.


In [2]:
from transformers import TranslationPipeline, TextClassificationPipeline
from transformers import AutoTokenizer, AutoModelWithLMHead, AutoModelForSequenceClassification
import torch
import requests
from ray import serve

These are example 🐦 tweets, some made up, some extracted from a dog lover's twitter handle. In a real use case,
these could come live from a Tweeter handle using [Twitter APIs](https://developer.twitter.com/en/docs/twitter-api/getting-started/getting-access-to-the-twitter-api). 

In [3]:
TWEETS = ["Tonight on my walk, I got mad because mom wouldn't let me play with this dog. We stared at each other...he never blinked!",
          "Sometimes. when i am bored. i will stare at nothing. and try to convince the human. that there is a ghost",
          "You little dog shit, you peed and pooed on my new carpet. Bad dog!",
          "I would completely believe you. Dogs and little children - very innocent and open to seeing such things",
          "You've got too much time on your paws. Go check on the skittle. under the, fridge",
          "You sneaky little devil, I can't live without you!!!",
          "It's true what they say about dogs: they are you BEST BUDDY, no matter what!",
          "This dog is way dope, just can't enough of her",
          "This dog is way cool, just can't enough of her",
          "Is a dog really the best pet?",
          "Cats are better than dogs",
          "Totally dissastified with the dog. Worst dog ever",
          "Briliant dog! Reads my moods like a book. Senses my moods and reacts. What a companinon!"
          ]

Utiliy function to fetch a tweet; these could very well be live tweets coming from Twitter API for a user or a #hashtag

In [4]:
def fetch_tweet_text(i):
    text = TWEETS[i]
    return text

### Sentiment model deployment

Our function deployment model to analyse the tweet using a pretrained transformer from HuggingFace 🤗.
Note we have number of `replicas=1` but to scale it, we can increase the number of replicas, as
we have done below.

In [5]:
@serve.deployment(num_replicas=1)
def sentiment_model(text: str):
    tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
    model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
    pipeline = TextClassificationPipeline(model=model, tokenizer=tokenizer, task="sentiment-analysis")

    return pipeline(text)[0]['label'], pipeline(text)[0]['score']

### Translation model deployment

Our function to translate a tweet from English --> French using a pretrained Transformer from HuggingFace 🤗

In [6]:
# Function to translate a tweet from English --> French 
# using a pretrained Transformer from HuggingFace
@serve.deployment(num_replicas=2)
def translate_model(text: str):
    tokenizer = AutoTokenizer.from_pretrained("t5-small")
    model = AutoModelWithLMHead.from_pretrained("t5-small")
    use_gpu = 0 if torch.cuda.is_available() else -1
    pipeline = TranslationPipeline(model, tokenizer, task="translation_en_to_fr", device=use_gpu)

    return pipeline(text)[0]['translation_text']

### Use the Model Composition pattern

<img src="images/tweet_composition.png" width="60%" height="25%">

A composed class is deployed with both sentiment analysis and translations models' ServeHandles initialized in the constructor

In [7]:
@serve.deployment(route_prefix="/composed", num_replicas=2)
class ComposedModel:
    def __init__(self):
        # fetch and initialize deployment handles
        self.translate_model = translate_model.get_handle(sync=False)
        self.sentiment_model = sentiment_model.get_handle(sync=False)

    async def __call__(self, starlette_request):
        data = starlette_request.query_params['data']

        sentiment, score = await(await self.sentiment_model.remote(data))
        trans_text = await(await self.translate_model.remote(data))

        return {'Sentiment': sentiment, 'score': score, 'Translated Text': trans_text}

Start a Ray Serve instance. Note that if Ray cluster does not exist, it will create one and attach the Ray Serve
instance to it. If one exists it'll run on that Ray cluster instance.

In [8]:
serve.start()

2022-06-01 16:50:41,942	INFO services.py:1456 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m
[2m[36m(ServeController pid=47597)[0m 2022-06-01 16:50:45,829	INFO checkpoint_path.py:15 -- Using RayInternalKVStore for controller checkpoint and recovery.
[2m[36m(ServeController pid=47597)[0m 2022-06-01 16:50:45,937	INFO http_state.py:106 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:WMenKP:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
2022-06-01 16:50:46,664	INFO api.py:794 -- Started Serve instance in namespace 'serve'.


<ray.serve.api.Client at 0x7f88fa5e4be0>

### Deploy our models 

Deploy our models. As seen before in other tutorials, this is as simple and intuitive as invoking `<func_or_class_name>.deploy()`.

In [9]:
sentiment_model.deploy()
translate_model.deploy()
ComposedModel.deploy()

2022-06-01 16:50:46,679	INFO api.py:615 -- Updating deployment 'sentiment_model'. component=serve deployment=sentiment_model
[2m[36m(HTTPProxyActor pid=47605)[0m INFO:     Started server process [47605]
[2m[36m(ServeController pid=47597)[0m 2022-06-01 16:50:46,780	INFO deployment_state.py:1216 -- Adding 1 replicas to deployment 'sentiment_model'. component=serve deployment=sentiment_model
2022-06-01 16:50:48,692	INFO api.py:630 -- Deployment 'sentiment_model' is ready at `http://127.0.0.1:8000/sentiment_model`. component=serve deployment=sentiment_model
2022-06-01 16:50:48,698	INFO api.py:615 -- Updating deployment 'translate_model'. component=serve deployment=translate_model
[2m[36m(ServeController pid=47597)[0m 2022-06-01 16:50:48,779	INFO deployment_state.py:1216 -- Adding 2 replicas to deployment 'translate_model'. component=serve deployment=translate_model
2022-06-01 16:50:50,713	INFO api.py:630 -- Deployment 'translate_model' is ready at `http://127.0.0.1:8000/translate_

### Send HTTP requests to our deployment model

In [None]:
for i in range(len(TWEETS)):
    tweet = fetch_tweet_text(i)
    print(F"Sending tweet request... : {tweet}")
    resp = requests.get("http://127.0.0.1:8000/composed", params={'data': tweet})
    print(resp.json())

Sending tweet request... : Tonight on my walk, I got mad because mom wouldn't let me play with this dog. We stared at each other...he never blinked!


[2m[36m(translate_model pid=47615)[0m For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
[2m[36m(translate_model pid=47615)[0m - Be aware that you SHOULD NOT rely on t5-small automatically truncating your input to 512 when padding/encoding.
[2m[36m(translate_model pid=47615)[0m - If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


{'Sentiment': 'POSITIVE', 'score': 0.965121328830719, 'Translated Text': "Ce soir, j'ai été fou parce que ma mère ne me laisse pas jouer avec ce chien."}
Sending tweet request... : Sometimes. when i am bored. i will stare at nothing. and try to convince the human. that there is a ghost
{'Sentiment': 'NEGATIVE', 'score': 0.99788898229599, 'Translated Text': "Parfois. quand j'ennuie. je ne regarderai rien. et essayerai de convaincre l'homme."}
Sending tweet request... : You little dog shit, you peed and pooed on my new carpet. Bad dog!


[2m[36m(translate_model pid=47616)[0m For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
[2m[36m(translate_model pid=47616)[0m - Be aware that you SHOULD NOT rely on t5-small automatically truncating your input to 512 when padding/encoding.
[2m[36m(translate_model pid=47616)[0m - If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


{'Sentiment': 'NEGATIVE', 'score': 0.9984055161476135, 'Translated Text': "Je n'ai pas eu l'impression d'avoir eu l'impression d'avoir eu l'impression d'avoir eu l'impression d'avoir eu l'impression d'avoir eu l'impression d'avoir eu l'impression d'avoir eu l'impression d'avoir eu l'impression d'avoir eu l'impression"}
Sending tweet request... : I would completely believe you. Dogs and little children - very innocent and open to seeing such things
{'Sentiment': 'POSITIVE', 'score': 0.9997748732566833, 'Translated Text': 'Je vous croyais tout à fait: chiens et petits enfants - très innocents et ouverts à ce genre de choses'}
Sending tweet request... : You've got too much time on your paws. Go check on the skittle. under the, fridge


Gracefully shutdown the Ray serve instance.

In [None]:
serve.shutdown()

### Exercise

1. Add more tweets with different sentiments.
2. Check the score (and if you speak and read French, what you think of the translation?)

### Homework

1. Instead of French, use a language transformer of your choice
2. What about Neutral tweets? Try using [vaderSentiment](https://github.com/cjhutto/vaderSentiment)
3. Solution for 2) is [here](https://github.com/anyscale/academy/blob/main/ray-serve/05-Ray-Serve-SentimentAnalysis.ipynb)