# Social media post generator

The goal of this notebook is to build a [social media generator](https://ai21-social-media-generator.streamlit.app/) that would create a post for Twitter or LinkedIn based on an article.

The flow of the process is as follows:
1. Get an article from a URL and summarize it.
2. Create a post based on that summary.
3. Filter out results and choose the best ranked generation.

### Imports and settings

In [1]:
import requests
import re
import ai21
import pandas as pd

### API key
In order to run this notebook, you will need an API key for AI21 Studio. How can you get it?

Create a free account at [AI21 Studio](https://studio.ai21.com/account/api-key). Then you will find you API key in the *API Key* tab under your name.

In [2]:
# TODO: fill your API key from AI21 Studio account
ai21.api_key = ""
assert ai21.api_key != "", "You must provide an API key!"

## Read the article and summarize

You can use our out-of-the-box summarizer, which only require you to input some text (either as a string or a URL) and get a grounded summaries that remain faithful to the original document (i.e. no external information is added during the process). The summaries are formatted as bullet lists, following the original text flow.

For this notebook, we will use a post about [simplifying our Jurassic-2 offering](https://www.ai21.com/blog/simplifying-our-jurassic-2-offering), where we discuss the changes we made to provide a clear and powerful lineup of foundation models. 

In [3]:
url = "https://www.ai21.com/blog/simplifying-our-jurassic-2-offering"

summary = ai21.Summarize.execute(source=url, sourceType="URL")['summary']
print(summary)

We are streamlining our foundation models to three, and renaming them Ultra, Mid and Light.
Three months ago, we launched Jurassic-2, our next generation foundation models. These models include instruct capabilities.
We’ve spent the last three months gathering user feedback, and as always, are constantly on the lookout for new ways to improve our technology, as well as ease of use for our customers.
We found that users were confused by the five different foundation models and the names of the models, Large, Grande and Jumbo, made it difficult to differentiate the models by their relative sizes and capabilities.
We are excited to announce that we are making some adjustments to our Jurassic-2 offering based on our learnings, in order to make the decision making process for our users more simple and intuitive.
We are now offering three foundation models, and all of them include instruct capabilities. These models perform as well as our non-instruct models for both zero-shot and few-shot p

# Let's generate

It's time to create our social media post!

Let's start by making a function that creates the prompt for this, depending on the platform.

In [4]:
def create_prompt(media, summary):    
    post_type = "tweet" if media == "Twitter" else "Linkedin post"
    prompt = f"""Article summary:
    {summary}
    Write a catchy {post_type} to promote an article based on the above summary.
    """
    
    return prompt

Let's take a look at a prompt for our case:

In [5]:
media = "Twitter" # "Twitter", "Linkedin"
prompt = create_prompt(media=media, summary=summary)
print(prompt)

Article summary:
    We are streamlining our foundation models to three, and renaming them Ultra, Mid and Light.
Three months ago, we launched Jurassic-2, our next generation foundation models. These models include instruct capabilities.
We’ve spent the last three months gathering user feedback, and as always, are constantly on the lookout for new ways to improve our technology, as well as ease of use for our customers.
We found that users were confused by the five different foundation models and the names of the models, Large, Grande and Jumbo, made it difficult to differentiate the models by their relative sizes and capabilities.
We are excited to announce that we are making some adjustments to our Jurassic-2 offering based on our learnings, in order to make the decision making process for our users more simple and intuitive.
We are now offering three foundation models, and all of them include instruct capabilities. These models perform as well as our non-instruct models for both zer

Calling Jurassic-2 Ultra to generate several completions for us to choose from:

In [6]:
response = ai21.Completion.execute(prompt=prompt,
                                   model="j2-ultra",
                                   maxTokens=200,
                                   temperature=0.8,
                                   numResults=16 # this will make the model generate 16 optional completions
                                   )

for comp in response['completions']:
    print(comp['data']['text'].strip())
    print("=============")

Jurassic-2 just got simpler! Streamlined to three foundation models: Ultra, Mid and Light. Now with instruct capabilities! #LanguageGeneration
Jurassic-2 Ultra, Mid and Light are the next generation of foundation models, offering exceptional quality and affordability. All models include instruct capabilities, so users can hit the ground running faster.
Jurassic-2 Ultra, Mid and Light: three powerful foundation models for complex language generation tasks. Learn more here! #Jurassic2
Jurassic-2 Ultra, Mid, and Light are here to simplify your decision making for complex language generation tasks. All three models include instruct capabilities and are continuously being improved.
Jurassic-2 Ultra, Mid and Light are our new foundation models- all include instruct capabilities and are continuously improving. Learn more now! #machinelearning #deeplearning
Jurassic-2 Ultra, Mid and Light have arrived! Our next generation foundation models now include instruct capabilities and have been rename

# Filter results

It's wise to apply some filters and ranking to the generated outputs, to choose the one that suits our needs best.

In this example, we use two basic filters:

1. **Length:** Posts should not be too short, or exceed the maximum characters allowed in either platform.

2. **Text diversity:** Our posts should be different from the original summary, rather than verbatim.

In [7]:
def is_length_valid(text, media):
    """
    This function makes sure that a given text is between a range of maximum and minimum character limit
    """
    CHAR_LIMIT = {"Twitter": (30, 280), "Linkedin": (100, 1500)}
    min_length, max_length = CHAR_LIMIT[media]
    return min_length <= len(text) <= max_length


def is_diverse(input_text, output_text, th=0.7):
    """
    This function makes sure that an input text and output text do not overlap too much, according to a threshold
    """
    input_words = input_text.strip().split()
    output_words = output_text.strip().split()
    if len(input_words) == 0 or len(output_words) == 0:
        return True
    output_prefix = output_words[:len(input_words)]
    overlap = set(output_prefix) & set(input_words)
    return len(overlap) / len(output_prefix) < th


def apply_filters(completion, prompt, media):
    """
    This function applies both filters from before
    """
    # Only consider completions that ended in a natural way
    if completion["finishReason"]["reason"] != "endoftext":
        return False
    text = completion['data']['text']
    return is_length_valid(text, media) and is_diverse(text, prompt)

Next, let's apply the filters to the given completions and display them in a dataframe with the length and probability of the generated text.

In [8]:
completions_filtered = [comp for comp in response['completions'] if apply_filters(comp, summary, media)]
pd.DataFrame([{'text': comp['data']['text'], 'length': len(comp['data']['text']),
               'prob': sum(tok['generatedToken']['logprob'] for tok in comp['data']['tokens'])} for comp in completions_filtered])

Unnamed: 0,text,length,prob
0,Jurassic-2 just got simpler! Streamlined to th...,142,-19.037806
1,"Jurassic-2 Ultra, Mid and Light are the next g...",211,-19.112564
2,"Jurassic-2 Ultra, Mid and Light: three powerfu...",132,-20.337919
3,"Jurassic-2 Ultra, Mid, and Light are here to s...",197,-28.87666
4,"Jurassic-2 Ultra, Mid and Light are our new fo...",175,-32.497679
5,"Jurassic-2 Ultra, Mid and Light have arrived! ...",188,-33.754031
6,Jurassic-2 just got better! Introducing a stre...,132,-38.895818
7,Jurassic-2 just got a major upgrade! New strea...,162,-39.663603
8,"Jurassic-2 Ultra, Mid and Light models provide...",168,-40.295752
9,Jurassic-2 foundation models get simplified to...,163,-40.89125


# Post processing - remove hallucinations and select the top ranked

In the generation process, the model may produce links or email addresses. This is very normal, as most posts do include something like this. There is a good chance that those links or emails are made up ("hallucinations"). We replace them with placeholders that the user can fill in.

In [9]:
def remove_hallucinations(text):
    # Replace emails or links with an indicator
    text = re.sub(r'https?:\/\/.*', '[URL]', text)
    return re.sub(r'([A-Za-z0-9]+[.-_])*[A-Za-z0-9]+@[A-Za-z0-9-]+(\.[A-Z|a-z]{2,})+', '[EMAIL]', text)

Now, we will choose the top ranked generation (according to the probability), remove the hallucinations (if exist) and return the post:

In [10]:
post = completions_filtered[0]['data']['text']
post = remove_hallucinations(post)
print(post)

Jurassic-2 just got simpler! Streamlined to three foundation models: Ultra, Mid and Light. Now with instruct capabilities! #LanguageGeneration
