# Using Generation Models for Summarization
This notebook demonstrates a simple way of using Cohere's generation models to summarize text.

We will use a simple prompt that includes two examples and a task description: 

`"<input phrase>". In summary: "<summary>"`.

In [4]:
# Let's first install Cohere's python SDK
!pip install cohere

Collecting cohere
  Downloading cohere-0.0.13-py3-none-any.whl (6.9 kB)
Installing collected packages: cohere
Successfully installed cohere-0.0.13


In [5]:
import cohere
import time
import pandas as pd
# Paste your API key here. Remember to not share it publicly 
api_key = ''
co = cohere.Client(api_key)


Our prompt is geared for paraphrasing to simplify an input sentence. It contains two examples. The sentence we want it to summarize is:

**Killer whales have a diverse diet, although individual populations often specialize in particular types of prey.**

In [9]:
prompt = '''"The killer whale or orca (Orcinus orca) is a toothed whale belonging to the oceanic dolphin family, of which it is the largest member". In summary: "The killer whale or orca is the largest type of dolphin"

"It is recognizable by its black-and-white patterned body". In summary:"Its body has a black and white pattern."

"Killer whales have a diverse diet, although individual populations often specialize in particular types of prey". In summary:"'''
print(prompt)

"The killer whale or orca (Orcinus orca) is a toothed whale belonging to the oceanic dolphin family, of which it is the largest member". In summary: "The killer whale or orca is the largest type of dolphin"

"It is recognizable by its black-and-white patterned body". In summary:"Its body has a black and white pattern."

"Killer whales have a diverse diet, although individual populations often specialize in particular types of prey". In summary:"


We get several completions from the model via the API

In [7]:
n_completions = 5
gens = []
likelihoods = []
for i in range(n_completions):
  prediction = co.generate(
    model='baseline-shark',
    prompt=prompt,
    return_likelihoods = 'GENERATION',
    stop_sequences=['"'],
    max_tokens=50,
    temperature=0.7,
    k=0,
    p=0.75)

  gens.append(prediction.text)
  sum_likelihood = 0
  for t in prediction.token_likelihoods:
    sum_likelihood += t['likelihood']
  likelihoods.append(sum_likelihood)


In [10]:
pd.options.display.max_colwidth = 200
# Create a dataframe for the generated sentences and their likelihood scores
df = pd.DataFrame({'generation':gens, 'likelihood': likelihoods})
# Drop duplicates
df = df.drop_duplicates(subset=['generation'])
# Sort by highest sum likelihood
df = df.sort_values('likelihood', ascending=False, ignore_index=True)
print('Candidate summaries for the sentence: \n"Killer whales have a diverse diet, although individual populations often specialize in particular types of prey."')
df

Candidate summaries for the sentence: 
"Killer whales have a diverse diet, although individual populations often specialize in particular types of prey."


Unnamed: 0,generation,likelihood
0,"Killer whales have a diverse diet, although individual populations often specialize in particular types of prey.""",-4.878381
1,"Killer whales eat many different types of animals.""",-10.016438
2,"They eat fish, marine mammals, and sea birds.""",-12.062806
3,"Their diet consists of fish, seabirds, marine mammals, turtles, and other cetaceans.""",-18.373332
4,"They eat fish, shrimp, squid, seals, other dolphins, birds, and possibly even humans.""",-29.302818


## Hyperparameters
It's worth spending some time learning the various hyperparameters of the generation endpoint. Parameters like [temperature](https://docs.cohere.ai/temperature-wiki), for example, which tunes the degree of randomness in the generations. Other parameters include `frequency_penalty` and `presence_penalty` which can reduce the amount of repetition in the output of the model. See the [API reference of the generate endpoint](https://docs.cohere.ai/generate-reference) for more details on all the parameters.