# Hello World! Meet Language AI

Here we take a quick tour of what’s possible with language AI via Cohere’s Large Language Model (LLM) API. This is the Hello, World! of language AI, written for developers with little or no background in AI. In fact, we’ll do that by exploring the Hello, World! phrase itself.

Read the accompanying [blog post here](https://txt.cohere.ai/hello-world-p1/).

![Hello World! Meet Language AI](https://github.com/cohere-ai/notebooks/raw/main/notebooks/images/hello-world/hello-world-feat.png)

We’ll cover three groups of tasks that you will typically work on when dealing with language data, including:
- Generating text
- Classifying text
- Analyzing text


The first step is to install the Cohere Python SDK. Next, create an API key, which you can generate from the Cohere [dashboard](https://os.cohere.ai/register) or [CLI tool](https://docs.cohere.ai/cli-key).

In [None]:
# Install the libraries
! pip install cohere altair umap-learn > /dev/null

In [None]:
# Import the libraries
import cohere
import pandas as pd
import numpy as np
import altair as alt
import textwrap as tr

# Setup the Cohere client
api_key = 'api_key' # Paste your API key here. Remember to not share it publicly 
co = cohere.Client(api_key)

# 1 - Generating Text

The Cohere Generate endpoint generates text given an input, called “prompt”. The prompt provides a context of what we want the model to generate text. To illustrate this, let’s start with a simple prompt as the input. 

### Try a Simple Prompt

In [None]:
# Create a simple one-line prompt
prompt = "Hello World is a program that"

# Generate text by calling the Generate endpoint
response = co.generate(
  model='base',
  prompt=prompt,
  max_tokens=75,
  temperature=0.4)

output = response.generations[0].text
output = tr.fill(output, width=100)
print(output)

 prints "Hello World" to the screen.  To write a Hello World program, you need to create a new file
and save it as HelloWorld.py. Then, you can write the following code into the file:  print("Hello
World!")  Save the file and run it using Python.  If you don't have Python installed,


### Create a Better Prompt

The output is not bad, but it can be better. We need to find a way to make the output tighter to how we want it to be, which is where we leverage *prompt engineering*.

In [None]:
# Create a prompt containing a short description, examples, and stop sequences
prompt = """
This program will generate the first paragraph of a blog post given a blog title.
--
Blog Title: Best Activities in Toronto
First Paragraph: Looking for fun things to do in Toronto? When it comes to exploring Canada's
largest city, there's an ever-evolving set of activities to choose from. Whether you're looking to
visit a local museum or sample the city's varied cuisine, there is plenty to fill any itinerary. In
this blog post, I'll share some of my favorite recommendations
--
Blog Title: Mastering Dynamic Programming
First Paragraph: In this piece, we'll help you understand the fundamentals of dynamic programming,
and when to apply this optimization technique. We'll break down bottom-up and top-down approaches to
solve dynamic programming problems.
--
Blog Title: Learning to Code with Hello, World!
First Paragraph:"""

# Generate text by calling the Generate endpoint
response = co.generate(
  model='base',
  prompt=prompt,
  max_tokens=75,
  temperature=0.4,
  stop_sequences=["--"])

output = response.generations[0].text
output = tr.fill(output, width=100)
print(output.strip())

Coding is a fun and exciting way to learn the basics of computer science. In this article, we'll
review the fundamentals of programming, including variables, functions, conditional statements, and
loops. We'll also discuss how to use Python to write code that prints "Hello, World!" --


### Automating the Process

In real applications, you will likely need to produce these text generations on an ongoing basis, given different inputs. Let’s simulate that with our example.

In [None]:
# A function that generates text given a base prompt and a new topic
def generate_text(base_prompt, current_prompt):
  """
  Generate text given a prompt
  Arguments:
    base_prompt(str): the base prompt containing the examples
    current_prompt(str): the new topic to generate
  Returns:
    generation(str): the newly generated output text
  """
  # Generate text by calling the Generate endpoint
  response = co.generate(
    model='base',
    prompt = base_prompt + current_prompt,
    max_tokens=75,
    temperature=0.4,
    stop_sequences=["--"])
  generation = response.generations[0].text

  return generation

We create a base prompt containing the examples, and then we append it to the current prompt, which is the new topic.

In [None]:
# The base prompt
base_prompt = """
This program will generate the first paragraph of a blog post given a blog title.
--
Blog Title: Best Activities in Toronto
First Paragraph: Looking for fun things to do in Toronto? When it comes to exploring Canada's
largest city, there's an ever-evolving set of activities to choose from. Whether you're looking to
visit a local museum or sample the city's varied cuisine, there is plenty to fill any itinerary. In
this blog post, I'll share some of my favorite recommendations
--
Blog Title: Mastering Dynamic Programming
First Paragraph: In this piece, we'll help you understand the fundamentals of dynamic programming,
and when to apply this optimization technique. We'll break down bottom-up and top-down approaches to
solve dynamic programming problems.
--
Blog Title:"""

In [None]:
# The list of topics
topics = ["How to Grow in Your Career",
          "The Habits of Great Software Developers",
          "Ideas for a Relaxing Weekend"]

In [None]:
# Keep the generations in a list of paragraphs
paragraphs = []

for topic in topics:
  current_prompt = " " + topic + "\n" + "First Paragraph:"
  para = generate_text(base_prompt, current_prompt)
  para = para.strip().replace("--","")
  paragraphs.append(para)

In [None]:
# Display the generated paragraphs
for topic,para in zip(topics,paragraphs):
  print(f"Topic: {topic}")
  print(f"First Paragraph: {para}")
  print("-"*10)

Topic: How to Grow in Your Career
First Paragraph: If you've been working in the same position for a while, you may be wondering how
to grow in your career. In this article, we'll discuss how to advance your career and take your
skills to the next level. We'll cover how to get promoted, how to get a raise, and how to find a new
job.

----------
Topic: The Habits of Great Software Developers
First Paragraph: What makes a great software developer? What separates them from the rest? What are
the habits of great software developers? In this post, I will share with you the habits of great
software developers.

----------
Topic: Ideas for a Relaxing Weekend
First Paragraph: If you're looking for ideas for a relaxing weekend, there are plenty of options
available. Whether you're hoping to spend some time with friends or family, or you're looking for a
quiet weekend at home, you can find a relaxing weekend activity that fits your needs. Here are
some ideas for a relaxing weekend.

----------


# 2 - Classifying Text

Cohere’s Classify endpoint makes it easy to take a list of texts and predict their categories, or classes. A typical machine learning model requires many training examples to perform text classification, but with the Classify endpoint, you can get started with as few as 5 examples per class.

### Sentiment Analysis

In [None]:
# Create the training examples for the classifier
from cohere.responses.classify import Example

examples = [Example("I’m so proud of you", "positive"), 
            Example("What a great time to be alive", "positive"), 
            Example("That’s awesome work", "positive"), 
            Example("The service was amazing", "positive"), 
            Example("I love my family", "positive"), 
            Example("They don't care about me", "negative"), 
            Example("I hate this place", "negative"), 
            Example("The most ridiculous thing I've ever heard", "negative"), 
            Example("I am really frustrated", "negative"), 
            Example("This is so unfair", "negative"),
            Example("This made me think", "neutral"), 
            Example("The good old days", "neutral"), 
            Example("What's the difference", "neutral"), 
            Example("You can't ignore this", "neutral"), 
            Example("That's how I see it", "neutral")            
            ]

In [None]:
# Enter the inputs to be classified
inputs=["Hello, world! What a beautiful day",
        "It was a great time with great people",
        "Great place to work",
        "That was a wonderful evening",
        "Maybe this is why",
        "Let's start again",
        "That's how I see it",
        "These are all facts",
        "This is the worst thing",
        "I cannot stand this any longer",
        "This is really annoying",
        "I am just plain fed up"
        ]

In [None]:
# A function that classifies a list of inputs given the examples
def classify_text(inputs, examples):
  """
  Classify a list of input texts
  Arguments:
    inputs(list[str]): a list of input texts to be classified
    examples(list[Example]): a list of example texts and class labels
  Returns:
    classifications(list): each result contains the text, labels, and conf values
  """
  # Classify text by calling the Classify endpoint
  response = co.classify(
    model='embed-english-v2.0',
    inputs=inputs,
    examples=examples)
  
  classifications = response.classifications
  
  return classifications

In [None]:
# Classify the inputs
predictions = classify_text(inputs,examples)

# Display the classification outcomes
classes = ["positive","negative","neutral"]
for inp,pred in zip(inputs,predictions):
  class_pred = pred.prediction
  class_idx = classes.index(class_pred)
  class_conf = pred.confidence

  print(f"Input: {inp}")
  print(f"Prediction: {class_pred}")
  print(f"Confidence: {class_conf:.2f}")
  print("-"*10)

Input: Hello, world! What a beautiful day
Prediction: positive
Confidence: 0.83
----------
Input: It was a great time with great people
Prediction: positive
Confidence: 0.99
----------
Input: Great place to work
Prediction: positive
Confidence: 0.91
----------
Input: That was a wonderful evening
Prediction: positive
Confidence: 0.96
----------
Input: Maybe this is why
Prediction: neutral
Confidence: 0.70
----------
Input: Let's start again
Prediction: neutral
Confidence: 0.83
----------
Input: That's how I see it
Prediction: neutral
Confidence: 1.00
----------
Input: These are all facts
Prediction: neutral
Confidence: 0.78
----------
Input: This is the worst thing
Prediction: negative
Confidence: 0.93
----------
Input: I cannot stand this any longer
Prediction: negative
Confidence: 0.93
----------
Input: This is really annoying
Prediction: negative
Confidence: 0.99
----------
Input: I am just plain fed up
Prediction: negative
Confidence: 1.00
----------


# 3 - Analyzing Text

Cohere’s Embed endpoint takes a piece of text and turns it into a vector embedding. Embeddings represent text in the form of numbers that capture its meaning and context. What it means is that it gives you the ability to turn unstructured text data into a structured form. It opens up ways to analyze and extract insights from them.


## Get embeddings

Here we have a list of 50 top web search terms about Hello, World! taken from a keyword tool. Let’s look at a few examples:

In [None]:
# Get a list of texts and add to a dataframe
df = pd.read_csv("https://github.com/cohere-ai/notebooks/raw/main/notebooks/data/hello-world-kw.csv", names=["search_term"])
df.head()

Unnamed: 0,search_term
0,how to print hello world in python
1,what is hello world
2,how do you write hello world in an alert box
3,how to print hello world in java
4,how to write hello world in eclipse


We use the Embed endpoint to get the embeddings for each of these serach terms.

In [None]:
# A function that classifies a list of inputs given the examples
def embed_text(texts):
  """
  Turns a piece of text into embeddings
  Arguments:
    text(str): the text to be turned into embeddings
  Returns:
    embedding(list): the embeddings
  """
  # Embed text by calling the Embed endpoint
  output = co.embed(
                model="embed-english-v2.0",
                texts=texts)
  embedding = output.embeddings

  return embedding

In [None]:
# Get embeddings of all search terms
df["search_term_embeds"] = embed_text(df["search_term"].tolist())
embeds = np.array(df["search_term_embeds"].tolist())

### Semantic Search

We’ll look at a couple of example applications. The first example is semantic search. Given a new query, our "search engine" must return the most similar FAQs, where the FAQs are the 50 search terms we uploaded earlier.


In [None]:
# Add a new query
new_query = "what is the history of hello world"

# Get embeddings of the new query
new_query_embeds = embed_text([new_query])[0]

We use cosine similarity to compare the similarity of the new query with each of the FAQs

In [None]:
# Calculate cosine similarity

from sklearn.metrics.pairwise import cosine_similarity

def get_similarity(target, candidates):
  """
  Computes the similarity between a target text and a list of other texts
  Arguments:
    target(list[float]): the target text
    candidates(list[list[float]]): a list of other texts, or candidates
  Returns:
    sim(list[tuple]): candidate IDs and the similarity scores
  """
  # Turn list into array
  candidates = np.array(candidates)
  target = np.expand_dims(np.array(target),axis=0)

  # Calculate cosine similarity
  sim = cosine_similarity(target,candidates)
  sim = np.squeeze(sim).tolist()

  # Sort by descending order in similarity
  sim = list(enumerate(sim))
  sim = sorted(sim, key=lambda x:x[1], reverse=True)

  # Return similarity scores
  return sim

Finally, we display the top 5 FAQs that match the new query

In [None]:
# Get the similarity between the new query and existing queries
similarity = get_similarity(new_query_embeds,embeds)

# Display the top 5 FAQs
print("New query:")
print(new_query,'\n')

print("Similar queries:")
for idx,score in similarity[:5]:
  print(f"Similarity: {score:.2f};", df.iloc[idx]["search_term"])

New query:
what is the history of hello world 

Similar queries:
Similarity: 0.91; how did hello world originate
Similarity: 0.88; where did hello world come from
Similarity: 0.86; what is hello world
Similarity: 0.77; why is hello world so famous
Similarity: 0.70; why hello world


### Semantic Exploration

In the second example, we take the same idea as semantic search and take a broader look, which is exploring huge volumes of text and analyzing their semantic relationships.

We'll use the same 50 top web search terms about Hello, World! There are different techniques we can use to compress the embeddings down to just 2 dimensions while retaining as much information as possible. We'll use a technique called UMAP. And once we can get it down to 2 dimensions, we can plot these embeddings on a 2D chart.

In [None]:
# Reduce the embeddings' dimensions to 2 using UMAP
import umap
reducer = umap.UMAP(n_neighbors=49) 
umap_embeds = reducer.fit_transform(embeds)

# Add the 2 dimensions to the dataframe
df['x'] = umap_embeds[:,0]
df['y'] = umap_embeds[:,1]

In [None]:
# Plot the 2-dimension embeddings on a chart
chart = alt.Chart(df).mark_circle(size=500).encode(
  x=
  alt.X('x',
      scale=alt.Scale(zero=False),
      axis=alt.Axis(labels=False, ticks=False, domain=False)
  ),

  y=
  alt.Y('y',
      scale=alt.Scale(zero=False),
      axis=alt.Axis(labels=False, ticks=False, domain=False)
  ),
  
  tooltip=['search_term']
  )

text = chart.mark_text(align='left', dx=15, size=12, color='black'
          ).encode(text='search_term', color= alt.value('black'))

result = (chart + text).configure(background="#FDF7F0"
      ).properties(
      width=1000,
      height=700,
      title="2D Embeddings"
      )

result.interactive()