# Fueling Generative Content with Keyword Research

Generative models have proven extremely useful in content idea generation. But they don’t take into account user search demand and trends. In this notebook, let’s see how we can solve that by adding keyword research into the equation.

Read the accompanying [blog post here](https://txt.cohere.ai/generative-content-keyword-research/).

In [None]:
# Install packages
! pip install cohere topically > /dev/null

In [None]:
# Wrap output
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)


In [None]:
import cohere
import numpy as np
import pandas as pd
from topically import Topically
from sklearn.cluster import KMeans

co = cohere.Client('api_key') # Add your Cohere API key here

# Step 1: Get a list of High-performing Keywords 

In [None]:
# Download the pre-created dataset (feel free to replace with your CSV file, containing two columns - "keyword" and "volume")
!wget "https://raw.githubusercontent.com/cohere-ai/notebooks/main/notebooks/data/remote_teams.csv" -O remote_teams.csv

--2023-03-30 11:08:34--  https://raw.githubusercontent.com/cohere-ai/notebooks/main/notebooks/data/remote_teams.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3764 (3.7K) [text/plain]
Saving to: ‘remote_teams.csv’


2023-03-30 11:08:34 (44.7 MB/s) - ‘remote_teams.csv’ saved [3764/3764]



In [None]:
# Create a dataframe
df = pd.read_csv('remote_teams.csv')
df.columns = ["keyword","volume"]
df.head()

Unnamed: 0,keyword,volume
0,managing remote teams,1000
1,remote teams,390
2,collaboration tools for remote teams,320
3,online games for remote teams,320
4,how to manage remote teams,260


# Step 2: Group the Keywords into Topics 

### Embed the Keywords with co.embed

In [None]:
def embed_text(text):
  output = co.embed(
                model='large',
                texts=text)
  return output.embeddings


embeds = np.array(embed_text(df['keyword'].tolist()))

### Cluster the Keywords into Topics with scikit-learn

In [None]:
NUM_TOPICS = 4
kmeans = KMeans(n_clusters=NUM_TOPICS, random_state=21, n_init="auto").fit(embeds)
df['topic'] = list(kmeans.labels_)
df.head()

Unnamed: 0,keyword,volume,topic
0,managing remote teams,1000,2
1,remote teams,390,2
2,collaboration tools for remote teams,320,0
3,online games for remote teams,320,1
4,how to manage remote teams,260,2


### Generate Topic Names with Topically

In [None]:
# Load topically
app = Topically('api_key') # Add your Cohere API key here

# Name clusters
df['topic_name'], _ = app.name_topics((df['keyword'], df['topic']))

df.head()

Unnamed: 0,keyword,volume,topic,topic_name
0,managing remote teams,1000,2,Managing remote teams
1,remote teams,390,2,Managing remote teams
2,collaboration tools for remote teams,320,0,Collaboration tools for remote teams
3,online games for remote teams,320,1,Virtual games for remote teams
4,how to manage remote teams,260,2,Managing remote teams


In [None]:
# View the list of topics
topic2name = {}
topic2name = dict(df.groupby('topic')['topic_name'].first())
for key, value in topic2name.items():
  print(value)

Collaboration tools for remote teams
Virtual games for remote teams
Managing remote teams
remote team building activities


# Step 3: Generate Blog Post Ideas for Each Topic

### Take the Top Keywords from Each Topic

In [None]:
TOP_N = 10

# Group the DataFrame by topic and select the top N keywords sorted by volume
top_keywords = (df.groupby('topic')
                        .apply(lambda x: x.nlargest(TOP_N, 'volume'))
                        .reset_index(drop=True))


# Convert the DataFrame to a nested dictionary
content_by_topic = {}
for topic, group in top_keywords.groupby('topic'):
    keywords = ', '.join(list(group['keyword']))
    topic2name = topic2name = dict(df.groupby('topic')['topic_name'].first())
    topic_name = topic2name[topic]
    content_by_topic[topic] = {'topic_name': topic_name, 'keywords': keywords}

In [None]:
# Print the topics and they top keywords
content_by_topic

{0: {'topic_name': 'Collaboration tools for remote teams',
  'keywords': 'collaboration tools for remote teams, best collaboration tools for remote teams, tools for remote teams, zapier remote teams, best communication tools for remote teams, free collaboration tools for remote teams, free retrospective tools for remote teams, project management tools for remote teams, best tools for remote teams, collaboration remote teams'},
 1: {'topic_name': 'Virtual games for remote teams',
  'keywords': 'online games for remote teams, games for remote teams, retro ideas for remote teams, retrospective games for remote teams, virtual games for remote teams, agile games for remote teams, fun games for remote teams, whiteboard for remote teams, always on video for remote teams, best games for remote teams'},
 2: {'topic_name': 'Managing remote teams',
  'keywords': 'managing remote teams, remote teams, how to manage remote teams, leading remote teams, managing remote teams best practices, remote tea

### Create a Prompt with These Keywords

In [None]:
def generate_blog_ideas(keywords):
  prompt = f"""{keywords}\n\nThe above is a list of high-traffic keywords obtained from a keyword research tool. 
Suggest three blog post ideas that are highly relevant to these keywords. 
For each idea, write a one paragraph abstract about the topic. 
Use this format:
Blog title: <text>
Abstract: <text>"""
  
  response = co.generate(
    model='command',
    prompt = prompt,
    max_tokens=300,
    temperature=0.9)
  return response.generations[0].text


### Generate Content Ideas

In [None]:
# Generate content ideas
for key,value in content_by_topic.items():
  value['ideas'] = generate_blog_ideas(value['keywords'])


# Print the results
for key,value in content_by_topic.items():
  print(f"Topic Name: {value['topic_name']}")
  print(f"Top Keywords: {value['keywords']}")
  print(f"Blog Post Ideas: {value['ideas']}")
  print("-"*50)

Topic Name: Collaboration tools for remote teams
Top Keywords: collaboration tools for remote teams, best collaboration tools for remote teams, tools for remote teams, zapier remote teams, best communication tools for remote teams, free collaboration tools for remote teams, free retrospective tools for remote teams, project management tools for remote teams, best tools for remote teams, collaboration remote teams
Blog Post Ideas: 

Blog title: The Best Collaboration Tools for Remote Teams
Abstract: Are you looking for ways to improve collaboration among your remote team? In this blog post, we'll explore some of the best collaboration tools on the market and how they can help your team stay connected and productive, even when working from different locations. From video conferencing to project management tools, we'll cover everything you need to know to choose the right tools for your team.

Blog title: Free Retrospective Tools for Remote Teams
Abstract: Are you looking for ways to impr