# Generate prompts

We have a separate notebook for generating the prompts that we want for our project. Let's create prompts that both have context and don't have context.

In [8]:
import json

from ml_tooling.llm.prompt_helper import generate_complete_prompt_for_post_link

Let's get the list of links to get prompts for

In [None]:
with open ("links_to_prompts_map.json", 'r') as f:
    links_to_prompt_map = json.load(f)

In [None]:
links_list = [
    link for (link, _) in links_to_prompt_map.items()
]

In [None]:
links_prompts_lst = [
    (link, prompt) for (link, prompt) in links_to_prompt_map.items()
]

In [None]:
# spot-checking to see if these are correct. If we see the same error
# as before, the post in the link shouldn't match the prompt
print(links_prompts_lst[200][0])
print(links_prompts_lst[200][1])

In [None]:
# spot-checking to see if these are correct. If we see the same error
# as before, the post in the link shouldn't match the prompt
print(links_prompts_lst[300][0])
print(links_prompts_lst[300][1])

In [None]:
# spot-checking to see if these are correct. If we see the same error
# as before, the post in the link shouldn't match the prompt
print(links_prompts_lst[-1][0])
print(links_prompts_lst[-1][1])

These look good now, so let's dump these links so we can save those.

In [None]:
with open("links_list.json", 'w') as f:
    json.dump(links_list, f)

Now let's get the prompts for each link. Our previous attempt to get the links assumes that we want context. Let's create new versions of the prompts, both with and without context, so we can test both.

In [9]:
with open("links_list.json", 'r') as f:
    loaded_links_list = json.load(f)

In [23]:
def create_prompts_for_each_link(
    links: list[str], task_name: str
) -> dict:
    """Creates prompts for each link."""
    links_to_prompt_map = {}
    for link in links:
        try:
            context_prompt = generate_complete_prompt_for_post_link(
                link=link,
                task_name=task_name,
                include_context=True,
                only_json_format=True
            )
            no_context_prompt = generate_complete_prompt_for_post_link(
                link=link,
                task_name=task_name,
                include_context=False,
                only_json_format=True
            )
            links_to_prompt_map[link] = {
                "context_prompt": context_prompt,
                "no_context_prompt": no_context_prompt,
                # to see how often adding context actually changes our
                # prompt so far.
                "prompts_are_equal": context_prompt == no_context_prompt
            }
        except Exception as e:
            print(f"Error with link {link}: {e}")
            continue
    return links_to_prompt_map

Let's do it on a subset, to make sure that we're on the right track.

In [None]:
links_to_prompt_map = create_prompts_for_each_link(
    loaded_links_list[0:10], "civic_and_political_ideology"
)

Let's take a look at the prompts

In [27]:
example_link = list(links_to_prompt_map.keys())[8]

In [28]:
example_link

'https://bsky.app/profile/jgownder.bsky.social/post/3knji5ltct32a'

In [29]:
example_prompts = links_to_prompt_map[example_link]

In [30]:
print(example_prompts["context_prompt"])
print('#' * 10)
print('#' * 10)
print(example_prompts["no_context_prompt"])
print('#' * 10)
print('#' * 10)
print(example_prompts["prompts_are_equal"])




Pretend that you are a classifier that predicts whether a post has civic content or not. Civic refers to whether a given post is related to politics (government, elections, politicians, activism, etc.) or social issues (major issues that affect a large group of people, such as the economy, inequality, racism, education, immigration, human rights, the environment, etc.). We refer to any content that is classified as being either of these two categories as “civic”; otherwise they are not civic. Please classify the following text denoted in <text> as "civic" or "not civic". 

Then, if the post is civic, classify the text based on the political lean of the opinion or argument it presents. Your options are 'left-leaning', 'moderate', 'right-leaning', or 'unclear'. You are analyzing text that has been pre-identified as 'political' in nature. If the text is not civic, return "unclear".

Think through your response step by step.

Return in a JSON format in the following way:
{
    "civic": <

OK, these look great, so let me run these for all the links

In [None]:
# runs in ~8 minutes
links_to_prompt_map: dict = create_prompts_for_each_link(
    loaded_links_list, "civic_and_political_ideology"
)

Let's spot-check these

In [35]:
links_prompts_lst = [
    (link, prompt) for (link, prompt) in links_to_prompt_map.items()
]

In [None]:
# spot-checking to see if these are correct. If we see the same error
# as before, the post in the link shouldn't match the prompt
print(links_prompts_lst[200][0])
print(links_prompts_lst[200][1]["context_prompt"])

In [38]:
# spot-checking to see if these are correct. If we see the same error
# as before, the post in the link shouldn't match the prompt
print(links_prompts_lst[300][0])
print(links_prompts_lst[300][1]["context_prompt"])

https://bsky.app/profile/paulgcornish.bsky.social/post/3knsux5iyet2h


Pretend that you are a classifier that predicts whether a post has civic content or not. Civic refers to whether a given post is related to politics (government, elections, politicians, activism, etc.) or social issues (major issues that affect a large group of people, such as the economy, inequality, racism, education, immigration, human rights, the environment, etc.). We refer to any content that is classified as being either of these two categories as “civic”; otherwise they are not civic. Please classify the following text denoted in <text> as "civic" or "not civic". 

Then, if the post is civic, classify the text based on the political lean of the opinion or argument it presents. Your options are 'left-leaning', 'moderate', 'right-leaning', or 'unclear'. You are analyzing text that has been pre-identified as 'political' in nature. If the text is not civic, return "unclear".

Think through your response step by 

In [37]:
# spot-checking to see if these are correct. If we see the same error
# as before, the post in the link shouldn't match the prompt
print(links_prompts_lst[-1][0])
print(links_prompts_lst[-1][1]["context_prompt"])
print(links_prompts_lst[-1][1]["prompts_are_equal"])

https://bsky.app/profile/merrittk.com/post/3kntzspsje52z


Pretend that you are a classifier that predicts whether a post has civic content or not. Civic refers to whether a given post is related to politics (government, elections, politicians, activism, etc.) or social issues (major issues that affect a large group of people, such as the economy, inequality, racism, education, immigration, human rights, the environment, etc.). We refer to any content that is classified as being either of these two categories as “civic”; otherwise they are not civic. Please classify the following text denoted in <text> as "civic" or "not civic". 

Then, if the post is civic, classify the text based on the political lean of the opinion or argument it presents. Your options are 'left-leaning', 'moderate', 'right-leaning', or 'unclear'. You are analyzing text that has been pre-identified as 'political' in nature. If the text is not civic, return "unclear".

Think through your response step by step.

Retur

In [34]:
with open("links_to_prompts_map_v2.json", 'w') as f:
    json.dump(links_to_prompt_map, f)