# Generate Challenge Headlines

## Overview

This notebook generates challenge headlines for challenges fetched from OpenChallenges (OC).

## Requirements

- Access to OpenChallenges REST API
- Access to OpenAI API

## List challenges

In [1]:
import openchallenges_client
from pprint import pprint
from openchallenges_client.api import challenge_api

In [2]:
# See configuration.py for a list of all supported configuration parameters.
configuration = openchallenges_client.Configuration(
    host = "https://openchallenges.io/api/v1"
)

In [3]:
# Enter a context with an instance of the API client
challenges = []
with openchallenges_client.ApiClient(configuration) as api_client:
    api_instance = challenge_api.ChallengeApi(api_client)
    
    query = openchallenges_client.ChallengeSearchQuery(page_number=1000, page_size=1)

    try:
        # Get the first page of the list of challenges
        page = api_instance.list_challenges(query)
        challenges.extend(page.challenges)
    except openchallenges_client.ApiException as e:
        print("Exception when calling ChallengeApi->list_challenges: %s\n" % e)

In [4]:
from dotenv import load_dotenv

load_dotenv()

True

In [5]:
import openai

In [6]:
# Source: https://medium.com/muthoni-wanyoike/implementing-text-summarization-using-openais-gpt-3-api-dcd6be4f6933
def split_text(text):
    max_chunk_size = 2048
    chunks = []
    current_chunk = ""
    for sentence in text.split("."):
        if len(current_chunk) + len(sentence) < max_chunk_size:
            current_chunk += sentence + "."
        else:
            chunks.append(current_chunk.strip())
            current_chunk = sentence + "."
    if current_chunk:
        chunks.append(current_chunk.strip())
    return chunks

In [7]:
def generate_challenge_headline(text):
    prompt=(
        "Please generate five headlines that have a maximum ten words from the following "
        "challenge description. The headline must summarize the goal of the challenge. "
        f"Description: \n{text}"
    )
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt},
        ],
        max_tokens=1024,
        temperature=0.5
    )
    return response['choices'][0]['message']['content']

In [8]:
challenge = challenges[0]
result = generate_challenge_headline(challenge.description)
pprint(result)


('1. NIDDK Data Centric Challenge: Enhancing Repository for AI Research\n'
 '2. Unlocking Insights: NIDDK Challenge Improves Data Quality for AI\n'
 "3. NIDDK-CR's Data Centric Challenge: Advancing AI-driven Discovery\n"
 '4. Bridging the Gap: NIDDK Challenge Boosts Data Collaboration for AI\n'
 '5. NIDDK Repository Challenge: Making Research Data FAIR for AI')


In [28]:
from itertools import compress
import json

raw_headlines = result.splitlines()

def is_raw_headline(raw_headline):
    prefixes = ("1. ", "2. ", "3. ", "4. ", "5. ")
    return raw_headline.startswith(prefixes)

headlines = list(compress(raw_headlines, map(is_raw_headline, raw_headlines)))

obj = {
    "id": challenge.id,
    "slug": challenge.slug,
    "name": challenge.name,
    "headline": challenge.headline,
    "headline_alternatives": headlines
}
json_str = json.dumps(obj, indent=2)

print(json_str)

{
  "id": 279,
  "slug": "niddk-central-repository-data-centric-challenge",
  "name": "NIDDK Central Repository Data-Centric Challenge",
  "headline": "Enhancing NIDDK datasets for future Artificial Intelligence (AI) applications.",
  "headline_alternatives": [
    "1. NIDDK Data Centric Challenge: Enhancing Repository for AI Research",
    "2. Unlocking Insights: NIDDK Challenge Improves Data Quality for AI",
    "3. NIDDK-CR's Data Centric Challenge: Advancing AI-driven Discovery",
    "4. Bridging the Gap: NIDDK Challenge Boosts Data Collaboration for AI",
    "5. NIDDK Repository Challenge: Making Research Data FAIR for AI"
  ]
}


## Generating challenge headlines with AWS LLM

### Configure Bedrock client

In [5]:
import json
import os
import sys

import boto3
import botocore

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww

os.environ["AWS_DEFAULT_REGION"] = "us-east-1"
os.environ["AWS_PROFILE"] = "cnb"

boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None)
)

Create new client
  Using region: us-east-1
  Using profile: cnb
boto3 Bedrock client successfully created!
bedrock-runtime(https://bedrock-runtime.us-east-1.amazonaws.com)


### Configure base model options

In [6]:
from langchain.llms.bedrock import Bedrock

inference_modifier = {'max_tokens_to_sample':6000, 
                      "temperature":0.6,
                      "top_k":250,
                      "top_p":1,
                      "stop_sequences": ["\n\nHuman"]
                     }

textgen_llm = Bedrock(model_id = "anthropic.claude-v2",
                    client = boto3_bedrock, 
                    model_kwargs = inference_modifier 
                    )


Call API and output results

In [9]:
def generate_challenge_headline(text):
    prompt=(
        "Please generate five headlines that have a maximum ten words from the following "
        "challenge description. The headline must summarize the goal of the challenge. "
        f"Description: \n{text}"
    )
    response = Bedrock(model_id = "anthropic.claude-v2",
                    client = boto3_bedrock, 
                    model_kwargs = inference_modifier 
                )(prompt)
    return response

Authentify with AWS using the command:

```console
aws --profile cnb sso login
```

In [10]:
challenge = challenges[0]
result = generate_challenge_headline(challenge.description)
pprint(result)


(' Here are 5 headlines with a maximum of 10 words summarizing the goal of the '
 'challenge:\n'
 '\n'
 '1. Challenge Seeks to Standardize Data for AI Discovery\n'
 '\n'
 '2. Improve Data Quality for AI Research, Says NIDDK Challenge \n'
 '\n'
 '3. NIDDK Launches Data Challenge to Boost AI Reuse\n'
 '\n'
 '4. Challenge Aims to Ready Data for AI Insights\n'
 '\n'
 '5. Data Challenge Targets Interoperability for AI')
