In [1]:
import warnings
warnings.filterwarnings('ignore')
import os

import tool_pool as tp

from langchain_openai import ChatOpenAI
from langchain import hub
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain import hub
from langchain.agents import AgentExecutor
from langchain.memory import ChatMessageHistory

from IPython.display import Markdown, display



The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /c1/yumin/.cache/huggingface/token
Login successful


본 노트북을 실행하기 위해선 Semantic scholar key와 Openai api key가 필요합니다.

In [2]:
api_key = os.getenv("SEMANTIC_SCHOLAR_API_KEY")
openai_key = os.getenv("OPENAI_API_KEY")

### Tool List
loadpaper : HTML 기반으로 Section list를 받고, Tool을 한 번 더 실행하는 방식으로 동작  
loadpaper_wo_figure : loadpaper 에서 figure 기능이 제외된 Tool  
loadpaper_wo_figure_wo_section : loadpaper에서 figure, section 기능이 제외된 Tool  
recommendpaper : 관련 논문을 찾아주는 Tool. citation, reference mode 두 가지로 동작  
codematching : 논문 깃허브에서 관련 코드를 찾아주는 Tool  
  
tool_pool.py에서 직접 tool을 custom 할 수 있습니다.

In [3]:
# tools = [tp.loadpaper, tp.recommendpaper, tp.codematching, tp.loadpaper_wo_figure, tp.loadpaper_wo_figure_wo_section]
tools = [tp.loadpaper_wo_figure_wo_section, tp.recommendpaper]
# load Agent prompt
prompt = hub.pull("hwchase17/openai-tools-agent")

### base model 선택

In [4]:
llm = ChatOpenAI(model="gpt-4o", temperature=1)
# llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=1)

### Real Use

In [5]:
class PaperAssistant:
    def __init__(self, llm, verbose):
        self.memory = None
        self.llm = llm
        self.verbose = verbose

        self.agent_executor = self.build_agent_executor()
        self.agent_with_chat_history = self.bulid_agent_with_chat_history()


    def build_agent_executor(self):
        # Construct the OpenAI Tools agent
        agent = create_openai_tools_agent(self.llm, tools, prompt)
        # Create an agent executor by passing in the agent and tools
        agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=self.verbose)
        return agent_executor
    
    def bulid_agent_with_chat_history(self):
        memory = ChatMessageHistory(session_id="test-session")

        agent_with_chat_history = RunnableWithMessageHistory(
            self.agent_executor,
            # This is needed because in most real world scenarios, a session id is needed
            # It isn't really used here because we are using a simple in memory ChatMessageHistory
            lambda session_id: memory,
            input_messages_key="input",
            history_messages_key="chat_history",
        )
        return agent_with_chat_history

    def reset_memory(self, session_id='temp'):
        self.agent_with_chat_history = self.bulid_agent_with_chat_history()

    def send_message(self, message, session_id='temp'):
        output = self.agent_with_chat_history.invoke(
            {"input": message},
            config={"configurable": {"session_id": session_id}},
        )
        display(Markdown(output['output']))

    def __call__(self, message):
        self.send_message(message)


In [6]:
# Tool 실행 과정을 생략하려면 verbose = False로 설정하세요
assistant = PaperAssistant(llm, verbose=True)

In [7]:
assistant('"Attention is all you need"에서 positional encoding 에 대해 설명해줘')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m"Attention is All You Need"는 2017년에 Vaswani et al.가 발표한 논문으로, Transformer라는 모델을 제안합니다. 이 모델은 자연어 처리(NLP)와 관련된 여러 작업에 혁신적 변화를 가져왔습니다. Transformer의 핵심 구성 요소 중 하나는 포지셔널 인코딩(Positional Encoding)입니다.

### 포지셔널 인코딩(Positional Encoding)

포지셔널 인코딩은 Transformer 모델이 입력 시퀀스의 순서를 인식할 수 있도록 하는 중요한 구성 요소입니다. RNN(Recurrent Neural Networks)이나 LSTM(Long Short-Term Memory) 같은 기존의 순환 신경망은 순차적인 특성 덕분에 입력 데이터의 순서를 자연스럽게 인코딩할 수 있습니다. 하지만, Transformer는 순환 구조가 없기 때문에 순서에 대한 정보를 명시적으로 주입해 줄 필요가 있습니다.

포지셔널 인코딩은 이러한 역할을 수행합니다. 아래는 포지셔널 인코딩의 세부적인 설명입니다:

1. **사인과 코사인 함수 사용**:
   Transformer는 다양한 주파수의 사인(sin) 및 코사인(cos) 함수를 사용하여 포지셔널 인코딩을 생성합니다. 위치 \( pos \)와 차원 \( i \)에 대한 포지셔널 인코딩은 다음과 같은 공식으로 정의됩니다:

   \[
   PE(pos, 2i) = \sin \left(\frac{pos}{10000^{\frac{2i}{d}}}\right)
   \]
   \[
   PE(pos, 2i+1) = \cos \left(\frac{pos}{10000^{\frac{2i}{d}}}\right)
   \]

   여기서 \( pos \)는 위치 인덱스, \( i \)는 차원 인덱스, \( d \)는 모델의 차원입니다. 짝수 위치는 사인 함수를, 홀수 위치는 코사인 함수

"Attention is All You Need"는 2017년에 Vaswani et al.가 발표한 논문으로, Transformer라는 모델을 제안합니다. 이 모델은 자연어 처리(NLP)와 관련된 여러 작업에 혁신적 변화를 가져왔습니다. Transformer의 핵심 구성 요소 중 하나는 포지셔널 인코딩(Positional Encoding)입니다.

### 포지셔널 인코딩(Positional Encoding)

포지셔널 인코딩은 Transformer 모델이 입력 시퀀스의 순서를 인식할 수 있도록 하는 중요한 구성 요소입니다. RNN(Recurrent Neural Networks)이나 LSTM(Long Short-Term Memory) 같은 기존의 순환 신경망은 순차적인 특성 덕분에 입력 데이터의 순서를 자연스럽게 인코딩할 수 있습니다. 하지만, Transformer는 순환 구조가 없기 때문에 순서에 대한 정보를 명시적으로 주입해 줄 필요가 있습니다.

포지셔널 인코딩은 이러한 역할을 수행합니다. 아래는 포지셔널 인코딩의 세부적인 설명입니다:

1. **사인과 코사인 함수 사용**:
   Transformer는 다양한 주파수의 사인(sin) 및 코사인(cos) 함수를 사용하여 포지셔널 인코딩을 생성합니다. 위치 \( pos \)와 차원 \( i \)에 대한 포지셔널 인코딩은 다음과 같은 공식으로 정의됩니다:

   \[
   PE(pos, 2i) = \sin \left(\frac{pos}{10000^{\frac{2i}{d}}}\right)
   \]
   \[
   PE(pos, 2i+1) = \cos \left(\frac{pos}{10000^{\frac{2i}{d}}}\right)
   \]

   여기서 \( pos \)는 위치 인덱스, \( i \)는 차원 인덱스, \( d \)는 모델의 차원입니다. 짝수 위치는 사인 함수를, 홀수 위치는 코사인 함수를 사용하여 값을 계산합니다.

2. **임베딩 벡터에 추가**:
   이렇게 생성된 포지셔널 인코딩 값들은 단어 임베딩 벡터에 더해집니다. 이로 인해 모델은 각 단어의 위치 정보를 포함한 입력 벡터를 처리할 수 있게 됩니다.

3. **주파수의 다양성**:
   10000이라는 큰 값을 일반화하여, 다양한 주파수로 포지셔널 인코딩이 계산될 수 있게 합니다. 이는 모델이 다양한 시퀀스 길이에서 위치 정보를 효율적으로 캡쳐할 수 있도록 돕습니다.

### 예시

예를 들어, 모델의 차원이 \( d = 512 \)이라고 가정하면, \( i \)는 0에서 511 사이의 값일 것입니다. \( i \)가 0인 경우는 가장 낮은 주파수의 사인파를 사용하고, \( i \)가 511인 경우는 가장 높은 주파수의 코사인파를 사용합니다. 이를 통해 서로 다른 주기를 갖는 위치 인코딩 벡터를 형성하여, 모델이 다양한 길이와 패턴을 인식할 수 있도록 합니다.

이러한 포지셔널 인코딩은 Transformer 모델이 순서를 인지하고, 문맥 내에서 단어 간의 관계를 더 잘 이해할 수 있도록 합니다. 

추가적으로 논문을 직접 확인할 수 있도록 도와드릴까요?

In [41]:
assistant('논문 Method의 각 요소를 요약해줘')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `loadpaper` with `{'title': 'Learning to Evolve via Policy-Level Reflection and Optimization'}`


[0m[36;1m[1;3mHere is the title and section of the paper in HTML
title
Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
sections
          Abstract
  1 Introduction
  2 Problem Definition
  3 Methods
    3.1 Belief-aware Decision-Making Process
    3.2 Policy-Level Reflection
    3.3 DFS-based Policy Evolution
  4 Game: Blackjack
    4.1 Results
    4.2 Analyisis
  5 Game: Limit Texas Hold’em
    5.1 Results
    5.2 Analysis on Learning Process
    5.3 Analysis on Policy Evolution
  6 Conclusion
  Limitations
  References
  Appendix A Experiment Details
    A.1 LLMs
    A.2 Baselines
    A.3 Setups For Blackjack
    A.4 Detailed Setup For Texas Hold’em
  Appendix B Introduction of Two Games
    B.1 Blackjack
    B.2 Limit Texas Hold’em
    B.3 The Challenging of two Games
  Appendix C Compl

논문 "Learning to Evolve via Policy-Level Reflection and Optimization"의 정확한 arXiv ID가 "2309.13399"이 아닌 것 같습니다. 제공해주신 제목과 일치하는 arXiv ID를 찾지 못한 것 같습니다.

논문 제목 또는 arXiv ID를 다시 확인해 주시길 바랍니다. 정확한 정보로 다시 요청해 주시면 감사하겠습니다.

In [42]:
# reset memory 로 메모리를 초기화합니다.
assistant.reset_memory()

In [43]:
assistant('논문 Method의 각 요소를 요약해줘')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m논문의 Method 섹션 요약 작업을 도와드리기 위해 논문을 확인해야 합니다. 논문 제목이나 Arxiv ID를 알려주시면 논문을 불러온 후, Method 섹션의 요약을 제공할 수 있습니다. 논문 정보를 알려주시겠어요?[0m

[1m> Finished chain.[0m


논문의 Method 섹션 요약 작업을 도와드리기 위해 논문을 확인해야 합니다. 논문 제목이나 Arxiv ID를 알려주시면 논문을 불러온 후, Method 섹션의 요약을 제공할 수 있습니다. 논문 정보를 알려주시겠어요?

In [44]:
assistant.reset_memory()

In [45]:
# semantic scholar error 발생 시 다시 한 번 실행하면 됩니다.
assistant('PROMPTAGENT: STRATEGIC PLANNING WITH LANGUAGE MODELS ENABLES EXPERT-LEVEL PROMPT OPTIMIZATION 후속 논문 추천해줘') 



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `recommendpaper` with `{'query': 'Strategic Planning with Language Models Enables Expert-Level Prompt Optimization', 'rec_type': 'citation', 'rec_num': 5}`


[0m[33;1m[1;3m[{'title': 'Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization', 'abstract': "Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks. However, most LLM-based agents are designed as specific task solvers with sophisticated prompt engineering, rather than agents capable of learning and evolving through interactions. These task solvers necessitate manually crafted prompts to inform task rules and regulate LLM behaviors, inherently incapacitating to address complex dynamic scenarios e.g., large interactive games. In light of this, we propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization that can learn a wealth of expertise from interactive experiences and pro

Here are some follow-up papers citing "Strategic Planning with Language Models Enables Expert-Level Prompt Optimization":

1. **Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization**
   - **Abstract**: Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks. However, most LLM-based agents are designed as specific task solvers with sophisticated prompt engineering, rather than agents capable of learning and evolving through interactions. These task solvers necessitate manually crafted prompts to inform task rules and regulate LLM behaviors, inherently incapacitating to address complex dynamic scenarios e.g., large interactive games. In light of this, we propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization that can learn a wealth of expertise from interactive experiences and progressively elevate its behavioral policy. Specifically, it involves a dynamic belief generation and reflection process for policy evolution. Rather than action-level reflection, Agent-Pro iteratively reflects on past trajectories and beliefs, fine-tuning its irrational beliefs for a better policy. Moreover, a depth-first search is employed for policy optimization, ensuring continual enhancement in policy payoffs. Agent-Pro is evaluated across two games: Blackjack and Texas Hold'em, outperforming vanilla LLM and specialized models. Our results show Agent-Pro can learn and evolve in complex and dynamic scenes, which also benefits numerous LLM-based applications.
   - **Publication Date**: February 27, 2024

2. **RoT: Enhancing Large Language Models with Reflection on Search Trees**
   - **Abstract**: Large language models (LLMs) have demonstrated impressive capability in reasoning and planning when integrated with tree-search-based prompting methods. However, since these methods ignore the previous search experiences, they often make the same mistakes in the search process. To address this issue, we introduce Reflection on search Trees (RoT), an LLM reflection framework designed to improve the performance of tree-search-based prompting methods. It uses a strong LLM to summarize guidelines from previous tree search experiences to enhance the ability of a weak LLM. The guidelines are instructions about solving this task through tree search which can prevent the weak LLMs from making similar mistakes in the past search process. In addition, we proposed a novel state selection method, which identifies the critical information from historical search processes to help RoT generate more specific and meaningful guidelines. In our extensive experiments, we find that RoT significantly improves the performance of LLMs in reasoning or planning tasks with various tree-search-based prompting methods (e.g., BFS and MCTS). Non-tree-search-based prompting methods such as Chain-of-Thought (CoT) can also benefit from RoT guidelines since RoT can provide task-specific knowledge collected from the search experience.
   - **Publication Date**: April 8, 2024

3. **Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments**
   - **Abstract**: Large language models (LLMs) have shown promising abilities as cost-effective and reference-free evaluators for assessing language generation quality. In particular, pairwise LLM evaluators, which compare two generated texts and determine the preferred one, have been employed in a wide range of applications. However, LLMs exhibit preference biases and worrying sensitivity to prompt designs. In this work, we first reveal that the predictive preference of LLMs can be highly brittle and skewed, even with semantically equivalent instructions. We find that fairer predictive preferences from LLMs consistently lead to judgments that are better aligned with humans. Motivated by this phenomenon, we propose an automatic Zero-shot Evaluation-oriented Prompt Optimization framework, ZEPO, which aims to produce fairer preference decisions and improve the alignment of LLM evaluators with human judgments. To this end, we propose a zero-shot learning objective based on the preference decision fairness. ZEPO demonstrates substantial performance improvements over state-of-the-art LLM evaluators, without requiring labeled data, on representative meta-evaluation benchmarks. Our findings underscore the critical correlation between preference fairness and human alignment, positioning ZEPO as an efficient prompt optimizer for bridging the gap between LLM evaluators and human judgments.
   - **Publication Date**: June 17, 2024

4. **Dual-Phase Accelerated Prompt Optimization**
   - **Abstract**: Gradient-free prompt optimization methods have made significant strides in enhancing the performance of closed-source Large Language Models (LLMs) across a wide range of tasks. However, existing approaches make light of the importance of high-quality prompt initialization and the identification of effective optimization directions, thus resulting in substantial optimization steps to obtain satisfactory performance. In this light, we aim to accelerate prompt optimization process to tackle the challenge of low convergence rate. We propose a dual-phase approach which starts with generating high-quality initial prompts by adopting a well-designed meta-instruction to delve into task-specific information, and iteratively optimize the prompts at the sentence level, leveraging previous tuning experience to expand prompt candidates and accept effective ones. Extensive experiments on eight datasets demonstrate the effectiveness of our proposed method, achieving a consistent accuracy gain over baselines with less than five optimization steps.
   - **Publication Date**: June 19, 2024

5. **Trace is the New AutoDiff -- Unlocking Efficient Optimization of Computational Workflows**
   - **Abstract**: We study a class of optimization problems motivated by automating the design and update of AI systems like coding assistants, robots, and copilots. We propose an end-to-end optimization framework, Trace, which treats the computational workflow of an AI system as a graph akin to neural networks, based on a generalization of back-propagation. Optimization of computational workflows often involves rich feedback (e.g. console output or user's responses), heterogeneous parameters (e.g. prompts, hyper-parameters, codes), and intricate objectives (beyond maximizing a score). Moreover, its computation graph can change dynamically with the inputs and parameters. We frame a new mathematical setup of iterative optimization, Optimization with Trace Oracle (OPTO), to capture and abstract these properties so as to design optimizers that work across many domains. In OPTO, an optimizer receives an execution trace along with feedback on the computed output and updates parameters iteratively. Trace is the tool to implement OPTO in practice. Trace has a Python interface that efficiently converts a computational workflow into an OPTO instance using a PyTorch-like interface. Using Trace, we develop a general-purpose LLM-based optimizer called OptoPrime that can effectively solve OPTO problems. In empirical studies, we find that OptoPrime is capable of first-order numerical optimization, prompt optimization, hyper-parameter tuning, robot controller design, code debugging, etc., and is often competitive with specialized optimizers for each domain. We believe that Trace, OptoPrime and the OPTO framework will enable the next generation of interactive agents that automatically adapt using various kinds of feedback.
   - **Publication Date**: June 23, 2024
   - **Website**: [Trace](https://microsoft.github.io/Trace)

These papers build on your specified work and expand on various aspects of LLMs and prompt optimization.