In [1]:
import warnings
warnings.filterwarnings('ignore')
import os

import tool_pool as tp

from langchain_openai import ChatOpenAI
from langchain import hub
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain import hub
from langchain.agents import AgentExecutor
from langchain.memory import ChatMessageHistory

from IPython.display import Markdown, display



The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /c1/yumin/.cache/huggingface/token
Login successful


본 노트북을 실행하기 위해선 Openai api key가 필요합니다.

In [3]:
openai_key = 'sk-' # Your openai API key here
# openai_key = os.getenv("OPENAI_API_KEY")


### Tool List
loadpaper : HTML 기반으로 Section list를 받고, Tool을 한 번 더 실행하는 방식으로 동작  
loadpaper_wo_figure : loadpaper 에서 figure 기능이 제외된 Tool  
loadpaper_wo_figure_wo_section : loadpaper에서 figure, section 기능이 제외된 Tool  
recommendpaper : 관련 논문을 찾아주는 Tool. citation, reference mode 두 가지로 동작  
code_matching : 논문 깃허브에서 관련 코드를 찾아주는 Tool  
  
tool_pool.py에서 직접 tool을 custom 할 수 있습니다.

In [4]:
# tp.loadpaper, tp.recommendpaper, tp.code_matching, tp.loadpaper_wo_figure, tp.loadpaper_wo_figure_wo_section

# default tools
# tools = [tp.loadpaper, tp.recommendpaper, tp.code_matching]

# light version. Customize it as you wish!
# tools = [tp.loadpaper_wo_figure_wo_section, tp.recommendpaper]

# Recommend tool combination.
tools = [tp.loadpaper_wo_figure, tp.recommendpaper]

# load Agent prompt
prompt = hub.pull("hwchase17/openai-tools-agent")

### base model 선택

In [5]:
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

### Real Use

In [6]:
class PaperAssistant:
    def __init__(self, llm, verbose):
        self.memory = None
        self.llm = llm
        self.verbose = verbose

        self.agent_executor = self.build_agent_executor()
        self.agent_with_chat_history = self.bulid_agent_with_chat_history()


    def build_agent_executor(self):
        # Construct the OpenAI Tools agent
        agent = create_openai_tools_agent(self.llm, tools, prompt)
        # Create an agent executor by passing in the agent and tools
        agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=self.verbose)
        return agent_executor
    
    def bulid_agent_with_chat_history(self):
        memory = ChatMessageHistory(session_id="test-session")

        agent_with_chat_history = RunnableWithMessageHistory(
            self.agent_executor,
            # This is needed because in most real world scenarios, a session id is needed
            # It isn't really used here because we are using a simple in memory ChatMessageHistory
            lambda session_id: memory,
            input_messages_key="input",
            history_messages_key="chat_history",
        )
        return agent_with_chat_history

    def reset_memory(self, session_id='temp'):
        self.agent_with_chat_history = self.bulid_agent_with_chat_history()

    def send_message(self, message, session_id='temp'):
        output = self.agent_with_chat_history.invoke(
            {"input": message},
            config={"configurable": {"session_id": session_id}},
        )
        display(Markdown(output['output']))

    def __call__(self, message):
        self.send_message(message)


In [7]:
# Tool 실행 과정을 생략하려면 verbose = False로 설정하세요
assistant = PaperAssistant(llm, verbose=True)

In [8]:
assistant('"Attention is all you need" arxiv id 1706.03762 에서 positional encoding 에 대해 설명해줘')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `loadpaper` with `{'title': 'Attention is all you need', 'arxiv_id': '1706.03762'}`


[0m[36;1m[1;3mHere is the title and section of the paper in HTML
title
Attention Is All You Need
sections
          Abstract
  1 Introduction
  2 Background
  3 Model Architecture
    3.1 Encoder and Decoder Stacks
        Encoder:
        Decoder:
    3.2 Attention
      3.2.1 Scaled Dot-Product Attention
      3.2.2 Multi-Head Attention
      3.2.3 Applications of Attention in our Model
    3.3 Position-wise Feed-Forward Networks
    3.4 Embeddings and Softmax
    3.5 Positional Encoding
  4 Why Self-Attention
  5 Training
    5.1 Training Data and Batching
    5.2 Hardware and Schedule
    5.3 Optimizer
    5.4 Regularization
        Residual Dropout
        Label Smoothing
  6 Results
    6.1 Machine Translation
    6.2 Model Variations
    6.3 English Constituency Parsing
  7 Conclusion
        Acknowledgements
  Reference

"Attention is All You Need" 논문에서 positional encoding에 대해 설명한 부분은 다음과 같습니다:

### Positional Encoding

모델에 recurrence(순환)이나 convolution(합성곱)이 없기 때문에, 시퀀스의 순서를 모델이 활용할 수 있도록 하기 위해서는 토큰의 상대적 또는 절대적 위치에 대한 정보를 주입해야 합니다. 이를 위해, 인코더와 디코더 스택의 입력 임베딩에 "positional encodings"를 추가합니다. positional encodings는 임베딩과 동일한 차원 \(d_{\text{model}}\)을 가지며, 두 값을 더할 수 있습니다. positional encodings는 학습된 것과 고정된 것 등 여러 가지 선택지가 있습니다.

이 논문에서는 다양한 주파수의 사인(sine) 함수와 코사인(cosine) 함수를 사용합니다:

\[ PE_{(pos, 2i)} = \sin\left(\frac{pos}{10000^{2i/d_{\text{model}}}}\right) \]

\[ PE_{(pos, 2i+1)} = \cos\left(\frac{pos}{10000^{2i/d_{\text{model}}}}\right) \]

여기서 \(pos\)는 위치를, \(i\)는 차원을 나타냅니다. 즉, positional encoding의 각 차원은 사인파에 해당합니다. 파장의 길이는 \(2\pi\)에서 \(10000 \cdot 2\pi\)까지 기하급수적으로 증가합니다. 이 함수를 선택한 이유는 모델이 상대적 위치에 따라 쉽게 주목(attend)할 수 있도록 하기 위해서입니다. 고정된 오프셋 \(k\)에 대해 \(PE_{pos+k}\)는 \(PE_{pos}\)의 선형 함수로 표현될 수 있기 때문입니다.

또한, 학습된 positional embeddings를 사용하는 실험도 진행했으며, 두 버전이 거의 동일한 결과를 낳는다는 것을 발견했습니다. (표 3의 (E) 행 참조) 우리는 사인파 버전을 선택했는데, 이는 모델이 학습 중에 접한 시퀀스 길이보다 더 긴 시퀀스 길이로 외삽할 수 있게 할 수 있기 때문입니다.

In [19]:
assistant('논문 Method의 각 요소를 요약해줘')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `loadpaper` with `{'title': 'Attention is all you need', 'arxiv_id': '1706.03762'}`


[0m[36;1m[1;3mHere is the title and section of the paper in HTML
title
Attention Is All You Need
sections
          Abstract
  1 Introduction
  2 Background
  3 Model Architecture
    3.1 Encoder and Decoder Stacks
        Encoder:
        Decoder:
    3.2 Attention
      3.2.1 Scaled Dot-Product Attention
      3.2.2 Multi-Head Attention
      3.2.3 Applications of Attention in our Model
    3.3 Position-wise Feed-Forward Networks
    3.4 Embeddings and Softmax
    3.5 Positional Encoding
  4 Why Self-Attention
  5 Training
    5.1 Training Data and Batching
    5.2 Hardware and Schedule
    5.3 Optimizer
    5.4 Regularization
        Residual Dropout
        Label Smoothing
  6 Results
    6.1 Machine Translation
    6.2 Model Variations
    6.3 English Constituency Parsing
  7 Conclusion
        Acknowledgements
  Reference

The "Method" section of the paper "Attention is All You Need" is detailed under the "Model Architecture" section. Here is a summary of each element:

### 3 Model Architecture

#### 3.1 Encoder and Decoder Stacks

- **Encoder**: 
  - Composed of 6 identical layers.
  - Each layer has two sub-layers: a multi-head self-attention mechanism and a position-wise fully connected feed-forward network.
  - Residual connections and layer normalization are applied around each sub-layer.
  - All sub-layers and embedding layers produce outputs of dimension \( d_{\text{model}} = 512 \).

- **Decoder**: 
  - Also composed of 6 identical layers.
  - Each layer has an additional third sub-layer for multi-head attention over the encoder's output.
  - Similar to the encoder, residual connections and layer normalization are applied.
  - Self-attention sub-layer is modified to prevent positions from attending to subsequent positions, ensuring auto-regressive property.

#### 3.2 Attention

- **Attention Function**: 
  - Maps a query and a set of key-value pairs to an output.
  - Output is a weighted sum of the values, with weights computed by a compatibility function of the query with the corresponding key.

##### 3.2.1 Scaled Dot-Product Attention

- Computes dot products of the query with all keys, divides by \(\sqrt{d_k}\), and applies a softmax function to obtain weights on the values.
- Efficiently implemented using matrix multiplication.

##### 3.2.2 Multi-Head Attention

- Projects queries, keys, and values multiple times with different learned linear projections.
- Performs attention function in parallel on these projections.
- Concatenates and projects the results to form the final values.
- Allows the model to attend to information from different representation subspaces at different positions.
- Uses 8 parallel attention layers (heads), each with \( d_k = d_v = d_{\text{model}} / h = 64 \).

##### 3.2.3 Applications of Attention in the Model

- **Encoder-Decoder Attention**: Queries from the decoder, keys, and values from the encoder.
- **Self-Attention in Encoder**: Keys, values, and queries from the same layer output.
- **Self-Attention in Decoder**: Similar to encoder but prevents leftward information flow to maintain auto-regressive property.

#### 3.3 Position-wise Feed-Forward Networks

- Each layer in the encoder and decoder contains a fully connected feed-forward network.
- Applied to each position separately and identically.
- Consists of two linear transformations with a ReLU activation in between.
- Input and output dimensionality is \( d_{\text{model}} = 512 \), and inner-layer dimensionality is \( d_{ff} = 2048 \).

#### 3.4 Embeddings and Softmax

- Uses learned embeddings to convert input and output tokens to vectors of dimension \( d_{\text{model}} \).
- Uses a learned linear transformation and softmax function to convert decoder output to predicted next-token probabilities.
- Shares the same weight matrix between embedding layers and pre-softmax linear transformation, scaled by \( \sqrt{d_{\text{model}}} \).

#### 3.5 Positional Encoding

- Adds positional encodings to input embeddings to incorporate sequence order.
- Positional encodings have the same dimension \( d_{\text{model}} \) as embeddings.
- Uses sine and cosine functions of different frequencies for encoding.
- Allows the model to learn to attend by relative positions.
- Experimented with learned positional embeddings but found sinusoidal version to be equally effective and potentially better for extrapolation to longer sequences.

This summary encapsulates the key components and mechanisms of the Transformer model as described in the "Attention is All You Need" paper.

In [20]:
# reset memory 로 메모리를 초기화합니다.
assistant.reset_memory()

In [21]:
assistant('논문 Method의 각 요소를 요약해줘')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m논문의 제목을 알려주시면, 해당 논문의 Method 섹션을 불러와 요약해드리겠습니다.[0m

[1m> Finished chain.[0m


논문의 제목을 알려주시면, 해당 논문의 Method 섹션을 불러와 요약해드리겠습니다.

In [22]:
assistant.reset_memory()

In [24]:
# semantic scholar error 발생 시 다시 한 번 실행하면 됩니다.
assistant('PROMPTAGENT: STRATEGIC PLANNING WITH LANGUAGE MODELS ENABLES EXPERT-LEVEL PROMPT OPTIMIZATION 후속 논문 추천해줘') 



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `recommendpaper` with `{'query': 'PROMPTAGENT: STRATEGIC PLANNING WITH LANGUAGE MODELS ENABLES EXPERT-LEVEL PROMPT OPTIMIZATION', 'rec_type': 'citation'}`


[0m[33;1m[1;3m[{'title': 'Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization', 'abstract': "Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks. However, most LLM-based agents are designed as specific task solvers with sophisticated prompt engineering, rather than agents capable of learning and evolving through interactions. These task solvers necessitate manually crafted prompts to inform task rules and regulate LLM behaviors, inherently incapacitating to address complex dynamic scenarios e.g., large interactive games. In light of this, we propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization that can learn a wealth of expertise from interactive experiences and prog

Here are some recent papers that cite "PROMPTAGENT: STRATEGIC PLANNING WITH LANGUAGE MODELS ENABLES EXPERT-LEVEL PROMPT OPTIMIZATION":

1. **Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization**
   - **Abstract**: Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks. However, most LLM-based agents are designed as specific task solvers with sophisticated prompt engineering, rather than agents capable of learning and evolving through interactions. These task solvers necessitate manually crafted prompts to inform task rules and regulate LLM behaviors, inherently incapacitating to address complex dynamic scenarios e.g., large interactive games. In light of this, we propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization that can learn a wealth of expertise from interactive experiences and progressively elevate its behavioral policy. Specifically, it involves a dynamic belief generation and reflection process for policy evolution. Rather than action-level reflection, Agent-Pro iteratively reflects on past trajectories and beliefs, fine-tuning its irrational beliefs for a better policy. Moreover, a depth-first search is employed for policy optimization, ensuring continual enhancement in policy payoffs. Agent-Pro is evaluated across two games: Blackjack and Texas Hold'em, outperforming vanilla LLM and specialized models. Our results show Agent-Pro can learn and evolve in complex and dynamic scenes, which also benefits numerous LLM-based applications.
   - **Publication Date**: 2024-02-27
   - **Influential Citation Count**: 1

2. **RoT: Enhancing Large Language Models with Reflection on Search Trees**
   - **Abstract**: Large language models (LLMs) have demonstrated impressive capability in reasoning and planning when integrated with tree-search-based prompting methods. However, since these methods ignore the previous search experiences, they often make the same mistakes in the search process. To address this issue, we introduce Reflection on search Trees (RoT), an LLM reflection framework designed to improve the performance of tree-search-based prompting methods. It uses a strong LLM to summarize guidelines from previous tree search experiences to enhance the ability of a weak LLM. The guidelines are instructions about solving this task through tree search which can prevent the weak LLMs from making similar mistakes in the past search process. In addition, we proposed a novel state selection method, which identifies the critical information from historical search processes to help RoT generate more specific and meaningful guidelines. In our extensive experiments, we find that RoT significantly improves the performance of LLMs in reasoning or planning tasks with various tree-search-based prompting methods (e.g., BFS and MCTS). Non-tree-search-based prompting methods such as Chain-of-Thought (CoT) can also benefit from RoT guidelines since RoT can provide task-specific knowledge collected from the search experience.
   - **Publication Date**: 2024-04-08
   - **Influential Citation Count**: 1

3. **Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments**
   - **Abstract**: Large language models (LLMs) have shown promising abilities as cost-effective and reference-free evaluators for assessing language generation quality. In particular, pairwise LLM evaluators, which compare two generated texts and determine the preferred one, have been employed in a wide range of applications. However, LLMs exhibit preference biases and worrying sensitivity to prompt designs. In this work, we first reveal that the predictive preference of LLMs can be highly brittle and skewed, even with semantically equivalent instructions. We find that fairer predictive preferences from LLMs consistently lead to judgments that are better aligned with humans. Motivated by this phenomenon, we propose an automatic Zero-shot Evaluation-oriented Prompt Optimization framework, ZEPO, which aims to produce fairer preference decisions and improve the alignment of LLM evaluators with human judgments. To this end, we propose a zero-shot learning objective based on the preference decision fairness. ZEPO demonstrates substantial performance improvements over state-of-the-art LLM evaluators, without requiring labeled data, on representative meta-evaluation benchmarks. Our findings underscore the critical correlation between preference fairness and human alignment, positioning ZEPO as an efficient prompt optimizer for bridging the gap between LLM evaluators and human judgments.
   - **Publication Date**: 2024-06-17
   - **Influential Citation Count**: 0

4. **Dual-Phase Accelerated Prompt Optimization**
   - **Abstract**: Gradient-free prompt optimization methods have made significant strides in enhancing the performance of closed-source Large Language Models (LLMs) across a wide range of tasks. However, existing approaches make light of the importance of high-quality prompt initialization and the identification of effective optimization directions, thus resulting in substantial optimization steps to obtain satisfactory performance. In this light, we aim to accelerate prompt optimization process to tackle the challenge of low convergence rate. We propose a dual-phase approach which starts with generating high-quality initial prompts by adopting a well-designed meta-instruction to delve into task-specific information, and iteratively optimize the prompts at the sentence level, leveraging previous tuning experience to expand prompt candidates and accept effective ones. Extensive experiments on eight datasets demonstrate the effectiveness of our proposed method, achieving a consistent accuracy gain over baselines with less than five optimization steps.
   - **Publication Date**: 2024-06-19
   - **Influential Citation Count**: 0

5. **Trace is the New AutoDiff -- Unlocking Efficient Optimization of Computational Workflows**
   - **Abstract**: We study a class of optimization problems motivated by automating the design and update of AI systems like coding assistants, robots, and copilots. We propose an end-to-end optimization framework, Trace, which treats the computational workflow of an AI system as a graph akin to neural networks, based on a generalization of back-propagation. Optimization of computational workflows often involves rich feedback (e.g. console output or user's responses), heterogeneous parameters (e.g. prompts, hyper-parameters, codes), and intricate objectives (beyond maximizing a score). Moreover, its computation graph can change dynamically with the inputs and parameters. We frame a new mathematical setup of iterative optimization, Optimization with Trace Oracle (OPTO), to capture and abstract these properties so as to design optimizers that work across many domains. In OPTO, an optimizer receives an execution trace along with feedback on the computed output and updates parameters iteratively. Trace is the tool to implement OPTO in practice. Trace has a Python interface that efficiently converts a computational workflow into an OPTO instance using a PyTorch-like interface. Using Trace, we develop a general-purpose LLM-based optimizer called OptoPrime that can effectively solve OPTO problems. In empirical studies, we find that OptoPrime is capable of first-order numerical optimization, prompt optimization, hyper-parameter tuning, robot controller design, code debugging, etc., and is often competitive with specialized optimizers for each domain. We believe that Trace, OptoPrime and the OPTO framework will enable the next generation of interactive agents that automatically adapt using various kinds of feedback.
   - **Publication Date**: 2024-06-23
   - **Influential Citation Count**: 0

These papers build upon the concepts introduced in "PROMPTAGENT" and explore various aspects of prompt optimization, reflection, and learning in large language models.

In [55]:
assistant.reset_memory()

In [56]:
assistant('A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems, arxiv id 2402.18013 논문 초록 요약해줘')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `loadpaper` with `{'title': 'A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems', 'arxiv_id': '2402.18013'}`


[0m[36;1m[1;3mHere is the title and section of the paper in HTML
title
A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems
sections
          Abstract.
  1. INTRODUCTION
    1.1. What is Multi-turn Dialogue System?
    1.2. Why a Survey on LLM-based Multi-turn Dialogue System?
    1.3. Contribution of this Survey
  2. GENERAL METHODS
    2.1. Decoder-only Transformer Architecture
      2.1.1. Causal Decoder
        GPT Series
          GPT-1.
          GPT-2.
          GPT-3.
          GPT-3.5.
          GPT-4.
          ChatGPT.
          GPTs.
        LLAMA Series
          LLAMA.
          LLAMA2.
          LLAMA2 CHAT.
          CODE LLAMA.
      2.1.2. Prefix Decoder
        GLM
        ChatGLM series
          ChatGLM-6B.
          ChatGLM2-6B.
          ChatGLM3

이 논문의 초록은 다음과 같습니다:

이 설문조사는 다중 턴 대화 시스템에 대한 연구를 포괄적으로 검토하며, 특히 대형 언어 모델(LLM)을 기반으로 한 다중 턴 대화 시스템에 중점을 둡니다. 이 논문은 다음을 목표로 합니다:
(a) 기존 LLM 및 LLM을 다운스트림 작업에 적응시키기 위한 접근 방식을 요약합니다.
(b) LLM 기반의 오픈 도메인 대화(ODD) 및 작업 지향 대화(TOD) 시스템을 포함한 다중 턴 대화 시스템의 최근 발전, 데이터셋 및 평가 지표를 자세히 설명합니다.
(c) LLM의 발전과 다중 턴 대화 시스템에 대한 증가하는 요구로 인해 발생하는 미래의 중점 사항과 최근 연구 문제를 논의합니다.

In [None]:
assistant('')