<div align="center" id="top">
 <img src="https://github.com/user-attachments/assets/10ba11e4-4ced-400e-a400-ee0f72541780" alt="julep" width="640" height="320" />
</div>

<p align="center">
  <br />
  <a href="https://docs.julep.ai" rel="dofollow">Explore Docs (wip)</a>
  ·
  <a href="https://discord.com/invite/JTSBGRZrzj" rel="dofollow">Discord</a>
  ·
  <a href="https://x.com/julep_ai" rel="dofollow">𝕏</a>
  ·
  <a href="https://www.linkedin.com/company/julep-ai" rel="dofollow">LinkedIn</a>
</p>

<p align="center">
    <a href="https://www.npmjs.com/package/@julep/sdk"><img src="https://img.shields.io/npm/v/%40julep%2Fsdk?style=social&amp;logo=npm&amp;link=https%3A%2F%2Fwww.npmjs.com%2Fpackage%2F%40julep%2Fsdk" alt="NPM Version"></a>
    <span>&nbsp;</span>
    <a href="https://pypi.org/project/julep"><img src="https://img.shields.io/pypi/v/julep?style=social&amp;logo=python&amp;label=PyPI&amp;link=https%3A%2F%2Fpypi.org%2Fproject%2Fjulep" alt="PyPI - Version"></a>
    <span>&nbsp;</span>
    <a href="https://hub.docker.com/u/julepai"><img src="https://img.shields.io/docker/v/julepai/agents-api?sort=semver&amp;style=social&amp;logo=docker&amp;link=https%3A%2F%2Fhub.docker.com%2Fu%2Fjulepai" alt="Docker Image Version"></a>
    <span>&nbsp;</span>
    <a href="https://choosealicense.com/licenses/apache/"><img src="https://img.shields.io/github/license/julep-ai/julep" alt="GitHub License"></a>
</p>

## Task Definition: Develop a Personalized Research Paper Recommendation Assistant

### Overview

This task is a personal research paper assistant that can search the arxiv for information and recommend research papers to the user based on their persona. It further uses the llama parse tool to scrape the research papers and extract the contents.

### Task Tools:

**brave_search**: An `integration` type tool that can search the web and extract data from a given URL.

**arxiv_search**: An `integration` type tool that can search the arxiv for research papers.

**llama_parse_scrape**: An `integration` type tool that can scrape the web and extract data from a given URL.

**get_user_from_ppid**: A `system` type tool that can get a user from the user ppid.

**list_user_docs**: A `system` type tool that can list the user docs.

**get_user_docs**: A `system` type tool that can get the user docs.

**create_agent_doc**: A `system` type tool that can create an agent doc.

**create_user_doc**: A `system` type tool that can create a user doc.

### Task Input:

**topics**: A list of topics to search for.

**user_ppid**: The user ppid to search for.

### Task Output:

**output**: A list of research papers that are recommended to the user.

### Task Flow

1. **Brave Search**: The `brave_search` tool is called to search the web for the topics.

2. **Prompt Step**: The prompt step is used to extract the keywords from the search results.

3. **Evaluate**: The keywords are evaluated to get the top 8 keywords.

4. **Arxiv Search**: The `arxiv_search` tool is called to search the arxiv for the keywords.

5. **Evaluate**: The arxiv results are evaluated to get the top 2 research papers for each keyword.

6. **Get User**: The `get_user` tool is called to get the user from the user ppid.

7. **Get User Docs**: The `get_user_docs` tool is called to get the user docs. In this case, we are getting the user persona.

8. **Prompt Step**: The prompt step is used to select the top 5 research papers based on the user persona and the research papers.

9. **Evaluate**: The research papers are evaluated to get the top 5 research papers.

10. **Llama Parse Scrape**: The `llama_parse_scrape` tool is called to scrape the research papers.

11. **Evaluate**: The scraped research papers are evaluated to get the paper contents.

12. **Create User Doc**: The `create_user_doc` tool is called to create a user doc with the paper contents.

```plaintext
+----------------+     +----------------+     +----------------+     +----------------+
|  Brave Search  |     |  Extract       |     |  Evaluate      |     |  Arxiv Search  |
|  (for topics)  | --> |  Keywords      | --> |  Top 8         | --> |  (for keywords)|
|                |     |  (Prompt Step) |     |  Keywords      |     |                |
+----------------+     +----------------+     +----------------+     +----------------+
                                                                             |
+----------------+     +----------------+     +----------------+     +----------------+
|  Select Top 5  |     |  Get User      |     |  Get User      |     |  Evaluate      |
|  Papers        | <-- |  from PPID     | <-- |  Docs (Persona)| <-- |  Arxiv Results |
|  (Prompt Step) |     |                |     |                |     |                |
+----------------+     +----------------+     +----------------+     +----------------+
        |
+----------------+     +----------------+     +----------------+     +----------------+
|  Evaluate      |     |  Llama Parse   |     |  Evaluate      |     |  Create User   |
|  Paper Titles  | --> |  Scrape        | --> |  Paper Contents| --> |  Doc           |
|  and URLs      |     |  (for URLs)    |     |                |     |                |
+----------------+     +----------------+     +----------------+     +----------------+
```



## Implementation

To recreate the notebook and see the code implementation for this task, you can access the Google Colab notebook using the link below:

<a target="_blank" href="https://colab.research.google.com/github/julep-ai/julep/blob/dev/cookbooks/07-personalized-research-assistant.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

### Additional Information

For more details about the task or if you have any questions, please don't hesitate to contact the author:

**Author:** Julep AI  
**Contact:** [hey@julep.ai](mailto:hey@julep.ai) or  <a href="https://discord.com/invite/JTSBGRZrzj" rel="dofollow">Discord</a>

In [1]:
# Global UUID is generated for agent and task
from dotenv import load_dotenv
from julep import Client
import os
import uuid

load_dotenv()

# NOTE: these UUIDs are used in order not to use the `create_or_update` methods instead of
# the `create` methods for the sake of not creating new resources every time a cell is run.
AGENT_UUID = uuid.uuid4()
TASK_UUID = uuid.uuid4()
SESSION_UUID = uuid.uuid4()
USER_UUID = uuid.uuid4()

### Creating Julep Client with the API Key

In [2]:
from julep import Client
import os

# api_key = os.getenv("JULEP_API_KEY")
api_key = os.getenv("JULEP_API_KEY_LOCAL")

# Create a Julep client
client = Client(api_key=api_key, environment="local_multi_tenant")

## Creating an "agent"

Agent is the object to which LLM settings, like model, temperature along with tools are scoped to.

To learn more about the agent, please refer to the [documentation](https://github.com/julep-ai/julep/blob/dev/docs/julep-concepts.md#agent).

In [3]:
# Defining the agent
name = "Research Assistant"
about = "The Research Assistant is a research assistant that can search the arxiv for information."
default_settings = {
    "temperature": 0.7,
    "top_p": 1,
    "min_p": 0.01,
    "presence_penalty": 0,
    "frequency_penalty": 0,
    "length_penalty": 1.0,
    "max_tokens": 150,
}

# Create the agent
agent = client.agents.create_or_update(
    agent_id=AGENT_UUID,
    name=name,
    about=about,
    model="gpt-4o",
)


## Creating a "user"

User is the object to which persona, metadata, and documents are scoped to.

To learn more about the user, please refer to the [documentation](https://github.com/julep-ai/julep/blob/dev/docs/julep-concepts.md#user).

In [4]:
# Create a user with metadata
client.users.create_or_update(
    user_id=USER_UUID,
    name = f"user_new_1",
    about = "This is a test user",
    metadata={
          "ppid": str(USER_UUID),  # Ensure it's a native int
          "age": int(25),  # Ensure it's a native int
          "state": str("California"),  # Convert to string if not already
          "city": str("San Francisco"),  # Convert to string if not already
          "research_interests": str(["Quantum Computing", "Artificial Intelligence", "Machine Learning"]),  # Convert to string or json serializable format
          "favorite_scientists": str(["Richard Feynman", "Alan Turing", "Geoffrey Hinton"]),  # Same as above
          "top_research_area": str("Quantum Computing"),
          "top_scientist": str("Richard Feynman"),
          "latest_research_read": str("Quantum Machine Learning"),
          "top_journals": str(["Nature", "Science", "IEEE Transactions on Quantum Engineering"])
        }
)

ResourceCreated(id='e5480cb5-fa51-47bb-b1b7-ed25bb7782a3', created_at=datetime.datetime(2024, 11, 28, 17, 48, 17, 901945, tzinfo=datetime.timezone.utc), jobs=[])

## Creating a "user" document

User documents are used to store the persona of the user.

In [5]:
# Create a user document
client.users.docs.create(
    user_id=USER_UUID,
    title="Google Scholar Persona",
    embed_instruction="Represent the query for User persona used for document retrieval: ",
    content="This is a test google scholar persona. The user is a 25 year old living in San Francisco. He is a fan of Google Scholar and is a student at Stanford University.\
    He uses Google Scholar to stay updated with the latest research in quantum computing and artificial intelligence.\
    Over the last 3 months, he has read 100+ research papers on quantum computing and artificial intelligence.\
    He is particularly interested in the latest research in quantum machine learning and quantum error correction.\
    He is also interested in how LLMs are being used to solve problems in quantum mechanics."
)

ResourceCreated(id='b4fbd9c8-a533-4e07-9076-f45f3a18f7fc', created_at=datetime.datetime(2024, 11, 28, 17, 48, 21, 122998, tzinfo=datetime.timezone.utc), jobs=['98e1c0e8-b821-4aa0-bea4-e9efd34812c0'])

## Defining a Task

Tasks in Julep are Github-Actions-style workflows that define long-running, multi-step actions.

You can use them to conduct complex actions by defining them step-by-step.

To learn more about tasks, please refer to the `Tasks` section in [Julep Concepts](https://github.com/julep-ai/julep/blob/dev/docs/julep-concepts.md#tasks).

In [25]:
import yaml
task_def = yaml.safe_load("""
name: Personalized Research Paper Recommendation Assistant

input_schema:
  type: object
  properties:
    topics:
      type: array
      items:
        type: string
      description: The topics to search for.
    user_id:
      type: string
      description: The user id to search for.

tools:
                          
- name: brave_search
  type: integration
  integration:
    provider: brave
    setup:
      api_key: "DEMO_API_KEY"
      
- name: arxiv_search
  type: integration
  integration:
    provider: arxiv
                          
- name: llama_parse_scrape
  type: integration
  integration:
    provider: llama_parse
    setup:
      llamaparse_api_key: "DEMO_API_KEY"
      params:
        result_format: "text"
        num_workers: 3

- name: get_user_from_id
  description: Get a user from the user id
  type: system
  system:
    resource: user
    operation: list

- name: list_user_docs
  description: List the user docs
  type: system
  system:
    resource: user
    subresource: doc
    operation: list

- name: get_user_docs
  description: Get user docs
  type: system
  system:
    resource: user
    subresource: doc
    operation: list
                          
- name : create_agent_doc
  description: Create an agent doc
  type: system
  system:
    resource: agent
    subresource: doc
    operation: create
                          
- name: create_user_doc
  description: Create a user doc
  type: system
  system:
    resource: user
    subresource: doc
    operation: create

main:

# Calling brave search for each topic
- over: _.topics
  parallelism: 2
  map:
    tool: brave_search
    arguments:
      query: "'the latest news about ' + _"

# Prompting the agent to extract keywords from the news snippet
- prompt:
  - role: system
    content: |
      You are a research assistant.
      Based on the following latest html news snippet, generate keywords:
                        
      {% for docs in _ %}
        {% for doc in docs['result'] %}
          {{doc['snippet']}}
        {% endfor %}
      {% endfor %}
                          
      Your response should be a list of keywords, separated by commas. Do not add any other text.
      Example: `KEYWORDS: keyword1, keyword2, keyword3`
  unwrap: true

# Extracting the top 8 keywords
- evaluate:
    keywords: str(_).split(',')[:8]

# Calling arxiv search for each keyword
- over: _.keywords
  parallelism: 4
  map:
    tool: arxiv_search
    arguments:
      query: _
      max_results: "2"

# Get the arxiv results
- evaluate:
    arxiv_results: _
                          
# Get the user from the id using metadata_filter (returns a list)
- tool: get_user_from_id
  arguments:
    limit: "1"
    sort_by: "'created_at'"
    direction: "'desc'"
    metadata_filter:
      ppid: inputs[0]['user_id']

# Unwrap the list to get the user
- evaluate:
    user_info: _[0]

# Get the user persona from user docs
- tool: get_user_docs
  arguments:
    user_id: inputs[0]['user_id']
    limit: "1"
    sort_by: "'created_at'"
    direction: "'desc'"

# Unwrap the list to get the user persona
- evaluate:
    persona: _[0].content[0]

# Prompting the agent to select the top 5 papers
- prompt:
  - role: system
    content: |
      As an expert in recommending research papers tailored to individual user personas, your task is to select papers top 5 papers that align closely with the user's interests and professional needs. 
      This careful selection ensures that the recommendations are both relevant and beneficial to the user's ongoing research and professional development.

      Below is a detailed description of the user's persona:
      {{_.persona}}

      Based on this persona, please review the following list of research paper titles. 
      These titles have been gathered from a comprehensive search to match the user's interests:

      {% for result in outputs[4].arxiv_results %}
        {% for doc in result['result'] %}
          - (Title: {{doc['title']}}, 
              
          - URL: {{doc['pdf_url']}})
        {% endfor %}
      {% endfor %}

      After reviewing the titles, please compile a list of the top 5 relevant research papers. 
      Your response should be a list of Titles and URLs of the papers, separated by double commas. 
      Do not add any other text.
      Example: `Title1,url1,,Title2,url2,,Title3,url3`
  unwrap: true

# Extracting the titles and urls
- evaluate:
    paper_titles_and_urls: |-
      list(
        zip(
          [pair.split(',')[0] for pair in str(_).split(',,')],
          [pair.split(',')[1].replace("http://", "https://") for pair in str(_).split(',,')]
        )
      )
                          
# Calling llama parse scrape for each url
- over: _['paper_titles_and_urls']
  parallelism: 5
  map:
    tool: llama_parse_scrape
    arguments:
      file: _[1]
      filename: _[0]
      params:
        result_format: "'text'"

# Extracting the paper contents
- evaluate:
    paper_contents: str(_)

# Create a new persona document
- tool: create_user_doc
  arguments:
    user_id: inputs[0]['user_id']
    data:
      embed_instruction: "'Represent the query for User persona used for document retrieval: '"
      title: "'Research Papers'"
      content: _['paper_contents']
""")

<span style="color:olive;">Notes:</span>
- The `unwrap:True` in the prompt step is used to unwrap the output of the prompt step (to unwrap the `choices[0].message.content` from the output of the model).
- The `metadata_filter` is used to filter the user based on the ppid.
- The `persona` is the user persona that is used to filter the research papers.
- The `paper_titles_and_urls` is the list of paper titles and urls that are used to call the `llama_parse_scrape` tool.
- The `{{}}` is used to access the output of the previous step. Whereas `{}` is used for jinja templating.

## Creating/Updating a task

In [26]:
# Create the task
task = chat_task = client.tasks.create_or_update(
    task_id=TASK_UUID,
    agent_id=AGENT_UUID,
    **task_def
)

## Creating an Execution

Executions are the objects that run the task.

To learn more about executions, please refer to the [documentation](https://github.com/julep-ai/julep/blob/dev/docs/julep-concepts.md#execution).

In [27]:
execution = client.executions.create(
    task_id=TASK_UUID,
    input={
            "topics": ["Quantum Computing", "LLMs"],
            "user_id": str(USER_UUID)
            }
          )

There are multiple ways to get the execution details and the output:

1. **Get Execution Details**: This method retrieves the details of the execution, including the output of the last transition that took place.

2. **List Transitions**: This method lists all the task steps that have been executed up to this point in time, so the output of a successful execution will be the output of the last transition (first in the transition list as it is in reverse chronological order), which should have a type of `finish`.


<span style="color:olive;">Note: You need to wait for a few seconds for the execution to complete before you can get the final output, so feel free to run the following cells multiple times until you get the final output.</span>


In [29]:
# Lists all the task steps that have been executed up to this point in time
transitions = client.executions.transitions.list(execution_id=execution.id).items

# Transitions are retreived in reverse chronological order
for transition in reversed(transitions):
    print("Transition type: ", transition.type)
    print("Transition output: ", transition.output)
    print("-"*50)

Transition type:  init
Transition output:  {'topics': ['Quantum Computing', 'LLMs'], 'user_id': 'e5480cb5-fa51-47bb-b1b7-ed25bb7782a3'}
--------------------------------------------------
Transition type:  init_branch
Transition output:  LLMs
--------------------------------------------------
Transition type:  init_branch
Transition output:  Quantum Computing
--------------------------------------------------
Transition type:  finish_branch
Transition output:  {'result': [{'link': 'https://thequantuminsider.com/', 'snippet': 'Find <strong>the</strong> <strong>latest</strong> <strong>Quantum</strong> <strong>Computing</strong> <strong>news</strong>, data, market research, and insights. To stay up to date with <strong>the</strong> <strong>quantum</strong> market click here!', 'title': 'Quantum Computing News & Top Stories | The Quantum Insider'}, {'link': 'https://www.sciencedaily.com/news/computers_math/quantum_computers/', 'snippet': 'Oct. 24, 2024 \x97 <strong>Scientists have used high

In [30]:
import pprint
# Retrieves the output of the last transition that took place
output = client.executions.transitions.list(execution_id=execution.id).items[0].output
# Pretty printing the output
pprint.pprint(output)


{'created_at': '2024-11-28T18:03:28.106661Z',
 'id': 'efbd0645-3ac1-4e0b-a4ec-47418cdf11a3',
 'jobs': ['283822bc-221f-4dfd-9505-24ba4f631a39']}
