In [1]:
import nest_asyncio
nest_asyncio.apply()

import sys
sys.path.append("../")

from desci_sense.shared_functions.init import init_multi_chain_parser_config
from desci_sense.shared_functions.parsers.multi_chain_parser import MultiChainParser

config = init_multi_chain_parser_config(llm_type="google/gemma-7b-it",
                                        post_process_type="combined")

In [2]:
multi_chain_parser = MultiChainParser(config)

[32m2024-04-22 10:21:37.155[0m | [1mINFO    [0m | [36mdesci_sense.shared_functions.parsers.multi_chain_parser[0m:[36m__init__[0m:[36m64[0m - [1mInitializing MultiChainParser. PostProcessType=combined[0m
[32m2024-04-22 10:21:37.162[0m | [1mINFO    [0m | [36mdesci_sense.shared_functions.parsers.multi_chain_parser[0m:[36m__init__[0m:[36m71[0m - [1mInitializing post parsers...[0m
[32m2024-04-22 10:21:37.163[0m | [1mINFO    [0m | [36mdesci_sense.shared_functions.parsers.post_parser_chain[0m:[36m__init__[0m:[36m26[0m - [1mInitializing parser chain 'refs_tagger' [0m
[32m2024-04-22 10:21:37.209[0m | [1mINFO    [0m | [36mdesci_sense.shared_functions.parsers.post_parser_chain[0m:[36m__init__[0m:[36m26[0m - [1mInitializing parser chain 'topics' [0m
[32m2024-04-22 10:21:37.237[0m | [1mINFO    [0m | [36mdesci_sense.shared_functions.parsers.post_parser_chain[0m:[36m__init__[0m:[36m26[0m - [1mInitializing parser chain 'keywords' [0m


In [5]:
parser = multi_chain_parser.pparsers["topics"]

In [6]:
parser.chat("Hi!")

"\nHello! 👋 I'm happy to chat with you. What would you like to talk about today? 😊"

In [7]:
test_input = """
You are an expert annotator tasked with converting social media posts about scientific research to a structured semantic format. The post contains external references in the form of links (URLs). Your job is to select, for each reference, the tags best characterizing the relation of the post to the reference.

The tags are to be selected from a predefined set of tags. The available tag types are:
<announce> the reference is a new research output being announced by the post. The announcement is likely made by the authors but may be a third party. A research output could be a paper, dataset or other type of research that is being announced publicly.
<discussion> this post discusses how the cited reference relates to other facts or claims. For example, post might discuss how the cited reference informs questions, provides evidence, or supports or opposes claims.
<review> the reference is being reviewed by the post. The reference could be a book, article or movie, or other media content. The review could be positive or negative.
<job> the reference is a job listing, for example a call for graduate students or faculty applications.
<event> the reference is an invitation to an event, either a real-world or an online event. Any kind of event is relevant, some examples of such events could be seminars, meetups, or hackathons. This tag shold only be used for invitations to events, not for posts describing other kinds of events.
<reading> this post describes the reading status of the author in relation to this reference, which could be a book, article or other written media. The author may either have read the reference in the past, is reading the reference in the present, or is looking forward to reading the reference in the future.
<listening> this post describes the listening status of the author in relation to this reference, such as a podcast or radio station. The author may have listened to the content in the past, is listening to the content in the present, or is looking forward to listening the content in the future.
<watching> this post describes the watching status of the author in relation to this reference, such as a video or movie. The author may have watched the content in the past, is watching the content in the present, or is looking forward to watching the content in the future.
<recommendation> The author is recommending the referebce, which can be any kind of content: an article, a movie, podcast, book, another post, etc. This tag can also be used for cases of implicit recommendation, where the author is expressing enjoyment of some content but not explicitly recommending it.
<quote> this post is quoting text from the reference. Symbols like ">" or quotation marks are often used to indicate quotations. 
<question> this post is raising a question or questions about the reference. The content could be a research paper or other media like a podcast, video or blog post.

A user will pass in a post, and you should think step by step, before selecting a set of tags for each reference that best that reference's relation with the post.

Each reference will be marked by a number for convenient identification, in order of appearance in the post. The first reference will be number 1, the second 2, etc.

Your final answer should be structured as follows:
Reference Number: (number of current reference)
Reasoning Steps: (your reasoning steps)
Candidate Tags: (For potential each tag you choose, explain why you chose it.)
Final Answer: (a set of final tags, based on the Candidate Tags. The final tags must be included in the Candidate Tags list!)

# Input post text:
Author: Yogi Jaeger 💙 @yoginho@spore.social
Content: I just (re)discovered this recording of my "How Organisms Come to Know the World" talk at the Max Planck in Leipzig in 2022: https://cbs.mpg.de/cbs-coconut/video/jaeger.

It covers this paper: https://frontiersin.org/articles/10.3389/fevo.2021.806283/full.

A wonderful collaboration with Andrea Roli & Stu Kauffman.

References:
1: https://cbs.mpg.de/cbs-coconut/video/jaeger
2: https://frontiersin.org/articles/10.3389/fevo.2021.806283/full

# Output:
"""

In [10]:
res = parser.chat(test_input)
print(res)

**Reference 1:** 1
**Reasoning Steps:** The author directly links to the reference and mentions it covers the content of the recording.
**Candidate Tags:** <announce>, <review>
**Final Answer:** <announce>


**Reference 2:** 2
**Reasoning Steps:** The author explicitly mentions the paper as a collaboration. 
**Candidate Tags:** <recommend>, <discussion> 
**Final Answer:** <discussion>


In [11]:
test_input_2 = """
You are an expert annotator tasked with converting social media posts about scientific research to a structured semantic format. The post contains external references in the form of links (URLs). Your job is to select, for each reference, the tags best characterizing the relation of the post to the reference.

The tags are to be selected from a predefined set of tags. The available tag types are:
<announce> the reference is a new research output being announced by the post. The announcement is likely made by the authors but may be a third party. A research output could be a paper, dataset or other type of research that is being announced publicly.
<discussion> this post discusses how the cited reference relates to other facts or claims. For example, post might discuss how the cited reference informs questions, provides evidence, or supports or opposes claims.
<review> the reference is being reviewed by the post. The reference could be a book, article or movie, or other media content. The review could be positive or negative.
<job> the reference is a job listing, for example a call for graduate students or faculty applications.
<event> the reference is an invitation to an event, either a real-world or an online event. Any kind of event is relevant, some examples of such events could be seminars, meetups, or hackathons. This tag shold only be used for invitations to events, not for posts describing other kinds of events.
<reading> this post describes the reading status of the author in relation to this reference, which could be a book, article or other written media. The author may either have read the reference in the past, is reading the reference in the present, or is looking forward to reading the reference in the future.
<listening> this post describes the listening status of the author in relation to this reference, such as a podcast or radio station. The author may have listened to the content in the past, is listening to the content in the present, or is looking forward to listening the content in the future.
<watching> this post describes the watching status of the author in relation to this reference, such as a video or movie. The author may have watched the content in the past, is watching the content in the present, or is looking forward to watching the content in the future.
<recommendation> The author is recommending the referebce, which can be any kind of content: an article, a movie, podcast, book, another post, etc. This tag can also be used for cases of implicit recommendation, where the author is expressing enjoyment of some content but not explicitly recommending it.
<quote> this post is quoting text from the reference. Symbols like ">" or quotation marks are often used to indicate quotations. 
<question> this post is raising a question or questions about the reference. The content could be a research paper or other media like a podcast, video or blog post.

A user will pass in a post, and you should think step by step, before selecting a set of tags for each reference that best that reference's relation with the post.

Each reference will be marked by a number for convenient identification, in order of appearance in the post. The first reference will be number 1, the second 2, etc.

Your final answer should be structured as follows:
Reference Number: (number of current reference)
Reasoning Steps: (your reasoning steps)
Candidate Tags: (For potential each tag you choose, explain why you chose it.)
Final Answer: (a set of final tags, based on the Candidate Tags. The final tags must be included in the Candidate Tags list!)

# Input post text:
Author: Shashank Gupta ✈️ ICLR'24
Content: 🚨New Paper Alert🚨

Beware! While personas excel at refining LLM behavior, they can bring deep-rooted biases to the surface, diminishing LLM's core competencies 😲

Our study reveals a surprising finding – Personas can degrade LLMs' reasoning by a massive 70%! 🤯

🧵https://arxiv.org/abs/2311.04892

📌Project: https://allenai.github.io/persona-bias
📌Code: https://github.com/allenai/persona-bias
📌Dataset: https://huggingface.co/datasets/allenai/persona-bias

References:
1: https://arxiv.org/abs/2311.04892
2: https://allenai.github.io/persona-bias
3: https://github.com/allenai/persona-bias
4: https://huggingface.co/datasets/allenai/persona-bias

# Output:
"""

In [14]:
res = parser.chat(test_input_2)
print(res)

**Reference 1: https://arxiv.org/abs/2311.0 viciss**

Reasoning Steps: The author explicitly mentions a new research paper that reveals a surprising finding about the negative impact of personas on LLMs' reasoning abilities.


Candidate Tags: 
- <announce> - The author is announcing the release of a new research paper.
- <review> - The paper is being reviewed by the post.


Final Answer: **<announce><review>**


**Reference 2: https://allenai.github.io/persona-bias**

Reasoning Steps: The project page provides information about the paper and related resources. 


Candidate Tags: 
- <review> - The page reviews and summarizes the paper.
- <recommendation> - The page recommends the paper.


Final Answer: **<review><recommendation>**


**Reference 3: https://github.com/allenai/persona-bias**

Reasoning Steps: The code page is related to the paper and provides access to the associated code.


Candidate Tags: 
- <review> - The page reviews the paper's code. 
- <job> - This could potentially 

In [15]:
config = init_multi_chain_parser_config(llm_type="anthropic/claude-3-opus",
                                        post_process_type="combined")
multi_chain_parser = MultiChainParser(config)
parser = multi_chain_parser.pparsers["topics"]


[32m2024-04-22 10:36:06.412[0m | [1mINFO    [0m | [36mdesci_sense.shared_functions.parsers.multi_chain_parser[0m:[36m__init__[0m:[36m64[0m - [1mInitializing MultiChainParser. PostProcessType=combined[0m
[32m2024-04-22 10:36:06.415[0m | [1mINFO    [0m | [36mdesci_sense.shared_functions.parsers.multi_chain_parser[0m:[36m__init__[0m:[36m71[0m - [1mInitializing post parsers...[0m
[32m2024-04-22 10:36:06.415[0m | [1mINFO    [0m | [36mdesci_sense.shared_functions.parsers.post_parser_chain[0m:[36m__init__[0m:[36m26[0m - [1mInitializing parser chain 'refs_tagger' [0m
[32m2024-04-22 10:36:06.453[0m | [1mINFO    [0m | [36mdesci_sense.shared_functions.parsers.post_parser_chain[0m:[36m__init__[0m:[36m26[0m - [1mInitializing parser chain 'topics' [0m
[32m2024-04-22 10:36:06.482[0m | [1mINFO    [0m | [36mdesci_sense.shared_functions.parsers.post_parser_chain[0m:[36m__init__[0m:[36m26[0m - [1mInitializing parser chain 'keywords' [0m


In [16]:
res = parser.chat(test_input_2)
print(res)

Reference Number: 1
Reasoning Steps:
The post is announcing a new research paper, with a link to the paper on arxiv.org. This suggests the post is primarily focused on announcing this new research output.

The post also provides a brief summary of the key findings from the paper, discussing how personas can introduce biases and degrade the reasoning abilities of large language models. This suggests there is also some high-level discussion of the paper's contents and implications.

Candidate Tags:
<announce> - The post is announcing a new research paper, so this tag is clearly applicable.
<discussion> - The post provides a brief discussion of the key findings and implications of the paper, so this tag could potentially apply. However, the discussion is fairly brief and high-level, so it may not be substantial enough to warrant this tag.

Final Answer: <announce>

Reference Number: 2
Reasoning Steps: 
The link points to a webpage that seems to be a project page associated with the resear

In [20]:
test_input_json = """
You are an expert annotator tasked with converting social media posts about scientific research to a structured semantic format. The post contains external references in the form of links (URLs). Your job is to select, for each reference, the tags best characterizing the relation of the post to the reference.

The tags are to be selected from a predefined set of tags. The available tag types are:
<announce> the reference is a new research output being announced by the post. The announcement is likely made by the authors but may be a third party. A research output could be a paper, dataset or other type of research that is being announced publicly.
<discussion> this post discusses how the cited reference relates to other facts or claims. For example, post might discuss how the cited reference informs questions, provides evidence, or supports or opposes claims.
<review> the reference is being reviewed by the post. The reference could be a book, article or movie, or other media content. The review could be positive or negative.
<job> the reference is a job listing, for example a call for graduate students or faculty applications.
<event> the reference is an invitation to an event, either a real-world or an online event. Any kind of event is relevant, some examples of such events could be seminars, meetups, or hackathons. This tag shold only be used for invitations to events, not for posts describing other kinds of events.
<reading> this post describes the reading status of the author in relation to this reference, which could be a book, article or other written media. The author may either have read the reference in the past, is reading the reference in the present, or is looking forward to reading the reference in the future.
<listening> this post describes the listening status of the author in relation to this reference, such as a podcast or radio station. The author may have listened to the content in the past, is listening to the content in the present, or is looking forward to listening the content in the future.
<watching> this post describes the watching status of the author in relation to this reference, such as a video or movie. The author may have watched the content in the past, is watching the content in the present, or is looking forward to watching the content in the future.
<recommendation> The author is recommending the referebce, which can be any kind of content: an article, a movie, podcast, book, another post, etc. This tag can also be used for cases of implicit recommendation, where the author is expressing enjoyment of some content but not explicitly recommending it.
<quote> this post is quoting text from the reference. Symbols like ">" or quotation marks are often used to indicate quotations. 
<question> this post is raising a question or questions about the reference. The content could be a research paper or other media like a podcast, video or blog post.

A user will pass in a post, and you should think step by step, before selecting a set of tags for each reference that best that reference's relation with the post.

Each reference will be marked by a number for convenient identification, in order of appearance in the post. The first reference will be number 1, the second 2, etc.

Your final answer should be structured as a list of objects in JSON format with the following schema:

```
class SubAnswer
	ref_number: int # ID number of current reference
	reasoning_steps: str # your reasoning steps
	candidate_tags: str # For potential each tag you choose, explain why you chose it.
	final_answer: List[str] # a set of final tags, based on the Candidate Tags. The final tags must be included in the Candidate Tags list!
```
    
# Input post text:
Author: Yogi Jaeger 💙 @yoginho@spore.social
Content: I just (re)discovered this recording of my "How Organisms Come to Know the World" talk at the Max Planck in Leipzig in 2022: https://cbs.mpg.de/cbs-coconut/video/jaeger.

It covers this paper: https://frontiersin.org/articles/10.3389/fevo.2021.806283/full.

A wonderful collaboration with Andrea Roli & Stu Kauffman.

References:
1: https://cbs.mpg.de/cbs-coconut/video/jaeger
2: https://frontiersin.org/articles/10.3389/fevo.2021.806283/full

# Output:
"""

In [18]:
res = parser.chat(test_input_2)
print(res)

Reference Number: 1
Reasoning Steps:
1. The post announces a new research paper, with a link to the paper on arXiv.
2. The post provides a brief summary of the key findings from the paper.
3. This reference is the primary focus of the post, with the other references providing supplementary information.

Candidate Tags:
1. <announce>: The post is announcing a new research paper, which is linked to by this reference. This tag is appropriate because the post is sharing and publicizing this new research output.

Final Answer: <announce>

Reference Number: 2
Reasoning Steps:
1. This reference links to a project page related to the research paper announced in the post.
2. The project page likely provides additional details, visualizations, or interactive elements to supplement the research paper.
3. While related to the main research output, this reference is not the primary focus of the post.

Candidate Tags:
1. <announce>: While not the main research output, this project page is being anno

In [21]:
res = parser.chat(test_input_json)
print(res)

Here is the output in the requested JSON format:

[
  {
    "ref_number": 1,
    "reasoning_steps": "This reference is a link to a video recording of the author giving a talk titled 'How Organisms Come to Know the World' at the Max Planck Institute in Leipzig in 2022. The post indicates the author just rediscovered this recording, implying they had seen it before but are mentioning it again now.",
    "candidate_tags": "<announce> While the author is sharing this video, it doesn't seem to be a new research output they are announcing for the first time, so this tag doesn't fit.
<discussion> The post doesn't go into details discussing or relating the content of the talk to other facts or claims. 
<review> The author expresses no opinion on the quality of the talk, positive or negative, so this is not a review.
<recommendation> By sharing the link and drawing attention to it, the author seems to be recommending that others watch this recording of their talk. The positive phrasing 'I just 