In [1]:
import sys
sys.path.append("../")

from enum import Enum

from desci_sense.parsers.multi_stage_parser import MultiStageParser
from desci_sense.configs import init_multi_stage_parser_config
from desci_sense.web_extractors.metadata_extractors import MetadataExtractionType, RefMetadata, extract_metadata_by_type

In [2]:
url = "https://www.sciencedirect.com/science/article/pii/S2352250X23002324"
md = extract_metadata_by_type(url, MetadataExtractionType.CITOID)

In [3]:
print(md.to_str())

url: https://www.sciencedirect.com/science/article/pii/S2352250X23002324
item_type: journalArticle
title: Updating the identity-based model of belief: From false belief to the spread of misinformation
summary: The spread of misinformation threatens democratic societies, hampering informed decision-making. Partisan identity biases perceptions of reality, promoting false beliefs. The Identity-based Model of Political Belief explains how social identity shapes information processing and contributes to misinf


In [4]:
config = init_multi_stage_parser_config()
config

{'general': {'parser_type': 'multi_stage', 'ref_metadata_method': 'citoid'},
 'model': {'model_name': 'mistralai/mistral-7b-instruct', 'temperature': 0.6},
 'ontology': {'versions': None},
 'prompt': {'template_dir': 'desci_sense/prompting/jinja/',
  'zero_ref_template_name': 'zero_ref_template.j2',
  'single_ref_template_name': 'single_ref_template.j2',
  'multi_ref_template_name': 'multi_ref_template.j2'},
 'wandb': {'entity': 'common-sense-makers', 'project': 'st-demo-sandbox'}}

In [5]:

parser = MultiStageParser(config)

[32m2024-01-30 15:26:53.581[0m | [1mINFO    [0m | [36mdesci_sense.parsers.multi_stage_parser[0m:[36m__init__[0m:[36m88[0m - [1mLoading parser model (type=mistralai/mistral-7b-instruct)...[0m
                    headers was transferred to model_kwargs.
                    Please confirm that headers is what you intended.
[32m2024-01-30 15:26:53.594[0m | [1mINFO    [0m | [36mdesci_sense.parsers.multi_stage_parser[0m:[36m__init__[0m:[36m98[0m - [1mLoading ontology...[0m


In [6]:
from desci_sense.schema.post import RefPost

from desci_sense.dataloaders import convert_text_to_ref_post

In [7]:
text = """Our new paper describes that "The Identity-based Model of Political Belief" and explains how social identity shapes information processing and contributes to the belief and spread of #misinformation
Partisanship involves cognitive and motivational aspects that shape party members' beliefs and actions. This includes whether they seek further evidence, where they seek that evidence, and which sources they trust. 
Understanding the interplay between social identity and accuracy is crucial in addressing misinformation.
To read the full paper:  https://www.sciencedirect.com/science/article/pii/S2352250X23002324"""

In [8]:
rp = convert_text_to_ref_post(text)
rp

RefPost(author='deafult_author', content='Our new paper describes that "The Identity-based Model of Political Belief" and explains how social identity shapes information processing and contributes to the belief and spread of #misinformation\nPartisanship involves cognitive and motivational aspects that shape party members\' beliefs and actions. This includes whether they seek further evidence, where they seek that evidence, and which sources they trust. \nUnderstanding the interplay between social identity and accuracy is crucial in addressing misinformation.\nTo read the full paper:  https://www.sciencedirect.com/science/article/pii/S2352250X23002324', url='', source_network='default_source', ref_urls=['https://www.sciencedirect.com/science/article/pii/S2352250X23002324'])

In [9]:
from desci_sense.parsers.multi_stage_parser import PromptCase

In [18]:
res = parser.process_by_case(rp, PromptCase.SINGLE_REF, metadata_list=[md])

In [19]:
print(res["answer"]["reasoning"])

[Reasoning Steps]

1. The post refers to a scientific paper titled "Updating the identity-based model of belief: From false belief to the spread of misinformation" and provides a brief summary of its content.
2. The paper discusses the role of social identity in shaping information processing and contributing to the belief and spread of misinformation.
3. The author of the post is likely interested in the paper and is sharing it with others.
4. The post does not contain any direct recommendation or review of the paper.
5. The post does not contain any direct question or discussion about the paper.
6. The post does not contain any direct quote from the paper.
7. The post does not contain any direct event or job announcement related to the paper.

[Candidate Tags]

1. <reading>: This tag is suitable as the author of the post is likely interested in the paper and is sharing it with others.
2. <announce>: This tag is also suitable as the post is about a new research paper.


In [20]:
print(res["answer"]["final_answer"])

<reading>, <announce>


In [21]:
print(res["full_prompt"])

You are an expert annotator tasked with converting social media posts about scientific research to a structured semantic format. The input post contains a reference to an external URL. Your job is to select the tags best characterizing the relation of the post to the external reference, from a predefined set of tags. 

The available tag types are:
<watching>: this post describes the watching status of the author in relation to a reference, such as a video or movie. The author may have watched the content in the past, is watching the content in the present, or is looking forward to watching the content in the future.
<reading>: this post describes the reading status of the author in relation to a reference, such as a book or article. The author may either have read the reference in the past, is reading the reference in the present, or is looking forward to reading the reference in the future.
<listening>: this post describes the listening status of the author in relation to a reference,

In [14]:
parser.prompt_case_dict[PromptCase.ZERO_REF]["type_templates"]

[{'display_name': '⬛ possible-missing-reference',
  'URI': None,
  'label': 'missing-ref',
  'prompt': 'this post seems to be referring to a reference by name but has not explicitly provided a URL link to the reference. For example, a post that discusses a book and mentions it by title, but contains no link to the book.',
  'notes': None,
  'valid_subject_types': ['post'],
  'valid_object_types': ['nan'],
  'versions': ['v0']},
 {'display_name': '🔭 discourse-graph/observation',
  'URI': None,
  'label': 'dg-observation',
  'prompt': 'this post is articulating a single, highly observation. The intuition is that observation notes should be as close to “the data” as possible. They should be similar to how results are described in results sections of academic publications.',
  'notes': None,
  'valid_subject_types': ['post'],
  'valid_object_types': ['nan'],
  'versions': ['v0']},
 {'display_name': '🫴 discourse-graph/claim',
  'URI': None,
  'label': 'dg-claim',
  'prompt': 'this post is a

In [15]:
parser.prompt_case_dict.keys()

dict_keys([<PromptCase.ZERO_REF: 'ZERO_REF'>, <PromptCase.SINGLE_REF: 'SINGLE_REF'>])

In [16]:
class MetadataExtractionType(Enum):
    NONE = "none"
    CITOID = "citoid"

In [17]:
MetadataExtractionType(MetadataExtractionType.NONE.value)

<MetadataExtractionType.NONE: 'none'>