In [1]:
%load_ext autoreload
%autoreload 2

In [None]:
from daemon_analysis_tools.file_handling import load_and_process_csv
from daemon_analysis_tools.data.utils import group_questions_by_journal
from daemon_analysis_tools.data.publisher import Publisher
from daemon_analysis_tools.file_handling import (
    save_answers_to_yaml,
    load_answers_from_yaml,
)

Load and process data:
- Group answers by publisher and journal, trying to uniform names written in slightly different ways.
- Store in a DataFrame

In [None]:
data = load_and_process_csv("../../data/raw/rdp.csv")

Get a `dict` labeled by publisher names of `dict`s labeled by journal names of `dict`s of `Question` instances. The `.answer` attribute contains the answers given by the respondents and the explanations text to motivate it.

In [4]:
grouped_questions = group_questions_by_journal(data)

## Resolve discrepancies

The `Question` class has a `.resolve_discrepancies` method which updates `Question.anwsers` with the correct answer.

For example, let's consider IOP's 2D Materials. Question 7 has discrepancies.

In [None]:
for journal, data in grouped_questions["APS"].items():
    print("#############################################################")
    print(journal)
    for question, answer in data.items():
        if answer.has_discrepancies():
            answer.print_qa()

#############################################################
physical_review_a
#############################################################
physical_review_applied
#############################################################
physical_review_b
#############################################################
physical_review_c
13. Code sharing requirements
  Resp. 0:
    Answer: Code sharing encouraged but optional.
    Explanation: Authors should, when possible, include a Data Availability Statement in the paragraph before the Acknowledgments that indicates the availability of data, software, code, and other materials
  Resp. 1:
    Answer: Code sharing not mentioned.
    Explanation: Data Availability Statements

Sharing data supports new research and the independent verification of findings. The Physical Review journals strongly encourage authors to share relevant data, code, software, and other materials that support their reported results by depositing them in open data repositories 

Inconsistencies can be removed manually, passing the index of the correct respondent.

In [None]:
grouped_questions["APS"]["physical_review_c"][13].resolve_discrepancy(
    correct_answer=0, discrepancy_reason="Language understanding"
)
grouped_questions["APS"]["physical_review_letters"][13].resolve_discrepancy(
    correct_answer=0, discrepancy_reason="Language understanding"
)
grouped_questions["APS"]["physical_review_research"][13].resolve_discrepancy(
    correct_answer=0, discrepancy_reason="Language understanding"
)
grouped_questions["APS"]["prx_quantum"][13].resolve_discrepancy(
    correct_answer=0, discrepancy_reason="Language understanding"
)

Language understanding
Language understanding
Language understanding
Language understanding


In [None]:
save_answers_to_yaml(
    grouped_questions,
    parent_folder="../../data/processed/all_answers",
    save_only=["APS"],
)

After doing this, the `.get_final_answer()` method returns the correct answer.