In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from daemon_analysis_tools.file_handling import load_and_process_csv
from daemon_analysis_tools.data.utils import group_questions_by_journal
from daemon_analysis_tools.data.publisher import Publisher
from daemon_analysis_tools.file_handling import (
    save_answers_to_yaml,
    load_answers_from_yaml,
)

Load and process data:
- Group answers by publisher and journal, trying to uniform names written in slightly different ways.
- Store in a DataFrame

In [3]:
data = load_and_process_csv("../../data/raw/rdp.csv")

Get a `dict` labeled by publisher names of `dict`s labeled by journal names of `dict`s of `Question` instances. The `.answer` attribute contains the answers given by the respondents and the explanations text to motivate it.

In [4]:
grouped_questions = group_questions_by_journal(data)

## Resolve discrepancies

The `Question` class has a `.resolve_discrepancies` method which updates `Question.anwsers` with the correct answer.

For example, let's consider IOP's 2D Materials. Question 7 has discrepancies.

In [None]:
for journal, data in grouped_questions["Taylor & Francis"].items():
    print(journal)
    for question, answer in data.items():
        if answer.has_discrepancies():
            answer.print_qa()
    print("\n\n")

advanced_composite_materials
7. Timing of data release
  Resp. 0:
    Answer: Required data must be available prior to official publication.
    Explanation: At the point of submission, you will be asked if there is a data set associated with the paper. If you reply yes, you will be asked to provide the DOI, pre-registered DOI, hyperlink, or other persistent identifier associated with the data set(s).
  Resp. 1:
    Answer: Required data must be available prior to review process.
    Explanation: At the point of submission, you will be asked if there is a data set associated with the paper. If you reply yes, you will be asked to provide the DOI, pre-registered DOI, hyperlink, or other persistent identifier associated with the data set(s). If you have selected to provide a pre-registered DOI, please be prepared to share the reviewer URL associated with your data deposit, upon request by reviewers.



analytical_letters
7. Timing of data release
  Resp. 0:
    Answer: Required data must 

In [10]:
a=grouped_questions["Taylor & Francis"]["international_journal_of_nanomedicine"][3]
a.resolve_discrepancy?

[0;31mSignature:[0m
[0ma[0m[0;34m.[0m[0mresolve_discrepancy[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mcorrect_answer[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mint[0m[0;34m,[0m [0mNoneType[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdiscrepancy_reason[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mstr[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0;32mNone[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Resolve the discrepancy by choosing the correct answer.

:param correct_answer: The correct answer to resolve the discrepancy.
:param discrepancy_reason: The reason for the discrepancy, if any.
Select one of Text missing
            Language understanding
            Difficulty in matching information and question
            Other: free text
[0;31mFile:[0m      ~/Documents/papers/rdp_daemon/Journal-Research-Data-Policy/src/daemon_analy

Inconsistencies can be removed manually, passing the index of the correct respondent.

In [11]:
q_to_fix = 7
correct_id = 1
for j in [
    "advanced_composite_materials",
    "analytical_letters",
    "green_chemistry_letters_and_reviews",
    "journal_of_macromolecular_science_part_b",
    "materials_research_letters",
    "molecular_physics",
    "waves_in_random_and_complex_media",
    "science_and_technology_of_advanced_materials",
    "polycyclic_aromatic_compounds",
    "liquid_crystals",
    "materials_research_letters",
    "molecular_physics",
]:

    grouped_questions["Taylor & Francis"][j][q_to_fix].resolve_discrepancy(
        correct_answer=correct_id, discrepancy_reason="Language understanding"
    )

for i in range(1, 14):
    if i != 6:
        grouped_questions["Taylor & Francis"]["ferroelectrics"][i].resolve_discrepancy(
            correct_answer=0, discrepancy_reason="Language understanding"
        )

grouped_questions["Taylor & Francis"]["international_journal_of_nanomedicine"][
    3
].resolve_discrepancy(
    correct_answer=1,
    discrepancy_reason="Difficulty in matching information and question",
)

grouped_questions["Taylor & Francis"]["international_journal_of_nanomedicine"][
    4
].resolve_discrepancy(
    correct_answer=1,
    discrepancy_reason="Difficulty in matching information and question",
)

grouped_questions["Taylor & Francis"]["international_journal_of_nanomedicine"][
    8
].resolve_discrepancy(
    correct_answer=1,
    discrepancy_reason="Difficulty in matching information and question",
)

grouped_questions["Taylor & Francis"]["international_journal_of_nanomedicine"][
    13
].resolve_discrepancy(
    correct_answer=1,
    discrepancy_reason="Difficulty in matching information and question",
)

grouped_questions["Taylor & Francis"]["journal_of_macromolecular_science_part_b"][
    13
].resolve_discrepancy(correct_answer=0, discrepancy_reason="Language understanding")

grouped_questions["Taylor & Francis"]["molecular_crystals_and_liquid_crystals"][
    13
].resolve_discrepancy(correct_answer=0, discrepancy_reason="Language understanding")

grouped_questions["Taylor & Francis"]["nanotoxicology"][8].resolve_discrepancy(
    correct_answer=1,
    discrepancy_reason="Difficulty in matching information and question",
)

grouped_questions["Taylor & Francis"]["nanotoxicology"][10].resolve_discrepancy(
    correct_answer=0,
    discrepancy_reason="Difficulty in matching information and question",
)

grouped_questions["Taylor & Francis"]["nanotoxicology"][13].resolve_discrepancy(
    correct_answer=1,
    discrepancy_reason="Difficulty in matching information and question",
)

Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Difficulty in matching information and question
Difficulty in matching information and question
Difficulty in matching information and question
Difficulty in matching information and question
Language understanding
Difficulty in matching information and question
Difficulty in matching information and question
Difficulty in matching information and question


In [12]:
save_answers_to_yaml(
    grouped_questions,
    parent_folder="../../data/processed/all_answers",
    save_only=["Taylor & Francis"],
)

After doing this, the `.get_final_answer()` method returns the correct answer.