In [5]:
%load_ext autoreload
%autoreload 2

In [6]:
from daemon_analysis_tools.file_handling import load_and_process_csv
from daemon_analysis_tools.data.utils import group_questions_by_journal
from daemon_analysis_tools.data.publisher import Publisher
from daemon_analysis_tools.file_handling import (
    save_answers_to_yaml,
    load_answers_from_yaml,
)

Load and process data:
- Group answers by publisher and journal, trying to uniform names written in slightly different ways.
- Store in a DataFrame

In [7]:
data = load_and_process_csv("../../data/raw/rdp.csv")

Get a `dict` labeled by publisher names of `dict`s labeled by journal names of `dict`s of `Question` instances. The `.answer` attribute contains the answers given by the respondents and the explanations text to motivate it.

In [8]:
grouped_questions = group_questions_by_journal(data)

## Resolve discrepancies

The `Question` class has a `.resolve_discrepancies` method which updates `Question.anwsers` with the correct answer.

For example, let's consider IOP's 2D Materials. Question 7 has discrepancies.

In [9]:
for journal, data in grouped_questions["Wiley"].items():
    print(journal)
    for question, answer in data.items():
        if answer.has_discrepancies():
            answer.print_qa()
    print("\n\n")

advanced_energy_materials
1. Existence of research data policy
  Resp. 0:
    Answer: Research Data Policy (RDP) exists.
    Explanation: Wiley’s Data Sharing Policies

Wiley is committed to a more open research landscape, facilitating faster and more effective research discovery by enabling reproducibility and verification of data, methodology and reporting standards. We encourage authors of articles published in our journals to share their research data including, but not limited to: raw data, processed data, software, algorithms, protocols, methods, materials.

Refer to the table below to understand the various standardized data sharing policy categories:

	

Data availability statement is published1
	

Data has been shared2
	

Data has been peer reviewed3
	

Example Wiley journals

Encourages Data Sharing
	

Optional
	

Optional
	

Optional
	

Expects Data Sharing
	

Required
	

Optional
	

Optional
	

British Journal of Social Psychology

Mandates Data Sharing
	

Required
	

Requi

Inconsistencies can be removed manually, passing the index of the correct respondent.

In [10]:
for j in ["chirality"]:
    for i in [2, 3, 4, 5, 7, 8]:
        grouped_questions["Wiley"][j][i].resolve_discrepancy(
            correct_answer=1,
            discrepancy_reason="Text not found",
        )
    
    for i in [11, 12]:
        grouped_questions["Wiley"][j][i].resolve_discrepancy(
            correct_answer=0,
            discrepancy_reason="Language understanding",
        )

for j in ["ce-papers",
          "journal_of_physical_organic_chemistry"]:
    for i in [1, 2, 3, 4, 5, 8, 11, 12]:
        grouped_questions["Wiley"][j][i].resolve_discrepancy(
            correct_answer=0,
            discrepancy_reason="Text not found",
        )

for j in ["advanced_energy_materials",
          "advanced_functional_materials",
          "advanced_materials",
          "carbon_energy",
          "infomat",
          "microscopy_research_and_technique",
          "small",
          ]:
    for i in [1, 4, 5, 8]:
        grouped_questions["Wiley"][j][i].resolve_discrepancy(
            correct_answer=0,
            discrepancy_reason="Text not found",
        )

    for i in [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]:
        grouped_questions["Wiley"][j][i].resolve_discrepancy(
            correct_answer=0,
            discrepancy_reason="Language understanding",
        )

for j in ["angewandte_chemie",
          "angewandte_chemie_international_edition",
          "batteries_and_supercaps",
          "chemistry_a_european_journal",
          "chemistryselect"]:
    for i in [2, 4, 5, 7, 8]:
        grouped_questions["Wiley"][j][i].resolve_discrepancy(
            correct_answer=0,
            discrepancy_reason="Text not found",
        )
    
    for i in [10, 11, 12]:
        grouped_questions["Wiley"][j][i].resolve_discrepancy(
            correct_answer=0,
            discrepancy_reason="Language understanding",
        )
    
    for i in [13]:
        grouped_questions["Wiley"][j][i].resolve_discrepancy(
            correct_answer=1,
            discrepancy_reason="Language understanding",
        )

Text not found
Text not found
Text not found
Text not found
Text not found
Text not found
Language understanding
Language understanding
Text not found
Text not found
Text not found
Text not found
Text not found
Text not found
Text not found
Text not found
Text not found
Text not found
Text not found
Text not found
Text not found
Text not found
Text not found
Text not found
Text not found
Text not found
Text not found
Text not found
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Text not found
Text not found
Text not found
Text not found
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Language understanding
Text not found
Text not found
Text not found

In [11]:
save_answers_to_yaml(
    grouped_questions,
    parent_folder="../../data/processed/all_answers",
    save_only=["Wiley"],
)

After doing this, the `.get_final_answer()` method returns the correct answer.