7/29 (Tue)

---

# Frequency Analysis of Converted Codes in L2 Japanese Pragmatics Instruction Research

To synthesize target, treatment, learner, outcome measure features, this notebook calculates frequency/proportion of identified codes.
This notebook converts the original coding results to more abstract codes to count frequencies.

The following code block imports Python packages.

In [1]:
import json
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns


The following code block defines a global variable of the path to data directry.

In [2]:
DATA_DIR = Path.cwd().parents[1] / "data"

### Utility Functions 

The following code block defines a function to search top key of hierarchical dictionray from a value.

In [3]:
def find_top_key(d: dict[str, dict|list|str], target: str) -> str|None:
    for key, value in d.items():
        if isinstance(value, dict):
            # 再帰的に探す
            found = find_top_key(value, target)
            if found:  # 下層から見つかったら、上位層のkeyを返す
                return key
        elif isinstance(value, list):
            if target in value:
                return key # 下層から見つかったら、上位層のkeyを返す
        else:
            if value == target:
                return key
    return None # 見つからない場合はNone

The following code block defines a function to remove redundant commas from multi-value cells.

In [4]:
def remove_redundant_commas(multi_val: str) -> str:
    if pd.isna(multi_val):
        return multi_val

    cleaned_val_list = []
    for val in multi_val.split(", "):
        if val == "":
            continue
        if val == ",":
            continue
        cleaned_val_list.append(val)

    return ", ".join(cleaned_val_list)

---

## 1. Data Loading & Preprocess

### 1.1. Data Loading

The following code block loads the table of coding results (data/external/coding_result_ver_1.tsv).

In [5]:
tsv_path = DATA_DIR / "external/coding_result_ver_1.tsv"

df_coding_result_raw = pd.read_table(tsv_path, sep="\t", skiprows=[0, 1], header=0)

The following code block loads code hierarchy dictionary (data/external/code_hierarchy_ver_1.json).

In [6]:
with open(DATA_DIR / "external/code_hierarchy_ver_1.json") as f:
    code_hierarcy = json.load(f)

### 1.2. Preprocessing

The following code block converts a group column by assigning either explicit or implicit code.

In [7]:
mask_explicit = df_coding_result_raw["Meta Pragmatic Information"]

# assign explicit/implicit codes to experimental design studies.
mask_experimental = df_coding_result_raw["Group"].str.contains("Experimental")


df_coding_result_raw.loc[mask_experimental & mask_explicit, "Group"] \
    = df_coding_result_raw.loc[mask_experimental & mask_explicit, "Group"].str.replace("Experimental", "Explicit")
df_coding_result_raw.loc[mask_experimental & ~mask_explicit, "Group"] \
    = df_coding_result_raw.loc[mask_experimental & ~mask_explicit, "Group"].str.replace("Experimental", "Implicit")

# assign explicit/implicit codes to pre/posttest design studies.
mask_prepost_design = df_coding_result_raw["Group"].isna()

df_coding_result_raw.loc[mask_prepost_design & mask_explicit, "Group"] = "Explicit"
df_coding_result_raw.loc[mask_prepost_design & ~mask_explicit, "Group"] = "Implicit"

The following code block sets Study ID, Authors, Year, and Group as multiple indices.

In [8]:
multiple_indices = []

prev_study_id = ""
prev_authors = ""
prev_year = ""
for idx in df_coding_result_raw.index:
    cur_study_id = df_coding_result_raw.at[idx, "Study ID"]
    if pd.isna(cur_study_id):
        cur_study_id = prev_study_id

    cur_authors = df_coding_result_raw.at[idx, "Authors"]
    if pd.isna(cur_authors):
        cur_authors = prev_authors

    cur_year = df_coding_result_raw.at[idx, "Year"]
    if pd.isna(cur_year):
        cur_year = prev_year

    cur_group = df_coding_result_raw.at[idx, "Group"]

    multiple_indices.append((cur_study_id, cur_authors, cur_year, cur_group))

    prev_study_id = cur_study_id
    prev_authors = cur_authors
    prev_year = cur_year

multiple_indices = pd.MultiIndex.from_tuples(multiple_indices, names=["Study ID", "Authors", "Year", "Group"])

df_coding_result = df_coding_result_raw.drop(["Study ID", "Paper IDs", "Authors", "Year", "Group"], axis=1)
df_coding_result.index = multiple_indices

The following code blocks apply the following conversions:
1. Class year level to proficiency levels
    - (1st Year → Novice; 2nd Year → Intermediate; 3rd Year → Intermediate; 4th Year → Advanced)
2. *a soo desu ka* to *soo desu ka*
3. remove fill-in-the-blank tests from production task
    - If fill-in-the-blank is the only task used as production practice, set False in the Production column
4. remove Metacognitive Strategy Questionnaire from outcome measures and results
    - See comments on Google Spreadsheet.

In [9]:
# Class Year Lvl → Prof. Lvl
converter = {
    "1st Year": "Novice",
    "2nd Year": "Intermediate",
    "3rd Year": "Intermediate",
    "4th Year": "Advanced"
}

for class_year_level, proficiency in converter.items():
    mask_prof_is_nan = df_coding_result["Proficiency Level"].isna()
    mask_class_year_level = df_coding_result["Class Year Level"].str.contains(class_year_level)
    mask = mask_prof_is_nan & mask_class_year_level # Prof. Lvl が NA で，指定するクラスレベルの行を取り出すマスク

    df_coding_result.loc[mask, "Proficiency Level"] = proficiency

In [10]:
# a soo desu ka → soo desu ka
converted_linguistic_items_col = df_coding_result["Linguistic Items"].str.replace("a soo desu ka", "soo desu ka")
df_coding_result.loc[:, "Linguistic Items"] = converted_linguistic_items_col

In [11]:
# remove fill-in-the-blank in Production Task col.
converted_production_task_col = df_coding_result["Production Task"].str.replace(
    "Audio-visual Fill-in-the-blank Task", ""
)
converted_production_task_col = converted_production_task_col.str.replace(
    "Fill-in-the-blank Task", ""
)
df_coding_result.loc[:, "Production Task"] = converted_production_task_col

mask_no_production_task = converted_production_task_col == ""
mask_production_is_true = df_coding_result["Production"]
mask = mask_no_production_task & mask_production_is_true

df_coding_result.loc[mask, "Production"] = False
df_coding_result.loc[mask_no_production_task, "Production Task"] = np.nan

In [12]:
# remove metacognitive strategy questionnaire
df_coding_result.loc[9, "Outcome Measure 1"] = np.nan

result_col_1_start = "Within-group Results"
result_col_2_end = "Evidence of Potential Moderators/Reasons for Variety in Between-group Results"
df_coding_result.loc[9, result_col_1_start:result_col_2_end] = np.nan

### 1.3. Brief Summarization of Table

This subsection briefly summarizes the converted table to show research settings (i.e., country, institution, N participants, L1, proficiency levels), pragmatic targets, treatment type (i.e., explicit vs. implicit), outcome measures, Analysis (i.e., Qual. vs. Quan.), and Results (e.g., positive, negative, & no impact) for within-group and between-group results, respectively.

#### 1.3.1. Within-Ggroup Results

The following code block drops unrelated index.

In [13]:
mask_control = df_coding_result.index.get_level_values(3).str.contains("Control")
df_within_res = df_coding_result[~mask_control].copy(deep=True)

The following code blocks summarize the coding results.

In [14]:
# --- generate the context column, which contains values, like "US university".
country = df_within_res.loc[:, "Country"].copy(deep=True)
institution = df_within_res.loc[:, "Institution"].copy(deep=True)
context_col = country + " " + institution
context_col = context_col.ffill()

In [15]:
# --- generate the participant column, which contains values, like "English L1 Novice Learners (N = 13)". ---
## convert L1 columns
l1 = df_within_res.loc[:, "L1"].copy(deep=True)
l1[l1.isna()] = "Not Specified"
l1[l1.str.contains(",")] = "Mixed"
l1 = l1.apply(lambda lang: f"{lang} L1")

## convert N participant columns
n_participant = df_within_res.loc[:, "N Participants"].copy(deep=True)
n_participant = n_participant.apply(lambda n: f"(n={n})")

proficiency = df_within_res.loc[:, "Proficiency Level"].copy(deep=True)

## join L1, proficiency, and sample size information
participant_col = l1 + " " + proficiency + " " + n_participant

In [16]:
# --- generate the abstract pragmatic target column ---
pragmatic_target_col = df_within_res.loc[:, "Pragmatic Target"].copy(deep=True)

## get abstract codes
particle_list = code_hierarcy["Target Feature"]["Pragmatic Target"]["Particle"]
pragmatic_target_col = pragmatic_target_col.apply(lambda target: target if target not in particle_list else "Particle")

## fill NaN
pragmatic_target_col = pragmatic_target_col.ffill()

In [17]:
# --- generate the outcome measures column ---
outcome_measures = df_within_res.loc[:, "Outcome Measure 1":"Outcome Measure 4"].copy(deep=True)

## get abstract codes
outcome_measures = outcome_measures.map(
    lambda measure: find_top_key(code_hierarcy["Assessment Feature"], measure), na_action="ignore"
)

## join all outcome measures
outcome_measures_col = outcome_measures.loc[:, "Outcome Measure 1"].str.cat(
    [
        outcome_measures.loc[:, "Outcome Measure 2"],
        outcome_measures.loc[:, "Outcome Measure 3"],
        outcome_measures.loc[:, "Outcome Measure 4"]
    ],
    sep=", ",
    na_rep=""
)

## fill NaN
outcome_measures_col.loc[outcome_measures_col == ", , , "] = np.nan
outcome_measures_col = outcome_measures_col.ffill()
outcome_measures_col = outcome_measures_col.apply(remove_redundant_commas)

In [18]:
# --- generate the data column ---
analysis = df_within_res.loc[:, "Qualitative Analysis":"Quantitative Report Type"]

# judge mixed, qual. or quan
analysis_col = pd.Series(np.full(len(analysis), ""), index=analysis.index)
analysis_col[analysis["Qualitative Analysis"] & analysis["Quantitative Analysis"]] = "Mixed"
analysis_col[analysis["Qualitative Analysis"] & ~analysis["Quantitative Analysis"]] = "Qual"
analysis_col[~analysis["Qualitative Analysis"] & analysis["Quantitative Analysis"]] = "Quan"

# judge whether stats. tests were used
analysis_col[analysis["Quantitative Report Type"].str.contains("Statistical Test", na=False)] += "+"

# fill NaN
analysis_col.loc[analysis_col == ""] = np.nan
analysis_col = analysis_col.ffill()

In [19]:
# --- generate the result column ---
target_columns = [
    "Within-group Results",
    "Within-group Results.1",
    "Within-group Results.2",
    "Within-group Results.3",
]
results = df_within_res.loc[:, target_columns].copy(deep=True)

result_col = results["Within-group Results"].str.cat(
    [
        results["Within-group Results.1"],
        results["Within-group Results.2"],
        results["Within-group Results.3"],
    ],
    sep=", ",
    na_rep=""
)

result_col.loc[result_col == ", , , "] = np.nan
result_col = result_col.bfill()
result_col = result_col.apply(remove_redundant_commas)

In [20]:
# --- add generated columns and summarize brief within-group results ---

df_within_res.loc[:, "Context"] = context_col
df_within_res.loc[:, "Participant"] = participant_col
df_within_res.loc[:, "Target"] = pragmatic_target_col
df_within_res.loc[:, "Outcome Measures"] = outcome_measures_col
df_within_res.loc[:, "Analysis"] = analysis_col
df_within_res.loc[:, "Results"] = result_col

df_within_res.loc[:, "Context":"Results"].sort_index(level=1)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Context,Participant,Target,Outcome Measures,Analysis,Results
Study ID,Authors,Year,Group,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
9.0,"Cohen & Ishihara, Ishihara, Ishihara","2005, 2006, 2007",Explicit,US University,"English L1 Novice, Intermediate (n=22)",Speech Act,Production,Mixed,Positive
1.0,Gyogi,2016,Implicit,UK University,Mixed L1 Intermediate (n=14),Particle,"Production, Decision Making",Qual,"Positive, Positive"
8.0,"Hoshi, Hoshi","2017, 2022",Explicit,US University,Not Specified L1 Intermediate (n=14),Particle,"Comprehension/Perception, Comprehension/Percep...",Mixed,"Positive, Positive, Positive, Positive"
10.0,"Ishida, Ishida, Ishida","2007, 2009, 2009",Explicit Fall,US University,Mixed L1 Novice (n=14),Particle,Comprehension/Perception,Mixed,Positive
10.0,"Ishida, Ishida, Ishida","2007, 2009, 2009",Explicit Full,US University,English L1 Novice (n=6),Particle,Comprehension/Perception,Mixed,Positive
10.0,"Ishida, Ishida, Ishida","2007, 2009, 2009",Explicit Spring,US University,Mixed L1 Novice (n=12),Particle,Comprehension/Perception,Mixed,Positive
15.0,Iwai,2010,Explicit,US University,Not Specified L1 Novice (n=14),Small Talk,Production,Mixed,Positive
15.0,Iwai,2010,Explicit Expanded,US University,Not Specified L1 Novice (n=15),Small Talk,Production,Mixed,Positive
4.0,Kakegawa,2009,Explicit,US University,Mixed L1 Intermediate (n=11),Particle,Production,Mixed,Positive
14.0,Katayama,2012,Explicit,Canada University,Mixed L1 Intermediate (n=21),Particle,Production,Quan+,Positive


The table suggests that almost all studies showed positive impacts of pragmatic instruction regardless of learning targets, proficiency levels, instruction types, and outcome measures.
However, two studies, Tateyama (1998, 2001) and Tsujihara (2023) showed that the changes of pragmatic performance were not observed.
Both studies focused on speech act. 
In addition, Tsujihara reported the positive impacts of instruction in the measures of descision making and self-evaluation. 
Thus, the learning of Japanese speech act production might be relatively more difficult than other pragmatic targets.
Meanwhile, other studies targeting speech act (e.g., Ishihara, 2005, 2006, 2007; Tateyama 2006, 2007, 2008, 2009; Tsujihara, 2023) showed positive results in speech act production. 

To furhter analyze those differences in a more detailed manner, the following code block shows the table of treatment features.

In [21]:
target_study_id = [6, 9, 11, 5, 12]

treatment_feature_columns = [
    "Intervention Length",
    "Meta Pragmatic Information",
    "Input",
    "Input Enhancement",
    "Inductive Consciousness-Raising",
    "Deductive Consciousness-Raising",
    "Production",
    "Journal Writing/ Self-reflection",
    "Feedback",
    "Discussion"
]

df_within_res.loc[target_study_id, ["Results", "Outcome Measures"] + treatment_feature_columns].style.apply(
    lambda col: ["background-color: #d65f5f" if flag else "background-color: #5fba7d" for flag in col],
    subset=treatment_feature_columns[1:]
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Results,Outcome Measures,Intervention Length,Meta Pragmatic Information,Input,Input Enhancement,Inductive Consciousness-Raising,Deductive Consciousness-Raising,Production,Journal Writing/ Self-reflection,Feedback,Discussion
Study ID,Authors,Year,Group,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
6.0,"Tateyama, Tateyama, Tateyama, Tateyama","2006, 2007, 2008, 2009",Explicit,"Positive, Positive, Positive, Positive","Production, Production, Production, Comprehension/Perception",Medium Treatments,True,True,False,True,False,True,True,True,True
6.0,"Tateyama, Tateyama, Tateyama, Tateyama","2006, 2007, 2008, 2009",Explicit Expanded,"Positive, Positive, Positive, Positive","Production, Production, Production, Comprehension/Perception",Medium Treatments,True,True,False,False,False,True,False,False,False
9.0,"Cohen & Ishihara, Ishihara, Ishihara","2005, 2006, 2007",Explicit,Positive,Production,Medium Treatments,True,True,False,True,False,True,True,True,False
11.0,Tsujihara,2023,Explicit,"Positive, Positive","Production, Decision Making",Medium Treatments,True,True,False,False,True,True,True,True,True
5.0,"Tateyama, Tateyama","1998, 2001",Explicit,"No Impact, No Impact","Comprehension/Perception, Production",Medium Treatments,True,True,False,False,False,True,True,False,True
5.0,"Tateyama, Tateyama","1998, 2001",Implicit,"No Impact, No Impact","Comprehension/Perception, Production",Medium Treatments,False,True,False,True,False,True,True,False,False
12.0,Tsujihara,2023,Explicit,"No Impact, Positive, Positive","Production, Decision Making, Self-Evaluation",Long Treatments,True,True,False,False,True,True,True,True,True


One of the difference in studies was the instruction types (i.e., explicit vs. implict).
To facilitate the ability to understand the appropriateness of speech act use, it might be better to provide meta-pragmatic information.

Another difference in studies was the opportunity of consciousness-raising activities.
Theoretically, consciousness-raising activities can enhance learners' awareness of the link between forms and contexts, ultimately facilitate the ability to understand the appropriateness of speech act use or change learners' decision-making processes.
Thus, non-existance of consciousness-raising activities in the explicit group in Tateyama (1998, 2001) failed to improve their comprehension/perception ability, while the explicit group in Tsujihara (2023) could improve their decision making ability.
However, the regular-explicit group in Tateyama (2006, 2007, 2008, 2009) showed positive changes of comprehension/perception measures, it would be necessary to more detailed comparison between those studies.

Moreover, regarding the production, all groups received the opportunity of production practices, the table did not suggest any potential factors affect the within-group result differences.
Thus, it might be better to inspect what production tasks during instruction and tests were used in those studies.

To further compare those studies, the following table shows more detailed informaton of instructio and outcome measure materials.

In [22]:
target_study_id = [6, 9, 11, 5, 12]

material_columns = [
    "Production Task",
    "Outcome Measure 1",
    "Outcome Measure 2",
    "Outcome Measure 3",
    "Outcome Measure 4"
]

df_within_res.loc[target_study_id, ["Results", "Outcome Measures"] + material_columns]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Results,Outcome Measures,Production Task,Outcome Measure 1,Outcome Measure 2,Outcome Measure 3,Outcome Measure 4
Study ID,Authors,Year,Group,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
6.0,"Tateyama, Tateyama, Tateyama, Tateyama","2006, 2007, 2008, 2009",Explicit,"Positive, Positive, Positive, Positive","Production, Production, Production, Comprehens...","Oral Role-plays with an L2, Oral Role-plays wi...",Discourse Completion Task,Oral Discourse Completion Task,Role-plays,Acceptability Judgment
6.0,"Tateyama, Tateyama, Tateyama, Tateyama","2006, 2007, 2008, 2009",Explicit Expanded,"Positive, Positive, Positive, Positive","Production, Production, Production, Comprehens...",Oral Role-plays with an L2,,,,
9.0,"Cohen & Ishihara, Ishihara, Ishihara","2005, 2006, 2007",Explicit,Positive,Production,Discourse Completion Task,,Oral Discourse Completion Task,,
11.0,Tsujihara,2023,Explicit,"Positive, Positive","Production, Decision Making",Discourse Completion Task,Discourse Completion Task,Retrospective Interview,,
5.0,"Tateyama, Tateyama","1998, 2001",Explicit,"No Impact, No Impact","Comprehension/Perception, Production",Discourse Completion Task,Multiple-Choice Tests,Role-plays,,
5.0,"Tateyama, Tateyama","1998, 2001",Implicit,"No Impact, No Impact","Comprehension/Perception, Production",Discourse Completion Task,,,,
12.0,Tsujihara,2023,Explicit,"No Impact, Positive, Positive","Production, Decision Making, Self-Evaluation","Discourse Completion Task, Oral Role-plays wit...",Discourse Completion Task,Retrospective Interview,Self-evaluation,


In the comparison between Tateyama (1998, 2001) and Tateyama (2006, 2007, 2008, 2009) studies, the table suggests the gaps of modality between production practice task and outcome measures.
More specifically, in the explicit group in Tateyama (2006, 2007, 2008, 2009), learners engaged in oral production practices and took oral production tests.
On the other hand, the explicit group in Tateyama (1998, 2001) engaged in DCTs as practice, while the test was role-plays. 
This modality gap might be one reason of no-impact results in Tateyama (1998, 2001).

※ ただ，Ishihara の study も練習とテストに modality gap がある．一方で，こちらは DCT → Oral DCT なので，modality と task design の両方が違うことが1つの原因か？
※ 加えて，Tateyama (1998, 2001) は習熟度が低い学生が対象となっている．一方で，Ishihara は intermediate level learners が対象なので，この proficiency の差も関係するかも．

However, in the explicit group in Tsujihara (2023), there was no modality gap between practice and assessment. 
Given another explicit group in Tsujihara (2023) engaged in the same/similar instructions and assessments, it might be necessary to qualitatively compare those groups.

※ Tsujihara の study については，天井効果が考えられる．上級学習者を対象としており，かつ，最初から多用な形式を利用できるようになっている．
※ 加えて，SCOBAs があることでパフォーマンスが若干良くなっていること，pilot study に比べ練習の機会が少なかったことから，internalization のレベルには達していないが，verbalization のレベルには達しているかも．

In [37]:
material_columns = [
    "Production Task",
    "Outcome Measure 1",
    "Outcome Measure 2",
    "Outcome Measure 3",
    "Outcome Measure 4"
]

df_within_res.loc[:, ["Results", "Outcome Measures"] + material_columns]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Results,Outcome Measures,Production Task,Outcome Measure 1,Outcome Measure 2,Outcome Measure 3,Outcome Measure 4
Study ID,Authors,Year,Group,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1.0,Gyogi,2016,Implicit,"Positive, Positive","Production, Decision Making",Text Translation,Text Translation Task,Retrospective Interview,,
2.0,Yoshimi,2001,Explicit,Positive,Production,Oral Story Telling Task with an L1,Oral Storytelling Task,,,
3.0,Utashiro & Kawai,2009,Explicit,"Positive, Positive","Production, Comprehension/Perception",", Oral Role-plays with an L2,",Oral Proficiency Interview,Video-based Meaning Asking Task,,
4.0,Kakegawa,2009,Explicit,Positive,Production,Email Correspondence,Email Correspondence,,,
5.0,"Tateyama, Tateyama","1998, 2001",Explicit,"No Impact, No Impact","Comprehension/Perception, Production",Discourse Completion Task,Multiple-Choice Tests,Role-plays,,
5.0,"Tateyama, Tateyama","1998, 2001",Implicit,"No Impact, No Impact","Comprehension/Perception, Production",Discourse Completion Task,,,,
6.0,"Tateyama, Tateyama, Tateyama, Tateyama","2006, 2007, 2008, 2009",Explicit,"Positive, Positive, Positive, Positive","Production, Production, Production, Comprehens...","Oral Role-plays with an L2, Oral Role-plays wi...",Discourse Completion Task,Oral Discourse Completion Task,Role-plays,Acceptability Judgment
6.0,"Tateyama, Tateyama, Tateyama, Tateyama","2006, 2007, 2008, 2009",Explicit Expanded,"Positive, Positive, Positive, Positive","Production, Production, Production, Comprehens...",Oral Role-plays with an L2,,,,
7.0,"Narita, Narita","2009, 2012",Implicit,"Positive, Positive, Positive","Comprehension/Perception, Comprehension/Percep...",,Multiple-Choice Tests,Multiple-Choice Tests,Oral Discourse Completion Task,
8.0,"Hoshi, Hoshi","2017, 2022",Explicit,"Positive, Positive, Positive, Positive","Comprehension/Perception, Comprehension/Percep...","Oral Translation, Oral Unscripted Interaction ...",Metapragmatic Knowledge Explanation Task,Multiple-Choice Tests,Oral Spontaneous Interaction,Self-evaluation


Utashiro & Kawai は，reactive tokens を対象としており，practice と outcome measures の task にギャップあり (Role-plays vs. OPIs)
↔ 一方で，伸びが見られることから speech act の方が難しいかも...？

#### 1.3.2. Between-Group Results

This subsection briefly summarizes the coding results of between-group comparisons. 
More specifically, I separate studies which compare experimental and control groups and two instructions and summarize each result.

The following code block drops unrelated index (i.e., pretest-posttest design studies.)

In [24]:
mask = df_coding_result["Study Design"].ffill() == "Quasi-Experimental Design"

df_between_res = df_coding_result[mask].copy(deep=True)

The following code block summarizes the coded between-group results.

In [25]:
# --- generate the result column ---
target_columns = [
    "Between-group Results",
    "Between-group Results.1",
    "Between-group Results.2",
    "Between-group Results.3",
]
results = df_between_res.loc[:, target_columns].copy(deep=True)

result_col = results["Between-group Results"].str.cat(
    [
        results["Between-group Results.1"],
        results["Between-group Results.2"],
        results["Between-group Results.3"],
    ],
    sep=", ",
    na_rep=""
)

result_col.loc[result_col == ", , , "] = np.nan
result_col = result_col.apply(remove_redundant_commas)

The following code block (1) merges the summarized columns of within-group table (i.e., Participant, Target Outcome, Measures, Analysis) and (2) adds the above Result column to the between-group table

In [26]:
df_between_res = pd.merge(
    df_between_res, df_within_res.loc[:, "Participant":"Analysis"], how="left", left_index=True, right_index=True
)

df_between_res.loc[:, "Results"] = result_col

***Experimental vs. Control Group***

The following code block shows the experimental vs. control group comparison results.

In [27]:
exp_vs_cont_study_id = [2.0, 7.0, 8.0, 10.0, 15.0]
exceptional_groups = [
    (10.0, "Ishida, Ishida, Ishida", "2007, 2009, 2009", "Explicit Full"),
    (15.0, "Iwai", "2010", "Explicit Expanded")
]

df_between_res.loc[exp_vs_cont_study_id, "Participant":"Results"].drop(exceptional_groups)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Participant,Target,Outcome Measures,Analysis,Results
Study ID,Authors,Year,Group,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2.0,Yoshimi,2001,Explicit,Mixed L1 Intermediate (n=5),Particle,Production,Mixed,Positive
2.0,Yoshimi,2001,Control,,,,,
7.0,"Narita, Narita","2009, 2012",Implicit,English L1 Intermediate (n=22),Hearsay Evidential Marker,"Comprehension/Perception, Comprehension/Percep...",Mixed+,"Positive, Positive, Positive"
7.0,"Narita, Narita","2009, 2012",Control,,,,,
8.0,"Hoshi, Hoshi","2017, 2022",Explicit,Not Specified L1 Intermediate (n=14),Particle,"Comprehension/Perception, Comprehension/Percep...",Mixed,"Positive, Positive, Positive, Positive"
8.0,"Hoshi, Hoshi","2017, 2022",Control,,,,,
10.0,"Ishida, Ishida, Ishida","2007, 2009, 2009",Explicit Fall,Mixed L1 Novice (n=14),Particle,Comprehension/Perception,Mixed,Positive
10.0,"Ishida, Ishida, Ishida","2007, 2009, 2009",Control Fall,,,,,
10.0,"Ishida, Ishida, Ishida","2007, 2009, 2009",Explicit Spring,Mixed L1 Novice (n=12),Particle,Comprehension/Perception,Mixed,Positive
10.0,"Ishida, Ishida, Ishida","2007, 2009, 2009",Control Spring,,,,,


The table sugggests that pragmatic instruction is generally effective, aligning with previous review studies.

***Two Instruction Comparison***

The following code block shows the explicit vs. implicit comparison results.

In [28]:
exp_vs_imp_study_id = [5.0, 14.0]

df_between_res.loc[exp_vs_imp_study_id, "Participant":"Results"]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Participant,Target,Outcome Measures,Analysis,Results
Study ID,Authors,Year,Group,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
5.0,"Tateyama, Tateyama","1998, 2001",Explicit,Mixed L1 Novice (n=13),Speech Act,"Comprehension/Perception, Production",Mixed+,"No Impact, Negative"
5.0,"Tateyama, Tateyama","1998, 2001",Implicit,Mixed L1 Novice (n=14),Speech Act,"Comprehension/Perception, Production",Mixed+,
14.0,Katayama,2012,Explicit,Mixed L1 Intermediate (n=21),Particle,Production,Quan+,No Impact
14.0,Katayama,2012,Implicit,Mixed L1 Intermediate (n=19),Particle,Production,Quan+,


The table generally suggest that there are no difference between explicit and implicit groups, which contradicts previous pragmatic instruction reviews.
Moreover, Tateyama (1998, 2001) showed that the implicit group outperformed the explicit group production tests in terms of production measures.

To further explore those studies, the following table shows the detailed instructional features.

In [29]:
exp_vs_imp_study_id = [5.0, 14.0]

treatment_feature_columns = [
    "Intervention Length",
    "Meta Pragmatic Information",
    "Input",
    "Input Enhancement",
    "Inductive Consciousness-Raising",
    "Deductive Consciousness-Raising",
    "Production",
    "Journal Writing/ Self-reflection",
    "Feedback",
    "Discussion"
]

df_between_res.loc[
    exp_vs_imp_study_id, 
    ["Proficiency Level", "Results", "Outcome Measures"] + treatment_feature_columns
].style.apply(
    lambda col: ["background-color: #d65f5f" if flag else "background-color: #5fba7d" for flag in col],
    subset=treatment_feature_columns[1:]
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Proficiency Level,Results,Outcome Measures,Intervention Length,Meta Pragmatic Information,Input,Input Enhancement,Inductive Consciousness-Raising,Deductive Consciousness-Raising,Production,Journal Writing/ Self-reflection,Feedback,Discussion
Study ID,Authors,Year,Group,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
5.0,"Tateyama, Tateyama","1998, 2001",Explicit,Novice,"No Impact, Negative","Comprehension/Perception, Production",Medium Treatments,True,True,False,False,False,True,True,False,True
5.0,"Tateyama, Tateyama","1998, 2001",Implicit,Novice,,"Comprehension/Perception, Production",Medium Treatments,False,True,False,True,False,True,True,False,False
14.0,Katayama,2012,Explicit,Intermediate,No Impact,Production,Short Treatments,True,True,True,False,False,False,False,False,False
14.0,Katayama,2012,Implicit,Intermediate,,Production,Short Treatments,False,True,True,True,False,False,False,False,False


※ 片山は，short instruction をして，immediate posttest を実施している．一方で，tateyama は複数回の intervention を行い，別日に posttest を実行．
これらのposttestの実施時期が影響...？
つまり，片山は両群とも positive な結果になったが，immediate result だけでは？実際に，delayed posttest は点数が下がっており，定着はしていなさそう．

→ 念の為，どんなタスクを使って評価をしているか，詳しくみてみる

In [30]:
exp_vs_imp_study_id = [5.0, 14.0]

material_columns = [
    "Outcome Measure 1",
    "Outcome Measure 2",
    "Outcome Measure 3",
    "Outcome Measure 4"
]

df_between_res.loc[exp_vs_imp_study_id, ["Results", "Outcome Measures"] + material_columns]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Results,Outcome Measures,Outcome Measure 1,Outcome Measure 2,Outcome Measure 3,Outcome Measure 4
Study ID,Authors,Year,Group,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
5.0,"Tateyama, Tateyama","1998, 2001",Explicit,"No Impact, Negative","Comprehension/Perception, Production",Multiple-Choice Tests,Role-plays,,
5.0,"Tateyama, Tateyama","1998, 2001",Implicit,,"Comprehension/Perception, Production",,,,
14.0,Katayama,2012,Explicit,No Impact,Production,Discourse Completion Task,,,
14.0,Katayama,2012,Implicit,,Production,,,,


※ Katayama は outcome measures に DCT を使用している．一般に，role-plays よりも簡単だと考えられるため（永遠に考えられる; writing なので，やり直しが容易），それも withing-group result の結果がでた理由...？

※ これらの結果を踏まえて，L2 Jp. を教えるうえで，explicit vs. implicit に差はないという結論を付けるには早い．
第一に，Tateyama はそもそも両群とも withing-group differences がない，つまり explicit, implicit の指導をしても，伸びがないという結果である．実際に，Taguchi (2015) でも指摘があるように，指導とテストにギャップがあるため，この結果からは explicit と implicit のどちらが良いかは言えない．
第二に，Katayama の study は 指導の直後に immediate posttest，12日後に delayed posttest を実施している．Delayed posttest のスコアは pretest のスコアより有意に高いものの，immediate posttest よりは低く，知識が定着したとは言えない結果であった．言い換えると，学習者は n desu/n desu ka に関する発達が起きておらず，practice effect が見られただけといえる．加えて，laboratory setting の研究で，介入期間も短い．そのため，教室での長期的な explicit/implicit の両指導が durable な L2 Jp. Pragmatic competence の発達に寄与するのか，より調査は必要だろう．
また，Katayama の explicit instruction は implicit instruction を超えるにはシンプル過ぎたのかもしれない．Taguchi (2015) によると，meta-pragmatic instruction と production practice の combination が有効とされており，使用しながら気づきを得ることが有効なのかもしれない (cf., noticing hyp.)．

The following code block shows the explicit vs. explicit-expanded comparison results.

In [31]:
exp_vs_expexp_study_id = [6.0, 10.0, 13.0, 15.0]
exceptional_groups = [
    (10.0, "Ishida, Ishida, Ishida", "2007, 2009, 2009", "Control Fall"),
    (10.0, "Ishida, Ishida, Ishida", "2007, 2009, 2009", "Control Spring"),
    (15.0, "Iwai", "2010", "Control")
]

df_between_res.loc[exp_vs_expexp_study_id, "Participant":"Results"].drop(exceptional_groups)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Participant,Target,Outcome Measures,Analysis,Results
Study ID,Authors,Year,Group,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6.0,"Tateyama, Tateyama, Tateyama, Tateyama","2006, 2007, 2008, 2009",Explicit,Mixed L1 Intermediate (n=22),Speech Act,"Production, Production, Production, Comprehens...",Mixed+,"No Impact, No Impact, No Impact, No Impact"
6.0,"Tateyama, Tateyama, Tateyama, Tateyama","2006, 2007, 2008, 2009",Explicit Expanded,Mixed L1 Intermediate (n=24),Speech Act,"Production, Production, Production, Comprehens...",Mixed+,
10.0,"Ishida, Ishida, Ishida","2007, 2009, 2009",Explicit Full,English L1 Novice (n=6),Particle,Comprehension/Perception,Mixed,Positive
10.0,"Ishida, Ishida, Ishida","2007, 2009, 2009",Explicit Fall,Mixed L1 Novice (n=14),Particle,Comprehension/Perception,Mixed,Positive
10.0,"Ishida, Ishida, Ishida","2007, 2009, 2009",Explicit Spring,Mixed L1 Novice (n=12),Particle,Comprehension/Perception,Mixed,Positive
13.0,Kim,2016,Explicit Expanded,"Mixed L1 Intermediate, Advanced (n=33)",Particle,Comprehension/Perception,Quan+,No Impact
13.0,Kim,2016,Explicit,"Mixed L1 Intermediate, Advanced (n=32)",Particle,Comprehension/Perception,Quan+,
15.0,Iwai,2010,Explicit,Not Specified L1 Novice (n=14),Small Talk,Production,Mixed,Positive
15.0,Iwai,2010,Explicit Expanded,Not Specified L1 Novice (n=15),Small Talk,Production,Mixed,Positive


The table showed that half study showed expanded explicit groups outperformed regular groups, while half studies found no differences between expanded and regular explicit groups.
Since each study has different definition of expanded explicit instruction, to compare them in more detailed manner, the following table lists the treatment features.

興味深いことに，統計的検定をした研究では，有意差がないとう結果になっている．そのため，Ishida や Iwai の study も統計的検定をしたら，差はでないのかもしれない．
→ 実際，ANOVA の交互作用は「伸び方に差がないか」をみているのに対して，形式の使用頻度やスコアの平均などを見るだけでは，事後テストの結果の比較になりがち．ここで興味があるのは，事前→事後での伸びが，expanded explicit の方が大きいかで，その意味では Ishida の結果はあまりないと思う（天井効果で，explicit full は伸びがそもそも小さい; ただし，explicit fall・spring で事前・事後テストが異なるため，explicit full の「完全な」指導前→指導後の伸びは比較できない）．

→ Iwai の場合は，例えば n desu の出現頻度を目的変数，時刻・group・時刻 x group を説明変数とする，ポアソン回帰を構築し，時刻 x group が有意な変数となるか見れば良い
（時刻が有意 → instruction は n desu の使用回数を増やす; group が有意 → どちらかの群がより多くの/少ない n desu を使用; 時刻 x group が有意 → 群が n desu の使用頻度を調整; Frequency を見るに，group と 時刻 x group は有意な説明変数になるかも...？）
↔ Iwai の対象とする small talk は複合的な行為であるため，n desu の使用頻度に群間で有意な差があることがわかっても，small talk に差があることは言えない．

In [32]:
exp_vs_expexp_study_id = [6.0, 10.0, 13.0, 15.0]

exceptional_groups = [
    (10.0, "Ishida, Ishida, Ishida", "2007, 2009, 2009", "Control Fall"),
    (10.0, "Ishida, Ishida, Ishida", "2007, 2009, 2009", "Control Spring"),
    (15.0, "Iwai", "2010", "Control")
]

treatment_feature_columns = [
    "Intervention Length",
    "Meta Pragmatic Information",
    "Input",
    "Input Enhancement",
    "Inductive Consciousness-Raising",
    "Deductive Consciousness-Raising",
    "Production",
    "Journal Writing/ Self-reflection",
    "Feedback",
    "Discussion"
]

df_between_res.loc[
    exp_vs_expexp_study_id, 
    ["Proficiency Level", "Results", "Outcome Measures"] + treatment_feature_columns
].drop(
    exceptional_groups
).style.apply(
    lambda col: ["background-color: #d65f5f" if flag else "background-color: #5fba7d" for flag in col],
    subset=treatment_feature_columns[1:]
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Proficiency Level,Results,Outcome Measures,Intervention Length,Meta Pragmatic Information,Input,Input Enhancement,Inductive Consciousness-Raising,Deductive Consciousness-Raising,Production,Journal Writing/ Self-reflection,Feedback,Discussion
Study ID,Authors,Year,Group,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
6.0,"Tateyama, Tateyama, Tateyama, Tateyama","2006, 2007, 2008, 2009",Explicit,Intermediate,"No Impact, No Impact, No Impact, No Impact","Production, Production, Production, Comprehension/Perception",Medium Treatments,True,True,False,True,False,True,True,True,True
6.0,"Tateyama, Tateyama, Tateyama, Tateyama","2006, 2007, 2008, 2009",Explicit Expanded,Intermediate,,"Production, Production, Production, Comprehension/Perception",Medium Treatments,True,True,False,False,False,True,False,False,False
10.0,"Ishida, Ishida, Ishida","2007, 2009, 2009",Explicit Full,Novice,Positive,Comprehension/Perception,Long Treatments,True,True,False,True,False,True,False,False,True
10.0,"Ishida, Ishida, Ishida","2007, 2009, 2009",Explicit Fall,Novice,Positive,Comprehension/Perception,Long Treatments,True,False,False,False,False,True,False,False,True
10.0,"Ishida, Ishida, Ishida","2007, 2009, 2009",Explicit Spring,Novice,Positive,Comprehension/Perception,Long Treatments,True,True,False,True,False,True,False,False,True
13.0,Kim,2016,Explicit Expanded,"Intermediate, Advanced",No Impact,Comprehension/Perception,Short Treatments,True,True,False,False,False,False,False,False,True
13.0,Kim,2016,Explicit,"Intermediate, Advanced",,Comprehension/Perception,Short Treatments,True,True,False,False,False,False,False,False,True
15.0,Iwai,2010,Explicit,Novice,Positive,Production,Long Treatments,True,True,False,False,True,False,False,True,True
15.0,Iwai,2010,Explicit Expanded,Novice,Positive,Production,Long Treatments,True,True,False,False,True,False,True,True,True


Iwai の 2群は，additional treatment と self-reflection の有無が regular vs. expanded の差．
Kim は meta-pragmatic information の提示方法の質的な違いを比較．
Ishida は，2学期連続で intervention をしており，explicit fall = explicit spring < explicit fall + spring の順番で，intervention の期間は長く，explicit fall < explicit spring = explicit fall の順番で，input/indective consciousness-raising の有無に差．
Tateyama の2群は，additional activity の有無．

第一に，2群間に差があったとする Ishida, Iwai は novice learner を対象としている．Proficiency が低い場合は，単純に instruction が長かったり，self-reflection や inductive consciousness-raising がある方が伸びるのかも (or 効果が見やすいのかも)...？
一方で，Kim や Tateyama は質的な差が2群間にあることを報告している．

もう少しフラットな目で見ると，
1. meta-pragmatic information の提示方法の差は結果に影響を与えない (cf., Kim)
2. Production 練習をしない場合，additional treatment (e.g., self-evaluation) の有無は，伸びの差に影響...？（Iwai, Tateyama）
3. Production 練習がある場合，activity の差よりかは，intervention の長さの方が重要かも...？（Ishida, Tateyama）

統計的検定がどのように行われたのか詳しくみるために，analysis の詳細な方法をみてみる

In [33]:
exp_vs_expexp_study_id = [6.0, 10.0, 13.0, 15.0]

exceptional_groups = [
    (10.0, "Ishida, Ishida, Ishida", "2007, 2009, 2009", "Control Fall"),
    (10.0, "Ishida, Ishida, Ishida", "2007, 2009, 2009", "Control Spring"),
    (15.0, "Iwai", "2010", "Control")
]

analysis_columns = [
    "Quantitative Report Type",
    "Outcome Measure 1",
    "Outcome Measure 2",
    "Outcome Measure 3",
    "Outcome Measure 4"
]

df_between_res.loc[
    exp_vs_expexp_study_id, 
    ["Results", "Outcome Measures"] + analysis_columns
].drop(
    exceptional_groups
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Results,Outcome Measures,Quantitative Report Type,Outcome Measure 1,Outcome Measure 2,Outcome Measure 3,Outcome Measure 4
Study ID,Authors,Year,Group,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
6.0,"Tateyama, Tateyama, Tateyama, Tateyama","2006, 2007, 2008, 2009",Explicit,"No Impact, No Impact, No Impact, No Impact","Production, Production, Production, Comprehens...","Frequency, Statistical Test, Descriptive Stati...",Discourse Completion Task,Oral Discourse Completion Task,Role-plays,Acceptability Judgment
6.0,"Tateyama, Tateyama, Tateyama, Tateyama","2006, 2007, 2008, 2009",Explicit Expanded,,"Production, Production, Production, Comprehens...",,,,,
10.0,"Ishida, Ishida, Ishida","2007, 2009, 2009",Explicit Full,Positive,Comprehension/Perception,"Frequency, Descriptive Statistics",Acceptability Judgment,,,
10.0,"Ishida, Ishida, Ishida","2007, 2009, 2009",Explicit Fall,Positive,Comprehension/Perception,,,,,
10.0,"Ishida, Ishida, Ishida","2007, 2009, 2009",Explicit Spring,Positive,Comprehension/Perception,,,,,
13.0,Kim,2016,Explicit Expanded,No Impact,Comprehension/Perception,"Statistical Test, Descriptive Statistics",Multiple-Choice Tests,,,
13.0,Kim,2016,Explicit,,Comprehension/Perception,,,,,
15.0,Iwai,2010,Explicit,Positive,Production,Frequency,Oral Spontaneous Interaction,,,
15.0,Iwai,2010,Explicit Expanded,Positive,Production,,,,,


---

## 2. Frequency Analysis

### 2.1. Target Factors

This subsection calculates frequency and proportion of learning targets.