This markdown is meant to ensure reproducibility of my paper, `Cross-paradigmatic Metaphorical Mapping: A case of Indonesian MAJU vs MUNDUR` (2026).

It is bipartite; the first part (`Data Retrieval`) consists of how I retrieve the data and the second one (`Data Analysis`) how I analyse it.

Note that, as you may have read on the paper that it employs semi-automatic data retrieval, the second part only account for manually-annotated data (root, voice, and meaning). Thus, the script in `Data Retrieval` only lets you reproduce the 'raw' keyword-in-context that results on false positive.

This document assumes that you already have the required data (`ind_news_2024_1M-sentences.txt` for `Data Retrieval` and `cross-paradigmatic-metaphorical-mapping_data.csv` for `Data Analysis`) on your environment. For the `.csv` file, this markdown is pre-configured to automatically fetch the `.csv` dataset directly from this repository and output the statistical charts and p-values discussed in the paper (see [README](https://github.com/DerrySAP/cross-paradigmatic-metaphorical-mapping/tree/main)). For the corpus, you may download it [here](https://cls.corpora.uni-leipzig.de/en?corpusLanguage=ind#tblselect).

###Data Retrieval

In [None]:
install.packages(c("dplyr", "remotes"))
install.packages("vctrs", dependencies = TRUE)
remotes::install_github("gederajeg/corplingr")

In [None]:
kwic <- corplingr::concord_leipzig(leipzig_path = "ind_news_2024_1M-sentences.txt", pattern = "\\b(?:\\w*(maju|mundur)\\w*)\\b") |>
  dplyr::mutate(term = tolower(node))

###Data Analysis

In [None]:
install.packages("dplyr")
install.packages("vctrs", dependencies = TRUE)

Installing packages into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)



In [None]:
kwic <- read.csv("https://raw.githubusercontent.com/DerrySAP/cross-paradigmatic-metaphorical-mapping/refs/heads/main/cross-paradigmatic-metaphorical-mapping_data.csv")|>
  dplyr::filter(!is.na(target))

In [None]:
### TESTING SIGNIFICANCE BETWEEN ROOT AND VOICE (FIGURE 1) ###
root_to_voice_tab <- table(kwic$root, kwic$morph)
root_to_voice_tab

root_to_voice_chisq <- suppressWarnings(chisq.test(root_to_voice_tab, correct = F))
root_to_voice_chisq$statistic
format(root_to_voice_chisq$p.value, scientific = FALSE)

        
         di-root-kan me-root-kan
  maju            44         751
  mundur           7           8

In [None]:
### TESTING SIGNIFICANCE BETWEEN ROOT AND VOICE (FIGURE 2) ###
## MAJU ##
maju_voice_to_target_tab <- kwic |>
  dplyr::filter(root == "maju")
maju_voice_to_target_tab <- table(maju_voice_to_target_tab$morph, maju_voice_to_target_tab$target)
maju_voice_to_target_tab

maju_voice_to_target_chisq <- suppressWarnings(chisq.test(maju_voice_to_target_tab, correct = F))
maju_voice_to_target_chisq$statistic
format(maju_voice_to_target_chisq$p.value, scientific = FALSE)

             
              advancement physical pol. nom. quality time
  di-root-kan           1        3        18       1   21
  me-root-kan          31        1        17     698    4

In [None]:
## MUNDUR ##
mundur_voice_to_target_tab <- kwic |>
  dplyr::filter(root == "mundur")
mundur_voice_to_target_tab <- table(mundur_voice_to_target_tab$morph, mundur_voice_to_target_tab$target)
mundur_voice_to_target_tab

mundur_voice_to_target_chisq <- suppressWarnings(chisq.test(mundur_voice_to_target_tab, correct = F))
mundur_voice_to_target_chisq$statistic
format(mundur_voice_to_target_chisq$p.value, scientific = FALSE)

             
              advancement physical pol. nom. quality time
  di-root-kan           0        1         2       1    3
  me-root-kan           1        1         0       1    5