Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cwb_makeall() may ignore registry provided #31

Closed
ablaette opened this issue Dec 9, 2021 · 1 comment
Closed

cwb_makeall() may ignore registry provided #31

ablaette opened this issue Dec 9, 2021 · 1 comment

Comments

@ablaette
Copy link
Collaborator

ablaette commented Dec 9, 2021

For cwb_makeall() I offer the following example:

registry <- if (!check_pkg_registry_files()) use_tmp_registry() else get_pkg_registry()
home_dir <- system.file(package = "RcppCWB", "extdata", "cwb", "indexed_corpora", "unga")

tmpdir <- normalizePath(tempdir(), winslash = "/")
tmp_regdir <- file.path(tmpdir, "registry_tmp", fsep = "/")
tmp_data_dir <- file.path(tmpdir, "indexed_corpora", fsep = "/")
tmp_unga_dir <- file.path(tmp_data_dir, "unga", fsep = "/")
if (!file.exists(tmp_regdir)) dir.create(tmp_regdir)
if (!file.exists(tmp_data_dir)) dir.create(tmp_data_dir)
if (!file.exists(tmp_unga_dir)){
   dir.create(tmp_unga_dir)
} else {
  file.remove(list.files(tmp_unga_dir, full.names = TRUE))
}
regfile <- readLines(file.path(registry, "unga"))
regfile[grep("^HOME", regfile)] <- sprintf('HOME "%s"', tmp_unga_dir)
writeLines(text = regfile, con = file.path(tmp_regdir, "unga"))
for (x in list.files(home_dir, full.names = TRUE)){
 file.copy(from = x, to = tmp_unga_dir)
}

# perform cwb_makeall (equivalent to cwb-makeall command line utility)
cwb_makeall(corpus = "UNGA", p_attribute = "word", registry = tmp_regdir)

Surprisingly, the files generated are not written to tmp_unga_dir provided as the home directory in the registry file unga in the tmp_regdir, but to the unga directory within the installed package.

My hypothesis is that the registry directory provided in the function call cwb_makeall() is ignored, if the corpus is already loaded. So cl_delete_corpus("UNGA") is necessary to trigger reloading the corpus.

@ablaette
Copy link
Collaborator Author

ablaette commented Dec 9, 2021

The corpus is force reloaded now, fixing this issue.

@ablaette ablaette closed this as completed Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant