textTopic has fatal error #173

scm1210 · 2024-04-17T15:06:41Z

I'm trying to run textTopic to create a topic model in R v. 4.3.3 using the following code with the CRAN version of text (v. 1.2)

topics_result <- textTopics(
  data = textData,
  variable_name = "Text",  
  embedding_model = "miniLM",  
  umap_model = "default",
  hdbscan_model = "default",
  vectorizer_model = "default",
  representation_model = "mmr",
  num_top_words = 10,
  n_gram_range = c(1, 3),
  stopwords = "english",
  min_df = 5,
  bm25_weighting = FALSE,
  reduce_frequent_words = TRUE,
  set_seed = 8,
  save_dir = "~/study1_results") #save results

However, whenever I try to run the script, I encounter a fatal error that crashes R and the session is aborted with no explanation (e.g., vector memory exhausted). The last code that is printed in the console is:

2024-04-17 10:59:23,493 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.

Does anyone have insights into how to possibly resolve this issue?

The text was updated successfully, but these errors were encountered:

scm1210 · 2024-04-17T15:46:55Z

This seems to be the root of the issue

*** caught segfault ***

 *** caught segfault ***
address 0x600, cause 'memory not mapped'
address 0x600, cause 'memory not mapped'

 *** caught segfault ***

 *** caught segfault ***
address 0x600, cause 'memory not mapped'
address 0x600, cause 'memory not mapped'

 *** caught segfault ***

 *** caught segfault ***
address 0x600, cause 'memory not mapped'
address 0x600, cause 'memory not mapped'

OscarKjell · 2024-04-18T05:12:02Z

Thanks for the feedback.

Are you able to run the test within the package? Please try running:

  # Load and prepare data
  data1 <- Language_based_assessment_data_8[c("satisfactiontexts", "swlstotal")]
  colnames(data1) <- c("text", "score")

  data2 <- Language_based_assessment_data_8[c("harmonytexts", "hilstotal")]
  colnames(data2) <- c("text", "score")

  data3 <- Language_based_assessment_data_3_100[1:2]
  colnames(data3) <- c("text", "score")

  data <- dplyr::bind_rows(data1, data2, data3)

  # Create BERTopic model trained on data["text"] help(textTopics)
  bert_model <- textTopics(data = data,
                           variable_name = "text",
                           embedding_model = "distilroberta",
                           min_df = 2,
                           set_seed = 8,
                           save_dir="./results")

scm1210 · 2024-04-18T13:01:51Z

Running this code generates the same result: fatal error, R session aborted

*** caught segfault ***

 *** caught segfault ***

 *** caught segfault ***
address 0x600, cause 'memory not mapped'
address 0x600, cause 'memory not mapped'

 *** caught segfault ***
address 0x600, cause 'memory not mapped'

 *** caught segfault ***
address 0x600, cause 'memory not mapped'

 *** caught segfault ***
address 0x600, cause 'memory not mapped'
address 0x600, cause 'memory not mapped'

Traceback:
 1: py_call_impl(callable, call_args$unnamed, call_args$named)
 2: create_bertopic_model(data = data, data_var = variable_name,     embedding_model = embedding_model, umap_model = umap_model,     hdbscan_model = hdbscan_model, vectorizer_model = vectorizer_model,     representation_model = representation_model, top_n_words = num_top_words,     n_gram_range = n_gram_range, min_df = min_df, bm25_weighting = bm25_weighting,     reduce_frequent_words = reduce_frequent_words, stop_words = stopwords,     seed = set_seed, save_dir = save_dir)
 3: 
Traceback:

Traceback:

Traceback:

Traceback:

OscarKjell · 2024-04-18T17:40:57Z

after installing the GitHub version – have you rerun:

textrpp_install()
textrpp_initialize(save_profile = TRUE)

scm1210 · 2024-04-18T23:43:28Z

yes i did.

i did a hard uninstall of my conda environment too and found that there was a file permission issue on my end (not sure how that happened). so i addressed it using this.

i just ran

data1 <- Language_based_assessment_data_8[c("satisfactiontexts", "swlstotal")]
colnames(data1) <- c("text", "score")

data2 <- Language_based_assessment_data_8[c("harmonytexts", "hilstotal")]
colnames(data2) <- c("text", "score")

data3 <- Language_based_assessment_data_3_100[1:2]
colnames(data3) <- c("text", "score")

data <- dplyr::bind_rows(data1, data2, data3)

# Create BERTopic model trained on data["text"] help(textTopics)
# Create BERTopic model trained on data["text"] help(textTopics)
bert_model <- textTopics(data = data,
                         variable_name = "text",
                         embedding_model = "distilroberta",
                         min_df = 2,
                         set_seed = 8,
                         save_dir="./results")

again and am still getting the fatal error

OscarKjell · 2024-04-22T13:51:08Z

hmm, which OS are you using.

The example above is tested on different OS, where you can find more specifics here: https://github.com/OscarKjell/text/actions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

textTopic has fatal error #173

textTopic has fatal error #173

scm1210 commented Apr 17, 2024

scm1210 commented Apr 17, 2024

OscarKjell commented Apr 18, 2024

scm1210 commented Apr 18, 2024

OscarKjell commented Apr 18, 2024

scm1210 commented Apr 18, 2024

OscarKjell commented Apr 22, 2024

textTopic has fatal error #173

textTopic has fatal error #173

Comments

scm1210 commented Apr 17, 2024

scm1210 commented Apr 17, 2024

OscarKjell commented Apr 18, 2024

scm1210 commented Apr 18, 2024

OscarKjell commented Apr 18, 2024

scm1210 commented Apr 18, 2024

OscarKjell commented Apr 22, 2024