Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

textTopic has fatal error #173

Open
scm1210 opened this issue Apr 17, 2024 · 6 comments
Open

textTopic has fatal error #173

scm1210 opened this issue Apr 17, 2024 · 6 comments

Comments

@scm1210
Copy link

scm1210 commented Apr 17, 2024

I'm trying to run textTopic to create a topic model in R v. 4.3.3 using the following code with the CRAN version of text (v. 1.2)

topics_result <- textTopics(
  data = textData,
  variable_name = "Text",  
  embedding_model = "miniLM",  
  umap_model = "default",
  hdbscan_model = "default",
  vectorizer_model = "default",
  representation_model = "mmr",
  num_top_words = 10,
  n_gram_range = c(1, 3),
  stopwords = "english",
  min_df = 5,
  bm25_weighting = FALSE,
  reduce_frequent_words = TRUE,
  set_seed = 8,
  save_dir = "~/study1_results") #save results

However, whenever I try to run the script, I encounter a fatal error that crashes R and the session is aborted with no explanation (e.g., vector memory exhausted). The last code that is printed in the console is:

2024-04-17 10:59:23,493 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.

Does anyone have insights into how to possibly resolve this issue?

@scm1210
Copy link
Author

scm1210 commented Apr 17, 2024

This seems to be the root of the issue

*** caught segfault ***

 *** caught segfault ***
address 0x600, cause 'memory not mapped'
address 0x600, cause 'memory not mapped'

 *** caught segfault ***

 *** caught segfault ***
address 0x600, cause 'memory not mapped'
address 0x600, cause 'memory not mapped'

 *** caught segfault ***

 *** caught segfault ***
address 0x600, cause 'memory not mapped'
address 0x600, cause 'memory not mapped'

@OscarKjell
Copy link
Owner

Thanks for the feedback.

Are you able to run the test within the package? Please try running:

  # Load and prepare data
  data1 <- Language_based_assessment_data_8[c("satisfactiontexts", "swlstotal")]
  colnames(data1) <- c("text", "score")

  data2 <- Language_based_assessment_data_8[c("harmonytexts", "hilstotal")]
  colnames(data2) <- c("text", "score")

  data3 <- Language_based_assessment_data_3_100[1:2]
  colnames(data3) <- c("text", "score")

  data <- dplyr::bind_rows(data1, data2, data3)

  # Create BERTopic model trained on data["text"] help(textTopics)
  bert_model <- textTopics(data = data,
                           variable_name = "text",
                           embedding_model = "distilroberta",
                           min_df = 2,
                           set_seed = 8,
                           save_dir="./results")

@scm1210
Copy link
Author

scm1210 commented Apr 18, 2024

Running this code generates the same result: fatal error, R session aborted

*** caught segfault ***

 *** caught segfault ***

 *** caught segfault ***
address 0x600, cause 'memory not mapped'
address 0x600, cause 'memory not mapped'

 *** caught segfault ***
address 0x600, cause 'memory not mapped'

 *** caught segfault ***
address 0x600, cause 'memory not mapped'

 *** caught segfault ***
address 0x600, cause 'memory not mapped'
address 0x600, cause 'memory not mapped'

Traceback:
 1: py_call_impl(callable, call_args$unnamed, call_args$named)
 2: create_bertopic_model(data = data, data_var = variable_name,     embedding_model = embedding_model, umap_model = umap_model,     hdbscan_model = hdbscan_model, vectorizer_model = vectorizer_model,     representation_model = representation_model, top_n_words = num_top_words,     n_gram_range = n_gram_range, min_df = min_df, bm25_weighting = bm25_weighting,     reduce_frequent_words = reduce_frequent_words, stop_words = stopwords,     seed = set_seed, save_dir = save_dir)
 3: 
Traceback:

Traceback:

Traceback:

Traceback:

@OscarKjell
Copy link
Owner

after installing the GitHub version – have you rerun:

textrpp_install()
textrpp_initialize(save_profile = TRUE)

@scm1210
Copy link
Author

scm1210 commented Apr 18, 2024

yes i did.

i did a hard uninstall of my conda environment too and found that there was a file permission issue on my end (not sure how that happened). so i addressed it using this.

i just ran

data1 <- Language_based_assessment_data_8[c("satisfactiontexts", "swlstotal")]
colnames(data1) <- c("text", "score")

data2 <- Language_based_assessment_data_8[c("harmonytexts", "hilstotal")]
colnames(data2) <- c("text", "score")

data3 <- Language_based_assessment_data_3_100[1:2]
colnames(data3) <- c("text", "score")

data <- dplyr::bind_rows(data1, data2, data3)

# Create BERTopic model trained on data["text"] help(textTopics)
# Create BERTopic model trained on data["text"] help(textTopics)
bert_model <- textTopics(data = data,
                         variable_name = "text",
                         embedding_model = "distilroberta",
                         min_df = 2,
                         set_seed = 8,
                         save_dir="./results")

again and am still getting the fatal error

@OscarKjell
Copy link
Owner

hmm, which OS are you using.

The example above is tested on different OS, where you can find more specifics here: https://github.com/OscarKjell/text/actions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants