I’m having a strange issue with Homework 2. When I try to tokenize the bill titles from the training set, it appears to run for about a minute but then crashes R with a message that R needed to abort. If I try to knit the RMarkdown document, it just hangs forever at the block containing the tokenizer code. It does not crash if I try to tokenize the whole tibble instead of just the vector of titles, but that obviously just results in an empty tokenizer. I’ve experienced these issues on two machines both running RStudio 1.2.1335 on Windows 10, but each had different versions of R: 3.5.3 and 3.6.0. All of my latest code is up on my repo but I’ve copied the pertinent section below:
```{r prepare-data, cache=TRUE}
max_words = 10000
max_length = 100
input_train = congress_train$Title
tokenizer = text_tokenizer(num_words = max_words)
tokenizer %>% fit_text_tokenizer(input_train) # appears to freeze/crash here
sequences = texts_to_sequences(tokenizer, input_train)
word_indices = tokenizer$word_index
word_counts = tokenizer$word_counts
data = pad_sequences(sequences, max_length)
```
I’m having a strange issue with Homework 2. When I try to tokenize the bill titles from the training set, it appears to run for about a minute but then crashes R with a message that
R needed to abort. If I try to knit the RMarkdown document, it just hangs forever at the block containing the tokenizer code. It does not crash if I try to tokenize the whole tibble instead of just the vector of titles, but that obviously just results in an empty tokenizer. I’ve experienced these issues on two machines both running RStudio 1.2.1335 on Windows 10, but each had different versions of R: 3.5.3 and 3.6.0. All of my latest code is up on my repo but I’ve copied the pertinent section below: