Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confuse about the data loader function #40

Open
A11en0 opened this issue Dec 28, 2021 · 6 comments
Open

Confuse about the data loader function #40

A11en0 opened this issue Dec 28, 2021 · 6 comments

Comments

@A11en0
Copy link

A11en0 commented Dec 28, 2021

Hi, thanks for your wonderful job. But I encounter confusion about the data loader function. Detail as below:

parser.add_argument('--data_path', type=str, default='data/20ng', help='directory containing data')
  1. I can't find any code that refers to the '--data_path' parameter, so why do we need to add it as input in the following command.
python main.py --mode train --dataset 20ng --data_path data/20ng --num_topics 50 --train_embeddings 1 --epochs 1000
  1. How do these two parameters doc_terms_file_name and terms_filename do? I don't understand, even I can't find 'tf_idf_doc_terms_matrix_time_window_1' anywhere (such as the provided dataset directory.)
vocab, training_set, valid, test_1, test_2 = data.get_data(doc_terms_file_name="tf_idf_doc_terms_matrix_time_window_1",
                                                           terms_filename="tf_idf_terms_time_window_1")
@liuh236
Copy link

liuh236 commented Feb 28, 2022

same question...

@lxkkk117
Copy link

me too, also encounter this problem...

@zhaoLLL
Copy link

zhaoLLL commented Mar 26, 2022

For the second question, you can find it in file data_espy_tweets.py
savemat(path_save.joinpath('tf_idf_doc_terms_matrix_time_window_1'), {"doc_terms_matrix": doc_terms_matrix})
savemat(path_save.joinpath('tf_idf_terms_time_window_1'), {"terms" : terms})

@manueltonneau
Copy link

I have the same problem.

@zhaoLLL thanks for your reply but how do the bow_X_tokens.mat and bow_X_counts.mat map to these two TF-IDF matrices?

@manueltonneau
Copy link

Since this repo doesn't seem to be curated anymore, I suggest you use another repo I just discovered: https://github.com/lffloyd/embedded-topic-model I was able to use ETM very easily with it.

@Littleele
Copy link

same question!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants