Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to find files #1

Open
JayaLekhrajani opened this issue Oct 31, 2017 · 11 comments
Open

Unable to find files #1

JayaLekhrajani opened this issue Oct 31, 2017 · 11 comments

Comments

@JayaLekhrajani
Copy link

JayaLekhrajani commented Oct 31, 2017

Hi Sudipta,

I am unable to find DATA_FILES_LIST, ORIGINAL_DATA_DIR, RAW_DATA_PATH in config.py file.
Also, what is mb_train_trial_test_new_prs.csv file for? The training and test data is in json format.

@JayaLekhrajani
Copy link
Author

Also, you have mentioned "from features import lexical, syntactic, writing_density,
sentiments, embeddings, generic_field_vectorizer" in extract_features.py, but I could not find all the files in features folder. Let me know if I am missing out on something.

@cryptexcode
Copy link
Owner

This source code is a part of a large project structure. The provided codes are basically for showing the deep learning architectures. This codebase "is not" a running system.
The config.py is used to keep track of filepaths and other feature combinations in the project. So you can easily ignore the irrelevant things. You have to put your own files and change the paths.
About the features, lexical, embeddings, and sentiment features are relevant. So just ignore the rest. Senticnet features are extracted during the preprocessing step and that code is in prepare_data.

For the project, the raw data files from the organizers were preprocessed first. Then the experiments were run. So there were several preprocessed files. You have to code to generate them. Not the actual files.
Thanks

@JayaLekhrajani
Copy link
Author

Thank You Sudipta for clarifying most of my doubts. I am still little confused about how you got the following files:
headline_train_trial_test.csv
mb_train_trial_test_new_prs.csv

For the first one, did you get csv file after merging train, trial and test json files?

@cryptexcode
Copy link
Owner

cryptexcode commented Oct 31, 2017

Exactly. All the data were merged into a single csv for easy manipulation.

@JayaLekhrajani
Copy link
Author

Hi Sudipta,
Sorry, to bother you again. But while merging the three json files, what values did you use for sentiment score of test data?

@cryptexcode
Copy link
Owner

Hi, if you go through the paper you will get the idea about the process. We used senticnet.

@JayaLekhrajani
Copy link
Author

Hi Sudipta,
I read your paper and it has clarified most of my doubts. The only doubts that I still have are:
(a)doc_to_sequence_csv module has not been used for microblogs data?
(b)SenticConceptsTfidfVectorizer has been defined in sentiments module of features package. But it is not there in the repository.
Senticnet features were extracted during pre-processing step. How did you create SenticConceptsTfidfVectorizer ?

@cryptexcode
Copy link
Owner

a) We have used all the things with two versions. One for microblogs and another for the headlines. So one code is used to generate processed data for both dataset.
b) We finally didn't use it as we created the concept vectors during the preprocessing step. But if you want to use that, you can use the simple tf-idf vectorizer, as it will be modeled as bag of concepts.

Hope it helps. All the best.

@JayaLekhrajani
Copy link
Author

JayaLekhrajani commented Nov 4, 2017 via email

@leckie-chn
Copy link

Hi Sudipta,

You mentioned that 'About the features, lexical, embeddings, and sentiment features are relevant.' But I can only see embeddings.py & lexical.py under the features directory. Is there a sentiments.py or I can just drop all missing modules?

Thanks.

@cryptexcode
Copy link
Owner

The sentiment features were extracted code in the preprocessing. The code was done in hurry, so not exactly structured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants