Unable to find files #1

JayaLekhrajani · 2017-10-31T03:55:16Z

Hi Sudipta,

I am unable to find DATA_FILES_LIST, ORIGINAL_DATA_DIR, RAW_DATA_PATH in config.py file.
Also, what is mb_train_trial_test_new_prs.csv file for? The training and test data is in json format.

JayaLekhrajani · 2017-10-31T18:28:23Z

Also, you have mentioned "from features import lexical, syntactic, writing_density,
sentiments, embeddings, generic_field_vectorizer" in extract_features.py, but I could not find all the files in features folder. Let me know if I am missing out on something.

cryptexcode · 2017-10-31T18:41:25Z

This source code is a part of a large project structure. The provided codes are basically for showing the deep learning architectures. This codebase "is not" a running system.
The config.py is used to keep track of filepaths and other feature combinations in the project. So you can easily ignore the irrelevant things. You have to put your own files and change the paths.
About the features, lexical, embeddings, and sentiment features are relevant. So just ignore the rest. Senticnet features are extracted during the preprocessing step and that code is in prepare_data.

For the project, the raw data files from the organizers were preprocessed first. Then the experiments were run. So there were several preprocessed files. You have to code to generate them. Not the actual files.
Thanks

JayaLekhrajani · 2017-10-31T19:00:01Z

Thank You Sudipta for clarifying most of my doubts. I am still little confused about how you got the following files:
headline_train_trial_test.csv
mb_train_trial_test_new_prs.csv

For the first one, did you get csv file after merging train, trial and test json files?

cryptexcode · 2017-10-31T19:01:21Z

Exactly. All the data were merged into a single csv for easy manipulation.

JayaLekhrajani · 2017-11-01T21:47:27Z

Hi Sudipta,
Sorry, to bother you again. But while merging the three json files, what values did you use for sentiment score of test data?

cryptexcode · 2017-11-01T21:49:39Z

Hi, if you go through the paper you will get the idea about the process. We used senticnet.

JayaLekhrajani · 2017-11-02T17:50:51Z

Hi Sudipta,
I read your paper and it has clarified most of my doubts. The only doubts that I still have are:
(a)doc_to_sequence_csv module has not been used for microblogs data?
(b)SenticConceptsTfidfVectorizer has been defined in sentiments module of features package. But it is not there in the repository.
Senticnet features were extracted during pre-processing step. How did you create SenticConceptsTfidfVectorizer ?

cryptexcode · 2017-11-03T23:36:02Z

a) We have used all the things with two versions. One for microblogs and another for the headlines. So one code is used to generate processed data for both dataset.
b) We finally didn't use it as we created the concept vectors during the preprocessing step. But if you want to use that, you can use the simple tf-idf vectorizer, as it will be modeled as bag of concepts.

Hope it helps. All the best.

JayaLekhrajani · 2017-11-04T22:55:34Z

Hi Sudipta, Thank You for clarifying my doubts. You have been really very helpful. There is still one doubt that I was trying to fix on my own, but I couldn't. When you run the model and invoke the function: pack_data_to_format(), I get the following error, and I am unable to find out a fix. [image: Inline image 2] <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon> Virus-free. www.avast.com <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

…

On Fri, Nov 3, 2017 at 7:36 PM, Sudipta Kar ***@***.***> wrote: a) We have used all the things with two versions. One for microblogs and another for the headlines. So one code is used to generate processed data for both dataset. b) We finally didn't use it as we created the concept vectors during the preprocessing step. But if you want to use that, you can use the simple tf-idf vectorizer, as it will be modeled as bag of concepts. Hope it helps. All the best. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ATPk0Hb5Gttp2muAxyWQZuZ2CRiitDfEks5sy6NigaJpZM4QMLB2> .

leckie-chn · 2017-12-02T05:14:25Z

Hi Sudipta,

You mentioned that 'About the features, lexical, embeddings, and sentiment features are relevant.' But I can only see embeddings.py & lexical.py under the features directory. Is there a sentiments.py or I can just drop all missing modules?

Thanks.

cryptexcode · 2017-12-31T01:33:09Z

The sentiment features were extracted code in the preprocessing. The code was done in hurry, so not exactly structured.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to find files #1

Unable to find files #1

JayaLekhrajani commented Oct 31, 2017 •

edited

Loading

JayaLekhrajani commented Oct 31, 2017

cryptexcode commented Oct 31, 2017

JayaLekhrajani commented Oct 31, 2017

cryptexcode commented Oct 31, 2017 •

edited

Loading

JayaLekhrajani commented Nov 1, 2017

cryptexcode commented Nov 1, 2017

JayaLekhrajani commented Nov 2, 2017

cryptexcode commented Nov 3, 2017

JayaLekhrajani commented Nov 4, 2017 via email

leckie-chn commented Dec 2, 2017

cryptexcode commented Dec 31, 2017

Unable to find files #1

Unable to find files #1

Comments

JayaLekhrajani commented Oct 31, 2017 • edited Loading

JayaLekhrajani commented Oct 31, 2017

cryptexcode commented Oct 31, 2017

JayaLekhrajani commented Oct 31, 2017

cryptexcode commented Oct 31, 2017 • edited Loading

JayaLekhrajani commented Nov 1, 2017

cryptexcode commented Nov 1, 2017

JayaLekhrajani commented Nov 2, 2017

cryptexcode commented Nov 3, 2017

JayaLekhrajani commented Nov 4, 2017 via email

leckie-chn commented Dec 2, 2017

cryptexcode commented Dec 31, 2017

JayaLekhrajani commented Oct 31, 2017 •

edited

Loading

cryptexcode commented Oct 31, 2017 •

edited

Loading