Dynamic Topic Model for Cognitive Science
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
R
data_processing
data_retrieval
doc
docs
dtm_input_data
figures
gensim
lab_paper
modeling_scripts
output
pdf_extract_text
raw_pdf
reference_papers
result
specific_papers
text_data
topic_graphs
.gitignore
1979_ConferenceProgram.pdf
LICENSE
README.md
Results.ipynb
Rise_of_probability_topic.ipynb
analysis_of_a_lab.ipynb
create_topic_topword_graphs.ipynb
doc_topic_correlation_matrix.ipynb
doc_word_freq.ipynb
dtm_c_notebook.ipynb
generate_dtm_input.py
graph_model_comparison.py
highest_freq.txt
indiv_articles.ipynb
maxmin_entropy.txt
notes.md
pickle_to_csv.ipynb
run_all.s
run_chain_vars.s
run_prevlda_year.s
run_upto_year.s
session_names.txt
settings.txt
term_change_withintopics.ipynb
test.py
topic_entropy.ipynb
topic_relations.ipynb
topicnames.p

README.md

Topics and Trends in Cognitive Science

This project aims to uncover the trends and topics of the annual Cognitive Science Society conference using dynamic topic modeling.

Results

Paper: Topics and Trends in Cognitive Science (2000-2017) - to be published in the Proceedings of the 40th Annual Conference of the Cognitive Science Society.

Website: Find similar papers!

Workflow

1. Obtaining CogSci papers

Download PDFs from Archives of the Cognitive Science Society Conference Proceedings

Copy PDFs into text_data_new/

2. Preprocessing

Process PDFs:

generate_dtm_input.py
- input:  PDFs in text_data_new/volume_{}/
- output: dtm_input_data/dtm_input-mult.dat
          dtm_input_data/dtm_input-seq.dat

3. Modeling

Use our DTM version: alexanderrich/dtm

Talk to your local High Performance Cluster correspondent how to set everything up.

Run the script run_all.s

4. Postprocessing

@Alex: How did you create dtm_processed_output.p?

Exporting model output into csv tables:

pickle_to_csv.ipynb
- input:  output/dtm_processed_output.p
- output: output/csv/year_doc_topic.csv
          output/csv/topicnames.csv
          output/csv/year_topic_word.csv

Exporting original data into csv tables:

doc_word_freq.ipynb
- input:  dtm_input_data/dtm_input-mult.dat
- output: output/csv/doc_word_freq.csv

5. Analysis & Figures

See R scripts in the folder R. The R scripts save all figures into the folder figures.

Due to the exploratory nature of this project there are several scripts and figures that did not make their way into the paper.