About this repo:
Welcome to the Text-Analysis repository for Cyberinfrastructure for Digital Humanities.
Already familiar with Python and don't really need any help? Then go ahead and dive right in with our introductory Python word frequency scripts. Once in the WordFrequencies folder choose the algorithm/output you want (ngrams, streamgraphs, wordClouds, mostFrequentWords) then choose the scripts folder. They have minimal directions and are ready to go.
For those who are new to Python and need a little assistance, please see our introductory Jupyter Notebooks in the WordFrequencies folder. As with the scripts, first choose the algorithm/output you want then choose the notebooks folder. These notebooks go into more detail about the code and what it does with sample output.
We have notebooks and scripts for:
- text preparation
- wordclouds (plain text and twitter)
- ngram wordclouds (plain text and twitter)
- top ten words (plain text and twitter)
- streamgraphs (plain text and twitter)
More Advanced Topics
Sentiment Analysis
Jupyter Notebooks for sentiment analysis using Twitter adapted from VADER (1).
(1). Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
Topic Modeling
Jupyter Notebooks using