Beautiful visualizations of how language differs among document types.
-
Updated
Mar 6, 2024 - Python
Beautiful visualizations of how language differs among document types.
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
Notebooks for the Seattle PyData 2017 talk on Scattertext
Interpretable data visualizations for understanding how texts differ at the word level
Text analysis with networks.
2018 Computational Text Analysis Notebooks, University of Mannheim
This is a designed package for replicating the estimates and findings in the article of Factionalism and the Red Guards under Mao's China: Ideal Point Estimation Using Text Data.
A tool for Semantic Scaling of Political Text (branch of Topfish, a suite of tools for Political Text Analysis)
Summer 2017 Social Media Analytics Workshop Series
'dictvectoR' measures the similarity between a concept dictionary and documents, using fastText word vectors. Implements the "Distributed-Dictionary-Representation" (Garten et al. 2018) method in R.
LinkOrgs: An R package for linking linking records on organizations using half a billion open-collaborated records from LinkedIn
An Automation Webcrawler for Extracting Central Bankers' Speeches
The ABC of Computational Text Analysis. BA Seminar, Spring 2022, University of Lucerne
A small showcase for topic modeling with the tmtoolkit Python package. I use a corpus of articles from the German online news website Spiegel Online (SPON) to create a topic model for before and during the COVID-19 pandemic.
Original corpus of articles relating to refugees scraped from Tennessee newspaper The Chattanoogan along with simple code for text-as-data word cloud.
Empirical framework applied to parliament discourses and Twitter data, with a Discourse Polarization Index.
Material from my Machine Learning for the Social Sciences course
Collection of text corpora for publicly available speeches from Mexican president Andres Manuel Lopez Obrador (AMLO) sourced from YouTube. The dataset includes his daily morning conferences (conferencias mañaneras) 😴🪿
Add a description, image, and links to the text-as-data topic page so that developers can more easily learn about it.
To associate your repository with the text-as-data topic, visit your repo's landing page and select "manage topics."