Log-Entropy-Weighted-PCA

Matrix normalization and weighting is something that distant readers do all the time. The question I want to raise is whether we can use different weightings strategically, in order to capture valuable features of a textual corpus? The information retrieval technique Latent Semantic Analyisis (LSA) has an extensive literature devoted to alternate weight schemes and their impacts on different tasks. How might we use a semantic model like LSA in existing distant reading practices? The similarity of LSA to a common technique (i.e. PCA) for pattern finding and featurization in distant reading suggests that we can profitably apply its weight schemes to work that we are already doing.

This repository contains a Jupyter Notebook with code and output that supports the blog post "A Naive Empirical Post about DTM Weighting"

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Log-Entropy_Weighted_PCA.ipynb		Log-Entropy_Weighted_PCA.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log-Entropy_Weighted_PCA.ipynb

Log-Entropy_Weighted_PCA.ipynb

README.md

README.md

Repository files navigation

Log-Entropy-Weighted-PCA

About

Releases

Packages

Languages

teddyroland/Log-Entropy-Weighted-PCA

Folders and files

Latest commit

History

Log-Entropy_Weighted_PCA.ipynb

Log-Entropy_Weighted_PCA.ipynb

README.md

README.md

Repository files navigation

Log-Entropy-Weighted-PCA

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages