This repository contains the code that I used to prepare the raw data and generate the figures for the MixDir paper.
In the src
folder are 8 R markdown notebooks:
- Comparison on simulated data (html preview)
- Performance on data from a mixed membership model (html preview)
- Code of the data wrangling of the Young Lives data (html preview)
- Clustering of the Young Lives data (html preview)
- Analysis of the OSMI Mental Health in Tech Survey 2016 (html preview)
- Code of the data wrangling of the National Cancer Experience Survey (html preview)
- Analysis of the National Cancer Experience Survey (NCPES) (html preview)
- A more detailed analysis of the difference between Dirichlet and Dirichlet Process prior (html preview)
The NCPES and the Young Lives data were downloaded from the UK data service portal and are accessible under 8163 and 7483. The Mental Health in Tech data was downloaded from kaggle and is licenced under the Create Commons ShareAlike 4.0 Licence.