Vocation Map: Linking Social Media-Predicted Personality Traits and Occupations
This repository contains code and data accompanying the publication “Social Media-Predicted Personality Traits Can Help Match People to their Ideal Jobs” [Kern et al, PNAS’20].
Reference:
[Kern et al, PNAS'20] Kern, Margaret L., Paul X. McCarthy, Deepanjan Chakrabarty, and Marian-Andrei Rizoiu.
2019. “Social Media-Predicted Personality Traits and Values Can Help Match People to Their Ideal Jobs.”
Proceedings of the National Academy of Sciences 116(52):26459–64. https://doi.org/10.1073/pnas.1917942116
Repository content:
This repository contains the following code scripts:
scripts/build-professions-profiles.R– R script that starts fromdata/all_users_data.csv.xz, and builds the profession profiles (stored in the filedata/profession-profiles.csv);scripts/vocation-map.R– R script that starts fromdata/profession-profiles.csvto build the Vocation Map starting from occupation psychological profiles;scripts/predict-profession-python.ipynb– Python Jupyter notebook to build predictors for forecasting user occupation.scripts/prediction-step1-run-prediction.sh– Bash script to transform the Jupyter notebook to apyfile and run it without a graphical interface;scripts/prediction-step2-read-prediction-models-from-Python.R– R script (using thereticulatepackage, which reads Python data sctructures produced byscripts/predict-profession-python.ipynband builds R structures);scripts/prediction-step3-plot-prediction.R– R script that plots the prediction performance indicators (Precision, Recall, F1 score and Accuracy);scripts/construct-confusion-matrix.R– R script that loads the prediction results (seedata/prediction-results) and builds the confusion matrix.scripts/utils.R– additional functions for reading, writing data and plotting.
The following data and plots is also available:
data/profession-profiles.csv– contains the psychological profiles for each occupation in the Vocation Map.data/prediction-results/*– CSV files containing the model prediction for each classifier, and each fold.plots/vocation-map-static.pdf– a static version of the Vocation Map.plots/vocation-map-interactive.html– the interactive version of the Vocation Map. Also available at http://bit.ly/vocation-map-interactive .plots/confusion-heatmap-dendogram.pdf– the confusion map for the XGBoost classifier (based ondata/prediction-results/*)
Additional data file:
The following files cannot be publicly shared due to the Twitter’s and IBM Watson’s Terms of Service. However, these files could be provided privately upon request, requests are evaluated at a case-by-case basis.
data/all_users_data.csv.xz– contains the psychological profiles of all 128,278 users in our study;data/10_professions_data.csv.xz– contains the Big5 and the personal values (10 features) for 38,073 users in the occupation prediction part of the paper;data/10_professions_data_big5.csv.xz– contains solely the Big5 traits for the 38,073 users above (useful for the ablation study);data/10_professions_data_values.csv.xz– contains solely the personal values for the 38,073 users above (useful for the ablation study).
License
Both data set and code are distributed under the General Public License v3 (GPLv3) license, a copy of which is included in this repository, in the LICENSE file. If you require a different license and for other questions, please contact us at Marian-Andrei@rizoiu.eu

