Visualize correlations between a given dependent variable and explanatory variables, as well as the intercorrelations between the explanatory variables, as a solar map.
The relationships between the dependent variable (the "Sun") and the explanatory variables (the "planets") are depicted as a solar system, where planets orbit around the Sun. The closer a planet is to the Sun, the stronger is their relationship, as indicated by a higher Pearson correlation coefficient.
Furthermore, some of these planets have their own moons. These moons represent explanatory variables that are closely related to the planet, with a correlation coefficient score over 0.8.
You can also regard the planets as the primary predictors (or main parameters) of the dependent variable and the moons as the colinear paramaters to the main parameters.
This work builds on Stefan Zapf and Christopher Kraushaar's 2017 work but differs in a few key ways. This work uses fewer colors in the graph, can display more explanatory variables through numerical representation, and tries to position markers to avoid overlap or excessive closeness.
Note: any correlations having absolute values between 0.0 and 0.1 are not plotted. This is create more space in the plot for variables that have higher and presumably more important association with the dependent variable.
Clone the repo by
https://github.com/cbsteh/SolarCorrMap.git
The three important Julia source files are in the src
folder. They are: correlations.jl
, drawmap.jl
, and SolarCorrMap.jl
. The fourth file main.jl
is an example file (see below).
Call the viz
function to read the CSV
data file, and plot the correlations as a solar map (see main.jl
).
using SolarCorrMap
viz("data/housing.csv", :medv)
where housing.csv
is a CSV
file (in this case, the Boston Housing data), and :medv
is the dependent variable in the provided CSV
file.
The plot result is:
where negative correlations are denoted in red, else black for positive correlations. The legend on the left indicates the level of significance between the explanatory variables and the dependent variable, where *
p<0.05, **
p<0.01, and ns
p>0.05.
O'Reily article. This article also explains the above example plot.