-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scatterplot matrix #286
Comments
This is something for which there should be a nice shortcut (like there is for plotting functions, etc). Here's a goofy way can fake in the mean time: using Gadfly, Compose, DataFrames
df = DataFrame(x=rand(100), y=rand(100), z=rand(100))
n = size(df, 2)
cs = Array(Canvas, (n, n))
for i in 1:n
for j in 1:n
cs[i, j] = render(plot(x=df[i], y=df[j]))
end
end
draw(png("plotmatrix.png", 8inch, 8inch), gridstack(cs)) |
How would I render it in a IJulia notebook? |
Just removing the filename from a call to draw(PNG(8inch, 8inch), gridstack(cs)) |
Nice. Even work with D3 for some zooming and panning action. |
It's possible to make a nicer version of this now with |
I've been making used of a hand rolled function, but it is not very elegant or flexible. Lots of hard coded values and such. using Gadfly, Compose, DataFrames
x=randn(100)
df = DataFrame(firstvar=x, secondvar=x.+randn(100).+100, thirdvar=x.-randn(100), fourthvar=randn(100))
function scatterplot_matrix(df::AbstractDataFrame, cols::Vector{Symbol})
n = length(cols)
cs = Array(Context, (n, n))
for (i, coli) in enumerate(cols), (j, colj) in enumerate(cols)
if i==j
cs[i, j] = compose(context(), text(0.5, 0.5, string(coli), hcenter, vcenter), linewidth(0.01mm),
stroke(color("black")), fontsize(5))
else
cs[i, j] = render(plot(x=df[coli], y=df[colj], Guide.xlabel(nothing), Guide.ylabel(nothing),
Guide.yticks(label=false), Guide.xticks(label=false)))
end
end
draw(SVGJS((5*n)cm, (5*n)cm), gridstack(cs))
end
scatterplot_matrix(df, [:firstvar, :secondvar, :thirdvar, :fourthvar]) |
Hi, I just want to follow up on this question and ask whether there was some progress in making available a shortcut to this type of plot? Also I would like to suggest a slightly different representation. Since the information at the upper and lower triangle are basically redundant one could replace one triangle with the correlation coefficients between the two variables i,j. Also, since the outcome on the diagonal is quite obvious one could replace them with the histograms of the variable in question. I've seen this type of visualization in¹ and also used it myself. An example prepared with matlab2tikz is attached. Here I had two clusters. The (default) behavior of a generic scatterplot matrix would of course show only one color and a joined histogram. ¹Hair, Joseph F. "Multivariate data analysis." (2010). |
See corrplot in MLPlots/Plots: https://github.com/JuliaML/MLPlots.jl You can use Gadfly as a backend if you want.
|
Thanks for this, @tbreloff. Looks good. Still, something like Geom.scattermatrix would be great. |
I needed this as part of my research and decided to write a fairly reusable function, to reproduce what @e-neu described, so I'm sharing it in case someone else needs something similar. It's probably not symmetrical enough that you could use in a publication due to the way Gadfly handles stacked plots with/out axis information but it is good enough for inspection. function scatterMatrix(olddf, colorido=[], legenda=true)
df = olddf[complete_cases(olddf),:]
n = size(df, 2)
nomes = names(df)
if colorido!=[]
n = n-1
splice!(nomes,findfirst(x -> string(x) == colorido, nomes))
end
M = Array(Compose.Context, (n,n))
for (i,indexi) in zip(nomes,1:length(nomes))
nowcor = false
for (j,indexj) in zip(nomes,1:length(nomes))
gdplot = Geom.point
xTickMarks=yTickMarks=false
xName=yName=""
kps=:none
if j == nomes[1]
yName=string(i)
yTickMarks=true
end
if i == nomes[end]
xName=string(j)
xTickMarks=true
end
if nowcor#Cor Info
index1 = complete_cases(df[:,[i,j]])
text0 = "Corr: "*string(trunc(cor(df[index1,i],df[index1,j]),4))
M[indexi,indexj] = compose(context(),
(context(), text(0.5, 0.5, text0, hcenter)),
(context(0.1w, 0.1h, 0.8w, 0.8h), rectangle(), fill("white"), stroke("black")))#))
elseif i == j #histogram
gdplot = Geom.histogram
if legenda
kps=:right
end
M[indexi,indexj] = render(plot(df, x=string(j), y=string(i), color=colorido, gdplot(maxbincount=20),
Guide.xlabel(xName), Guide.ylabel(yName), Guide.xticks(label=xTickMarks), Guide.yticks(label=yTickMarks),
Theme(grid_line_width=1pt, panel_stroke=colorant"black", key_position=kps)))
nowcor=true
else #scatterplots
M[indexi,indexj] = render(plot(df, x=string(j), y=string(i), color=colorido, gdplot,
Guide.xlabel(xName), Guide.ylabel(yName), Guide.xticks(label=xTickMarks), Guide.yticks(label=yTickMarks),
Theme(panel_stroke=colorant"black", key_position=kps)))
end
end
end
return gridstack(M)
end The first parameter is the Dataframe in question, the second is the name of the column to be used for the color information and the third is a boolean indicating whether or not there should be key information on the plots. The function assumes all the values on the Dataframe are numeric for the calculation of the correlation values (with the exception of the column used for color information, which does not appear on the plots). Here is an example of how to use it: using RDatasets
iris = dataset("datasets", "iris")
scmIris = scatterMatrix(iris, "Species", false) |
Here is an attempt at a (near) publication-quality scatterplot matrix for Gadfly. You can find this on the branch scatplotmat at http://github.com/Mattriks/Gadfly.jl. I'll do a PR soon. The function Note that the order of layers seems important, and Plotting histograms along the diagonal is problematic (try it by uncommenting the using Colors, DataFrames, RDatasets
iris = dataset("datasets","iris");
Dbase, D, Dcor, Dhist = scatterplotmatdata(iris, colorid=:Species)
theme(x) = Theme(default_color=parse(Colorant, x))
p = plot(
Dbase, xgroup=:g1, ygroup=:g2,
Geom.subplot_grid(
layer(x=:x,y=:y, Geom.point, theme("transparent")),
# layer(Dhist, xgroup=:g1, ygroup=:g2, x=:x, theme("gray"), Geom.histogram(density=true)),
layer(Dcor, xgroup=:g1, ygroup=:g2, x=:x,y=:y, label=:label, Geom.label(position=:centered)),
layer(D, xgroup=:g1, ygroup=:g2, x=:x, y=:y, Geom.point, color=:col),
free_y_axis=true,
free_x_axis=true
),
Guide.title("Iris Data"),
Guide.colorkey("Species")
)
|
Bit of a bump, but does Gadfly natively support scatter matrices yet? |
Would it be possible for Gadfly to support creating scatterplot matrices? Like http://home.centurytel.net/~mjm/NScatterplotMatrix.gif
The text was updated successfully, but these errors were encountered: