Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to Compress Graphs for pgf-backend #5983

Closed
overdetermined opened this issue Feb 9, 2016 · 7 comments
Closed

Option to Compress Graphs for pgf-backend #5983

overdetermined opened this issue Feb 9, 2016 · 7 comments
Milestone

Comments

@overdetermined
Copy link
Contributor

Working with the awesome .pgf backend, I hit a limit as to how pretty I can make the graphs look:

TeX capacity exceeded, sorry [main memory size={large number here}]

The reason is obvious, when i produce a scatter plot with tens of thousands of objects, it will inevitably produce very large *.pgf documents. Now, I may have overlooked some features and am not an expert on this, but in some cases it may be beneficial if there was an option to save the graph itself as a png and let latex only do the annotations (afterall, its the beautiful typesetting we are after).

As an example my current workaround ("hack") is posted below. Obviously the quality of the graphs is reduced, and the pdf in my example has a bigger filesize, but i have produced *.pgfs of several mb which reduced to some kb in pdf (if they can be compiled at all). In my example the *.pgf size reduces by a third.
I would try to implement this myself in matplotlib, but am not sure where to start.

Is this a good feature, or are there better ways to achieve what I want?

import numpy as np
import matplotlib as mpl
import matplotlib.image as mpimg

mpl.rc('pdf', fonttype=42)

pgf_with_latex = {                      # setup matplotlib to use latex for output
    "pgf.texsystem": "pdflatex",        # change this if using xetex or lautex
    "text.usetex": True,                # use LaTeX to write all text
    "font.family": "serif",
    }
mpl.rcParams.update(pgf_with_latex)

import matplotlib.pyplot as plt

def newfig(width):
    plt.clf()
    fig = plt.figure()#figsize=figsize(width))
    ax = fig.add_subplot(111)
    #ax2 = fig.add_subplot(122)
    return fig, ax

def savefig(filename):
    plt.savefig('{}.pgf'.format(filename))
    plt.savefig('{}.pdf'.format(filename))

np.random.seed(42)
randomData = np.random.rand(50,4)
fig, ax = newfig(0.9)
tmp =  ax.scatter(x=randomData[:,0],
                y=randomData[:,1]*10,
                s=randomData[:,2]*1000,
                c=randomData[:,3])

#remember the figure size
frame_ax = ax.get_window_extent().get_points()
frame_fig = fig.get_window_extent().get_points()

#keep the axis labels
ylim = ax.get_ylim()
xlim = ax.get_xlim()
extent = [xlim[0],xlim[1],ylim[0],ylim[1]]
#ax.annotate('help',xy=[0.5,0.5],xytext=[0.4,0.4]) #annotating has to be done after
savefig('large')

ax.axis('off')
plt.savefig('empty.png')
ax.axis('on')

tmp.remove() #remove the graphs
img = mpimg.imread('empty.png')
#print(img.shape)

#get only the figure part
x1 = (int(round(frame_ax[0][0]/frame_fig[1][0]*img.shape[1])))
x2 = (int(round(frame_ax[1][0]/frame_fig[1][0]*img.shape[1])))
y1 = (int(round(frame_ax[0][1]/frame_fig[1][1]*img.shape[0])))
y2 = (int(round(frame_ax[1][1]/frame_fig[1][1]*img.shape[0])))
img2 = img[y1:y2,x1:x2] #crop image at figure

ax.imshow(img2,extent=extent, aspect='auto')#,extent=extent,aspect='auto')

savefig('small')

plt.show()

Notes:
In this code I plot, save an image of just the plot without the axis as 'empty.png';
The problem is that this has the size of the figure and not the graph itself, so i have to crop the image.
This cropped image is then included using .imshow.
I looked in the source and knew that the pgf backend has a draw_image function, but got lost trying to figure out how it determines whether the plotted stuff is an image or not.
I may have confused my x's and y's.

@jenshnielsen
Copy link
Member

I didn't have time to read you example in detail but I think set_rasterized(True) may do what you want.

Something like

import matplotlib.pyplot as plt
import numpy as np
a = np.random.rand(10000)
b = np.random.rand(10000)
c = plt.scatter(a,b)
c.set_rasterized(True)

Should ensure that the scatter is rendered as a bitmap.

Hope that is helpful.

@overdetermined
Copy link
Contributor Author

That is exactly what I want. Thank you!
I spent almost as much time looking for a solution, as I did trying to hack a workaround, but I guess I learned something. Thank you for the great support and very quick reply indeed.

@jenshnielsen
Copy link
Member

Great. I think we need to document that better. I will rename this issue to reflect that if ok with you?

@overdetermined
Copy link
Contributor Author

Sure. I was looking for set_rasterized in the docs after you mentioned it, and it is somewhat hard to find I guess.

@jenshnielsen
Copy link
Member

I think the section about PGF should explain this. We actually have a test of this in the pgf backend https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/tests/test_backend_pgf.py#L160 but I don't think the docs here http://matplotlib.org/users/pgf.html mention it.

@tacaswell tacaswell added this to the 1.5.2 (Critical bug fix release) milestone Feb 14, 2016
@jenshnielsen
Copy link
Member

Close now that the docs update has been merged

@jenshnielsen
Copy link
Member

Thanks for the work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants