-
Notifications
You must be signed in to change notification settings - Fork 12
How do I know how many iterations to run? #3
Comments
Ok, I've done some more digging on this myself, and it seems that if the difference between my low and high values is 2 to 3 orders of magnitude, then 5 to 10 iterations looks like it creates the desired effect. But 4 orders of magnitude requires the max 99 iterations (which takes forever), and 5+ orders of magnitude renders the plugin useless, as the distortion effect isn't noticeable even at 99 iterations. I know that I could massage my data in order to artificially limit the differences in value, and I know that you can't really depict 5 order of magnitude differences visually anyway (there aren't enough pixels on the screen), but it adds an additional data manipulation step that I was hoping to avoid. Am I doing something wrong? |
@bjquinn For visualization purposes a fewer number of iterations is entirely reasonable (generally speaking). You are right that the range in your data will dictate how many iterations to do, but realistically, you don't need that many to emphasize differences... and if your values are significantly different from the relative areas, then the algorithm will get close to equilibrium pretty quickly. For reference, I rarely use more than 10 iterations. |
Thanks for the response! However, if I have data that ranges from 1 to 25,000 (for example), 10 iterations creates a map that's nearly indistinguishable from the original. 99 iterations (which takes forever) looks good. But if my range is larger (say, 1 to 250,000), then even 99 iterations creates a map indistinguishable from the original. Perhaps I'm doing something wrong? I know I could artificially massage the data to narrow the range (since who could see a 250,000:1 ratio on a map anyway?), but I was hoping not to add an additional step of complexity. In my case I'd be having to train end users on data manipulation and the definition of "orders of magnitude" and I don't see that ending well. :) |
Ah ok, that is a huge range... hmm, I'm not sure what you can do about this actually... with such a large range, there really isn't much you can do... log-scale the data? |
Right, I was thinking about that, but in my case the place to log-scale it would be WITHIN the plugin. I don't have a lot of control over the data before it hits QGIS. Thought about digging into the code and seeing where the data was loaded and detecting if the range was >3 orders of magnitude and scaling it if it was. Or perhaps I could add a checkbox to enable log-scaling or a dropdown for the maximum order of magnitude range or something. Do you have a suggestion as to where I'd start? Would that be in doCartogram.py? |
Also thought about attempting to adjust the number of iterations to the data, but I've already seen that that doesn't scale well. 3 orders of magnitude = 5 to 10 iterations, 4 orders of magnitude = 99 iterations, 5 orders of magnitude = whoops, I broke it. |
We could add more iterations, but I'm thinking much more than 99 is a lot! |
Actually, I was thinking more along the lines of adding a checkbox to optionally scale down the data to span no more than 3 orders of magnitude. |
Do you have a suggestion as to where I'd start if I were to overwrite and re-scale the data that was passed into the plugin? If you could point me at where that (array?) gets set, I could add some logic to manipulate the data right there. |
I modified the getInfo function in doCartogram.py. I added two small sections. Right below dTotalValue = 0.00, I added the following lines of code --
Under lfeat.dValue = atMap[index].toInt()[0] I added one more line of code -- if lfeat.dValue < minval: lfeat.dValue = minval This has the effect of limiting the minimum value's distance from the maximum value. Since the larger values will be the ones visible on the map anyway, we find a reasonable minimum allowed value, which I set as maxval/1000. Then, any value under maxval/1000 simply gets set to maxval/1000. I thought about log-scaling, but that hides large differences between large numbers. That's not what you would want on a cartogram. What I'm usually dealing with is a few stray low values (some 1's, 10's, etc.) mixed in with the normal values (5000, 50000). The downside is we mask the differences between values like 1, 10, and 100 when your max value is 100,000. But could you have seen the difference between 1, 10, and 100 anyway when the max value is 100,000? They would all be basically reduced to a few pixels anyway. |
Also, for reference, some data -- especially data with only moderate variations between the larger values (say, between 15,000 and 25,000) but only a few smaller values (some 100's, some 2,000's), the areas with small values STILL seem to not get distorted anywhere near what one would expect. This is even though on other sets of data the "hack" shown above works beautifully. Obviously the code shown above would limit the low value to 25 assuming a max value of 25,000. Compared to 25,000, you'd expect 25 to show up as basically a couple of pixels, but it doesn't. I suppose it would if you ran enough iterations. Anyway, I'm just playing around with numbers here, but the best balance I've found so far is to limit the minval to maxval / 250 and run 15 iterations. |
Good to see this is working for you now, however, I don't I'll be adding these adjustments to the plugin code. Ultimately, doing clever things like this is likely to confuse users who aren't expecting it more than not doing it... It might be worth adding a note of some kind to the documentation, but for now, I think I'll leave it out. |
That's fine. I will mention that a large percentage of the data I've run across and tried to use as a basis for a cartogram has exhibited this behavior, so it might be worth implementing a user-friendly version of this at some point. However, the modifications I made are in fact working for me, so I'm fine merging them in to newer releases myself as necessary. |
Thanks for your help on this! |
@bjquinn If you'd like to submit a pull request for an update that offers these adjustments as options (via GUI and code), I would be happy to consider it :-) |
What I'll probably do is wait until the fixes for the errors I'm having when using 2.4 are merged, then I'll merge my changes with that code, test it again, and submit a pull request. Right now I'm still using 1.8. Thanks! |
If I'm looking for an "accurate" result, where the polygons get reasonably correctly sized relative to the data column being used, how do I know how many iterations to run? I've already seen that it looks like I need to run more iterations in order to get a reasonably-looking distorted result when the low to high range of the data is separated by several orders of magnitude. But what I don't know is how to determine how many iterations I "should" run in order for my result to be mathematically accurate. Does running too many iterations endlessly exaggerate differences between polygons? Is there a way to know what the correct number of iterations I should run is?
The text was updated successfully, but these errors were encountered: