Skip to content
This repository has been archived by the owner on May 8, 2021. It is now read-only.

How do I know how many iterations to run? #3

Closed
bjquinn opened this issue Jul 9, 2014 · 16 comments
Closed

How do I know how many iterations to run? #3

bjquinn opened this issue Jul 9, 2014 · 16 comments

Comments

@bjquinn
Copy link

bjquinn commented Jul 9, 2014

If I'm looking for an "accurate" result, where the polygons get reasonably correctly sized relative to the data column being used, how do I know how many iterations to run? I've already seen that it looks like I need to run more iterations in order to get a reasonably-looking distorted result when the low to high range of the data is separated by several orders of magnitude. But what I don't know is how to determine how many iterations I "should" run in order for my result to be mathematically accurate. Does running too many iterations endlessly exaggerate differences between polygons? Is there a way to know what the correct number of iterations I should run is?

@bjquinn
Copy link
Author

bjquinn commented Jul 9, 2014

Ok, I've done some more digging on this myself, and it seems that if the difference between my low and high values is 2 to 3 orders of magnitude, then 5 to 10 iterations looks like it creates the desired effect. But 4 orders of magnitude requires the max 99 iterations (which takes forever), and 5+ orders of magnitude renders the plugin useless, as the distortion effect isn't noticeable even at 99 iterations. I know that I could massage my data in order to artificially limit the differences in value, and I know that you can't really depict 5 order of magnitude differences visually anyway (there aren't enough pixels on the screen), but it adds an additional data manipulation step that I was hoping to avoid. Am I doing something wrong?

@carsonfarmer
Copy link
Owner

@bjquinn For visualization purposes a fewer number of iterations is entirely reasonable (generally speaking). You are right that the range in your data will dictate how many iterations to do, but realistically, you don't need that many to emphasize differences... and if your values are significantly different from the relative areas, then the algorithm will get close to equilibrium pretty quickly. For reference, I rarely use more than 10 iterations.

@bjquinn
Copy link
Author

bjquinn commented Jul 10, 2014

Thanks for the response! However, if I have data that ranges from 1 to 25,000 (for example), 10 iterations creates a map that's nearly indistinguishable from the original. 99 iterations (which takes forever) looks good. But if my range is larger (say, 1 to 250,000), then even 99 iterations creates a map indistinguishable from the original. Perhaps I'm doing something wrong? I know I could artificially massage the data to narrow the range (since who could see a 250,000:1 ratio on a map anyway?), but I was hoping not to add an additional step of complexity. In my case I'd be having to train end users on data manipulation and the definition of "orders of magnitude" and I don't see that ending well. :)

@carsonfarmer
Copy link
Owner

Ah ok, that is a huge range... hmm, I'm not sure what you can do about this actually... with such a large range, there really isn't much you can do... log-scale the data?

@bjquinn
Copy link
Author

bjquinn commented Jul 10, 2014

Right, I was thinking about that, but in my case the place to log-scale it would be WITHIN the plugin. I don't have a lot of control over the data before it hits QGIS. Thought about digging into the code and seeing where the data was loaded and detecting if the range was >3 orders of magnitude and scaling it if it was. Or perhaps I could add a checkbox to enable log-scaling or a dropdown for the maximum order of magnitude range or something. Do you have a suggestion as to where I'd start? Would that be in doCartogram.py?

@bjquinn
Copy link
Author

bjquinn commented Jul 10, 2014

Also thought about attempting to adjust the number of iterations to the data, but I've already seen that that doesn't scale well. 3 orders of magnitude = 5 to 10 iterations, 4 orders of magnitude = 99 iterations, 5 orders of magnitude = whoops, I broke it.

@carsonfarmer
Copy link
Owner

We could add more iterations, but I'm thinking much more than 99 is a lot!

@bjquinn
Copy link
Author

bjquinn commented Jul 10, 2014

Actually, I was thinking more along the lines of adding a checkbox to optionally scale down the data to span no more than 3 orders of magnitude.

@bjquinn
Copy link
Author

bjquinn commented Jul 14, 2014

Do you have a suggestion as to where I'd start if I were to overwrite and re-scale the data that was passed into the plugin? If you could point me at where that (array?) gets set, I could add some logic to manipulate the data right there.

@bjquinn
Copy link
Author

bjquinn commented Jul 16, 2014

I modified the getInfo function in doCartogram.py. I added two small sections. Right below dTotalValue = 0.00, I added the following lines of code --

    while provider.nextFeature(feat):
        atMap = feat.attributeMap()
        if atMap[index].toInt()[0] > maxval: maxval = atMap[index].toInt()[0]
    minval = maxval / 1000
    if minval <= 0: minval = 1

Under lfeat.dValue = atMap[index].toInt()[0] I added one more line of code --

if lfeat.dValue < minval: lfeat.dValue = minval

This has the effect of limiting the minimum value's distance from the maximum value. Since the larger values will be the ones visible on the map anyway, we find a reasonable minimum allowed value, which I set as maxval/1000. Then, any value under maxval/1000 simply gets set to maxval/1000. I thought about log-scaling, but that hides large differences between large numbers. That's not what you would want on a cartogram. What I'm usually dealing with is a few stray low values (some 1's, 10's, etc.) mixed in with the normal values (5000, 50000). The downside is we mask the differences between values like 1, 10, and 100 when your max value is 100,000. But could you have seen the difference between 1, 10, and 100 anyway when the max value is 100,000? They would all be basically reduced to a few pixels anyway.

@bjquinn
Copy link
Author

bjquinn commented Jul 16, 2014

Also, for reference, some data -- especially data with only moderate variations between the larger values (say, between 15,000 and 25,000) but only a few smaller values (some 100's, some 2,000's), the areas with small values STILL seem to not get distorted anywhere near what one would expect. This is even though on other sets of data the "hack" shown above works beautifully. Obviously the code shown above would limit the low value to 25 assuming a max value of 25,000. Compared to 25,000, you'd expect 25 to show up as basically a couple of pixels, but it doesn't. I suppose it would if you ran enough iterations. Anyway, I'm just playing around with numbers here, but the best balance I've found so far is to limit the minval to maxval / 250 and run 15 iterations.

@carsonfarmer
Copy link
Owner

Good to see this is working for you now, however, I don't I'll be adding these adjustments to the plugin code. Ultimately, doing clever things like this is likely to confuse users who aren't expecting it more than not doing it... It might be worth adding a note of some kind to the documentation, but for now, I think I'll leave it out.

@bjquinn
Copy link
Author

bjquinn commented Jul 23, 2014

That's fine. I will mention that a large percentage of the data I've run across and tried to use as a basis for a cartogram has exhibited this behavior, so it might be worth implementing a user-friendly version of this at some point. However, the modifications I made are in fact working for me, so I'm fine merging them in to newer releases myself as necessary.

@bjquinn bjquinn closed this as completed Jul 23, 2014
@bjquinn
Copy link
Author

bjquinn commented Jul 23, 2014

Thanks for your help on this!

@carsonfarmer
Copy link
Owner

@bjquinn If you'd like to submit a pull request for an update that offers these adjustments as options (via GUI and code), I would be happy to consider it :-)

@bjquinn
Copy link
Author

bjquinn commented Jul 23, 2014

What I'll probably do is wait until the fixes for the errors I'm having when using 2.4 are merged, then I'll merge my changes with that code, test it again, and submit a pull request. Right now I'm still using 1.8. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants