How do I know how many iterations to run? #3

bjquinn · 2014-07-09T22:01:35Z

If I'm looking for an "accurate" result, where the polygons get reasonably correctly sized relative to the data column being used, how do I know how many iterations to run? I've already seen that it looks like I need to run more iterations in order to get a reasonably-looking distorted result when the low to high range of the data is separated by several orders of magnitude. But what I don't know is how to determine how many iterations I "should" run in order for my result to be mathematically accurate. Does running too many iterations endlessly exaggerate differences between polygons? Is there a way to know what the correct number of iterations I should run is?

bjquinn · 2014-07-09T23:11:56Z

Ok, I've done some more digging on this myself, and it seems that if the difference between my low and high values is 2 to 3 orders of magnitude, then 5 to 10 iterations looks like it creates the desired effect. But 4 orders of magnitude requires the max 99 iterations (which takes forever), and 5+ orders of magnitude renders the plugin useless, as the distortion effect isn't noticeable even at 99 iterations. I know that I could massage my data in order to artificially limit the differences in value, and I know that you can't really depict 5 order of magnitude differences visually anyway (there aren't enough pixels on the screen), but it adds an additional data manipulation step that I was hoping to avoid. Am I doing something wrong?

carsonfarmer · 2014-07-10T01:41:47Z

@bjquinn For visualization purposes a fewer number of iterations is entirely reasonable (generally speaking). You are right that the range in your data will dictate how many iterations to do, but realistically, you don't need that many to emphasize differences... and if your values are significantly different from the relative areas, then the algorithm will get close to equilibrium pretty quickly. For reference, I rarely use more than 10 iterations.

bjquinn · 2014-07-10T02:41:31Z

Thanks for the response! However, if I have data that ranges from 1 to 25,000 (for example), 10 iterations creates a map that's nearly indistinguishable from the original. 99 iterations (which takes forever) looks good. But if my range is larger (say, 1 to 250,000), then even 99 iterations creates a map indistinguishable from the original. Perhaps I'm doing something wrong? I know I could artificially massage the data to narrow the range (since who could see a 250,000:1 ratio on a map anyway?), but I was hoping not to add an additional step of complexity. In my case I'd be having to train end users on data manipulation and the definition of "orders of magnitude" and I don't see that ending well. :)

carsonfarmer · 2014-07-10T02:53:13Z

Ah ok, that is a huge range... hmm, I'm not sure what you can do about this actually... with such a large range, there really isn't much you can do... log-scale the data?

bjquinn · 2014-07-10T02:57:54Z

Right, I was thinking about that, but in my case the place to log-scale it would be WITHIN the plugin. I don't have a lot of control over the data before it hits QGIS. Thought about digging into the code and seeing where the data was loaded and detecting if the range was >3 orders of magnitude and scaling it if it was. Or perhaps I could add a checkbox to enable log-scaling or a dropdown for the maximum order of magnitude range or something. Do you have a suggestion as to where I'd start? Would that be in doCartogram.py?

bjquinn · 2014-07-10T02:59:35Z

Also thought about attempting to adjust the number of iterations to the data, but I've already seen that that doesn't scale well. 3 orders of magnitude = 5 to 10 iterations, 4 orders of magnitude = 99 iterations, 5 orders of magnitude = whoops, I broke it.

carsonfarmer · 2014-07-10T22:45:51Z

We could add more iterations, but I'm thinking much more than 99 is a lot!

bjquinn · 2014-07-10T22:47:22Z

Actually, I was thinking more along the lines of adding a checkbox to optionally scale down the data to span no more than 3 orders of magnitude.

bjquinn · 2014-07-14T18:44:58Z

Do you have a suggestion as to where I'd start if I were to overwrite and re-scale the data that was passed into the plugin? If you could point me at where that (array?) gets set, I could add some logic to manipulate the data right there.

bjquinn · 2014-07-16T20:31:14Z

I modified the getInfo function in doCartogram.py. I added two small sections. Right below dTotalValue = 0.00, I added the following lines of code --

    while provider.nextFeature(feat):
        atMap = feat.attributeMap()
        if atMap[index].toInt()[0] > maxval: maxval = atMap[index].toInt()[0]
    minval = maxval / 1000
    if minval <= 0: minval = 1

Under lfeat.dValue = atMap[index].toInt()[0] I added one more line of code --

if lfeat.dValue < minval: lfeat.dValue = minval

This has the effect of limiting the minimum value's distance from the maximum value. Since the larger values will be the ones visible on the map anyway, we find a reasonable minimum allowed value, which I set as maxval/1000. Then, any value under maxval/1000 simply gets set to maxval/1000. I thought about log-scaling, but that hides large differences between large numbers. That's not what you would want on a cartogram. What I'm usually dealing with is a few stray low values (some 1's, 10's, etc.) mixed in with the normal values (5000, 50000). The downside is we mask the differences between values like 1, 10, and 100 when your max value is 100,000. But could you have seen the difference between 1, 10, and 100 anyway when the max value is 100,000? They would all be basically reduced to a few pixels anyway.

bjquinn · 2014-07-16T21:00:28Z

Also, for reference, some data -- especially data with only moderate variations between the larger values (say, between 15,000 and 25,000) but only a few smaller values (some 100's, some 2,000's), the areas with small values STILL seem to not get distorted anywhere near what one would expect. This is even though on other sets of data the "hack" shown above works beautifully. Obviously the code shown above would limit the low value to 25 assuming a max value of 25,000. Compared to 25,000, you'd expect 25 to show up as basically a couple of pixels, but it doesn't. I suppose it would if you ran enough iterations. Anyway, I'm just playing around with numbers here, but the best balance I've found so far is to limit the minval to maxval / 250 and run 15 iterations.

carsonfarmer · 2014-07-23T16:46:47Z

Good to see this is working for you now, however, I don't I'll be adding these adjustments to the plugin code. Ultimately, doing clever things like this is likely to confuse users who aren't expecting it more than not doing it... It might be worth adding a note of some kind to the documentation, but for now, I think I'll leave it out.

bjquinn · 2014-07-23T16:55:12Z

That's fine. I will mention that a large percentage of the data I've run across and tried to use as a basis for a cartogram has exhibited this behavior, so it might be worth implementing a user-friendly version of this at some point. However, the modifications I made are in fact working for me, so I'm fine merging them in to newer releases myself as necessary.

bjquinn · 2014-07-23T16:55:39Z

Thanks for your help on this!

carsonfarmer · 2014-07-23T17:45:48Z

@bjquinn If you'd like to submit a pull request for an update that offers these adjustments as options (via GUI and code), I would be happy to consider it :-)

bjquinn · 2014-07-23T17:48:45Z

What I'll probably do is wait until the fixes for the errors I'm having when using 2.4 are merged, then I'll merge my changes with that code, test it again, and submit a pull request. Right now I'm still using 1.8. Thanks!

bjquinn closed this as completed Jul 23, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I know how many iterations to run? #3

How do I know how many iterations to run? #3

bjquinn commented Jul 9, 2014

bjquinn commented Jul 9, 2014

carsonfarmer commented Jul 10, 2014

bjquinn commented Jul 10, 2014

carsonfarmer commented Jul 10, 2014

bjquinn commented Jul 10, 2014

bjquinn commented Jul 10, 2014

carsonfarmer commented Jul 10, 2014

bjquinn commented Jul 10, 2014

bjquinn commented Jul 14, 2014

bjquinn commented Jul 16, 2014

bjquinn commented Jul 16, 2014

carsonfarmer commented Jul 23, 2014

bjquinn commented Jul 23, 2014

bjquinn commented Jul 23, 2014

carsonfarmer commented Jul 23, 2014

bjquinn commented Jul 23, 2014

How do I know how many iterations to run? #3

How do I know how many iterations to run? #3

Comments

bjquinn commented Jul 9, 2014

bjquinn commented Jul 9, 2014

carsonfarmer commented Jul 10, 2014

bjquinn commented Jul 10, 2014

carsonfarmer commented Jul 10, 2014

bjquinn commented Jul 10, 2014

bjquinn commented Jul 10, 2014

carsonfarmer commented Jul 10, 2014

bjquinn commented Jul 10, 2014

bjquinn commented Jul 14, 2014

bjquinn commented Jul 16, 2014

bjquinn commented Jul 16, 2014

carsonfarmer commented Jul 23, 2014

bjquinn commented Jul 23, 2014

bjquinn commented Jul 23, 2014

carsonfarmer commented Jul 23, 2014

bjquinn commented Jul 23, 2014