Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weights function enhancement #1751

Closed
lixun910 opened this issue Nov 16, 2018 · 17 comments
Closed

weights function enhancement #1751

lixun910 opened this issue Nov 16, 2018 · 17 comments
Assignees

Comments

@lixun910
Copy link
Member

from Luc:

1.extend the distance weights functionality to higher dimensions, beyond 2
right now, we have an x and y coordinate, which should remain the default, but
it would be nice to be able to add more variables, so that a distance in multiattribute
space could be computed

  • for example, this could create so-called socio-economic weights
  1. generalize the distance metric beyond Euclidean distance, at the very least
    include Manhattan distance?
@lixun910
Copy link
Member Author

For example:
screen shot 2018-11-15 at 9 23 13 pm
screen shot 2018-11-15 at 9 23 37 pm

lixun910 added a commit to lixun910/geoda that referenced this issue Nov 16, 2018
lixun910 added a commit to lixun910/geoda that referenced this issue Nov 22, 2018
lixun910 added a commit to lixun910/geoda that referenced this issue Nov 27, 2018
add adaptive kernel weights for social weights creation
lixun910 added a commit to lixun910/geoda that referenced this issue Nov 28, 2018
lixun910 added a commit to lixun910/geoda that referenced this issue Nov 28, 2018
@lixun910 lixun910 self-assigned this Nov 29, 2018
@lixun910
Copy link
Member Author

v179

@Ashitacarl
Copy link

GeoDa 1.12.1.181 (macOS Mojave, 10.14.1). Dec. 10 build.

Enhancement verified:

  1. non-geographical distance allowed;
  2. Manhattan distance added.

1751

@lixun910
Copy link
Member Author

@jkoschinsky I would leave it open, since I don't get feedback from Luc yet, and I am not sure if @Ashitacarl only verify the UI part, or did a throughout check at all the possible combinations and outputs of social weights creation.

@lixun910 lixun910 reopened this Dec 19, 2018
@jkoschinsky
Copy link
Collaborator

sounds good, @lixun910.
@Ashitacarl: could you also test the functionality beyond the UI?
@lanselin: does this look like what you wanted?

@Ashitacarl
Copy link

@lixun910 and @jkoschinsky Sure I will test functionalities beyond the UI. Is there a routine/list to follow to check all the possible combinations using weights (GeoDa workbook?)?

@jkoschinsky
Copy link
Collaborator

@Ashitacarl -- great, thanks. There's no script to follow since this is new functionality, so just try to set it up so you increase the likelihood of breaking it (e.g. use grouped variables, table with missing values and any other data characteristics that aren't sample/text book data).

@Ashitacarl
Copy link

Ashitacarl commented Dec 19, 2018

GeoDa 1.12.1.181 (macOS Mojave, 10.14.1). Dec. 17 build.

@lixun910
Tested social distance weights. A few observations:

  1. When creating the weight, GeoDa requires a distance metric specified: either Manhattan or Euclidean distance. I guess they are just "labels" or units in this case. Do we need to create another "distance metric" name for this social-distance purpose? Ex. Custom distance or social distance.

1751-1

  1. In the weights manager dialogue, the "distance var" is not specified. Shall we include that? "Type" is specified as "threshold". I am not sure if this name is clear enough.

  2. Also, in the same image, the min neighbor is 0, which should not be the case since I use the default distance bandwidth/threshold (guarantees at least one neighbor). Relatedly, when I change the bandwidth downward, I did not receive any warning message about creating isolates.

1751-2

  1. In Multivariate Local Geary's C Cluster Map, one cannot select multiple time points from a single group. Could this be a problem? The same is also found in Multivariate Local Join Count.

1751-3

No problem with grouped variables or table with missing values. No problem creating a project file. No problem with Moran scatter plots, Cluster Maps, Cluster Analysis. I tested using datasets from external sources.

@lanselin
Copy link
Collaborator

lanselin commented Dec 19, 2018 via email

lixun910 added a commit to lixun910/geoda that referenced this issue Jan 14, 2019
GeoDaCenter#1751  (socio-weights creating) when change the bandwidth downward to cause min neighbor is 0, GeoDa should raises warning message about creating isolates.
lixun910 added a commit to lixun910/geoda that referenced this issue Jan 14, 2019
Social-weights creation: In the weights manager dialogue, the "distance var" is not displayed.
lixun910 added a commit to lixun910/geoda that referenced this issue Jan 15, 2019
In Multivariate Local Geary's C Cluster Map, one cannot select multiple time points from a single group.
lixun910 added a commit to lixun910/geoda that referenced this issue Jan 15, 2019
apply same fix of GeoDaCenter#1751 to multi variate local join count
@lixun910
Copy link
Member Author

@Ashitacarl can you help to verify it? Thanks!

@Ashitacarl
Copy link

GeoDa 1.12.1.189 (macOS Mojave, 10.14.1). Jan. 15 build.

Datasets: Guerry sample and external.

One problem remains: when there are outliers in the distance var, GeoDa produces isolates. Also, missing values are problematic here. I don't know the underlying algorithm but observations with missing values seem to have a lot of neighbors. See figs and explanations below.

All other fixed verified.

1791-3

In the first figure, the selected polygons have missing values (except the 1 obs) in the distance var. They seem to have a lot of neighbors (see column NUM_NBRS). The first selected polygon, as an outlier, has 0 neighbors. I used the default bandwidth, and no message pop up warning me of creating potential isolates (and it actually creates isolates). The figure below is another example of creating isolates without warning (default bandwidth used). The second example uses the Guerry sample data and 'Area' variable as the distance var.

1791-4

@lanselin
Copy link
Collaborator

I need to check more carefully, but this could have something to do with the scale of the variable. Since all the distances are based on squared values and if the original values are large to begin with, there may be overflow issues.

One option would be to standardize all the variables before computing the distances, similar to what is done in the cluster modules.

@lixun910
Copy link
Member Author

@Ashitacarl Thanks! I was able to replicate this and found this bug: it happens when "manhanttan distance" is selected, the bandwidth is computed with square root, which is not correct and should only be applied when "Euclidean distance" is selected. This will be fixed in next build V191.

@lanselin Yes, this current implementation, the "Transformation:" options, which are the same with in cluster methods, have been added to allow user to scale the variables.
screen shot 2019-01-16 at 10 21 26 am

@lixun910
Copy link
Member Author

@Ashitacarl Empty values are not treated in current implementation. This is not a problem when creating spatial weights since all geometries are valid, but a unique case in socio-weights creation.

I think we simply treat observations with empty value "islands" when creating a socio-weights (will be in V191). Let me know if I am wrong @lanselin . Thanks!

lixun910 added a commit to lixun910/geoda that referenced this issue Jan 17, 2019
GeoDaCenter#1751 (socio-weights creation) when "manhanttan distance" is selected, the bandwidth is computed with square root, which is not correct and should only be applied when "Euclidean distance" is selected
lixun910 added a commit to lixun910/geoda that referenced this issue Jan 17, 2019
treat observations with empty value "islands" when creating a socio-weights

raise warning if islands detected
@Ashitacarl
Copy link

GeoDa 1.12.1.191 (macOS Mojave, 10.14.1). Jan. 16 build.

Fix verified.

@lixun910
In contrast to geometry-based weights, I think the social-weighting variable is more likely to have similar values among observations. This potentially creates more problem with KNN. Observations with identical values are very likely to be divided into different groups. See below with an example of KNN=4. In this dataset, the 15 observations with value 0 in the weighting variable are divided into 2 groups. Maybe make them into a single category with 14 neighbors?

1751-2-1

lixun910 added a commit to lixun910/geoda that referenced this issue Mar 19, 2019
Fix a bug: if there are many similar values among obs, KNN weights will generete neighbors with more than K neighbors.
@lixun910 lixun910 mentioned this issue May 16, 2019
@bsalas11
Copy link

GeoDa Windows 1.14

Fix verified.

No errors detected in functionality. Warning appears to indicate the presence of isolates.

Capture

One potential quality of usage update: when selecting multiple distance weight variables, it would be nice to have it display more than three variables at a time. When the list of variables is lengthy, it becomes tedious to scroll up and down while having to remember the variables already selected. Perhaps have it turn into a drop-down list or allow the user to increase the size of the box to display more than three variables?

variables

@lixun910
Copy link
Member Author

lixun910 commented Aug 2, 2019

Fixed verified. The UI enhancement suggested by @bsalas11 is not supported by wxWidgets. Will keep an eye on possible enhancement.

@lixun910 lixun910 closed this as completed Aug 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants