Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter issue when using polygonGate() #86

Closed
mehrnoushmalek opened this issue Sep 19, 2017 · 28 comments
Closed

filter issue when using polygonGate() #86

mehrnoushmalek opened this issue Sep 19, 2017 · 28 comments

Comments

@mehrnoushmalek
Copy link

Hello,

I created a gate and was using the polygonGate function in flowCore along with add() in flowWorkspace. When I compared my cell count before and after this, I realised that there's 1 cell difference.

I looked at the c++ code for filter in "inPolygon_c" function and it seems that when there's a point that lies on the vertex with the highest y-value, it gets excluded.

data[i+nrd] >= min(p1y, p2y) && data[i+nrd] < max(p1y, p2y) 
# when data[i+nrd]== max(vertices[j+nrv]) j=1,2...

It could be something else that caused the difference?
but this is the most probable thing that happened.

It only needs to cover the case when cells are on the vertex with highest y-value.

Best,

@mikejiang
Copy link
Member

mikejiang commented Sep 19, 2017

It is not clear to me which count you were comparing to?flowJo?

@mehrnoushmalek
Copy link
Author

flowJo, flowDensity, or any other method. If there's a cell on the vertex of the polygone with highest y-value, it gets excluded, when the gate is applied.

@mikejiang
Copy link
Member

Adding or excluding the events falling on the edge of the gates do not significantly affect the population statistics (especially percentage). When comparing the counts from different software, we typically don't expect them to be exactly same (even if we add these edge events). It should be considered to be reproducible within a certain tolerance. So far we haven't seen any issue related to this regarding to the data analysis.

@gfinak
Copy link
Member

gfinak commented Sep 19, 2017

I think @mehrnoushmalek means that the cell count is off by one, suggesting that other software includes the cells that lie on the boundary, whereas we my be excluding them?

mikejiang pushed a commit to RGLab/cytolib that referenced this issue Sep 19, 2017
@mikejiang
Copy link
Member

I've made the change and pull the trunk from cytolib.(You need trunk from flowWorkspace as well). I did not observe any difference in my test cases with this change. Not sure if this will get that cell back.

@mehrnoushmalek
Copy link
Author

mehrnoushmalek commented Sep 19, 2017

polygonmissingmaxy
Here's an example with 1-cell difference, after polygoneGate()
I also made an extreme case, where the difference can be more than one:
polygonmissingmaxy_2

@mikejiang
Copy link
Member

Let me know if this correct your counts once you update the software

@mehrnoushmalek
Copy link
Author

Also the filter in flowCore?

mikejiang pushed a commit that referenced this issue Sep 19, 2017
@mikejiang
Copy link
Member

done

@mehrnoushmalek
Copy link
Author

Thanks @mikejiang, that one is solved, but you added >=max(p1y,p2y) , so whatever that has a y-value equal to p2y (when p2y doesn't have the highest y-value) would be counted twice, but then you have
res[i] = counter % 2 > 0;, so those cells would be excluded.
Here's the example:
polygonmissingequalyvals_3

@mikejiang
Copy link
Member

This is the inherent known problem with ray cast algorithm. I don't think I have solution to it.

@mehrnoushmalek
Copy link
Author

mehrnoushmalek commented Sep 19, 2017

The old code would work, if you just add a condition for when
data[i+nrd]== max(vertices[nrv:j+nrv])
wouldn't that work?
This is using flowCore version 1.42.3
polygonmissing_flowcore1 42 3

mikejiang pushed a commit that referenced this issue Oct 6, 2017
@gfinak
Copy link
Member

gfinak commented Aug 17, 2018

Can this be closed?

@mehrnoushmalek
Copy link
Author

I don't know if it got solved, but as far as I remember, the function that was used inside the polygoneGate had problem identifying the points on the top vertex.
So, yes, go ahead, and close it, we are conscious about the differences in counts.

@gfinak
Copy link
Member

gfinak commented Aug 17, 2018

@mehrnoushmalek
If you have a solution (as it seems above) would you be willing to submit a PR?

@mehrnoushmalek
Copy link
Author

I'm actually using the point.in.polygon (sp), and didn't have that issue. That's also what I'm using in flowDensity.

@gfinak
Copy link
Member

gfinak commented Aug 17, 2018

@mikejiang shall we update the code to use that method if it doesn't have this issue?

@gfinak
Copy link
Member

gfinak commented Aug 17, 2018

Reading through the thread above, I think this is resolved. @mikejiang can you confirm?

@mikejiang
Copy link
Member

@mehrnoushmalek , I am having a hard time to understand your statement

that one is solved, but you added >=max(p1y,p2y) , so whatever that has a y-value equal to p2y (when p2y doesn't have the highest y-value) would be counted twice

So which points are currently excluded in your example? Still edge points or the one inside of polygon?(It would be helpful if you could provide the example gates that failed the current algorithm)

@mehrnoushmalek
Copy link
Author

Hello Mike,
this is from almost a year ago, and it was just a suggestion to modify the algorithm. I don't exactly recall that being solved. I remember we talked about it, and you mentioned that it's the problem of ra cast algorithm.
I'm really busy at the moment, so go ahead and close this issue, I can revisit later, if I face this issue again.

@gfinak gfinak closed this as completed Aug 17, 2018
@mikejiang
Copy link
Member

@mehrnoushmalek , @gfinak I was able to simulate your use case and verified that flowCore is accurate (i.e. it does not exclude any edge points). Below is the reproducible code.

vertices <- matrix(c(300,300
   	                     ,600,300
   	                     ,800, 200
   	                     ,650,100
   	                     ,310, 100
   	                     , 100, 200
   	), byrow = TRUE,ncol=2)
   	colnames(vertices) <- c("FSC-H", "SSC-H")
   	p   = polygonGate(vertices)
   	fr <- flowFrame(vertices)
   	pts <- fr@exprs[1,]
   	pts[2] <- pts[2] - 20
   	fr@exprs <- rbind(fr@exprs, pts)
   	pts <- fr@exprs[2,]
   	pts[2] <- pts[2] - 20
   	fr@exprs <- rbind(fr@exprs, pts)
   	x   = fr %in% p
   	expect_equal(sum(x),8)

plot(rbind(p@boundaries, fr@exprs), type = "n")
   	polygon(p@boundaries)
   	points(fr@exprs, col = "red")
   	title(sum(x))

image

mikejiang pushed a commit that referenced this issue Aug 17, 2018
@gfinak
Copy link
Member

gfinak commented Aug 18, 2018

Super! Thanks Mike.

@mikejiang
Copy link
Member

Looking at the code below (https://github.com/RGLab/flowCore/blob/trunk/src/inPolygon.cpp#L58-L73),

 if(data[i+nrd] >= min(p1y, p2y) && data[i+nrd] <= max(p1y, p2y) &&
         data[i] <= max(p1x, p2x))
      {
  	      xinters = (data[i+nrd]-p1y)*(p2x-p1x)/(p2y-p1y)+p1x;
  	      /*if intersection x coordinate == point x coordinate it lies on the
  	      boundary of the polygon, which means "in"*/
        	if(xinters==data[i])
      	  {
        	  counter=1;
        	  break;
        	}
        	/*count how many vertices are passed by the ray*/
        	if (xinters > data[i])
      	  {
        	  counter++;
        	}
      }

At first glance, this line will cause a bug

  	      xinters = (data[i+nrd]-p1y)*(p2x-p1x)/(p2y-p1y)+p1x;

Because when p2y == p1y, it leads to undefined divide-by-zero behavior, which is NA in my system. And it will fail the following two xinters comparisons (generates false for both condition tests).
However, the further investigation 4 different relevant use cases can prove this logic actually handle them correctly.

library(flowCore)
vertices <- matrix(c(300,300
                     ,600,300
                     ,800, 200
                     ,650,100
                     ,310, 100
                     , 100, 200
), byrow = TRUE,ncol=2)
colnames(vertices) <- c("FSC-H", "SSC-H")
p   = polygonGate(vertices)
pt2 <- vertices[1,]
pt1 <- pt3 <- pt2
pt1[1] <- pt2[1] - 50
pt3[1] <- pt2[1] + 50
pt4 <- vertices[2,]
fr <- flowFrame(rbind(pt1,pt2,pt3,pt4))

x   = fr %in% p
plot(rbind(p@boundaries, fr@exprs), type = "n")
polygon(p@boundaries)
points(fr@exprs, col = sapply(x, function(i)ifelse(i, "blue", "red")))
title(sum(x))

image
For example, the first dot on left, its intersections with polygon will be counted twice (instead of 3 times) since the horizontal edge will not be taken into account due to the above reason. And according to ray (horizontal toward right direction, in this implementation) cast algorithm, this dot is out side of polygon.
For the second dot, it will be counted as 1 (regardless its two intersections count) since it is the special case on the edge. Thus it is odd number and considered to be in.
For the third dot, it won't be computed as on the edge (even in fact it is) because the horizontal edge will fail the test for the above reason. And thus it will only intersect with the right edge once, so it is in.
The fourth dot, again falls on the (non-horizontal) edge and automatically in.

So, still, I didn't find the use case that will fail this particular version of ray cast algorithm. (@mehrnoushmalek , feel free to post one whenever you have time to revisit this).

That said, the undefined behavior is probably not a good idea, I will try to have a zero denominator check to eliminate this potential problem.

@mikejiang mikejiang reopened this Aug 20, 2018
mikejiang pushed a commit to RGLab/cytolib that referenced this issue Aug 20, 2018
mikejiang pushed a commit that referenced this issue Aug 20, 2018
@mikejiang
Copy link
Member

@mehrnoushmalek , I just reproduced the issue you reported, which was introduced by 515d0db that was trying to include the edge case where cells are on the polygon vertex/edge with highest y-value.
image
In this example (the same as you described previously), the cell on the far left is counted as in because the horizontal ray crosses the 3 edges, while the cell in the middle is considered as out since the ray crosses 2 edges (green). It happens when the cell has the exact same y-value as one of the vertices (e.g. vertex in blue),which causes the same ray crossing counted twice.
So you are right, instead of changing < to <= , we want to preserve the original comparisons code data[i+nrd] < max(p1y, p2y) and add the additional logic branch for edge cases where data_y == max(vertices.y). This adds extra computation times especially when nVertices is large, but it seems to be only correct solution to avoid both errors.

@mehrnoushmalek
Copy link
Author

@mikejiang Thanks for working on this. That was the exact problem, points with y-values same as one the vertices. I will install the latest version, and test it.

mikejiang pushed a commit that referenced this issue May 9, 2019
mikejiang pushed a commit to RGLab/cytolib that referenced this issue May 9, 2019
@mikejiang
Copy link
Member

The fix is in trunk. I will merge over to bioc release/devel tomorrow. Here are the tests for the edge cases.
You will need to reinstall flowWorkspace&CytoML as well
image

@mehrnoushmalek
Copy link
Author

Perfect. Will do that tomorrow as well, Thanks Mike.

@itskathylam
Copy link
Contributor

I was having a similar gating issue. Thank you very much for this fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants