Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploring ideas for new shoreline extraction routines for application on label outputs of 4-class segmentation models #168

Closed
dbuscombe-usgs opened this issue Aug 3, 2023 · 10 comments
Assignees
Labels
Research Investigate if something is possible, experiment Testing Test Case Scenarios, automated testing, etc. V2 for version 2 of coastseg

Comments

@dbuscombe-usgs
Copy link
Member

I wanted to see if I could come up with a solution for shoreline extraction that starts with the 4-class label output as the basis. I put together a script to test this workflow, using an RGB image, the segformer RGB model, and cloud reference shorelines that I just made using GIMP for simplicity. These files are here

inputs_for_shoreline_extarction.zip

This is what they look like
inputs

First, I propose we filter the label image with the reference shoreline mask and the cloud mask. If our 4-class greyscale label is the variable grey, the binary cloud mask is called cloud_mask, and the binary shoreline buffer mask is called ref_shoreline_mask

grey = grey.astype(np.float16)
grey[cloud_mask==0] = np.nan
grey[ref_shoreline_mask==0] = np.nan

From here, I have devised 3 new methods for shoreline extraction, each of which builds more complexity on top of the last

method 1: grab the land/water contour directly from this masked greyscale image

def get_shoreline_points(grey):
    ## use matplotlib's contour function to get the contour of the value 1, 
    ## which is water/sand or whitewater/sand interface, depending on the presence of whitewater
    cs = plt.contour(grey,[1])
    plt.close()
    ## the above could be modified instead to get the sand contour, value = 2, e.g.
    ## cs = plt.contour(grey,[2])

    # the shoreline may or may not be segmented, so this next variable may be length 1 or more than 1
    shoreline_segments = cs.collections[0].get_paths()

    ## loop through each segment and get the X,Y coordinate values
    X = []; Y = []
    for p in shoreline_segments:
        v = p.vertices
        x = v[:,0]
        y = v[:,1]
        X.append(x)
        Y.append(y)

    # convert the list of lists into numpy arrays
    X = np.hstack(X)
    Y = np.hstack(Y)
    return X, Y

X, Y = get_shoreline_points(grey)

this is what we get:
method1

method 2: same as method 1, but filter found shorelines that are very close to the boundary of the shoreline mask

We could achieve this using a distance function

from scipy.ndimage import distance_transform_cdt
distances = distance_transform_cdt(ref_shoreline_mask, metric='chessboard', return_distances=True, return_indices=False)

now we have the distance matrix, we can further filter the label image based on a threshold distance in pixels. This is a matrix of pixel values that encode the distance to the boundary of the reference shoreline mask

thres = 10
grey[distances<thres] = np.nan

## now when we get the shoreline, hopefully many shoreline segments very near the edge will disappaear
X, Y = get_shoreline_points(grey)

Nice:

method2

method 3: same as method 1 or 2, but use dynamic boundary tracing rather than contouring

It is a way to find the 'least-cost' path through an image, so is good for finding interfaces

Here are the new libraries and functions we need


from numpy.matlib import repmat

# =========================================================
def dpboundary(imu):
   '''
   dynamic boundary tracing in an image
   (translated from matlab: CMP Vision Algorithms http://visionbook.felk.cvut.cz)
   '''
   # get the image dimensions
   m,n = np.shape(imu)
   #preallocate two arrays for collecting cost and position
   c = np.zeros((m,n))
   p = np.zeros((m,n))
   # initizlize the cost with the first row
   c[0,:] = imu[0,:]
   #cyle through all rows
   for i in range(1,m):
      # this cost starts out as the previous
      c0 = c[i-1,:]
      # the next is vectorized code for computing least costly path through
      # by the place with most similar image intensity
      tmp1 = np.squeeze(ascol(np.hstack((c0[1:],c0[-1]))))
      tmp2 = np.squeeze(ascol(np.hstack((c0[0], c0[0:len(c0)-1]))))
      d = repmat( imu[i,:], 3, 1 ) + np.vstack( (c0,tmp1,tmp2) )
      del tmp1, tmp2
      # record where the minimium cost is
      p[i,:] =  np.argmin(d,axis=0)
      # record what the minimum cost is
      c[i,:] =  np.min(d,axis=0)

   p[p==0] = -1
   p = p+1
   # this next loop allocates pixel coordinates to the minimum cost path
   x = np.zeros((m,1))
   #cost = np.min(c[-1,:])
   xpos = np.argmin( c[-1,:] )
   for i in reversed(range(1,m)):
      x[i] = xpos
      if p[i,xpos]==2 and xpos<n:
         xpos = xpos+1
      elif p[i,xpos]==3 and xpos>1:
         xpos = xpos-1
   x[0] = xpos
   return x

# =========================================================
def ascol( arr ):
   '''
   reshapes row matrix to be a column matrix (N,1).
   '''
   if len( arr.shape ) == 1: arr = arr.reshape( ( arr.shape[0], 1 ) )
   return arr

Now we make a binary image where shoreline points are -1, everything else zero, and call the dpboundary function to this

dp_input = np.zeros_like(grey)
for x,y in zip(X,Y):
   dp_input[int(y), int(x)] = -1

# call dpboundary on this image
shoreline = dpboundary(dp_input)

This one extrapolates to the edge of the image ... Nice!

method3

We can discuss all this tomorrow @2320sharon . We need to test on some more images ...

code here: shoreline_detect.zip

@dbuscombe-usgs dbuscombe-usgs added Testing Test Case Scenarios, automated testing, etc. Research Investigate if something is possible, experiment labels Aug 3, 2023
@dbuscombe-usgs dbuscombe-usgs changed the title Exploring ideas more new shoreline extraction routines for label outputs of 4-class segmentation models Exploring ideas for new shoreline extraction routines for application on label outputs of 4-class segmentation models Aug 3, 2023
@dbuscombe-usgs
Copy link
Member Author

Upon reflection, I guess it's just one method really, with "method 2" being just an additional logic-based filter, and "method 3" being a way to use those found shorelines to trace a "shoreline path" through the entire image

@dbuscombe-usgs
Copy link
Member Author

I tidied up my script, and added two more examples. It's simple and fast and seems quite robust. I'm quite pleased with it.

image2_3
image3_3

all code and data here:

example_shoreline_detection_workflow.zip

@dbuscombe-usgs
Copy link
Member Author

I think the advantages are:

  1. it works purely in the image coordinate space, and exploits the fact that in coastseg you generate the cloud and shoreline buffers as rasters
  2. it is therefore very fast, because it doesnt have to use vectors or convert between types or structure, or convert or reproject coordinates (nightmare!)
  3. applying the masks to the 4-class label eliminates the need to binarize the image, and by extension, having to deal with booleans and binary logic which can be tricky for both us and the computer to wrap our heads around. Instead, we can find the line separating class 0,1 and 2. Using matplotlib's contour function, you can ask for a specific contour, so I ask for the contour of the value 1. It finds the edge of the whitewater, and if whitewater is missing, it finds the edge of the water instead.
  4. the distance filtering is only helpful in situations where shoreline segments are find within a specified distance of the reference shorelien buffer boundary, so we may or may not want to implement that
  5. the dynamic boundary tracing technique allows us to get a continuous shoreline from the shoreline extracted from the label. Like extrapolation. Genius!

@2320sharon
Copy link
Collaborator

This is seriously cool and this should make extracting shorelines much faster. Would we still want to offer the same settings as coastsat for extracting shorelines with our model or do you think we may have to adapt them a bit? I believe we should adapt the settings to meet our needs when extracting shorelines instead of adhering to what coastsat has already built.

@dbuscombe-usgs
Copy link
Member Author

dbuscombe-usgs commented Aug 4, 2023

I tested a few more images, this time a few more tricky examples and the results were interesting

These ones worked great. The common theme is coastlines oriented from top to bottom ...

image4_3

image9_3

image6_3

However, at this site, the dynamic boundary tracing caused a problem, extrapolating the shoreline in an impossible area

image7_2
image7_3

I think I know the solution to this problem and will work on it and update here later.

Finally, the dynamic boundary tracing creates issues for non-straight shorelines, as we predicted:

image8_2
image8_3

image5_2
image5_3

It feels so good to get to work on an image processing task!

@dbuscombe-usgs
Copy link
Member Author

In the last two examples, it failed because the dynamic boundary tracing (DBT) is supposed to only work on straight(ish) interfaces, so I will try to think of a switch I can program in that determines when the DBT is/is not appropriate to apply ...

@dbuscombe-usgs
Copy link
Member Author

Let's examine why the shoreline failed for this image:
image7

If we examine the input to the DBT algorithm (blue/yellow mask) and the outputs, we realize how powerful this algorithm is at cutting through noise (an alternative we could explore here is the RANSAC algorithm)

tmp

I reasoned that the DBT algo needs more signal to work with because of the large gaps. I therefore dilated the input and that improved things a lot
tmp

The next problem is trickier ... there are too many shorelines so DBT doesn't apply
tmp

... we need a way to tell how complex the shoreline is. One metric that comes to mind is the "standard distance" of the shoreline locations. This is the average distance away from the center of the point cloud. It does a reasonable job at separating the long, straight coastline from the complex coastline. In each of the plots, below, the standard distance is the title, and the most complex coast has by far the largest standard distance

image2_stdist1
image4_stdist1
image5_stdist1
image8_stdist1

@dbuscombe-usgs
Copy link
Member Author

If I implement all of the above ideas, here are the new outputs for all 9 images. Not bad!

image1_3
image2_3
image3_3
image4_3
image5_3
image6_3
image7_3
image8_3
image9_3

My files:
script_and_masks.zip
images.zip
figs.zip

@2320sharon
Copy link
Collaborator

Thanks for explaining your though process here. I gotta say you got some massive improvements with those techinques.

Do you recommend any books, websites, or courses for learning more about image processing? I find it quite interesting and knowing more about it seems to be the key to solving these kinds of problems

@dbuscombe-usgs
Copy link
Member Author

These routines should only be implemented for the 'zoo method'. i.e. using the segformer models.

One nice thing about the approach is that it filters imagery based on label outputs, not image inputs. My logic is that it is probably a lot easier to identify bad images from model outputs (which are low-dimensional) than bad inputs (which are high dimensional). I'm making the case that it is harder to identify a bad image than a bad model output. The only real downside is the computational expense of pointing the model at each image, rather than only the 'good' ones.

However, I feel like the ideal approach is a two-pronged approach:

  1. a non-aggressive filter on the input images (like the black pixel filter we already have, with a permissive threshold)
  2. an aggressive filter on the model output label images

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Research Investigate if something is possible, experiment Testing Test Case Scenarios, automated testing, etc. V2 for version 2 of coastseg
Projects
None yet
Development

No branches or pull requests

2 participants