Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

METRON-1771: Update REST endpoints to support eventually consistent UI updates #1190

Closed
wants to merge 6 commits into from

Conversation

merrimanr
Copy link
Contributor

Contributor Comments

This PR updates several REST endpoints to return the state of an object on update operations. These endpoints include:

  • MetaAlertController

    • create
    • addAlertsToMetaAlert
    • removeAlertsFromMetaAlert
    • updateMetaAlertStatus
  • UpdateController

    • patch
    • replace
    • addCommentToAlert
    • removeCommentFromAlert

The general approach is to change the return type of the various Dao methods that update objects to return the updated document. In most cases this was straightforward but did require small changes in several files due to how the Dao classes are composed and the various implementations that exist (ES, Solr, InMemory, etc).

Perhaps the most significant change was to the InMemoryMetaAlertDao classes used for REST integration testing because they were no longer sufficient after the interface change. Before they stored a mapping between metaalerts and child alerts which worked fine when just returning a true/false on adding/removing child alerts. With this PR the InMemoryMetaAlertDao classes now must return the full object. As I worked through updating these methods to do this, I found myself rewriting the methods in AbstractLuceneMetaAlertUpdateDao. Instead of doing that, I refactored the InMemoryMetaAlertDao classes to more closely match how the MetaAlertDao classes are composed and was able to reuse the implementations in AbstractLuceneMetaAlertUpdateDao. I feel this makes the InMemoryMetaAlertDao classes smaller and easier to maintain moving forward.

Changes Included

  • Updated the various *UpdateDao interfaces and implementations to return Documents
  • Updated the REST services and controllers to match the new interfaces
  • Changed the InMemoryDao classes as described above
  • Updated the various unit and integration tests to verify that correct documents are returned (includes new tests in several cases)
  • Updated and added documentation as needed
  • Fixed a bug in HBaseDaoIntegrationTest.java where tables were not being cleared after each test

Testing

In addition to unit and integration tests passing, I have tested this in full dev with Elasticsearch. The testing approach is the same for each endpoint listed above: send a request and verify the document that is returned includes the expected change. Here are a couple of examples

Testing the patch endpoint

Send a patch request:

curl -X PATCH --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
  "guid": "1881af3c-88df-4e72-8e81-34538b6e774b",
  "patch": [
    {
      "op":"add",
      "path":"/newField",
      "value":"newValue"
    }
  ],
  "sensorType": "bro"
}' 'http://node1:8082/api/v1/update/patch'

The response should contain the full document along with the update, in this case a newField field added with a value of newValue:

{
  "timestamp": 1536185190979,
  "document": {
    "bro_timestamp": "1536182284.908185",
    "ip_dst_port": 8080,
    ...
    "newField": "newValue"
  },
  "guid": "1881af3c-88df-4e72-8e81-34538b6e774b",
  "sensorType": "bro"
}

Testing the metaalert create endpoint:

Create a metaalert:

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
  "alerts": [
    {
      "guid": "0588e788-3a91-4113-94a0-d80daec00d05",
      "sensorType": "snort"
    }
  ],
  "groups": [
    "group1"
  ]
}' 'http://node1:8082/api/v1/metaalert/create'

This should return the full metaalert along with the child alert:

{
  "timestamp": 1536203159546,
  "document": {
    "average": 10,
    "max": 10,
    "metron_alert": [
      {
        "msg": "'snort test alert'",
        ...
        "metaalerts": [
          "f65e4d0e-bdad-4ac7-ab22-4cf207c3fb2e"
        ]
      }
    ],
    "threat:triage:score": 10,
    "count": 1,
    "groups": [
      "group1"
    ],
    "sum": 10,
    "source:type": "metaalert",
    "min": 10,
    "median": 10,
    "guid": "f65e4d0e-bdad-4ac7-ab22-4cf207c3fb2e",
    "timestamp": 1536203159546,
    "status": "active"
  },
  "guid": "f65e4d0e-bdad-4ac7-ab22-4cf207c3fb2e",
  "sensorType": "metaalert"
}

I also tested every endpoint using the Alerts UI. Everything should continue working as expected.

Outstanding items

There are a couple of changes I'm not quite sure about and would like some input:

  • Are the changes made to the InMemoryUpdateDao classes reasonable?
  • The case of missing documents are handled in 2 different ways as far as I can tell. Either a null is returned or an empty Optional. I just happened to see the null approach first so that's the pattern I followed. However I think we may want to make this consistent and the Optional empty approach feels more appropriate to me. What do you think?
  • In the MultiIndexDao, updates are performed in parallel across the different Daos. Since we're assuming the updates should all be identical I'm returning the first one as the return value. Is there a better approach?

I've made some comments inline in this PR to highlight these and provide some context.

Fortunately I did not have to make any UI changes since the return value of these endpoints is ignored in most cases. We should have a follow on task to properly update the client side code to match these new endpoints. I believe there are a couple outstanding bugs related to displaying inconsistent results in the UI after updates.

I am still working on testing with Solr and adding more code comments to add some clarity around changes that may not be obvious.

Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.

In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:

For all changes:

  • Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
  • Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
  • Has your PR been rebased against the latest commit within the target branch (typically master)?

For code changes:

  • Have you included steps to reproduce the behavior or problem that is being changed or addressed?

  • Have you included steps or a guide to how the change may be verified and tested manually?

  • Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:

    mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh 
    
  • Have you written or updated unit tests and or integration tests to verify your changes?

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

  • Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via site-book/target/site/index.html:

    cd site-book
    mvn site
    

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.

* @return The updated metaalert with alerts added.
*/
@Override
public Document addAlertsToMetaAlert(String metaAlertGuid, List<GetRequest> alertRequests)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved from the Elasticsearch Dao implementation essentially making it the default method for adding alerts to metaalerts (the logic in the Solr implementation is different). This allows us to reuse this code in the InMemory implementation.

import java.io.IOException;
import java.util.List;

public class InMemoryMetaAlertRetrieveLatestDao implements MetaAlertRetrieveLatestDao {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class doesn't do much but is needed to compose an InMemoryMetaAlertUpdateDao.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding this comment to the javadoc for this class? I can imagine I will ask myself that question a few years down the road.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure no problem.

return createResponse;
public Document createMetaAlert(MetaAlertCreateRequest request)
throws InvalidCreateException, IOException {
return metaAlertUpdateDao.createMetaAlert(request);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we're delegating to the AbstractLuceneMetaAlertUpdateDao class instead of duplicating it.

import java.util.List;
import java.util.Optional;

public class InMemoryMetaAlertUpdateDao extends AbstractLuceneMetaAlertUpdateDao {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extending the AbstractLuceneMetaAlertUpdateDao class lets us reuse code from that implementation.

@@ -985,6 +1024,30 @@ protected boolean findCreatedDocs(List<GetRequest> getRequests)
throw new OriginalNotFoundException("Count not find guids after " + MAX_RETRIES + "tries");
}

@SuppressWarnings("unchecked")
protected void assertEquals(Map<String, Object> expected, Map<String, Object> actual) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is needed to handle type mismatches (Integer to Long and Float to Double) and child alert ordering.
These fields are not defined in the metaalert template so ES is guessing the type. When read back some field types are different vs when they were written.

@@ -61,8 +60,8 @@
0x54,0x79,0x70,0x65
};

@BeforeClass
public static void startHBase() throws Exception {
@Before
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tables need to be cleared after each test.

if (latest == null) {
return;
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here. We should never see a request to remove a comment from a doc that doesn't exist. So let's throw an exception with a message that we can use to debug the problem.

}

@Override
@SuppressWarnings("unchecked")
public void addCommentToAlert(CommentAddRemoveRequest request, Document latest) throws IOException {
public Document addCommentToAlert(CommentAddRemoveRequest request, Document latest) throws IOException {
if (latest == null || latest.getDocument() == null) {
throw new IOException("Unable to add comment to document that doesn't exist");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to match what I am suggesting we do in the Elasticsearch implementation. If we don't have a document, we should thrown an exception.

I'd also argue that the error message provides no useful information to help debug the problem. But that is a pre-existing condition. Would be nice to clean-up though.

throws IOException {
if (latest == null || latest.getDocument() == null) {
throw new IOException("Unable to remove comment document that doesn't exist");
}
List<Map<String, Object>> commentMap = (List<Map<String, Object>>) latest.getDocument().get(COMMENTS_FIELD);
// Can't remove anything if there's nothing there
if (commentMap == null) {
return;
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also treat this as an exceptional condition?

public Document addAlertsToMetaAlert(String metaAlertGuid, List<GetRequest> alertRequests)
throws IOException {

Document metaAlert = retrieveLatestDao
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need a null check here? What happens if there is no metaAlert?

Document metaAlert = retrieveLatestDao
.getLatest(metaAlertGuid, MetaAlertConstants.METAALERT_TYPE);
if (metaAlert == null) {
return false;
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also treat this as an exceptional condition?

if (latest == null) {
return;
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also treat this as an exceptional condition too?

throws IOException {
if (latest == null) {
return;
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also treat this as an exceptional condition too?

@@ -172,7 +174,7 @@ public void removeCommentFromAlert(CommentAddRemoveRequest request, Document lat
// Can't remove anything if there's nothing there
if (commentMap == null) {
LOG.debug("Provided alert had no comments to be able to remove from");
return;
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also treat this as an exceptional condition too?

@merrimanr
Copy link
Contributor Author

The latest commit updates the various places where looking up a document that doesn't exist returns null. Now an IOException is thrown with a helpful message and guid of the missing object. I added tests to cover these cases. I also added changes that will throw an exception when alerts to be add/removed don't exist. Previously that wasn't covered and the operation would succeed even though the missing alert wasn't added/removed.

For the parallel stream issue, I made it consistent with the rest of the class. It now uses the same approach as the getLatest method and uses the DocumentContainer class. As part of this I moved common code to a separate method.

Let me know what you think about these changes.


import static org.mockito.Mockito.mock;

public class ElasticsearchUpdateDaoTest extends UpdateDaoTest {
Copy link
Contributor

@nickwallen nickwallen Sep 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this test class for? There seems to be no actual tests. Maybe you forgot to push a commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out UpdateDaoTest. As I created tests for ElasticsearchUpdateDao, SolrUpdateDao, and HBaseDao I found myself copy/pasting the various tests. So I consolidated them into a single test. I believe there is value in having a single sets of tests for the UpdateDao interface methods that all the Daos must satisfy. This class sets up it's Dao implementation to be used in the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha. Could you add that to the javadoc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you missed the javadoc here.

} catch (Throwable e) {
return new DocumentContainer(e);
}
}).collect(Collectors.toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I like your approach.

}
}
}
}
Copy link
Contributor

@nickwallen nickwallen Sep 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we gain anything from a slight restructuring of your if/else chain that adds a sanity check? It might pay-off to be a little more defensive here, but I could be convinced otherwise.

if(dc.getException().isPresent()) { 
  ..
} else if(dc.getDocument.isPresent()) {
  ..
} else {
  throw new IllegalStateException("Expected either document or exception; got neither");
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. Will make that change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon further review, I'm not sure this is what we want. This caused an integration test to fail because the result from the first DAO did not contain a document or exception while the result from the second DAO did contain a document. Before this change, the document from the second DAO would have been returned. I think we want to return a document if possible, even if all DAOs are not consistent. Do you disagree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Interesting. I don't really understand what happens in that else condition then.

The OCD side of me wants to make sure I understand this to ensure there is not a pre-existing bug. But at the same time I can't fault you for this. All you did was refactor this code out into a method.

It seems that a DocumentContainer should either have an exception (dc.getException().isPresent()) or it should have a document (dc.getDocument.isPresent()). We would hit the else when neither of those are the case. So under what conditions would a DocumentContainer have neither of those?

I guess a null Document or null Throwable is added to the DocumentContainer? I hope this is what we expect to happen.

if(dc.getException().isPresent()) { 
  ..
} else if(dc.getDocument.isPresent()) {
  ..
} else {
 // when does this occur? what do we expect to happen?
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, don't mind working it out here and now. The case that causes the test to fail (although I'm not sure that was the intention of the test) is where 2 DAOs are contained in a MultiIndexDao and the first DAO returns an invalid result (no document or exception) while the second returns a valid result. I don't know what would cause a document to be missing both a document and an exception.

The question is what should happen when this unexpected condition does happen? Should it blow up and throw and exception or return a document if at least one DAO returns a valid one.

Copy link
Contributor

@nickwallen nickwallen Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is what should happen when this unexpected condition does happen? Should it blow up and throw and exception or return a document if at least one DAO returns a valid one.

Ah, OK. I'm groking it now. We want it to return null; at least based on the javadocs defined in RetrieveLatestDao.getLatest and other places as null means nothing was found. So functionally, we don't want it to throw that exception.

But IMO it might benefit from some clarity. I would suggest changing getDocument to getLatestDocument with some commentary and clarity on what it means if there is no Document or exception and what happens if there is an error in one of the underlying indices.

Feel free to tweak to your liking or disregard, but this is what I would do.

  /**
   * Returns the most recent {@link Document} from a list of {@link DocumentContainer}s.
   *
   * @param documentContainers A list of containers; each retrieved from a separate index.
   * @return The latest {@link Document} found.
   * @throws IOException If any of the {@link DocumentContainer}s contain an exception.
   */
  private Document getLatestDocument(List<DocumentContainer> documentContainers) throws IOException {
    Document latestDocument = null;
    List<String> error = new ArrayList<>();

    for(DocumentContainer dc : documentContainers) {
      if(dc.getException().isPresent()) {
        // collect each exception; multiple can occur, one in each index
        Throwable e = dc.getException().get();
        error.add(e.getMessage() + "\n" + ExceptionUtils.getStackTrace(e));

      } else if(dc.getDocument().isPresent()) {
        Document d = dc.getDocument().get();
        // is this the latest document so far?
        if(latestDocument == null || latestDocument.getTimestamp() < d.getTimestamp()) {
          latestDocument = d;
        }

      } else {
        // no document was found in the index
      }
    }
    if(error.size() > 0) {
      // report all of the errors encountered
      throw new IOException(Joiner.on("\n").join(error));
    }
    return latestDocument;
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Latest commit includes this change.

@merrimanr
Copy link
Contributor Author

@nickwallen, the latest commit should address your comments. Let me know what you think.

@nickwallen
Copy link
Contributor

+1 Looks great. Works great. I ran this up in Full Dev and tested manually through the Alerts UI. Looking forward to the follow-on UI changes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants