Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PyMISP argument to ignore dup indicators during ingestion with MISP server-side support #5257

Closed
github-germ opened this issue Oct 2, 2019 · 34 comments
Assignees
Labels
P: high Priority: high. This issue should be resolved as quickly as possible S: stale Status: stale. This issue has had no activity in a long time, it may not be relevant anymore T: feature request Type: feature request. This issue is requesting a new feature topic: API This issue involves API usage

Comments

@github-germ
Copy link

github-germ commented Oct 2, 2019

2.4.114

In order to increase performance in a ExpandedPyMISP client that is processing a large feed ingestion particularly into existing events that are large, it is more efficient to not first retrieve the entire event into the app to look for dups. Rather simple add the attribute, and deal with the error if it occurs.

I refactored two major feed ingestion clients with this approach and the processing time went for hours to minutes; so I was quite happy to deal with the error return. However, now the logs table is growing rather quickly.

This method does trigger an error return to the ExpandedPyMISP client, and also the insertion of a row into the database logs table -- example below.

Might it make sense to not trigger an error in this case, possibly adding an optional boolean argument to pymisp.add_attribute requesting the error but defaulted to False.

MariaDB [misp]> select * from logs where id=91023522;
+----------+--------------------------------------------------------------------------------------------------+---------------------+-----------+----------+--------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------+-------------+-----+
| id       | title                                                                                            | created             | model     | model_id | action | user_id | change                                                                                                                                                                                                                                                                                                                                    | email                      | org     | description | ip  |
+----------+--------------------------------------------------------------------------------------------------+---------------------+-----------+----------+--------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------+-------------+-----+
| 91023522 | Attribute dropped due to validation for Event 41069 failed: Network activity/ip-src 154.68.5.145 | 2019-10-02 15:05:22 | Attribute |        0 | add    |      45 | Validation errors: {"value":["A similar attribute already exists for this event."]} Full Attribute: {"uuid":"f2abb7ea-5a1b-43f9-b102-79893cf0ef52","type":"ip-src","value":"154.68.5.145","category":"Network activity","to_ids":1,"disable_correlation":0,"comment":"[10,2019100215]","event_id":"41069","object_id":0,"distribution":5} | api@local.net | X-Feed |             | ::1 |
+----------+--------------------------------------------------------------------------------------------------+---------------------+-----------+----------+--------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------+-------------+-----+
1 row in set (0.00 sec)

MariaDB [misp]>
@ancailliau
Copy link

I upvote, and would not raise an error when adding a tag that is already present, or saving an event where nothing changed.

@Rafiot Rafiot self-assigned this Nov 6, 2019
@github-germ
Copy link
Author

github-germ commented Nov 6, 2019

A few ideas:

  1. Add an arg to MISPEvent.add_attribute like ignore_dup=True to stop an error being thrown.
  2. Would also be nice if there was a way to simply revise the attribute timestamp when it's a dup, e.g. something like MISPEvent.add_attribute(..., ignore_dup=False, revise_timestamp_if_dup=True)
    • I tested calling add_attribute with the only diff with the existing attribute being a newer timestamp. The call succeeds; however, the WebUI only displays the prior/old attribute, while the db attributes table has both the new and old timestamped attributes in that event -- an interesting outcome.

@ancailliau
Copy link

ancailliau commented Nov 7, 2019 via email

@github-germ
Copy link
Author

A PyMISP client ingests hourly into fixed events from a premium feed provider. These events can contain 500k+ attributes.

Experience shows that having the client retrieve the full event with attributes in order to discover dups takes too much time.

Therefore, the app calls add_attribute not knowing if the attribute is new or a dup. This is much faster.

We want the attribute timestamp revised as this indicates the indicator is active at that time. For example, if the downstream data consumer needs a list of active indicators in the last 30 days, then the timestamp revision offers accurate results.

@github-germ
Copy link
Author

Our don't ask, don't tell use case is implemented with big performance improvement where:

  • don't ask -- the client does not retrieve the huge event from the server to dedup
  • don't tell -- client simply sends full list of attributes to the server and let's the server dedup.

HOWEVER, we get tons (could 1000s) of inserts in the logs table for the dups which eats cpu and disk space.

What about adding two options to add_attribute -- something like:

  • dup_log default True
  • dup_update_timestamp default False

Thanks for your consideration as this is acausing operational issues in our Production MISP instance.

@github-germ
Copy link
Author

Or another thought, offer a similar method as the Default Feed Delta Merge feature somehow for PyMISP ingestions -- not sure if this is done via PyMISP calls or perhaps as a new feature specified in the Event definition (new column added to events table?).

@Rafiot Rafiot removed their assignment Dec 16, 2019
@Rafiot Rafiot added the T: feature request Type: feature request. This issue is requesting a new feature label Dec 16, 2019
@Rafiot Rafiot changed the title Eliminate the error for a duplicate attribute add attempt into an event On-demand disable logging on API request Dec 16, 2019
@Rafiot
Copy link
Member

Rafiot commented Dec 16, 2019

Okay, I add it in the feature requests queue. It will be implemented as soon as possible, depending on the urgent/paid features requests we receive.

@enjeck enjeck added topic: API This issue involves API usage P: high Priority: high. This issue should be resolved as quickly as possible S: stale Status: stale. This issue has had no activity in a long time, it may not be relevant anymore and removed topic: api labels Nov 22, 2020
@github-germ github-germ changed the title On-demand disable logging on API request Add PyMISP argument to ignore dup indicators during ingestion Dec 11, 2020
@github-germ
Copy link
Author

github-germ commented Dec 11, 2020

Recap: Feature Request to add an argument, perhaps called ignore_dup to PyMISP add_object and add_attribute that will ignore duplicate attributes or objects and NOT log that as an error.

@chrisinmtown
Copy link

Extending PyMISP is the easy part. The server requires an extension, better make sure there's an issue open in MISP for that.

On the server I think this function needs to be extended to accept a new parameter to allow/ignore duplicate attributes:

public function captureAttribute($attribute, $eventId, $user, $objectId = false, $log = false, $parentEvent = false, &$validationErrors = false, $params = array())

I believe the attribute save command indicates error on dupe:

if (!$this->save($attribute, $params)) {

Maybe it's possible to look at the error here and skip over duplicates, but I don't know how, so I am not ready to propose a code patch here.

@github-germ github-germ changed the title Add PyMISP argument to ignore dup indicators during ingestion Add PyMISP argument to ignore dup indicators during ingestion with MISP server-side support Dec 16, 2020
@github-germ
Copy link
Author

I discovered issues with the server side MISPObject deduplication within an Event and have a proposed fix. Please see Proposed fix for deduplicating MISPObjects within an Event via breakOnDuplicate in #6838

@github-germ
Copy link
Author

github-germ commented Jan 12, 2021

Hi @Rafiot -- HNY!! Hope you are well...

I have tested successfully a simple patch to app/Model/Attribute.php as below by setting MISPAttribute.ignoreDup in the MISPEvent and calling update_event in a PyMISP client to prevent the entry for the attribute duplicate being saved into the logs table.

Would appreciate someone with greater expertise reviewing this patch proposal to ensure it's works in all cases.

Thanks!!

$ git diff Attribute.php
diff --git a/app/Model/Attribute.php b/app/Model/Attribute.php
index 8414482..b12e6f5 100644
--- a/app/Model/Attribute.php
+++ b/app/Model/Attribute.php
@@ -4313,6 +4313,9 @@ class Attribute extends AppModel
             $fieldList[] = 'object_relation';
         }
         if (!$this->save(array('Attribute' => $attribute), array('fieldList' => $fieldList))) {
+           if ($this->validationErrors['value'][0] == 'A similar attribute already exists for this event.' and array_key_exists('ignoreDup', $attribute) and $attribute['ignoreDup']) {
+                return 'Duplicate attribute';
+            }
             $attribute_short = (isset($attribute['category']) ? $attribute['category'] : 'N/A') . '/' . (isset($attribute['type']) ? $attribute['type'] : 'N/A') . ' ' . (isset($attribute['value']) ? $attribute['value'] : 'N/A');
             $this->Log = ClassRegistry::init('Log');
             $this->Log->create();
$ 

@github-germ
Copy link
Author

If this seems acceptable, based on the revision being considered in #6838 I'd suggest adding the same logic to not log a duplicate object if ignoreDup in two functions in MispObject.php:

  • saveObject
  • captureObject

Thanks.

@github-germ
Copy link
Author

Status?

@github-germ
Copy link
Author

@iglocska -- Hey Andras, were you able to close all the loose ends in the object dedup; if not, what remains?

@github-germ
Copy link
Author

Hi... was this completed?

@github-germ
Copy link
Author

Revisiting... We still see lots of the "A similar attribute already exists for this event. being logged. Is there any hope of not logging these, perhaps as suggested above

@github-germ
Copy link
Author

Hi @iglocska ... We still see these in 2.4.164 (we check after each MISP release =8^). Should we just forget about this being changed? Thx...

@github-germ
Copy link
Author

Issue persists in 2.4.165.
If there's no intent on addressing this, let me know. Thx.

@github-germ
Copy link
Author

@iglocska I thought maybe you'd get to this one. Perhaps if you're not going to work it, you can close this ticket as "wontdo"?

Thx.

@github-germ
Copy link
Author

This persists in 2.4.169. Gonna do anything with this one :) ? ... If not close, and I'll stop checking. Thanks!

@adulau
Copy link
Member

adulau commented Mar 16, 2023

@righel can you have a look?

@iglocska
Copy link
Member

It makes sense to me. We can simply stop the logging if the flag is present (sorry @github-germ - completely missed this one before).

@github-germ
Copy link
Author

Thanks @iglocska --no worries, it's not critical but would be nice to have.

@righel
Copy link
Contributor

righel commented Mar 23, 2023

Hello @github-germ
Could you check if the changes implemented in these PRs would solve your issue?
#8980
MISP/PyMISP#958

@github-germ
Copy link
Author

github-germ commented Mar 24, 2023

@righel Excellent! I will have time to build and test this next week, and will report back. THANKS.

@github-germ
Copy link
Author

@righel What MISP branch should I build my docker image with, and which PyMISP for client? Hope to get to this within the next week. Thanks again.

@righel
Copy link
Contributor

righel commented Mar 30, 2023

@github-germ
Copy link
Author

@righel Thanks for your patience. IT WORKS!!

Can this be merged in and available in 2.4.170?
Much appreciate you hitting this old request :-)

@adulau
Copy link
Member

adulau commented Apr 12, 2023

@righel Thanks for your patience. IT WORKS!!

Can this be merged in and available in 2.4.170? Much appreciate you hitting this old request :-)

Yes it will be released in 2.4.170 which will be released tomorrow.

@github-germ
Copy link
Author

@righel I am revisiting this now that 2.4.170 is installed here in the context of my client PyMISP apps. Indeed using pymisp.add_attribute(event, attribute, break_on_duplicate=0 works well by revising the attribute timestamp and not inserting the dup message into the db logs table.

  • Cool. Thanks!

Questions:

  1. Is an attribute deemed a dup if the value is the same even if first_seen or last_seen is different?
  2. Is it possible to also support breakOnDup for other use cases than pymisp.add_attribute? If I'm understanding, that's what your change offered, i.e. set it when pushing an attribute direct into MISP.
    2.1 When we were working on dedup for object with Bug with deduplicating MISPObjects within an Event via breakOnDuplicate #6917, the use case there was setting breakOnDuplicate on the MISPEvent or on a MISPObject
    2.2 For example... Can we also set breakOnDuplicate on a MISPAttribute?
    2.3. Here's a use case: Retrieve all attributes into memory for an existing event via pymisp.search, revising a subset of those attributes in-memory in the MISPEvent.Attribute list, and when calling pymisp.update_event those attributes that have breakOnDuplicate = False will not trigger the message inserted into the logs db table? (this would be especially helpful, if an attribute is deemed a dup event when first_seen or last_seen are different)

@righel
Copy link
Contributor

righel commented Apr 25, 2023

Hello

  1. Yes, but if a new/different first_seen or last_seen value is provided it will update the existing record. It's essentially an upsert operation.
  2. It is possible, yes.
    2.1. Unrelated but, I included some fixes for breakOnDuplicate logic for MISPObject, I presume it was not working as intended in some cases.
    2.2 & 2.3. There was a internal discussion with the team and we decided to not support the breakOnDuplicate as a property, and only as a url parameter, so that the rule applies to all the attributes in the operation and not just a subset.

@github-germ
Copy link
Author

@righel Really appreciate your swift replies: thanks!!

  1. So with an upsert like first_seen or last_seen if we do not want the db logs insert, we should set breakOnSuplicate to false
  2. Am I understanding your correctly that since breakOnDuplicate is not a MISPAttribute property, that the only method to enable an upsert without logs insert on what would otherwise trigger dup processing is the direct pymisp.add_attribute, i.e. there's not way to do that with members of a MISPEvent.Attribute list in memory, pushing those via pymisp.update_event? If that's the case, our concern is all the atomic pymisp.add_attribute can become a performance issue for our huge events for example where we need to revise a large set of last_seen.

@righel
Copy link
Contributor

righel commented Apr 25, 2023

  1. Yes.
  2. Yes and No, you can still add a batch of attributes using that pymisp.add_attribute endpoint, it supports arrays (or at least the endpoint does), but not via pymisp.update_event as it does not support the breakOnDuplicate flag.

@github-germ
Copy link
Author

@righel ... Many Thanks!!! I think I now understand how we can take advantage of this feature in our code in a future release. I'll close this ticket, and only if I have more questions come back with a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P: high Priority: high. This issue should be resolved as quickly as possible S: stale Status: stale. This issue has had no activity in a long time, it may not be relevant anymore T: feature request Type: feature request. This issue is requesting a new feature topic: API This issue involves API usage
Projects
None yet
Development

No branches or pull requests

8 participants