Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-string attributes are allowed but not supported by schema #907

Open
1 of 4 tasks
yankovs opened this issue Feb 3, 2024 · 3 comments
Open
1 of 4 tasks

Non-string attributes are allowed but not supported by schema #907

yankovs opened this issue Feb 3, 2024 · 3 comments
Labels
type:bug Something isn't working zone:backend MWDB backend core related tasks zone:frontend MWDB frontend tasks
Milestone

Comments

@yankovs
Copy link
Contributor

yankovs commented Feb 3, 2024

Environment information

  • MWDB version (from /about): 2.10.3
  • Installation method:
    • mwdb.cert.pl service
    • From PyPi (pip install mwdb-core)
    • From docker-compose
    • Other (please explain)
  • Plugins installed:

Behaviour the bug (what happened?)

Attribute value should be a string, as defined in the schema:

class MetakeyValueSchema(Schema):
value = fields.Str(required=True, allow_none=False)
@validates("value")
def validate_value(self, value):
if not value:
raise ValidationError("Value shouldn't be empty")

However, the introduction of the rich attributes feature opened the possibility for attributes that aren't strings. To do this, all we need to do is define a rich attribute template of the form {{value}}. Then we can put in the value section of the attribute some non-string value, for example the number 8, and what will happen is that the object will have the integer value 8 as an attribute value. Not a string but a number. This can be verified by the query functionality, you can treat this attribute as a number in a query and use feature like number ranges for it. This also works for float, so a value of 5.76 works as expected when doing this trick.

The problem here is that as seen in the schema, attribute values are expected to be strings and this can lead to weird behavior when using this trick. Let's say I define an attribute called vt-detections for samples, this is naturally an integer. And let's say I use this trick to make it treated like an integer. Then, when a sample comes in and it has 0 VT detections, it will fail the schema validation. In validate_value we have value==0 thus not value == True and this results in a nice error message. Despite 0 being a legitimate value in this case.

Expected behaviour

I think that instead of enforcing users not to use non-string values, this should be embraced and added as a system feature. I would do the following:

  • Change this attribute value validation to a switch statement based on the type returned by isinstance. Have different validators for strings, integers, floats, and rich attributes
  • Add support for non-strings in the UI, right now when you manually add an attribute through the UI it only has the option of string or object

Also, maybe add to an each attribute its (user defined) type in the DB to prevent the user from having multiple different types. For example not to have some attribute sometimes as a string and sometimes as an integer, as this makes a difference in searching.

Reproduction Steps

  • Add a new rich attribute with the template {{value}}
  • Try to add it to a sample through the UI as an object with the value 0
  • Be greeted with a validation error
@psrok1
Copy link
Member

psrok1 commented Feb 19, 2024

@yankovs Actually they're supported but by Attribute API (https://github.com/CERT-Polska/mwdb-core/blob/master/mwdb/schema/attribute.py#L25-L32). And range queries for integer values are already supported by attribute.*:... query field!

Metakey API you're referencing is an old thing that should not be used and will be removed in next major version.

Add support for non-strings in the UI, right now when you manually add an attribute through the UI it only has the option of string or object

Object is actually the non-string type you're looking for. I see that right now UI is pretty confusing and it's too easy to add numerical IDs both as strings and numbers.

@psrok1
Copy link
Member

psrok1 commented Feb 23, 2024

Ah and I see another problem. Yes, 0 as a number is considered an empty value

image

@psrok1 psrok1 added type:bug Something isn't working zone:backend MWDB backend core related tasks zone:frontend MWDB frontend tasks labels Feb 23, 2024
@yankovs
Copy link
Contributor Author

yankovs commented Mar 1, 2024

@psrok1 Thanks for taking the time to look into this! Your solution solves the issue I presented in this issue.

However, I still think that in general a better validation/typing model should take place for attributes. Let's say I want to have an attribute for PE files called compilation_timestamp. With current MWDB my options are basically either to save a timestamp (integer) or a formatted string of this timestamp (string). With both approaches I don't have:

  • validation of the date, even basic one
  • ability to work with date queries, doing attribute.compilation_timestamp>=1970-01-01 makes no sense

The issue is more complex when taking a look at a json attribute. This compilation timestamp might come as a key-value pair in a more general case:

{
    ...
    "compilation_timestamp": ...
    ...
}

@psrok1 psrok1 added this to the v2.12.0 milestone Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working zone:backend MWDB backend core related tasks zone:frontend MWDB frontend tasks
Projects
None yet
Development

No branches or pull requests

2 participants