Skip to content

Added DB backend for elasticsearch database#97

Merged
matllubos merged 2 commits intomasterfrom
ElasticsearchBackend
Oct 15, 2021
Merged

Added DB backend for elasticsearch database#97
matllubos merged 2 commits intomasterfrom
ElasticsearchBackend

Conversation

@matllubos
Copy link
Collaborator

No description provided.

@matllubos matllubos force-pushed the ElasticsearchBackend branch 10 times, most recently from 39ff222 to ab85a08 Compare September 17, 2021 09:56
- pip install Django~=$DJANGO_VERSION

before_script:
- sleep 10
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from travis documentation

services:
- elasticsearch

before_install:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need specific version. Default is too old

return a + b

Task result will be automatically logged to the ``security.models.CeleryTaskLog``.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method was moved to django-celery-extension


with assert_raises(CommandError):
call_command('celery_health_check', max_created_at_diff=max_created_at_diff)
from .celery_log import CeleryLogTestCase
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all test was rewritten

@@ -0,0 +1,69 @@
from io import StringIO
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

several test helpers

from .models import CommandLog, CeleryTaskRunLog, CeleryTaskInvocationLog, InputRequestLog, OutputRequestLog


class store_elasticsearch_log(override_settings):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helper for project testing. We need separate index for parallel test processes. Therefore this context processor create new index for every log and remove it int the end of the test/block

def backend_receiver(signal):
def _decorator(func):
def _wrapper(*args, **kwargs):
if settings.BACKENDS is None or backend_name in settings.BACKENDS:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can turn of backend for storing. So you can read from log but not write in it (good for tests)

)
input_request_logger = getattr(request, 'input_request_logger', None)
if input_request_logger:
input_request_logger.update_extra_data({'debug_toolbar': toolbar.render_toolbar()})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe toolbar is still broken. I will check it in next pull request

from security.config import settings


class SecurityLogger(ContextDecorator, local):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every logger exteds this class. Active loggers creates tree (property loggers) therefore you now parent loger. (For example output request inside input request)

self.id = id or (uuid4() if self.name else None)
self.parent = SecurityLogger.loggers[-1] if SecurityLogger.loggers else None
self.related_objects = set(related_objects) if related_objects else set()
self.slug = slug
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related_objects, slug and extra_data is extended from parent logger

@matllubos matllubos force-pushed the ElasticsearchBackend branch from b8a7e17 to acf5680 Compare October 4, 2021 12:11

.. attribute:: SECURITY_BACKENDS

With this setting you can select which backends will be used to store logs. Default value is ``None`` which means all installed logs are used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you meant all installed backends are used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes sure thanks


.. attribute:: SECURITY_ELASTICSEARCH_DATABASE

Setting can be used to set ElasticSearch database configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elasticsearch


.. attribute:: SECURITY_ELASTICSEARCH_AUTO_REFRESH

Every write to the elasticsearch database will automatically call auto refresh.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elasticsearch


Every write to the elasticsearch database will automatically call auto refresh.

.. attribute:: SECURITY_LOG_STING_IO_FLUSH_TIMEOUT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really mean STING and not STRING?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes string thanks

class Command(BaseCommand):

def handle(self, **options):
requests.post('http://test.cz/test')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't try to access live servers in tests no matter what. Why not http://localhost?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the tests should mock requests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but that cannot be guaranteed, and in case of error, you might be touching live server.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure I will change it

databases = ['default', 'security']

@data_provider
def create_user(self, username='test', email='test@test.cz'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, I like test@localhost.

expected_run_succeeded_data) as run_succeeded_receiver, \
set_signal_receiver(celery_task_run_output_updated) as run_output_updated_receiver, \
set_signal_receiver(celery_task_run_failed) as run_failed_receiver, \
set_signal_receiver(celery_task_run_retried) as run_retried_receiver:
Copy link
Member

@rubickcz rubickcz Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, this is sort of clumsy. Wouldn't be better to just create some helper class, that would automatically register to all these signals, and have internal counters that would increment when signal called. Or something like that. I believe this repeated initialization (signal registering) is not necessary. Something like

with TestSignalReceiver() as receiver:
    ... # do your stuff
    assert_equal(receiver.calls['celery_task_run_succeeded'], 1)

or

assert_equal(receiver.calls, {
    'invocation_started_receiver' : 1,
    'run_output_updated_receiver' : 6,
    ...
})

The second way of asserting is even better, because it will fail if any unexpected signal is fired, so you don't have to explicitly assert signals that are not supposed to be sent. (the dict will only contain signals that were fired at least once)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you are right. I will try to use the decorator which I wrote for project log testing and yes I should rewrite the set_signal_receiver to this decorator. Thanks

from .models import InputRequestLog


class PerRequestThrottlingValidator(ThrottlingValidator):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these throttling classes, I think they should not be imported directly, but there should be some factory function/class, that will return the right validator according to used backend. From the prespective of project code, I think you don't need to worry which log backend is used in settings. Also, how do you determine, which validator to use when you have multiple backends is use? (for example elastic and SQL)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are true. This is next step. I was thinging about the same thing. But again I cannot do whole change in one pull request. Therefore I have next tasks which will be solve it. But yes this is very good point

& Q('range', start={'gte': timezone.now() - timedelta(seconds=self.timeframe)})
& Q('slug', slug=slug)
).count()
return count_same_requests < self.throttle_at
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this decision logic should be implemented in parent. The child classes should only implement counting of the requests, because that is the only thing that is different across the backends.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right. I wil solve this in next PR. I want to do some api functions in the backends which will return number of requests for some input (some univerzal filter api). And throttling validator will be only one. This is temporary solution

start__gte=timezone.now() - timedelta(seconds=self.timeframe),
slug=self.slug
).count()
return count_same_requests <= self.throttle_at
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the elastic validators, you have just the = operator here, why? That's why I suggested to unify this in the parent class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is problem with async saving to elasticsearch. In the RDS the result will contain currently logged request. In the elasticsearch DB the current request will not be returned.

@matllubos matllubos force-pushed the ElasticsearchBackend branch 2 times, most recently from 6a98de3 to ed9addd Compare October 12, 2021 12:27
@matllubos matllubos force-pushed the ElasticsearchBackend branch from ed9addd to e3ee0e5 Compare October 13, 2021 09:58
@matllubos matllubos merged commit 81204fa into master Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants