Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode 'task_args' and 'task_kwargs' in 'store_result'. #78

Merged
merged 1 commit into from
Oct 20, 2020

Conversation

arnau126
Copy link
Contributor

@arnau126 arnau126 commented Nov 14, 2018

Encode task_args and task_kwargs in DatabaseBackend.store_result() (as we do with meta and result) so we can retrieve them as python objects later on.

Note that this PR changes the serialization format of these fields.
For example, if args=['a', 1, True, None]
Currently it's stored as :
"['a', 1, True, None]"
and after this PR it will be stored as:
'["a", 1, true, null]' (json format)

@auvipy
Copy link
Member

auvipy commented Nov 14, 2018

Is this a backward incompatible change?

@arnau126
Copy link
Contributor Author

arnau126 commented Nov 14, 2018

I think that this change won't break any main functionality, as these fields are just informative or for debugging.
As I said the difference is the way they are stored in the DB:
"['a', 1, True, None]" vs. '["a", 1, true, null]''

For me it's not a problem that the storing format changes.

Maybe it's a problem for someone who has a script or something that parse and use these fields. However, they are quite new (the deploy was 2 days ago).

@kiyoya
Copy link

kiyoya commented Nov 16, 2018

Could we make it an option?

Yes, the new release was landed a few days ago. However the feature was first merged several months ago (#36) and not a few people have been using HEAD instead of 1.0.1 (#61).

@arnau126
Copy link
Contributor Author

I don't like the idea to make it an option. I think that not encoding these fields is a bug, because serializing a list or a dict using str() force you to use eval() to retrieve them as python objects, which is not safe.

@kiyoya
Copy link

kiyoya commented Nov 17, 2018

It's a bit off-topic but there is ast.literal_eval for parsing strings like task_args.

Yes, I agree thattask_args and task_kwargs should have been implemented like this in the first place. Unfortunately they weren't, however, and I'm not sure if it is worth making an incompatible change while there is a safe way to parse them.

Another possibility is to introduce new fields in parallel.

django_celery_results/backends/database.py Show resolved Hide resolved
t/unit/backends/test_database.py Outdated Show resolved Hide resolved
Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rebase before you proceed :)

@arnau126 arnau126 force-pushed the task_args branch 3 times, most recently from ab04076 to 5822d22 Compare May 15, 2019 08:10
@arnau126
Copy link
Contributor Author

Rebased

@auvipy
Copy link
Member

auvipy commented May 15, 2019

Rebased

let's reach a consensus for the changes before the merge.

@arnau126
Copy link
Contributor Author

I'm pretty sure that the code works.
The important issue here is the backward incompatibility, which @kiyoya and I already discussed.
I'm willing to hear other points of view.

@auvipy
Copy link
Member

auvipy commented May 15, 2019

Keeping BC is very important.

@arnau126
Copy link
Contributor Author

arnau126 commented May 15, 2019

So the options are:

  • Make it an option (maybe a setting).
  • Introduce new fields in parallel.
  • Wait until a major release where backward incompatibilities are allowed.
  • Don't do it at all (reject the PR).

Let me know if you come up with other ideas.

@arnau126 arnau126 force-pushed the task_args branch 7 times, most recently from 9bb9578 to 2607311 Compare May 15, 2019 14:19
@auvipy
Copy link
Member

auvipy commented May 15, 2019

So the options are:

  • Make it an option (maybe a setting).
  • Introduce new fields in parallel.
  • Wait until a major release where backward incompatibilities are allowed.
  • Don't do it at all (reject the PR).

Let me know if you come up with other ideas.

I would wait for celery 5 and this package 2.0 version

@KevinPoole
Copy link

It's a bit off-topic but there is ast.literal_eval for parsing strings like task_args.

Yes, I agree thattask_args and task_kwargs should have been implemented like this in the first place. Unfortunately they weren't, however, and I'm not sure if it is worth making an incompatible change while there is a safe way to parse them.

Another possibility is to introduce new fields in parallel.

Looks like this particular PR has already been settled as "not yet" but I just wanted to comment on the above: ast.literal_eval is insufficient for cases where, for example, pickle has been used for serialization and there exists some CustomClass objects in the kwargs. In this case ast.literal_eval chokes invalid syntax when you get to the repr string for the objects.

@ZviBaratz
Copy link
Contributor

Is there anything I can do to help move this forward?

@arnau126 arnau126 force-pushed the task_args branch 3 times, most recently from 3d5237e to b1c96f9 Compare October 10, 2020 09:42
@arnau126
Copy link
Contributor Author

Hope that now that Celery 5 is released, it's a good moment to update this package to version 2.0, and merge this PR.

@auvipy
Copy link
Member

auvipy commented Oct 13, 2020

is it possible to not to break existing systems?

@arnau126
Copy link
Contributor Author

arnau126 commented Oct 13, 2020

This PR will only change how new TaskResult instances will store task_args and task_kwargs. Already existing instances will keep the current format.
So the problem is that we will have old instances with these fields serialized using str() and new instances with these fields serialized using json.dumps()

I already suggested a few options:

  • Make it an option (maybe a setting).
  • Introduce new fields in parallel.
  • Wait until a major release where backward incompatibilities are allowed.
  • Don't do it at all (reject the PR).

However, I think that store a list or a dict using str() is not a good practice, so it shouldn't be an option at all.
That's why I would wait until a major release.
And I think that now is a good moment to do it: celery 5 is released, django1.11, py27 and py35 support has been dropped...

@johnthealy3
Copy link

I just encountered this after solving a similar problem trying to use the JSONField (I was using the postgres contrib one but now merged in Django 3.1: https://docs.djangoproject.com/en/3.1/ref/contrib/postgres/fields/#jsonfield) for task_args and task_kwargs. Perhaps now is the time, with an option or with Django version checking?

Happy to provide some code if this approach has potential.

@ZviBaratz
Copy link
Contributor

@arnau126 I tried to check out this PR and I'm still getting task_args and task_kwargs encoded as strings. Any idea what I'm missing?

@arnau126
Copy link
Contributor Author

arnau126 commented Oct 19, 2020

task_args and task_kwargs still are TextField fields, so when you retrieve a TaskResult instance from DB they are strings. To cast them to list or dict you have to do json.loads.

@ZviBaratz
Copy link
Contributor

@arnau126 that makes sense, sorry 😅

@auvipy
Copy link
Member

auvipy commented Oct 20, 2020

I am accepting this PR, but new fields / better solutions using django new fields etc is wellcome

@auvipy auvipy merged commit 0d90df3 into celery:master Oct 20, 2020
@robvdl
Copy link

robvdl commented Nov 26, 2020

So 2.0.0 didn't fix the CVE yet? issue #154 when I saw a new major release I had my hopes up the CVE was fixed but I don't think it has yet. Has version 2.0.0 been flagged yet or is it a matter of time that version gets flagged too?

@arnau126
Copy link
Contributor Author

@robvdl , this PR wasn't meant to fix #154 at all, but to change the way the input arguments were stored (use json.dumps instead of str).

arnau126 added a commit to lead-ratings/django-celery-results that referenced this pull request Nov 27, 2020
arnau126 added a commit to lead-ratings/django-celery-results that referenced this pull request Nov 27, 2020
@arnau126 arnau126 deleted the task_args branch November 27, 2020 11:37
@igorvoltaic
Copy link

igorvoltaic commented Jan 25, 2021

Encode task_args and task_kwargs in DatabaseBackend.store_result() (as we do with meta and result) so we can retrieve them as python objects later on.

Note that this PR changes the serialization format of these fields.
For example, if args=['a', 1, True, None]
Currently it's stored as :
"['a', 1, True, None]"
and after this PR it will be stored as:
'["a", 1, true, null]' (json format)

Seems like not really working. This is what I need to do to get back task arguments.

>>> from django_celery_results.models import TaskResult
>>> import ast
>>> import json
>>> TaskResult.objects.last()
<TaskResult: <Task: c4e59e69-d324-44fd-95ad-1a8c210fa2bd (SUCCESS)>>
>>> t = TaskResult.objects.last()
>>> t.task_args
'"({\'dataset_id\': 2, \'height\': 2600, \'width\': 2600, \'plot_type\': \'bar\', \'params\': {\'x\': \'month\', \'y\': \'passengers\', \'hue\': None}, \'columns\': [\'month\', \'passengers\'], \'id\': 1},)"'
>>> json.loads(t.task_args)
"({'dataset_id': 2, 'height': 2600, 'width': 2600, 'plot_type': 'bar', 'params': {'x': 'month', 'y': 'passengers', 'hue': None}, 'columns': ['month', 'passengers'], 'id': 1},)"
>>> ast.literal_eval(json.loads(t.task_args))
({'dataset_id': 2, 'height': 2600, 'width': 2600, 'plot_type': 'bar', 'params': {'x': 'month', 'y': 'passengers', 'hue': None}, 'columns': ['month', 'passengers'], 'id': 1},)

>>> ast.literal_eval(json.loads(t.task_args))[0]
{'dataset_id': 2, 'height': 2600, 'width': 2600, 'plot_type': 'bar', 'params': {'x': 'month', 'y': 'passengers', 'hue': None}, 'columns': ['month', 'passengers'], 'id': 1}

Do I correctly understand this commit was aimed to allow simply do json.loads(t.task_args) ?

@auvipy
Copy link
Member

auvipy commented Jan 25, 2021

this is a more recent PR #178 and @arnau126 opened a new PR in celery as well celery/celery#6595

@arnau126
Copy link
Contributor Author

Do I correctly understand this commit was aimed to allow simply do json.loads(t.task_args) ?

Yes, you understand correctly.

Seems like not really working

You're right, unfortunately not working, because of the new Celery task_protocol=2 and the hide-sensitive-information feature

I have a work-in-progress PR (celery/celery#6595) to fix it (to allow simply do json.loads(t.task_args)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants