Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw/lua: allow access to object data #46550

Merged
merged 1 commit into from Aug 15, 2022
Merged

Conversation

yuvalif
Copy link
Contributor

@yuvalif yuvalif commented Jun 7, 2022

Signed-off-by: Yuval Lifshitz ylifshit@redhat.com

testing

  • using vstart cluster: MON=1 OSD=1 MDS=0 MGR=0 RGW=1 ../src/vstart.sh -n -d
  • sample script (data.lua):
RGWDebugLog("Data[0]="..tostring(Data[0]))
RGWDebugLog("Data[-1]="..tostring(Data[-1]))
RGWDebugLog("Data[99]="..tostring(Data[99]))
RGWDebugLog("Data[1]="..tostring(Data[1]))
RGWDebugLog("Data[8]="..tostring(Data[8]))
RGWDebugLog("Data[9]="..tostring(Data[9]))
RGWDebugLog("#Data"..tostring(#Data))

RGWDebugLog("ipairs");
for i, v in ipairs(Data) do
  RGWDebugLog("Data["..tostring(i).."]="..v)
end

RGWDebugLog("pairs");
for i, v in pairs(Data) do
  RGWDebugLog("Data["..tostring(i).."]="..v)
end
  • upload script: bin/radosgw-admin script put --infile=./data.lua --context=putData
  • local file to upload (data.txt):
ABCDEFG
  • create bucket, then upload and download an object:
aws --endpoint-url http://localhost:8000 s3 mb s3://kaboom
aws --endpoint-url http://localhost:8000 s3 cp data.txt s3://kaboom
aws --endpoint-url http://localhost:8000 s3 cp s3://kaboom/data.txt .

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

src/rgw/rgw_lua_filter.h Outdated Show resolved Hide resolved
src/rgw/rgw_lua_filter.h Outdated Show resolved Hide resolved
src/rgw/rgw_op.h Show resolved Hide resolved
src/rgw/CMakeLists.txt Outdated Show resolved Hide resolved
src/rgw/rgw_lua_filter.cc Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Jun 8, 2022

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@yuvalif
Copy link
Contributor Author

yuvalif commented Jun 8, 2022

comments from RGW refactoring call:

  • split to 2 contexts (putData and getData)
  • add a parameter to block the request from the lua script
  • no need to pass offset and length into lua

doc/radosgw/lua-scripting.rst Outdated Show resolved Hide resolved
doc/radosgw/lua-scripting.rst Outdated Show resolved Hide resolved
doc/radosgw/lua-scripting.rst Outdated Show resolved Hide resolved
doc/radosgw/lua-scripting.rst Outdated Show resolved Hide resolved
Tracing functions can be used only in `postRequest` context.

- ``Request.Trace.SetAttribute()`` - sets the attribute for the request's trace.
Takes two arguments. The first is the `key`, which should be a string. The second is the value, which can either be a string or a number.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First bit of line 322 isn’t a sentence. Suggest

“the request’s trace, and takes two arguments.”

Also if `key` is in backticks, shouldn’t `value` be as well?

Maybe clarify if “number” means an integer, a real, or either?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hate to quibble, but I think you added "of" instead of "or"

doc/radosgw/lua-scripting.rst Outdated Show resolved Hide resolved
doc/radosgw/lua-scripting.rst Outdated Show resolved Hide resolved
yuvalif added a commit to yuvalif/ceph that referenced this pull request Jul 11, 2022
(will squash before merge)

Signed-off-by: yuval Lifshitz <ylifshit@redhat.com>
@yuvalif yuvalif changed the title [WIP] rgw/lua: allow access to object data rgw/lua: allow access to object data Jul 11, 2022
@yuvalif
Copy link
Contributor Author

yuvalif commented Jul 11, 2022

jenkins test make check

doc/radosgw/lua-scripting.rst Outdated Show resolved Hide resolved
}
if (run_lua) {
filter = &*run_lua;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we want to add this lua filter last so its data isn't transformed by compression or encryption? have you tested these combinations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running lua on compressed and encrypted data won't be useful.
this is why i added it first in the "put" case.
if someone wants to encrypt or compress the after lua read it, there should not be any limitation.

anyway, will test to see that it works, and update the documentation accordingly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running lua on compressed and encrypted data won't be useful.
this is why i added it first in the "put" case.

right, i just think you have it backwards. this code is adding wrapper filters around the object processor, so the filter added last is the one on the outside, so it sees the data first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the "get" case:
https://github.com/ceph/ceph/pull/46550/files#diff-319900fb50bdb02677039fb81949243b9c0e7d98690f6fddf35e6ca5d5d52b2cR2308 ?
here i added the lua filter last, should i switch it to be first (so it runs last)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "get" case looks correct as-is. there we're wrapping filters around send_response_data(). you're adding the lua filter before decompress/decrypt filters, so your lua filter should see the data in plain-text. it's still worth testing to confirm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "get" case was reading the compressed data.
i moved it to be first in 3ca1a48
so it is executed last

Global ``RGW`` Table
Data Context
--------------------
Both ``getdata`` and ``putdata`` contexts has a single field named ``Data`` which is read only, optional and iterable (byte by byte).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably worth noting that these scripts only see one chunk of data per execution, right?

src/rgw/rgw_lua_data_filter.cc Show resolved Hide resolved

- ``Request.Trace.SetAttribute(<key>, <value>)`` - sets the attribute for the request's trace.
The function takes two arguments: the first is the ``key``, which should be a string, and the second is the ``value``, which can either be a string or a number (integer or double).
Using the attribute, you can locate specific traces.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may then locate specific traces by using this attribute.


- ``Request.Trace.AddEvent(<name>, <attributes>)`` - adds an event to the first span of the request's trace
An event is defined by event name, event time, and zero or more event attributes.
Therefore, the function accepts one or two arguments. A string containing the event ``name`` should be the first argument, followed by the event ``attributes``, which is optional for events without attributes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function accepts one or two arguments: A string

Global ``RGW`` Table
Data Context
--------------------
Both ``getdata`` and ``putdata`` contexts has a single field named ``Data`` which is read only, optional and iterable (byte by byte).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/has/have/
s/read only/read-only/

Copy link
Contributor

@anthonyeleven anthonyeleven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs lgtm, a couple of small requests but not enough to block approval.

A request (pre or post) or data (get or put) context script may be constrained to operations belonging to a specific tenant's users.
The request context script can also access fields in the request and modify certain fields, as well as the `Global RGW Table`_.
The data context script can access the content of the object as well as the request fields and the `Global RGW Table`_.
All Lua language features can be used in all contexts.

By default, all lua standard libraries are available in the script, however, in order to allow for other lua modules to be used in the script, we support adding packages to an allowlist:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lua capitalized?

By default, this directory would be `/tmp/luarocks/<entity name>`. Its prefix part (`/tmp/luarocks/`) could be set to a different location via the `rgw_luarocks_location` configuration parameter.
Note that this parameter should not be set to one of the default locations where luarocks install packages (e.g. `$HOME/.luarocks`, `/usr/lib64/lua`, `/usr/share/lua`)
By default, this directory would be ``/tmp/luarocks/<entity name>``. Its prefix part (``/tmp/luarocks/``) could be set to a different location via the ``rgw_luarocks_location`` configuration parameter.
Note that this parameter should not be set to one of the default locations where luarocks install packages (e.g. ``$HOME/.luarocks``, ``/usr/lib64/lua``, ``/usr/share/lua``)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Period at the end?

@@ -511,3 +526,39 @@ in `postRequest` context, we can add attributes and events to the request's trac

Request.Trace.AddEvent("second event", event_attrs)

- Calculate the entropy and size of uploaded objects and print to debug log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, how is this entropy value useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be used to detect encryption of objects. added this to the doc

The data context script can access the content of the object as well as the request fields and the `Global RGW Table`_.
All Lua language features can be used in all contexts.

By default, all Lua standard libraries are available in the script, however, in order to allow for other lua modules to be used in the script, we support adding packages to an allowlist:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a second s/lua/Lua/ on this line :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed (together with some more lower case "lau") - hope this is the last round :-)

@yuvalif
Copy link
Contributor Author

yuvalif commented Jul 19, 2022

jenkins test api

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
@yuvalif
Copy link
Contributor Author

yuvalif commented Aug 7, 2022

jenkins test make check

@yuvalif
Copy link
Contributor Author

yuvalif commented Aug 7, 2022

http://pulpito.front.sepia.ceph.com/yuvalif-2022-08-07_13:01:20-rgw:verify-wip-yuval-lua-filter-distro-default-smithi/
teuthology is passing with valgrind issue

  • out of 24 failures, 22 are valgrind (the known boost/asio ones)
  • one failure is "reached maximum tries (90) after waiting for 540 seconds"
  • one failure is an selinux check: "SELinux denials found on ubuntu@smithi117.front.sepia.ceph.com"

@yuvalif yuvalif removed the needs-qa label Aug 7, 2022
@cbodley
Copy link
Contributor

cbodley commented Aug 8, 2022

@yuvalif please run the full rgw suite with --subset ~1/3, there are important regression tests in the other subsuites!

@yuvalif
Copy link
Contributor Author

yuvalif commented Aug 9, 2022

@yuvalif please run the full rgw suite with --subset ~1/3, there are important regression tests in the other subsuites!

i ran with:
teuthology-suite --ceph-repo https://github.com/ceph/ceph-ci.git --subset 1/3 -p 75 -s rgw --ceph wip-yuval-lua-filter -m smithi
it is running 58 jobs, is this enough?

@yuvalif yuvalif merged commit ad51b94 into ceph:main Aug 15, 2022
@@ -2084,6 +2086,20 @@ int RGWGetObj::get_data_cb(bufferlist& bl, off_t bl_ofs, off_t bl_len)
return send_response_data(bl, bl_ofs, bl_len);
}

int RGWGetObj::get_lua_filter(std::unique_ptr<RGWGetObj_Filter>* filter, RGWGetObj_Filter* cb) {
std::string script;
const auto rc = rgw::lua::read_script(s, store, s->bucket_tenant, s->yield, rgw::lua::context::getData, script);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this broke the build on main. it looks like a conflict with a1e21d0 which changed the signature of read_script()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not clear how we're supposed to get that LuaManager all the way into RGWGetObj/PutObj. in the meantime, i prepared a revert at #47612

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants