Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw/lua: allow read access to object data #47719

Merged
merged 1 commit into from Sep 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
118 changes: 85 additions & 33 deletions doc/radosgw/lua-scripting.rst
Expand Up @@ -6,18 +6,26 @@ Lua Scripting

.. contents::

This feature allows users to assign execution context to Lua scripts. The three supported contexts are ``preRequest``" which will execute a script before each
operation is performed, ``postRequest`` which will execute after each operation is performed, and ``background`` which will execute within a specified time interval.
A request context script may be constrained to operations belonging to a specific tenant's users.
The request context script can also access fields in the request and modify some fields. All Lua language features can be used.
This feature allows users to assign execution context to Lua scripts. The supported contexts are:

By default, all lua standard libraries are available in the script, however, in order to allow for other lua modules to be used in the script, we support adding packages to an allowlist:
- ``prerequest`` which will execute a script before each operation is performed
- ``postrequest`` which will execute after each operation is performed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest s/operation/RGW operation/

- ``background`` which will execute within a specified time interval
- ``getdata`` which will execute on objects' data when objects are downloaded
- ``putdata`` which will execute on objects' data when objects are uploaded

A request (pre or post) or data (get or put) context script may be constrained to operations belonging to a specific tenant's users.
The request context script can also access fields in the request and modify certain fields, as well as the `Global RGW Table`_.
The data context script can access the content of the object as well as the request fields and the `Global RGW Table`_.
All Lua language features can be used in all contexts.

By default, all Lua standard libraries are available in the script, however, in order to allow for other Lua modules to be used in the script, we support adding packages to an allowlist:

- All packages in the allowlist are being re-installed using the luarocks package manager on radosgw restart. Therefore a restart is needed for adding or removing of packages to take effect
- To add a package that contains C source code that needs to be compiled, use the `--allow-compilation` flag. In this case a C compiler needs to be available on the host
- Lua packages are installed in, and used from, a directory local to the radosgw. Meaning that lua packages in the allowlist are separated from any lua packages available on the host.
By default, this directory would be `/tmp/luarocks/<entity name>`. Its prefix part (`/tmp/luarocks/`) could be set to a different location via the `rgw_luarocks_location` configuration parameter.
Note that this parameter should not be set to one of the default locations where luarocks install packages (e.g. `$HOME/.luarocks`, `/usr/lib64/lua`, `/usr/share/lua`)
- To add a package that contains C source code that needs to be compiled, use the ``--allow-compilation`` flag. In this case a C compiler needs to be available on the host
- Lua packages are installed in, and used from, a directory local to the radosgw. Meaning that Lua packages in the allowlist are separated from any Lua packages available on the host.
By default, this directory would be ``/tmp/luarocks/<entity name>``. Its prefix part (``/tmp/luarocks/``) could be set to a different location via the ``rgw_luarocks_location`` configuration parameter.
Note that this parameter should not be set to one of the default locations where luarocks install packages (e.g. ``$HOME/.luarocks``, ``/usr/lib64/lua``, ``/usr/share/lua``).


.. toctree::
Expand All @@ -32,7 +40,7 @@ To upload a script:

::

# radosgw-admin script put --infile={lua-file} --context={preRequest|postRequest|background} [--tenant={tenant-name}]
# radosgw-admin script put --infile={lua-file} --context={prerequest|postrequest|background|getdata|putdata} [--tenant={tenant-name}]


* When uploading a script with the ``background`` context, a tenant name may not be specified.
Expand All @@ -42,14 +50,14 @@ To print the content of the script to standard output:

::

# radosgw-admin script get --context={preRequest|postRequest|background} [--tenant={tenant-name}]
# radosgw-admin script get --context={prerequest|postrequest|background|getdata|putdata} [--tenant={tenant-name}]


To remove the script:

::

# radosgw-admin script rm --context={preRequest|postRequest|background} [--tenant={tenant-name}]
# radosgw-admin script rm --context={prerequest|postrequest|background|getdata|putdata} [--tenant={tenant-name}]


Package Management via CLI
Expand Down Expand Up @@ -149,7 +157,7 @@ Request Fields
+----------------------------------------------------+----------+--------------------------------------------------------------+----------+-----------+----------+
| ``Request.Bucket.Tenant`` | string | tenant of the bucket | no | no | yes |
+----------------------------------------------------+----------+--------------------------------------------------------------+----------+-----------+----------+
| ``Request.Bucket.Name`` | string | bucket name (writeable only in `preRequest` context) | no | yes | no |
| ``Request.Bucket.Name`` | string | bucket name (writeable only in ``prerequest`` context) | no | yes | no |
+----------------------------------------------------+----------+--------------------------------------------------------------+----------+-----------+----------+
| ``Request.Bucket.Marker`` | string | bucket marker (initial id) | no | no | yes |
+----------------------------------------------------+----------+--------------------------------------------------------------+----------+-----------+----------+
Expand Down Expand Up @@ -306,12 +314,32 @@ Operations Log
~~~~~~~~~~~~~~
The ``Request.Log()`` function prints the requests into the operations log. This function has no parameters. It returns 0 for success and an error code if it fails.

Tracing
~~~~~~~
Tracing functions can be used only in the ``postrequest`` context.

- ``Request.Trace.SetAttribute(<key>, <value>)`` - sets the attribute for the request's trace.
The function takes two arguments: the first is the ``key``, which should be a string, and the second is the ``value``, which can either be a string or a number (integer or double).
You may then locate specific traces by using this attribute.

- ``Request.Trace.AddEvent(<name>, <attributes>)`` - adds an event to the first span of the request's trace
An event is defined by event name, event time, and zero or more event attributes.
The function accepts one or two arguments: A string containing the event ``name`` should be the first argument, followed by the event ``attributes``, which is optional for events without attributes.
An event's attributes must be a table of strings.

Background Context
--------------------
The ``background`` context may be used for purposes that include analytics, monitoring, caching data for other context executions.
- Background script execution default interval is 5 seconds.

Global ``RGW`` Table
Data Context
--------------------
Both ``getdata`` and ``putdata`` contexts have the following fields:
- ``Data`` which is read-only and iterable (byte by byte). In case that an object is uploaded or retrieved in multiple chunks, the ``Data`` field will hold data of one chunk at a time.
- ``Offset`` which is holding the offset of the chunk within the entire object.
- The ``Request`` fields and the background ``RGW`` table are also available in these contexts.

Global RGW Table
--------------------
The ``RGW`` Lua table is accessible from all contexts and saves data written to it
during execution so that it may be read and used later during other executions, from the same context of a different one.
Expand All @@ -330,19 +358,6 @@ to atomically increment and decrement numeric values in it. For that the followi
- if we try to increment or decrement by non-numeric values, the execution of the script would fail


Tracing
~~~~~~~
Tracing functions can be used only in `postRequest` context.

- ``Request.Trace.SetAttribute()`` - sets the attribute for the request's trace.
Takes two arguments. The first is the `key`, which should be a string. The second is the value, which can either be a string or a number.
Using the attribute, you can locate specific traces.

- ``Request.Trace.AddEvent()`` - adds an event to the first span of the request's trace
An event is defined by event name, event time, and zero or more event attributes.
Therefore, the function accepts one or two arguments. A string containing the event name should be the first argument, followed by the event attributes, which is optional for events without attributes.
An event's attributes must be a table of strings.

Lua Code Samples
----------------
- Print information on source and destination objects in case of copy:
Expand Down Expand Up @@ -419,15 +434,15 @@ Lua Code Samples

- Add metadata to objects that was not originally sent by the client:

In the `preRequest` context we should add:
In the ``prerequest`` context we should add:

.. code-block:: lua

if Request.RGWOp == 'put_obj' then
Request.HTTP.Metadata["x-amz-meta-mydata"] = "my value"
end

In the `postRequest` context we look at the metadata:
In the ``postrequest`` context we look at the metadata:

.. code-block:: lua

Expand All @@ -446,7 +461,7 @@ First we should add the following packages to the allowlist:
# radosgw-admin script-package add --package=luasocket --allow-compilation


Then, do a restart for the radosgw and upload the following script to the `postRequest` context:
Then, do a restart for the radosgw and upload the following script to the ``postrequest`` context:

.. code-block:: lua

Expand Down Expand Up @@ -483,20 +498,20 @@ Tracing is disabled by default, so we should enable tracing for this specific bu


If `tracing is enabled <https://docs.ceph.com/en/latest/jaegertracing/#how-to-enable-tracing-in-ceph/>`_ on the RGW, the value of Request.Trace.Enable is true, so we should disable tracing for all other requests that do not match the bucket name.
In the `preRequest` context:
In the ``prerequest`` context:

.. code-block:: lua

if Request.Bucket.Name ~= "my-bucket" then
Request.Trace.Enable = false
end

Note that changing `Request.Trace.Enable` does not change the tracer's state, but disables or enables the tracing for the request only.
Note that changing ``Request.Trace.Enable`` does not change the tracer's state, but disables or enables the tracing for the request only.


- Add Information for requests traces

in `postRequest` context, we can add attributes and events to the request's trace.
in ``postrequest`` context, we can add attributes and events to the request's trace.

.. code-block:: lua

Expand All @@ -511,3 +526,40 @@ in `postRequest` context, we can add attributes and events to the request's trac

Request.Trace.AddEvent("second event", event_attrs)

- The entropy value of an object could be used to detect whether the object is encrypted.
The following script calculates the entropy and size of uploaded objects and print to debug log

in the ``putdata`` context, add the following script

.. code-block:: lua

function object_entropy()
local byte_hist = {}
local byte_hist_size = 256
for i = 1,byte_hist_size do
byte_hist[i] = 0
end
local total = 0

for i, c in pairs(Data) do
local byte = c:byte() + 1
byte_hist[byte] = byte_hist[byte] + 1
total = total + 1
end

entropy = 0

for _, count in ipairs(byte_hist) do
if count ~= 0 then
local p = 1.0 * count / total
entropy = entropy - (p * math.log(p)/math.log(byte_hist_size))
end
end

return entropy
end

local full_name = Request.Bucket.Name.."\\"..Request.Object.Name
RGWDebugLog("entropy of chunk of: " .. full_name .. " at offset:" .. tostring(Offset) .. " is: " .. tostring(object_entropy()))
RGWDebugLog("payload size of chunk of: " .. full_name .. " is: " .. #Data)

5 changes: 3 additions & 2 deletions src/rgw/CMakeLists.txt
Expand Up @@ -162,8 +162,11 @@ set(librgw_common_srcs
rgw_datalog.cc
rgw_datalog_notify.cc
cls_fifo_legacy.cc
rgw_log.cc
rgw_lua_request.cc
rgw_lua_utils.cc
rgw_lua.cc
rgw_lua_data_filter.cc
rgw_bucket_encryption.cc
rgw_tracer.cc
rgw_lua_background.cc)
Expand Down Expand Up @@ -275,8 +278,6 @@ set(rgw_a_srcs
rgw_frontend.cc
rgw_http_client_curl.cc
rgw_loadgen.cc
rgw_log.cc
rgw_lua_request.cc
rgw_period_pusher.cc
rgw_realm_reloader.cc
rgw_realm_watcher.cc
Expand Down
11 changes: 7 additions & 4 deletions src/rgw/rgw_admin.cc
Expand Up @@ -121,6 +121,9 @@ static inline int posix_errortrans(int r)
return r;
}


static const std::string LUA_CONTEXT_LIST("prerequest, postrequest, background, getdata, putdata");

void usage()
{
cout << "usage: radosgw-admin <cmd> [options...]" << std::endl;
Expand Down Expand Up @@ -479,7 +482,7 @@ void usage()
cout << " --subscription pubsub subscription name\n";
cout << " --event-id event id in a pubsub subscription\n";
cout << "\nScript options:\n";
cout << " --context context in which the script runs. one of: preRequest, postRequest, background\n";
cout << " --context context in which the script runs. one of: "+LUA_CONTEXT_LIST+"\n";
cout << " --package name of the lua package that should be added/removed to/from the allowlist\n";
cout << " --allow-compilation package is allowed to compile C code as part of its installation\n";
cout << "\nradoslist options:\n";
Expand Down Expand Up @@ -10395,7 +10398,7 @@ int main(int argc, const char **argv)
}
const rgw::lua::context script_ctx = rgw::lua::to_context(*str_script_ctx);
if (script_ctx == rgw::lua::context::none) {
cerr << "ERROR: invalid script context: " << *str_script_ctx << ". must be one of: preRequest, postRequest, background" << std::endl;
cerr << "ERROR: invalid script context: " << *str_script_ctx << ". must be one of: " << LUA_CONTEXT_LIST << std::endl;
return EINVAL;
}
if (script_ctx == rgw::lua::context::background && !tenant.empty()) {
Expand All @@ -10417,7 +10420,7 @@ int main(int argc, const char **argv)
}
const rgw::lua::context script_ctx = rgw::lua::to_context(*str_script_ctx);
if (script_ctx == rgw::lua::context::none) {
cerr << "ERROR: invalid script context: " << *str_script_ctx << ". must be one of: preRequest, postRequest, background" << std::endl;
cerr << "ERROR: invalid script context: " << *str_script_ctx << ". must be one of: " << LUA_CONTEXT_LIST << std::endl;
return EINVAL;
}
auto lua_manager = store->get_lua_manager();
Expand All @@ -10441,7 +10444,7 @@ int main(int argc, const char **argv)
}
const rgw::lua::context script_ctx = rgw::lua::to_context(*str_script_ctx);
if (script_ctx == rgw::lua::context::none) {
cerr << "ERROR: invalid script context: " << *str_script_ctx << ". must be one of: preRequest, postRequest, background" << std::endl;
cerr << "ERROR: invalid script context: " << *str_script_ctx << ". must be one of: " << LUA_CONTEXT_LIST << std::endl;
return EINVAL;
}
auto lua_manager = store->get_lua_manager();
Expand Down
7 changes: 7 additions & 0 deletions src/rgw/rgw_common.h
Expand Up @@ -51,6 +51,10 @@ namespace rgw::sal {
using Attrs = std::map<std::string, ceph::buffer::list>;
}

namespace rgw::lua {
class Background;
}

using ceph::crypto::MD5;

#define RGW_ATTR_PREFIX "user.rgw."
Expand Down Expand Up @@ -1810,6 +1814,9 @@ struct req_state : DoutPrefixProvider {
//Principal tags that come in as part of AssumeRoleWithWebIdentity
std::vector<std::pair<std::string, std::string>> principal_tags;

rgw::lua::Background* lua_background = nullptr;
rgw::sal::LuaManager* lua_manager = nullptr;

Comment on lines +1817 to +1819
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

req_state(CephContext* _cct, RGWEnv* e, uint64_t id);
~req_state();

Expand Down
10 changes: 10 additions & 0 deletions src/rgw/rgw_lua.cc
Expand Up @@ -25,6 +25,12 @@ context to_context(const std::string& s)
if (strcasecmp(s.c_str(), "background") == 0) {
return context::background;
}
if (strcasecmp(s.c_str(), "getdata") == 0) {
return context::getData;
}
if (strcasecmp(s.c_str(), "putdata") == 0) {
return context::putData;
}
return context::none;
}

Expand All @@ -37,6 +43,10 @@ std::string to_string(context ctx)
return "postrequest";
case context::background:
return "background";
case context::getData:
return "getdata";
case context::putData:
return "putdata";
case context::none:
break;
}
Expand Down
3 changes: 3 additions & 0 deletions src/rgw/rgw_lua.h
Expand Up @@ -7,6 +7,7 @@
#include "common/dout.h"
#include "rgw_sal_fwd.h"

class DoutPrefixProvider;
class lua_State;
class rgw_user;
class DoutPrefixProvider;
Expand All @@ -21,6 +22,8 @@ enum class context {
preRequest,
postRequest,
background,
getData,
putData,
none
};

Expand Down
2 changes: 1 addition & 1 deletion src/rgw/rgw_lua_background.cc
Expand Up @@ -177,5 +177,5 @@ void Background::create_background_metatable(lua_State* L) {
create_metatable<rgw::lua::RGWTable>(L, true, &rgw_map, &table_mutex);
}

} //namespace lua
} //namespace rgw::lua

2 changes: 1 addition & 1 deletion src/rgw/rgw_lua_background.h
Expand Up @@ -226,5 +226,5 @@ class Background : public RGWRealmReloader::Pauser {
void resume(rgw::sal::Store* _store) override;
};

} //namepsace lua
} //namepsace rgw::lua