Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for multiple filenames with same part name #28

Merged
merged 10 commits into from Jun 24, 2020
Merged

Added support for multiple filenames with same part name #28

merged 10 commits into from Jun 24, 2020

Conversation

nikolaydubina
Copy link
Contributor

@nikolaydubina nikolaydubina commented May 13, 2020

Problem

It is not uncommon to attach multiple files at once to the same multipart field. In particular, we found this issue at @fundingasiagroup in one of our plugins. When multiple files are attached in request to the same part name, behavior is unstable. For example in this scenario,

  • if set_simple is not used, multipart data encode and decode works correctly
  • if set_simple is used, one of the files is missing after decode

This is happening because "data" structure holds all data, yet "indexes" structure overwrites indexes for files since they share part name.

Fix

If name of part repeats, append _part_number_ with number of part to the name in "indexes" structure.

Compatibility

  • When this rock is used without repeating "name" parts, all API and behavior is the same as before.
  • This works if "filename" repeats or skipped.
  • This works for any field, not necessary files, although files would be primary scenario according to IETF standards.

Test plan

  • existing tests pass
  • added new test case

More examples

Here is HTTP output of how Postman would encode it:

Screenshot 2020-05-13 at 5 38 14 PM

POST ZZZZZ
Host: FFFFF
Authorization: Bearer  XXXX
Cookie: __cfduid=YYYYYYYContent-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW

----WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="files"; filename="file2.txt"
Content-Type: text/plain

(data)
----WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="files"; filename="file3.txt"
Content-Type: text/plain

(data)
----WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="files"; filename="file1.txt"
Content-Type: text/plain

(data)
----WebKitFormBoundary7MA4YWxkTrZu0gW

@bungle
Copy link
Member

bungle commented May 13, 2020

@nikolaydubina, one question: does this work for all fields or just file fields? It would be best if it works for all, as there is nothing special in my opinion with files.

@nikolaydubina
Copy link
Contributor Author

nikolaydubina commented May 13, 2020

Since it relies on "filename" for key, this would work only for files.

It seems that "name" is supposed to be unique in form data RFC2388. And widely used format for multiple files is to have same "name" 4.3 RFC7578, with filename could be omitted sometimes 4.2 RFC7578. In later case, this code would not work. Moreover, case when "filename" and "name" are the same but content is different is also easy to imagine.

Perhaps a better way would be to add counter to part key instead of filename. This would be more generic and solve issues above. I will update PR.

@nikolaydubina
Copy link
Contributor Author

Hi @bungle, I have updated the PR to handle case you described and fixed few edge cases.

@nikolaydubina
Copy link
Contributor Author

ping @bungle

@nikolaydubina
Copy link
Contributor Author

Hi @subnetmarco @bungle, I appreciate if you guys review and hopefully merge this fix, in that case we would not need to maintain our fork of this repo.

@bungle
Copy link
Member

bungle commented Jun 10, 2020

@nikolaydubina, I try to get it in next few days.

@bungle
Copy link
Member

bungle commented Jun 10, 2020

@nikolaydubina can you give example of how user uses this addition to read multiple values if the field is posted multiple times with same name? I see tests that use something called internal etc., but how does this look on public facing api, like get_all and get (perhaps it needs parameter to return something like an array of values or something?

@nikolaydubina
Copy link
Contributor Author

nikolaydubina commented Jun 11, 2020

Thanks for getting back @bungle. Good call. I have added tests for interface.

@bungle
Copy link
Member

bungle commented Jun 11, 2020

I see, thanks! While the suffix _part_number_<n> is one solution, what happens if user sends files_part_number_2 and two files. I know it is a made up, and not real world problem. But what I was thinking, could we have:

local files = m:get_as_array("files")
print(files[1].value)
print(files[2].value)

local single = m:get_as_array("single")
print(single[1].value)

and

local data = m:get_all_with_arrays()
print(data.files[1])
print(data.files[2])
print(data.single)

Or something like that?

@bungle
Copy link
Member

bungle commented Jun 12, 2020

@nikolaydubina what do you think about the ^

@nikolaydubina
Copy link
Contributor Author

nikolaydubina commented Jun 12, 2020

Hi, if it was only up to me, I void keep original version. However, since you guys maintain Kong you may know better and take precautions. API you suggested sounds reasonable, I will update PR. One clarification, fields that repeat will not be present in :get and :get_all right?

@bungle
Copy link
Member

bungle commented Jun 12, 2020

One clarification, fields that repeat will not be present in :get and :get_all right?

Fields that repeat should stay the same as before (with get / get_all), was it so it takes first or last value (and ignores others, I'd prefer if it just took the first but let's keep the original)?

@bungle
Copy link
Member

bungle commented Jun 12, 2020

The:

local data = m:get_all_with_arrays()

Woud effectively be the same as:
https://github.com/openresty/lua-nginx-module#ngxreqget_post_args
or
https://github.com/openresty/lua-nginx-module#ngxreqget_headers

(but for multipart/form-data input)

When it comes to multiple values (e.g. same name multiple times -> lua array, otherwise string)

@bungle
Copy link
Member

bungle commented Jun 17, 2020

@nikolaydubina, I think if you could make those changes, we could merge this. I will then prepare a new release. Thank you! Let me know if there are issues with my proposal.

@nikolaydubina
Copy link
Contributor Author

Thanks for checking on this! Will update PR sometime this week. Was sidetracked by other projects.

@nikolaydubina
Copy link
Contributor Author

nikolaydubina commented Jun 24, 2020

Hello @bungle, I have updated PR. Now it does not have "_part_number" anymore. All parts have array (table) of indexes. I also added two API changes you requested:

  1. get returns first value for part, and get_all returns mapping part_name -> first part value
  2. get_as_array and get_all_with_arrays return arrays. However, if part name does not repeat, then I still return as array, IMO this is better for users since it is clear from function name that it should return array, if it returns not an array it is confusing.

A couple of methods like delete and set_simple has to be updated to accomodate multiple indexes.

Updates tests. Added test case for new functions. Added whitebox test asserts for indexes.

@bungle
Copy link
Member

bungle commented Jun 24, 2020

Perhaps we need two then:

get_all_with_arrays -- only return repeating as arrays

and

get_all_as_arrays -- return everything as arrays

Where with one works the same as:
https://github.com/openresty/lua-nginx-module#ngxreqget_headers

Right now we don't have equivalent for that, and it would be expected by OR users.

@nikolaydubina
Copy link
Contributor Author

Yes this is better. Added this API and test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants