Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process JSON Body of ListObjects using streaming json parser #254

Closed
wants to merge 1 commit into from

Conversation

ilackarms
Copy link

This PR changes Kubeclient::get_entities as follows:

  1. Use RestClient::Request.execute in place of RestClient::Resource.get to read HTTP Response body in chunks.
  2. Use Json::Streamer to parse chunks as a JSON stream.
  3. If the caller passes a block, we yield entities as they are parsed from the stream of json. Otherwise we collect entities and return them as a Kubeclient::Common::EntityList as before.

@ilackarms
Copy link
Author

@agrare please review; @simon3z cc

@moolitayer
Copy link
Collaborator

moolitayer commented Jun 29, 2017

Just noticed ClientMixin uses rest-client while WatchStream uses http.
Can we get rid of one of the two? Also it seems WatchStream was created for stream processing, maybe the new code should live there?

We should probably have one implementation for streaming.

@ilackarms
Copy link
Author

@moolitayer it's certainly possible to use RestClient in place of HTTP:Client in WatchStream. However I think that is outside the scope of this PR

@ilackarms
Copy link
Author

@moolitayer kubeclient.get_entities parses the nested json inside of a single large kubernetes list object (as a single json object), while watchstream.each parses json objects as strings separated by newlines. there's probably some overlapping functionality that can be refactored there, but they will require different implementations

@@ -115,7 +117,7 @@ def handle_exception
end
err_message = json_error_msg['message'] || e.message
error_klass = e.http_code == 404 ? ResourceNotFoundError : HttpError
raise error_klass.new(e.http_code, err_message, e.response)
raise error_klass.new((e.http_code || 404), err_message, e.response)
Copy link
Member

@NickLaMuro NickLaMuro Jun 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix some of the rubocop errors you are getting you might want to consider a helper method for this:

-  error_klass = e.http_code == 404 ? ResourceNotFoundError : HttpError
-  raise error_klass.new(e.http_code, err_message, e.response)
+  raise http_error(err_message, e.response, e.http_code)

+def http_error(err_message, response, http_code = 404)
+  error_klass = http_code == 404 ? ResourceNotFoundError : HttpError
+  error_klass.new(http_code, err_message, response)
+end

EDIT: Had to fix some errors in my psuedo code. (please use with caution)

@ilackarms ilackarms force-pushed the streamingjson branch 3 times, most recently from 28c4b41 to 468415e Compare June 29, 2017 17:25
@agrare
Copy link
Member

agrare commented Jun 29, 2017

@ilackarms @abonas The interface here is great, without a block this behaves exactly as it used to and with a block I can parse entities one at a time.

err_message = json_error_msg['message'] || e.message
error_klass = e.http_code == 404 ? ResourceNotFoundError : HttpError
raise error_klass.new((e.http_code || 404), err_message, e.response)
end
Copy link
Collaborator

@moolitayer moolitayer Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The net code change here is

- raise error_klass.new(e.http_code, err_message, e.response)
+ raise error_klass.new((e.http_code || 404), err_message, e.response)

looks good

@@ -30,4 +30,5 @@ Gem::Specification.new do |spec|
spec.add_dependency 'rest-client'
spec.add_dependency 'recursive-open-struct', '~> 1.0.4'
spec.add_dependency 'http', '~> 2.2.2'
spec.add_dependency 'json-streamer'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this library has 1 contributor and from my impression low activity/users/forks etc.
why was this specific one selected?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the only one I'm aware of that provides the high-level utility of parsing whole objects out of streaming json bodies. I agree that it's a small project and preferable to use something with a bit of a community around it.

there is https://github.com/dgraham/json-stream which has more stars (if that's an indicator or anything). a bit lower level than json-streamer, it parses tokens from a streaming body. However in order to use it, we'd effectively have to duplicate the work that json-streamer takes care of for us (filtering objects by nesting level, key name, etc.).

I'm open to changing it to a more active/popular project. Do you have any suggestions?

Copy link
Member

@abonas abonas Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unfamiliar so can't recommend anything specific. But I think that it should be taken into consideration shall fixes/bugs will be needed in that lib, and there's only one contributor/maintainer (so there's a risk that he/she also stop developing it in any moment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would you recommend?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json-streamer seems to be based on json-stream if I understand correctly[1]
(It also depends on it) makes sense not to re-implement it. The only dependency it bring in is json-stream which seems to me of high quality.

@cben @jrafanie @grosser @kirs an emoji of a penny for your thoughts on adding this dependency?

💰
(closest thing I have)

[1] dgraham/json-stream#4 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cben personally i like the way the current patch is designed. one option i would consider is yielding chunks to the passed block if as: :raw option is specified, so the caller can handle parsing on its own. however I'm a big fan of decoupling; we should allow kubeclient to abstract the kube API as much as possible from callers, so they do not have to concern themselves with "low level" things like parsing JSON, specifying API endpoints, etc. just my $0.02

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilackarms @abonas @cben @jrafanie @kirs @moolitayer

Hi fellow developers,

I'm the creator and maintainer of the above mentioned json-streamer gem.

Just wanted to let you know that I'm actively keeping an eye on my open source projects. This one did not require much attention lately hence the last release date is May 2017. I will take a look at the issue you raised. If it's reproducible it shouldn't take long to provide a fix.

Regarding features: IMO it fills a big hole that the currently available JSON parsers leave open and does that in a neat way. I found no similar implementation when I started but json_stream_trigger is actually a good catch. However it doesn't seem to be maintained. I will consider supporting JSONPath in the future.

Regarding community: I know quite a few use it and the number increases steadily since it's likely the best option out there. I'm not sure though how easy it is to find out about it. Since @dgraham decided not to mention it as an alternative to json-stream it's not really advertised.

Regarding maintenance: I'm open to move the gem to an organization if you have any suggestions. For now all you have is my word. :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: it looks like we will not be able to use json-streamer to do what we want here. it's currently possible to yield objects one-by-one from the items array in a kubernetes entity list, under the assumption that items is the only top-level array found in the JSON structure. it's not currently possible to specify the array by key name (see discussion at thisismydesign/json-streamer#7 (comment)).

thus we have two options: implement our own json-streamer type of gem, or accept the workaround which would break if Kubernetes change the JSON structure of *list objects.

i would be in favor of writing our own gem, but this will be additional tech debt for us to maintain

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought of a solution since: thisismydesign/json-streamer#7 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cben appears to be working now. updated! 👍


def http_err(json_error_msg, e)
err_message = json_error_msg['message'] || e.message
error_klass = e.http_code == 404 ? ResourceNotFoundError : HttpError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way not to hardcode 404 here and in the row below?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when using Restclient::Request.execute (line 254), RestClient doesn't throw errors on 404 / Resource Not Found. I think looking for a 404 status is the most consistent way to check that the requested resource is not found here. Other than checking for Not Found in the response string (which may or may not be present).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abonas was talking about extracting 404 to a constant if I understand correctly. I'm okay with both ways.

@@ -1,4 +1,4 @@
# Kubernetes REST-API Client
module Kubeclient
VERSION = '2.3.0'.freeze
VERSION = '2.4.0'.freeze
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version bumps should be a separate change

@@ -233,6 +233,27 @@ def create_rest_client(path = nil)
RestClient::Resource.new(@api_endpoint.merge(path).to_s, options)
end

def do_rest_request(args)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one looks really similar to create_rest_client. Is there anyway to reuse it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting the common options should be enough actually (common_options?)

Also consider keeping the pattern we have in this file of passing path and headers later on and memoize the client if possible.

rest_client[ns_prefix + resource_name]
  .get({ 'params' => params }.merge(@headers))
end

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can probably switch to using do_rest_request everywhere.
except rest_client, create_rest_client are public, in theory someone might depend on them.

another direction: you only did this to be able to pass :block_response, right?
I think RestClient::Resource can pass through any option, including :block_response...
https://github.com/rest-client/rest-client/blob/master/lib/restclient/resource.rb#L51

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(to clarify, I don't mind the textual duplication, but I'd rather avoid going through 2 code paths in RestClient.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cben I agree that 2 code paths are bad. Unfortunately, you can't pass :block_response to RestClient.get, et al. (see https://github.com/rest-client/rest-client#block_response-receives-raw-nethttpresponse), so all the other code paths would have to be refactored to use RestClient::Request.execute rather than create_rest_client as they currently do. Since the scope of this change would be pretty large, I didn't want to include it in my PR.

Copy link
Collaborator

@cben cben Sep 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Agreed, rewriting everything to use execute is riskier change, might eventually be worth it, but not for this PR. 👍

Gemfile Outdated
@@ -1,4 +1,6 @@
source 'https://rubygems.org'

gem 'json-streamer', :git => "git://github.com/thisismydesign/json-streamer"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we depending on source and not on a gem @ilackarms

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6d46d6d I saw it was removed in another commit (which also looks redundant)

@@ -114,8 +116,8 @@ def handle_exception
{}
end
err_message = json_error_msg['message'] || e.message
error_klass = e.http_code == 404 ? ResourceNotFoundError : HttpError
raise error_klass.new(e.http_code, err_message, e.response)
error_klass = e.http_code == 404 || e.instance_of?(RestClient::NotFound) ? ResourceNotFoundError : HttpError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is getting a little too complicated for a ternary expression.
Consider switching it to a dumb if

assert_equal(1, pods.size)
assert_equal('Kubeclient::Pod', pods[0].class.to_s)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this changes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a kubernetes PodList object has a field Kind which is essentially its class name. Since we're no longer returning a PodList (if block_given?), Kind is no longer available to us. Since i removed the line assert_equal('Pod', pods.kind), I wanted to preserve the logic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh so this is for the old code
👍

@ilackarms
Copy link
Author

@abonas is this ok for merge?

@abonas
Copy link
Member

abonas commented Aug 22, 2017

@abonas is this ok for merge?

I did not see comments from 8 days ago adressed, so I don't think it's ready for merge yet.

resource_version_streamer = Json::Streamer::JsonStreamer.new

resource_version_streamer.get(key: 'resourceVersion') do |value|
resource_version = value unless resource_version
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably avoid the streamer in requests that don't ask for it.
(for example if it is broken it should not effect other requests)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean, use the old model (synchronous GET)? how should the caller ask for it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user did not pass a block we probably want a synchronous request.
See: #263 (comment)

In that case we probably want the resource version code to behave as before

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moolitayer updated

@@ -30,4 +30,5 @@ Gem::Specification.new do |spec|
spec.add_dependency 'rest-client'
spec.add_dependency 'recursive-open-struct', '~> 1.0.4'
spec.add_dependency 'http', '~> 2.2.2'
spec.add_dependency 'json-streamer'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json-streamer seems to be based on json-stream if I understand correctly[1]
(It also depends on it) makes sense not to re-implement it. The only dependency it bring in is json-stream which seems to me of high quality.

@cben @jrafanie @grosser @kirs an emoji of a penny for your thoughts on adding this dependency?

💰
(closest thing I have)

[1] dgraham/json-stream#4 (comment)

@@ -263,7 +284,8 @@ def watch_entities(resource_name, options = {})
# :namespace - the namespace of the entity.
# :label_selector - a selector to restrict the list of returned objects by their labels.
# :field_selector - a selector to restrict the list of returned objects by their fields.
def get_entities(entity_type, klass, resource_name, options = {})
def get_entities(entity_type, klass, resource_name, options = {}, &block)
return get_entities_async(klass, resource_name, options, &block) if block_given?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

refute_empty(services)
assert_equal(2, services.size)
assert_instance_of(Kubeclient::Service, services[0])
assert_instance_of(Kubeclient::Service, services[1])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind checking one service against it's expected value?
Just to be on the safe side

assert_equal(1, pods.size)
assert_equal('Kubeclient::Pod', pods[0].class.to_s)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh so this is for the old code
👍

@@ -233,6 +233,27 @@ def create_rest_client(path = nil)
RestClient::Resource.new(@api_endpoint.merge(path).to_s, options)
end

def do_rest_request(args)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting the common options should be enough actually (common_options?)

Also consider keeping the pattern we have in this file of passing path and headers later on and memoize the client if possible.

rest_client[ns_prefix + resource_name]
  .get({ 'params' => params }.merge(@headers))
end

raise ResourceNotFoundError.new(404,
'Not Found',
response)
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test for the negative flow?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some tests to handle exceptions / bad status codes

block: block, method: :get, headers: @headers, params: params)
end

return if block_given?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this true always?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, removed


def http_err(json_error_msg, e)
err_message = json_error_msg['message'] || e.message
error_klass = e.http_code == 404 ? ResourceNotFoundError : HttpError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abonas was talking about extracting 404 to a constant if I understand correctly. I'm okay with both ways.

@moolitayer
Copy link
Collaborator

moolitayer commented Sep 5, 2017

@ilackarms can you add a short README note?

@@ -286,6 +308,45 @@ def get_entities(entity_type, klass, resource_name, options = {})
Kubeclient::Common::EntityList.new(entity_type, resource_version, collection)
end

def get_entities_async(klass, resource_name, options = {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method really async? Or it's more of a lazy execution? Usually async is a sign that the procedure is using threads, which I think is not the case here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's async in that it's non-blocking. entities are yielded to the block passed by the user as they are read from a streaming HTTP response

@ilackarms
Copy link
Author

@abonas can this be re-reviewed?

Copy link
Collaborator

@cben cben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very close 👍

Some petty comments, only significant is proposal to change what's yielded.

Also, for each new method, consider if it can be private. Anything public we'll have to keep compatible...

# result['items'] might be nil due to https://github.com/kubernetes/kubernetes/issues/13096
entity_streamer.get(nesting_level: 2, yield_values: false) do |object|
entity = new_entity(object, klass)
yield(entity, resource_version)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still unsure about this API of yielding entity, resource_version.

  • extensibility could also be done by yielding more args in future (ignored by ruby blocks/procs, unless one uses & with a lambda)
  • this code assumes metadata like "resourceVersion": will appear in JSON before "items": [...]. We already discussed it offline, it's true in practice, I doubt k8s will ever break this assumption. But since then it keeps nagging me to hardwire such assumption into the shape of our API...
  • there's also kind this omits. and apiVersion that regular get_entities doesn't expose either but could be handy. and concievably more in future...

Idea: what if this yielded just entity, but at the end returned a "skeletal" variant on EntityList which had attributes .resourceVersion, .kind but did not contain items?

  • More uniform API, extensible.
  • We only expose the metadata after all JSON has been processed.
    I'm not sure if that's good enough, or it's actually useful to have these while processing...

raise HttpError.new(code, resp.message, resp)
end

def common_options
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider rest_client_options — they're common to the 2 rest-client code paths, but we also have http_options for the 3rd http gem path (used by watch)...

}.merge(common_options)
options[:block_response] = args[:block] if args[:block]
resp = RestClient::Request.execute(options)
return unless resp.instance_of?(Net::HTTPClientError)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not returning the response?
Does Request.execute return a Net::HTTPClientError outside of block_response mode?
=> Both OK as it's the only use case for this method. I'd make it unconditionally expect a block.

[style: early return usually used for problems, with flat fallthrough to happy path. doesn't matter.]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cben I haven't tested it but I expect that it would; I don't think block_response changes the behavior in this regard.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cben i originally used an if block to check for error and fallthrough on the happy path, and rubocop complained that i should replace the if block with a guard clause...

end

# result['items'] might be nil due to https://github.com/kubernetes/kubernetes/issues/13096
entity_streamer.get(nesting_level: 2, yield_values: false) do |object|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, can this be narrowed to only 'items'? if k8s adds another top-level array, it will confuse us.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried doing this and discovered (what i believe is) a bug with json-streamer: thisismydesign/json-streamer#7

though the bug possibly resides with json-stream itself. it may require forking json-streamer or re-implementing some of its features as mentioned above :(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that was fixed and you can bump the dependency?
Need not block this, if still in flux/untested, I'm fine with merging this and improving later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!


block = proc do |response|
if response.instance_of?(Net::HTTPNotFound)
raise ResourceNotFoundError.new(404,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're now checking both NotFound here and any error in do_rest_request.
can you check in one place, e.g. using
error_klass = e.http_code == 404 ? ResourceNotFoundError : HttpError

stub_request(:get, %r{/api/v1$})
.to_return(body: open_test_file('core_api_resource_list.json'), status: 200)
stub_request(:get, %r{/services})
.to_return(body: '{"err": "I\'m a teapot!"}', status: 418)
Copy link
Collaborator

@cben cben Sep 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆 🍵 👍

@ilackarms ilackarms force-pushed the streamingjson branch 4 times, most recently from 90deb68 to b7f8758 Compare September 19, 2017 14:33
method: args[:method],
headers: (args[:headers] || {}).merge('params' => args[:params])
}.merge(rest_client_options)
options[:block_response] = args[:block] if args[:block]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: To avoid confusion let's call it block_response everywhere.

Because Request.execute supports both regular ruby block and a :block_response param, with different semantics (regular block gets RestClient::Response after redirect/error processing/decoding).
https://github.com/rest-client/rest-client/blob/v2.0.2/lib/restclient/request.rb#L718-L726

@ilackarms ilackarms force-pushed the streamingjson branch 3 times, most recently from 1d5cfc5 to 302ba10 Compare September 25, 2017 20:39
Copy link
Collaborator

@cben cben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to keep bugging you, almost there.

Github won't let me continue the compression discussion.

https://github.com/rest-client/rest-client/blob/v2.0.2/lib/restclient/request.rb#L802-L803
BTW, another processing we're skipping is Content-encoding. What happens when k8s will enable compression?
(Note Request.decode is gone in upcoming rest-client 2.1: rest-client/rest-client#597)

You said

it looks like RestClient automatically handles decompression

So it depends on version. Try this:

RestClient::Request.execute(method: 'get', url: 'https://httpbin.org/gzip', block_response: proc {|r| pp(r); pp(r.to_hash); r.read_body {|b| puts "> " + b }})

2.1.0rc1 has Net::HTTP handle it so we get nice string (I wonder if really streamed or one ; but rest-client 2.0.2 gives binary mess :-(
We shouldn't bump to 2.1 until it's released as stable.
Perhaps disable compression in streaming mode, for now? (should also confirm if decompression is done streamed or all at once...)

response)
end
response.read_body do |chunk|
entity_streamer.parser << chunk
Copy link
Collaborator

@cben cben Sep 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍
Now we have 3 parsers, but in practice only 1 parser the big items part assuming kind and version come early.

(I tried now using 1 parser with 3 callbacks, and it worked! But json-streamer code doesn't document/test for such usage and not sure from code why it works, so let's not do that. Opened thisismydesign/json-streamer#9 to confirm => UPDATE: 1 parser only supports 1 callback.)

end

{
resource_version: resource_version,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is .resourceVersion on EntityList, should be camelCased here too for consistency :(

I'd also make this a struct/object for more similar API, so both support .kind access.

@ilackarms ilackarms force-pushed the streamingjson branch 3 times, most recently from d4737a1 to bdec9c3 Compare September 27, 2017 20:11
@ilackarms
Copy link
Author

can i get some help with the rubocop failures here?

lib/kubeclient/common.rb:477:5: C: Metrics/CyclomaticComplexity: Cyclomatic complexity for get_entities_async is too high. [9/8]
    def get_entities_async(klass, resource_name, options = {})
    ^^^

lib/kubeclient/common.rb:477:5: C: Metrics/PerceivedComplexity: Perceived complexity for get_entities_async is too high. [9/7]
    def get_entities_async(klass, resource_name, options = {})
    ^^^

seem difficult to fix. i'm not using a lot of conditionals; i don't see a clean way to fix these complaints.

lib/kubeclient/common.rb:488:57: W: Lint/UnusedBlockArgument: Unused block argument - object. If it's necessary, use _ or _object as an argument name to indicate that it won't be used.
      conditions.yield_object = lambda do |aggregator:, object:|
                                                        ^^^^^^

changing the naming of or remobing this variable causes json-streamer to fail parsing. not sure if it's worth opening an issue and requesting changes there in order to make this rubocop failure go away

@enoodle
Copy link
Contributor

enoodle commented Jan 8, 2018

@moolitayer @cben @abonas I see that this is held up due to the rubocop issues with the longer functions. It seems that the functions get_entities_async and some naming problem with json-streamer. are there other issues holding this ? Do we want to reopen this to split this function and see if we can do anything with the argument name?

@moolitayer
Copy link
Collaborator

Do we want to reopen this to split this function and see if we can do anything with the argument name?

That's up to you if you have a use case for this feature. Are you aware of #262 ?
Not the same functionality but might alleviate a similar problem.

@cben
Copy link
Collaborator

cben commented Jan 9, 2018

Our use case in ManageIQ remains, to reduce peak memory usage processing large responses like /api/pods and /oapi/images. (Our processing is item-by-item.)
However as Nick pointed out in #262, interleaving reading / parsing / processing may keep the server connection open for a long time, and it may be nicer to slurp the whole raw output into string or temp file, and then stream-parse however we want, outside of kubeclient.
I also wanted to experiment with avoiding RecursiveOpenStruct.
There was also an idea that simply popping items off the array kubeclient returns as we process them could be a significant win, as RecursiveOpenStruct starts reasonably compact and inflates as you access data inside it.
We don't have concrete numbers to compare these potential wins...

=> IMHO, kubeclient would benefit from a convenient stream parsing API, the API here is good, and I'd like to land this 👍
But if anyone has concerns, #262 makes it easier to experiment outside kubeclient, so we could do that and come back with more data.

@cben cben mentioned this pull request Jan 10, 2018
@cben
Copy link
Collaborator

cben commented Jan 10, 2018

Just learned that k8s finally got response pagination/chunking. Opened #283.
We'll definitely want chunking (it has more benefits on server).
Do we want this too? It'd be useful for older k8s that don't support chunking, but perhaps "use :raw and stream parse yourself" is good enough?

I'd like to understand first what API(s) we'd expose for chunking...

@agrare
Copy link
Member

agrare commented Jan 15, 2018

That's really cool @cben ! I think chunking will cover our needs from the parser side, so we should do similar perf tests using json stream parser with chunking to see if there is any benefit.

As for the API it could be the same as it is here?

@cben
Copy link
Collaborator

cben commented Oct 28, 2018

I'm going to close this for now.
We're getting chunking (#356) which is mostly better, and I feel the total code complexity isn't worth to justify doing both in kubeclient.
And "use :raw and stream parse yourself" sounds reasonable to me, until proven otherwise.

If I'm wrong, e.g. if someone really wants streaming for older k8s that don't do chunking, please speak up / reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet