Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement stackdriver logging #68

Closed
gregw opened this issue Nov 9, 2016 · 27 comments
Closed

Implement stackdriver logging #68

gregw opened this issue Nov 9, 2016 · 27 comments
Assignees

Comments

@gregw
Copy link
Contributor

gregw commented Nov 9, 2016

Implement stackdriver logging as described in the jetty-runtime design document.

@gregw
Copy link
Contributor Author

gregw commented Nov 9, 2016

work in progress will be in branch https://github.com/GoogleCloudPlatform/jetty-runtime/tree/master-68

@gregw gregw self-assigned this Nov 9, 2016
@gregw
Copy link
Contributor Author

gregw commented Nov 9, 2016

@meltsufin currently logging is split over openjdk-runtime (json formatters) and jetty-runtime.

Ultimately, I can see the results of this issue being split over:

  • pull request to google cloud java LoggingHandler
  • an non-jetty specific extension of LoggingHandler that could replace the JSON stuff (no longer needed) in openjdk-runtime and adds traceid support.
  • jetty specific usage and configuration of the LoggingHandler which belongs in jetty-runtime

I propose that for simplicity, I develop all the code here in jetty-runtime and then once we are happy with how it is working, we then distribute it out to the locations above (or as then appropriate) before we merge to master/cloud/open-jdk

OK?

gregw added a commit that referenced this issue Nov 9, 2016
gregw added a commit that referenced this issue Nov 9, 2016
gregw added a commit that referenced this issue Nov 9, 2016
@meltsufin
Copy link
Member

@gregw sounds good to develop the code here first.

@gregw
Copy link
Contributor Author

gregw commented Nov 10, 2016

@meltsufin progress is slow. I'm a bit concerned that the JUL to google-cloud-java approach may be a bit flawed. It is working when the log level is INFO, but once you make the level FINE, then the implementation of google-cloud-java using things like netty and HttpURLConnection that themselves log to java util logging, so infinite recursion and stack overflows result!

See googleapis/google-cloud-java#1386

We could go straight from the jetty log to google-cloud-logging, but anything that uses JUL or JCL (eg parts of the datastore used for sessions) will be lost (or end up on the console).

It may be that the google-cloud-logging developers have a solution to prevent the recursion. But I think it really needs a very simple HTTP client that does not do logging to be used to talk to the stackdrive service. Perhaps the google-cloud-java library can be used to do the auth first.

@meltsufin
Copy link
Member

@gregw Can you use a thread local flag to signal to the handler that it's being recursively called and break the infinite recursion?

@gregw
Copy link
Contributor Author

gregw commented Nov 10, 2016

@meltsufin we should be able to use a thread local, although care will be needed to ensure that we don't hammer on that for every log line. I've not done that yet, as it is really something that needs to be solved at the google-cloud-java level, but may do so today just to make some progress.

I've been thinking over alternate solutions and nothing really is as satisfactory as having a JUL handler receiving all container logs from all sources and sending them to stackdriver

@gregw
Copy link
Contributor Author

gregw commented Nov 11, 2016

The LoggingHandler does have protection for looping in it's implementation. It maintains a list of masked loggers and removes the stackdriver logger from them. However, for some reason it is not working for my testing. I am getting assistance to analyse.

Good news is that the logs are appearing in the console... just an infinite number of them :)

@meltsufin
Copy link
Member

@gregw It actually makes sense that the LoggingHandler would have protection against infinite recursion, but I guess the way we're trying to redirect logs is breaking it. Maybe we're somehow capturing the logs it skips and directng them back to it? Anyway, thanks for the update.

gregw added a commit that referenced this issue Nov 11, 2016
@gregw
Copy link
Contributor Author

gregw commented Nov 16, 2016

We have not yet found or fixed the log looping problem in googleapis/google-cloud-java#1386, but as they are working on it and as it indicates that loop logging is indeed something the API code will deal with, I don't think this indicates we have an architectural issue.

So I will ignore the problem for now (don't set logging to FINE) and finish this task. I'll separately help the google-cloud-java team debug and fix the looping issue.

gregw added a commit that referenced this issue Nov 16, 2016
@meltsufin
Copy link
Member

Are you ready to create a pull request to start the review process?

@gregw
Copy link
Contributor Author

gregw commented Nov 16, 2016

Not just yet. Let me upgrade to the just release RC2 and do a bit more testing. I'd also like to automate some tests, but I don't think we should hold up the PR for that.

@gregw
Copy link
Contributor Author

gregw commented Nov 17, 2016

@meltsufin I have the logging working now, however it uses a modified LoggingHandler that uses a threadlocal. So I will not create the PR until googleapis/google-cloud-java#1385 is resolved.... unless you want me to create the PR so we can do an early review and chat about the code?

@meltsufin
Copy link
Member

@gregw I will look at the code from the branch for now. Thanks!

@meltsufin
Copy link
Member

meltsufin commented Nov 17, 2016

@gregw I tried building an image from your branch, but was not able to deploy an app based on this image. Does the branch have your latest change with the use of threadlocal?

[INFO] GCLOUD: 
[INFO] GCLOUD: Updating service [testservice]....................................................................................................................................failed.
[INFO] GCLOUD: ERROR: (gcloud.app.deploy) Error Response: [9] 
[INFO] GCLOUD: Application startup error:
[INFO] GCLOUD:  at com.google.cloud.logging.LoggingOptions$DefaultLoggingRpcFactory.create(LoggingOptions.java:62)
[INFO] GCLOUD:  at com.google.cloud.ServiceOptions.getRpc(ServiceOptions.java:491)
[INFO] GCLOUD:  at com.google.cloud.logging.LoggingImpl.<init>(LoggingImpl.java:96)
[INFO] GCLOUD:  at com.google.cloud.logging.LoggingOptions$DefaultLoggingFactory.create(LoggingOptions.java:43)
[INFO] GCLOUD:  at com.google.cloud.logging.LoggingOptions$DefaultLoggingFactory.create(LoggingOptions.java:38)
[INFO] GCLOUD:  at com.google.cloud.ServiceOptions.getService(ServiceOptions.java:478)
[INFO] GCLOUD:  at com.google.cloud.runtimes.jetty9.LoggingHandler.getLogging(LoggingHandler.java:282)
[INFO] GCLOUD:  at com.google.cloud.runtimes.jetty9.AsyncLoggingHandler.write(AsyncLoggingHandler.java:101)
[INFO] GCLOUD:  at com.google.cloud.runtimes.jetty9.LoggingHandler.flush(LoggingHandler.java:365)
[INFO] GCLOUD:  at com.google.cloud.runtimes.jetty9.LoggingHandler.publish(LoggingHandler.java:298)

@gregw
Copy link
Contributor Author

gregw commented Nov 17, 2016

@meltsufin

Oops pushed failed... pushed now (and merged with latest master).

On 18 November 2016 at 04:58, meltsufin notifications@github.com wrote:

@gregw https://github.com/gregw I tried building an image from your
branch, but was not able to deploy an app based on this image. Does the
branch have your latest change with the use of threadlocal?

[INFO] GCLOUD:
[INFO] GCLOUD: Updating service [saltshaker]....................................................................................................................................failed.
[INFO] GCLOUD: ERROR: (gcloud.app.deploy) Error Response: [9]
[INFO] GCLOUD: Application startup error:
[INFO] GCLOUD: at com.google.cloud.logging.LoggingOptions$DefaultLoggingRpcFactory.create(LoggingOptions.java:62)
[INFO] GCLOUD: at com.google.cloud.ServiceOptions.getRpc(ServiceOptions.java:491)
[INFO] GCLOUD: at com.google.cloud.logging.LoggingImpl.(LoggingImpl.java:96)
[INFO] GCLOUD: at com.google.cloud.logging.LoggingOptions$DefaultLoggingFactory.create(LoggingOptions.java:43)
[INFO] GCLOUD: at com.google.cloud.logging.LoggingOptions$DefaultLoggingFactory.create(LoggingOptions.java:38)
[INFO] GCLOUD: at com.google.cloud.ServiceOptions.getService(ServiceOptions.java:478)
[INFO] GCLOUD: at com.google.cloud.runtimes.jetty9.LoggingHandler.getLogging(LoggingHandler.java:282)
[INFO] GCLOUD: at com.google.cloud.runtimes.jetty9.AsyncLoggingHandler.write(AsyncLoggingHandler.java:101)
[INFO] GCLOUD: at com.google.cloud.runtimes.jetty9.LoggingHandler.flush(LoggingHandler.java:365)
[INFO] GCLOUD: at com.google.cloud.runtimes.jetty9.LoggingHandler.publish(LoggingHandler.java:298)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#68 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEUrUMeULlFEKBZsv7-o5qBq8GvqM2mks5q_JXKgaJpZM4KtDgS
.

Greg Wilkins gregw@webtide.com CTO http://webtide.com

@meltsufin
Copy link
Member

@gregw It's running fine now. I can actually see the logs in the Log Viewer and they have the trace id, but they all appear under the "Global" resource, rather than "GAE Application" -> [service] -> [version]. This is caused by the "resource" not being setup with the necessary metadata in the LogEntry. I wonder if this is something we need to explicitly configure, or the cloud-logging-library should be doing it automatically.

@gregw
Copy link
Contributor Author

gregw commented Nov 22, 2016

I'll investigate and ask the google-cloud-java folks if need be.

@gregw
Copy link
Contributor Author

gregw commented Nov 23, 2016

@meltsufin I'm trying to add a MonitoredResource, but it is causing all the logs to go missing!

I create the resource with

    monitored = MonitoredResource.newBuilder("gae_app")
        .addLabel("project_id", System.getenv("GCLOUD_PROJECT"))
        .addLabel("module_id", System.getenv("GAE_SERVICE"))
        .addLabel("version_id", System.getenv("GAE_VERSION"))
        .addLabel("instance_id", System.getenv("GAE_INSTANCE"))
        .build();
    System.err.println("MonitoredResource = " + monitored);

and when run in env:flex I can see the stderr log in the console reporting:

MonitoredResource = MonitoredResource{type=gae_app, labels={module_id=default, instance_id=aef-default-20161123t200233-vs9b, project_id=jetty9-work, version_id=20161123t200233

For each log record, I now add the resource:

  protected void enhanceLogEntry(Builder builder, LogRecord record) {
    super.enhanceLogEntry(builder, record);
    String traceid = RequestContextScope.getCurrentTraceid();
    builder.setResource(monitored);
    if (traceid != null) {
      builder.addLabel("traceid", traceid);
    } else {
      Request request = RequestContextScope.getCurrentRequest();
      if (request != null) {
        builder.addLabel("http-scheme", request.getScheme());
        builder.addLabel("http-method", request.getMethod());
        builder.addLabel("http-uri", request.getOriginalURI());
        builder.addLabel("http-remote-addr", request.getRemoteAddr());
      }
    }
  }

But I cannot find any logs in the console? I'm currently running an instance in jetty9-work that will create a few info log entries every time https://20161123t200233-dot-jetty9-work.appspot-preview.com/dump/info is hit.

Any ideas?

@meltsufin
Copy link
Member

meltsufin commented Nov 23, 2016

It's a mystery for me too. Log Viewer might be using some other metadata to organize the logs. I've asked the folks who work on the Log Viewer for help. I'll update when I hear back from them.

@gregw
Copy link
Contributor Author

gregw commented Nov 24, 2016

I'm printing out the google API LogEntry as they are generated and they look like:

LogEntry
{
  logName=null, 
  resource=MonitoredResource
  {
    type=gae_app, 
    labels=
    {
      module_id=default, 
      instance_id=aef-default-20161124t151259-jezn, 
      project_id=jetty9-work, 
      version_id=20161124t151259
    }
  }, 
  timestamp=null, 
  severity=INFO, 
  insertId=null, 
  httpRequest=null, 
  labels=
  {
    traceid=5612ee32e03c4abe051377e74528d4b4, 
    levelName=INFO, 
    levelValue=800
  }, 
  operation=null, 
  payload=StringPayload
  {
    type=STRING, 
    data=Nov 24, 2016 4:19:41 AM org.eclipse.jetty.server.handler.ContextHandler$Context log INFO: dump servlet context /dump/testlog 
  }
}

@meltsufin
Copy link
Member

The logName=null might be an issue. Also, it would interesting to see how this compares to the log entries generated by fluentd when it reads the JSON-formatted application log from the filesystem.

@meltsufin
Copy link
Member

@gregw I got an update from the Logging team, and they're saying that the instance_id is not a valid label and the service should be rejecting the log entry. This makes me wonder how are we surfacing any errors from the logging service? Can we see if the call to the service is failing?

To get the list of valid labels, you can e.g. use our API https://cloud.google.com/logging/docs/api/reference/rest/v2/monitoredResourceDescriptors/list

  {
   "type": "gae_app",
   "displayName": "GAE Application",
   "description": "An application running in Google App Engine (GAE).",
   "labels": [
    {
     "key": "project_id",
     "description": "The identifier of the GCP project associated with this resource (e.g., my-project)."
    },
    {
     "key": "module_id",
     "description": "The service/module name."
    },
    {
     "key": "version_id",
     "description": "The version name."
    }
   ]
  },

@gregw
Copy link
Contributor Author

gregw commented Nov 28, 2016

Ah that was me adding the instance_id as I thought it may be useful. I thought it would just be a label that would be ignored and but that could be filtered on if desired. I will remove and try again (although it could be good information to include somewhere, perhaps on the log label rather than the resource label?)

With regards to errors, I think we are not getting any as they should be written to stderr if we did. However I will double check and perhaps induce a few forced errors to double check we can see logs.

I'll work on this tomorrow.

gregw added a commit that referenced this issue Nov 30, 2016
Updated to latest gcloud API
removed copied LoggingHandlers
removed instanceid from monitored resource
gregw added a commit that referenced this issue Dec 1, 2016
@gregw
Copy link
Contributor Author

gregw commented Dec 2, 2016

@meltsufin I did some simple memory usage testing, creating logs in a loop and every 1000 I checking the difference in the memory used by the JVM. I get output like:


total(2591555584)-free(504959536)=used(2086596048) delta=-233666880
total(2591555584)-free(472231168)=used(2119324416) delta=32728368
total(2591555584)-free(446086520)=used(2145469064) delta=26144648
total(2591555584)-free(419942008)=used(2171613576) delta=26144512
total(2591555584)-free(393797568)=used(2197758016) delta=26144440
total(2591555584)-free(367371512)=used(2224184072) delta=26426056
total(2591555584)-free(341227088)=used(2250328496) delta=26144424
total(2591555584)-free(315082536)=used(2276473048) delta=26144552
total(2591555584)-free(288657128)=used(2302898456) delta=26425408
total(2591555584)-free(262512392)=used(2329043192) delta=26144736
total(2591555584)-free(236086928)=used(2355468656) delta=26425464
total(2591555584)-free(209942272)=used(2381613312) delta=26144656
total(2766667776)-free(645374152)=used(2121293624) delta=-260319688
total(2766667776)-free(619806864)=used(2146860912) delta=25567288
total(2766667776)-free(594239624)=used(2172428152) delta=25567240
total(2766667776)-free(568400784)=used(2198266992) delta=25838840
total(2766667776)-free(535326968)=used(2231340808) delta=33073816
total(2766667776)-free(509755720)=used(2256912056) delta=25571248
total(2766667776)-free(483916880)=used(2282750896) delta=25838840
total(2766667776)-free(458349456)=used(2308318320) delta=25567424
total(2766667776)-free(432778136)=used(2333889640) delta=25571320
total(2766667776)-free(406935360)=used(2359732416) delta=25842776
total(2766667776)-free(381019520)=used(2385648256) delta=25915840
total(2766667776)-free(355371080)=used(2411296696) delta=25648440
total(2766667776)-free(329349728)=used(2437318048) delta=26021352
total(2596798464)-free(604057720)=used(1992740744) delta=-444577304
total(2596798464)-free(560980168)=used(2035818296) delta=43077552
total(2596798464)-free(522576296)=used(2074222168) delta=38403872
total(2596798464)-free(496365592)=used(2100432872) delta=26210704
total(2596798464)-free(470154784)=used(2126643680) delta=26210808
total(2596798464)-free(443943976)=used(2152854488) delta=26210808
total(2596798464)-free(417733016)=used(2179065448) delta=26210960
total(2596798464)-free(390986528)=used(2205811936) delta=26746488
total(2596798464)-free(364775584)=used(2232022880) delta=26210944
total(2596798464)-free(338296816)=used(2258501648) delta=26478768
total(2596798464)-free(311818248)=used(2284980216) delta=26478568
total(2596798464)-free(601169792)=used(1995628672) delta=-289351544
total(2596798464)-free(561170936)=used(2035627528) delta=39998856

This suggests that each log is generating about 25KB of garbage!

I think this warrants further investigation and I shall try a profiling tool on it next week.

@meltsufin
Copy link
Member

Thanks Greg! This is obviously way too much garbage, if it's true!
Do you mind posting an issue on google-cloud-java once you get more concrete idea of what's going?

Also, this issue with netty memory leaks may be related. googleapis/google-cloud-java#1433

gregw added a commit that referenced this issue Dec 5, 2016
Code cleanups after review
gregw added a commit that referenced this issue Dec 23, 2016
Removed stackdriver logging, so this just configures JUL to capture the traceid and log to stderr
gregw added a commit that referenced this issue Jan 4, 2017
gregw added a commit to GoogleCloudPlatform/openjdk-runtime that referenced this issue Jan 13, 2017
meltsufin pushed a commit to GoogleCloudPlatform/openjdk-runtime that referenced this issue Jan 13, 2017
gregw added a commit that referenced this issue Jan 24, 2017
minor clean ups
gregw added a commit that referenced this issue May 11, 2017
* Issue #68 added request context scope

* Issue #68 added request context scope

* Issue #68 added skeleton Logging Handler

* Issue #68 work in progress

* Issue #68 work in progress

* working implementation using a threadlocal to stop looping

* Issue #68 update to latest google-cloud-java LoggingHandler

* Issue #68

Updated to latest gcloud API
removed copied LoggingHandlers
removed instanceid from monitored resource

* Issue #68 Use same labels as nginx

* Issue #68

Code cleanups after review

* Issue #68

Removed stackdriver logging, so this just configures JUL to capture the traceid and log to stderr

* renamed traceid to traceId

* gcloud api not currently used

* Tunnel traceId in parameters #68

* Simplified with traceId lookup in formatter #68

* Use 8.2 beta stackdriver

* Improved formatting

* use logger name rather than source

* run jetty from the webapp working directory so logging configuration may be relative

* Testing logging

Test googleapis/google-cloud-java#1535 and jetty-9.4.1-SNAPSHOT

* use released 9.4.1

* Issue #68

minor clean ups

* Do not CD to non existant directory

* improved README

* fixed Cloud SDK

* Released beta of google-cloud-logging

* updated google-cloud API to 0.8.3-beta

* added remote test to check logging

* Stackdriver logging testing

improved comments
check for traceid

* upgrade to 0.9.2

* Test for zone

* upgrade to 10.0 gcloud API

* enable gae module only of GAE_INSTANCE environment variable is set

* improved gae.mod documentation

* GCP module

Rename gae module and configuration to gcp
Split the jetty.commands into jetty.commands and gcp.commands
moved commands file to jetty-base/config-scripts
updated setup-env-ext to run the gcp.commands when image is run with a GAE_INSTANCE set

* fixed warnings

* turn off stackdriver logging by default. Added instructions to enable.

* Updates

Update to latest openjdk-runtime with setup-env.d
Update to latest jetty release

* Use the PLATFORM env var

* Improved jetty startup

Added JETTY_PROPERTIES, JETTY_MODULES_ENABLE & JETTY_MODULES_DISABLE
Removed duplicate code fron 50-jetty.bash

* use launcher URL

* Trimmed GCP specific configuration

* Added structure tests for jetty setup script

Also fixed unpack bug found as a result

* Support passing just args

* fix merge

* Fixed test workspace paths for cloud build

* review feedback

* working directory is the root webapp

* Improve handling of various types of command line.

Fixed problem with handling of command line like "ls /var"
Requires duplication of test for java from openjdk-runtime docker-entrypoint.bash,
which should be modified to avoid the duplication.

* upgrade cgloud API version to 0.13.0-beta

* use package target

* tested with FINE log level

* Simplify entrypoint args processing

The $@ arg array is kept complete between all scripts.

* remove debug

* remove debug

* Test that the logging dependencies are hidden from webapp classpath

* update README.md with instructions to keep INFO level on io.grpc.netty.level=INFO

* Updated to lasted openjdk-runtime

* fixed classloader test

* fixed TODO

* Update to jetty 9.4.5 and gcloud 1.0.1-SNAPSHOT

* upgraded to gcloud-logging 1.0.1

* turn off debug

* updated README for latest 1.0.1 gcloud-logging

* update classpath exclusion


\o/
@sparhomenko
Copy link

Our team recently started migrating from GAE Standard to Flexible, and I bumped into this issue while investigating the difference in logging behaviour. My understanding is that the change done here will collapse multi-line logs into a single Stackdriver log entry, and will propagate JUL logging levels to Stackdriver. That will indeed make Flexible logging significantly more useful, but it still seems to be missing the feature we found extremely convenient in Standard: grouping Jetty logs by HTTP requests.

Can I ask if this feature is still in the plans, and if not, why? We are considering doing that with a custom JUL logging handler that will buffer the logs, and flush the buffer at the end of the request or if the buffer grows too big, producing a Standard-style Stackdriver log entry of type.googleapis.com/google.appengine.logging.v1.RequestLog type with multiple LogLine records. It will be helpful to know if you have some design or performance considerations that are preventing you from the following the same approach.

@meltsufin
Copy link
Member

Other than collapsing multi-line log messages, and setting the levels correctly, in the #81 we have support for enhanced logging which will provide grouping by HTTP request. You just have to make sure you configure the com.google.cloud.logging.LoggingHandler in your JUL logging.properties.
So, you don't need to implement it yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants