Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPE in LTPAConfigurationImpl.loadConfig #7448

Closed
jwalcorn opened this issue May 8, 2019 · 19 comments
Closed

NPE in LTPAConfigurationImpl.loadConfig #7448

jwalcorn opened this issue May 8, 2019 · 19 comments
Assignees
Labels
regression This bug is for something that worked in a past release, but no longer does release bug This bug is present in a released version of Open Liberty release:19005 team:OSGi Infrastructure

Comments

@jwalcorn
Copy link

jwalcorn commented May 8, 2019

I've owned a simple Stock Trader application for the past couple of years. Almost all of the microservices run on Liberty in Docker containers on Kubernetes (I've run it in ICP, IKS, and OCP). All has been fine as I moved up to each new Liberty release, until the new 19.0.0.4 came out last week. First a tech sales guy reported it, then I reproduced the problem myself. All I have to do is build the Docker container for my trader UI microservice against 19.0.0.4, and the server will fail to start, with an NPE early on as the security service tries to initialize. If I go back to 19.0.0.3, all is fine. The log output is as follows:

[INFO    ] CWWKS0007I: The security service is starting...
[ERROR   ] CWWKE0701E: bundle com.ibm.ws.security.token.ltpa:1.0.27.cl190420190419-0642 (67)[com.ibm.ws.security.token.ltpa.LTPAConfiguration(156)] : The activate method has thrown an exception java.lang.NullPointerException
    at com.ibm.ws.security.token.ltpa.internal.LTPAConfigurationImpl.loadConfig(LTPAConfigurationImpl.java:115)
    at [internal classes]

I'll attach the full messages.log, a trace.log with *=info:com.ibm.ws.security.*=all:com.ibm.ws.webcontainer.security.*=all, and my server.xml (the full code is in GitHub at https://github.com/IBMStockTrader/trader). Note my app actually uses JWT, not LTPA, but it appears Liberty loads the LTPA stuff whenever the security service starts. Please let me know if any other traces or other info are needed. I'm holding off on pushing 19.0.0.4-based images to DockerHub until this is fixed, so that at least the people just using the pre-built images are OK - but some people prefer to build it themselves, and they will be broken now.

@jwalcorn
Copy link
Author

jwalcorn commented May 8, 2019

logs.zip

@jwalcorn
Copy link
Author

jwalcorn commented May 8, 2019

Discussed in #was-security: https://ibm-cloud.slack.com/archives/C324NP6H5/p1556903477018100

@jwalcorn
Copy link
Author

jwalcorn commented May 8, 2019

A few random thoughts, which may or may not be relevant:

  1. As I don't actually use LTPA, I'm not copying an ltpa.keys into the Docker container
  2. Perhaps the server is generating one on the fly?
  3. I do have a key.jks and validationKeystore.jks that I refer to in my server.xml
  4. This is my only microservice with a validationKeystore.jks (and only one failing - could be coincidence)
  5. I believe there may be an expired key in my key.jks (but hasn't been an issue...)
  6. I've heard Liberty defaults to .p12 format for keystores now (new, in 19.0.0.4). I'm still using .jks format, which I understand should still be supported for backward compatibility reasons.
  7. I have appSecurity-2.0 being loaded, as well as mpJwt-1.1
  8. I'm also using jwtSSO-1.0 (this is my only microservice doing so - and the only one failing - could be coincidence.
  9. I'm just using basicRegistry
  10. I've reproduced this outside of Kubernetes, just by doing a docker run trader:latest on my Mac.

@teddyjtorres
Copy link
Contributor

It has to be determined why config admin provided a null value for the "expiration" attribute.

@teddyjtorres
Copy link
Contributor

The service is not passed the attribute values from metatype,

[5/7/19 16:23:00:866 UTC] 00000020 id=26fe8f0b om.ibm.ws.security.token.ltpa.internal.LTPAConfigurationImpl > activate Entry
org.apache.felix.scr.impl.manager.ComponentContextImpl@1ac6e462
{service.vendor=IBM, component.name=com.ibm.ws.security.token.ltpa.LTPAConfiguration, tokenType=Ltpa2, component.id=185}

@tjwatson
Copy link
Member

tjwatson commented May 9, 2019

Is the server configuration available to reproduce this?

@tjwatson
Copy link
Member

tjwatson commented May 9, 2019

recreate.zip

unzip the recreate.zip and run the recreate7448.sh script to reproduce.

@tjwatson
Copy link
Member

tjwatson commented May 9, 2019

Add the following to the end of your Dockerfile to work around:

RUN server start --clean && server stop

This also has the added benefit of doing some work in your docker build to cache the configuration changes of your application so it doesn't have to happen each time you start a new container based on your image.

@tjwatson
Copy link
Member

An alternative work around is to not run the installUtility against your defaultServer configuration. Instead list the features you want to install, for example:

RUN installUtility install --acceptLicense jwtSso-1.0 logstashCollector-1.0

@tjwatson tjwatson self-assigned this May 10, 2019
@tjwatson tjwatson added release bug This bug is present in a released version of Open Liberty regression This bug is for something that worked in a past release, but no longer does team:OSGi Infrastructure and removed Needs member attention labels May 10, 2019
@arthurdm
Copy link

hey @tjwatson - I am curious as to why one of the workarounds is to install the features individually vs doing installUtility install --acceptLicense defaultServer. Can you please elaborate on that?

as fyi:

the Docker images for OL and WL already have a docker server start / stop warmup optimization - which although don't have the application classes in the cache, do improve the startup considerably. They do add about 60-80 MB to the Docker layer though.

So if you do another server warmup it would duplicate most of the cache and cause the image to be even more bloater - yes, it would then have a better cache, but probably not worth the double footprint hit. And, if would have to worry about changing the group permissions (see the last part of this command)

@tjwatson
Copy link
Member

hey @tjwatson - I am curious as to why one of the workarounds is to install the features individually vs doing installUtility install --acceptLicense defaultServer. Can you please elaborate on that?

When doing installUtility against a server configuration it trashes the configurations persisted in the workarea of the server. These are the configurations that got persisted in the workarea for defaultServer in the Docker images for OL and WL which you reference below. On a cached restart we depended on the cached configurations to provide a performance improvement on cached restart. But when installUtility trashes the persistent configs we are doomed to fail the relaunch and that results in the NPE from this issue. Also see the design issue #7491 that I opened as a fallout of this issue.

If instead you use installUtility with an explicit list of features instead of a server configuration then it avoids trashing the workarea of the defaultServer preserving the configuration data we need to successfully start again.

as fyi:

the Docker images for OL and WL already have a docker server start / stop warmup optimization - which although don't have the application classes in the cache, do improve the startup considerably. They do add about 60-80 MB to the Docker layer though.

So if you do another server warmup it would duplicate most of the cache and cause the image to be even more bloater - yes, it would then have a better cache, but probably not worth the double footprint hit. And, if would have to worry about changing the group permissions (see the last part of this command)

jwalcorn pushed a commit to IBMStockTrader/trader that referenced this issue May 13, 2019
@jwalcorn
Copy link
Author

I tried the workaround of starting (with --clean) and stopping the server at the end of my Dockerfile (https://github.com/IBMStockTrader/trader/blob/master/Dockerfile), and it seems to be working. Thanks for the quick suggestion. I'll be happy to test the real fix whenever it's ready, and remove the workaround. Thanks again!

@jwalcorn
Copy link
Author

So a working 19.0.0.4-based image of my Trader UI microservice is in DockerHub now, available via "docker pull ibmstocktrader/trader:latest". My colleague Greg Hintermeister is about to deploy it to IKS and try it.

@marikaj123
Copy link

Tom will provide Docker to John to test.

@tjwatson
Copy link
Member

I provided John @jwalcorn with a Dockerfile in my fork of trader at https://github.com/tjwatson/trader/tree/LibertyIssue7448

This does require --build-arg to be used when doing a docker build to pass DOWNLOAD_OPTIONS with the --user and --password for accessing the liberty build.

@marikaj123
Copy link

@jwalcorn - Let us know if the fix did pass verification and the issue can be closed.

@jwalcorn
Copy link
Author

Just tested it built against the latest 19.0.0.5 GM-candidate Liberty build, without the workaround, and all is good!

@jwalcorn
Copy link
Author

image

@marikaj123
Copy link

@jwalcorn - Thank you and Tom thank you too. Please close the issue since the code testing passed with the green release build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression This bug is for something that worked in a past release, but no longer does release bug This bug is present in a released version of Open Liberty release:19005 team:OSGi Infrastructure
Projects
None yet
Development

No branches or pull requests

6 participants