Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second server on second co-located domain always fails on first start #36

Closed
jasonp18 opened this issue Jun 19, 2014 · 10 comments
Closed

Comments

@jasonp18
Copy link

Hi Edwin,

We have two SOA Suite domains (e.g. soa1domain, soa2domain) co-located on two machines . During each auto build, the only repeatable issue is that every time the second managed server (soa2_ms2) in the second domain (soa2domain) fails on first start.

Machine 1: Admin(soa1), Admin(soa2), soa1_ms1, soa2_ms2
Machine 2: soa1_ms2, soa2_ms2

The current workaround is manually start at command line and enter credentials when prompted. The server will eventually stop automatically and request to restart via node manager (requirement for consensus migration basis). Restart Node Manager and then start the managed server from the Weblogic console.

<18-Jun-2014 05:56:07 o'clock BST> <Could not decrypt the username attribute value of {AES}246rovWvZxqSyhCS4XAWfoREUsX/DwWJO77hHfM8TMg= from the file /
u01/app/oracle/admin/domains/ipt_cdp_domain/servers/cdp_server2/data/nodemanager/boot.properties. If you have copied an encrypted attribute from boot.properties from another domain into
/u01/app/oracle/admin/domains/ipt_cdp_domain/servers/cdp_server2/data/nodemanager/boot.properties, change the encrypted attribute to its cleartext value then reboot the server. The attri
bute will be re-encrypted. Otherwise, change all encrypted attributes to their cleartext values, then reboot the server. All encryptable attributes will be re-encrypted. The decryption f
ailed with the exception weblogic.security.internal.encryption.EncryptionServiceException.>

Please advise. Many thanks, Jason

@biemond
Copy link
Owner

biemond commented Jun 19, 2014

Strange,

what I basically do when I create a managed server, I also add the boot.properties in the current domain on the adminserver machine
https://github.com/biemond/biemond-orawls/blob/master/files/providers/wls_server/create.py.erb

When everything is configured, I do a pack domain and a copydomain to the other servers which does an unpack and enroll the domain to the nodemanager.

So I don't know, what happened here maybe I should re-create the boot.properties.
Or some bug cause it works on the first domain

can you check the boot.properties, before you first start all the managed server, it's then all clear text. maybe there is error in the weblogic password( mix up ) and the WLST domain daemon process.

@biemond biemond closed this as completed Sep 3, 2014
@jasonp18
Copy link
Author

jasonp18 commented Dec 3, 2014

Hi Edwin, can this issue be re-opened since the issue still exists with the version 1.0.17 we are using with WLS 10.3.6 and SOA 11.1.1.7.

On the problem second node I have soa1_domain and soa2_domain both with a managed server 2 specific to the domain. The AdminServer and managed server 1 specific to the domain soa1_domain/soa2_domain is on the first node.

The strange thing is that the issue is only for soa2_domain managed server 2 (MS2). The MS2 for the soa1_domain on the same host starts fine.

After successful copy/unpack on the second node and before the soa2_domain MS2 is started, the
/u01/app/oracle/admin/domains/soa2_domain/servers/AdminServer/security/boot.properties is encrypted and the MS2 server folder does not exist until first start.

Generated by Configuration Wizard on Wed Dec 03 13:27:31 GMT 2014
username=xxxxxxxx
password=xxxxxxxx

On first start of soa2_domain MS2, the error below prevents the server to start unless manually starting using the startManagedServer script, manually entering the username/password which results in the server starting up (resets boot.properties) and auto-stopps since it complains it was not started using node manager. After a restart of the node manager to pick up the new boot settings, the soa2_domain MS2 can be successfully started by Puppet.

Please advise whether there's a fix to avoid the above manual steps!

Thanks again. Jason

@biemond
Copy link
Owner

biemond commented Dec 3, 2014

Hi,

If the server2 and the security folder inside exists on the adminserver domain then probably the pack jar was already there. In this case I won't override it, so with adding a new server you should delete the pack jar from the download dir ( on the adminserver )

If you do the delete now and do a puppet run it should create a new one

@jasonp18
Copy link
Author

jasonp18 commented Dec 4, 2014

Hi, the machine & server2 is added before pack domain. As suggested, I tried deleting the pack jar on the admin node, ran Puppet to re-created it then ran Puppet on on the second node (cleaned) but same issue. The server2 won't start for soa2_domain.

On the admin server, server2 (and boot.properties) folder in soa1_domain doesn't exist and yet it starts fine on the second node.

@biemond
Copy link
Owner

biemond commented Dec 4, 2014

It should create it ( the last step of create.py ) not on a modify
https://github.com/biemond/biemond-orawls/blob/master/files/providers/wls_server/create.py.erb

so also for server1 ( domainX/servers/server1/security/boot.properties ) there is no boot.properties . you don't need one when domain is in development mode.

I will do some tests on a plain cluster domain

@jasonp18
Copy link
Author

jasonp18 commented Dec 4, 2014

FYI...both the co-located soa1_domain & soa2_domain on two shared machines (each with Admin, MS1 on Node1 and MS2 on Node2) have development_mode: false

Thanks Jason

@biemond
Copy link
Owner

biemond commented Dec 4, 2014

Ok it works for me on a standard domain, there should be no difference between soa or standard wls domain . It generates boot.properties for both servers and packs it in

Can you check when you add a new wls server to your config if is generates one ( don't have to be part of a cluster ) , and use ensure = absent to remove it later.

Else it can only be an old orawls module

@jasonp18
Copy link
Author

jasonp18 commented Dec 5, 2014

Thanks Edwin. WIll try when i get a chance but the issue is reproducible when building from scratch and upgrading to a later version is not possible at the moment since we already went through an upgrade cycle recently. In the interim, the solution may be to automate the manual workaround as part of our build

@guptasachin1112
Copy link

HI Edwin,

I have a scenario where I have installed Admin + managed_1 on machine A , and managed_2 on machine B. I am able to start the admin server and managed_1 on A machine using weblogic admin console and through command prompt. I had created a domain and then pack domain on machine A and then unpack domain on machine B.

I have started the nodeManager on machine B and my node manger is also reachable through weblogic console. I have also enrolled my nodemanager on machine B using wlst. But while starting the second managed server on machine B , it failed every time. it gives error as
Error: Diagnostics data was not saved to the credential store.
Error: Validate operation has failed.
Need to do the security configuration first!
FINEST NodeManager Waiting for the process to die: 15567
INFO NodeManager Server failed during startup so will not be restarted
NodeManager runMonitor returned, setting finished=true and notifying waiters

then I tried to start the second manged server from machine A (where Admin ) is installed. using startManaged command. the second managed server on machine B is started but at last failed with the error message as

Emergency Security BEA-090087 Server failed to bind to the configured Admin port. The port may already be used by another process.
Critical WebLogicServer BEA-000362 Server failed. Reason: Server failed to bind to any usable port. See preceeding log message for details.
Notice WebLogicServer BEA-000365 Server state changed to FAILED
Error WebLogicServer BEA-000383 A critical service failed. The server will shut itself down
Error Server BEA-002606 Unable to create a server socket for listening on channel "Default". The address 10.152.173.xx might be incorrect or another process is using port 14100: java.net.BindException: Cannot assign requested address.
Notice WebLogicServer BEA-000365 Server state changed to FORCE_SHUTTING_DOWN
Notice Cluster BEA-000163 Stopping "async" replication service

Is there any mistake done during unpacking and starting nodeManager on machine B??

@biemond
Copy link
Owner

biemond commented Aug 29, 2015

I think the pack command is executed too fast , should be the last command on A after that you can do machine B
and the second issue is understandable because you defined the listenaddress and this ip in on machine B and not on A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants