Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Akash Provider Fails to Start #69

Open
dougbebber opened this issue Mar 12, 2022 · 14 comments
Open

Akash Provider Fails to Start #69

dougbebber opened this issue Mar 12, 2022 · 14 comments

Comments

@dougbebber
Copy link

Everything appears to be working except that the Provider is offline and will not successfully start.
I keep seeing the following in the /var/log/handyhost.log:

AKT: get aggregates query error Error: post failed: Post "http://rpc-1.handyhost.computer/": dial tcp: address rpc-1.handyhost.computer: missing port in address

AKT: get aggregates query error Error: post failed: Post "http://rpc-1.handyhost.computer/": dial tcp: address rpc-1.handyhost.computer: missing port in address

AKT: get aggregates query error Error: post failed: Post "http://rpc-1.handyhost.computer/": dial tcp: address rpc-1.handyhost.computer: missing port in address

AKT: get aggregates query error Error: post failed: Post "http://rpc-1.handyhost.computer/": dial tcp: address rpc-1.handyhost.computer: missing port in address

AKT: get aggregates query error Error: post failed: Post "http://rpc-1.handyhost.computer/": dial tcp: address rpc-1.handyhost.computer: missing port in address

error calling http://rpc-1.handyhost.computer:26659/akash1ghhkmp9c0zwynvktefh2uu0saam2dzw04pf52m Error: connect ECONNREFUSED 72.8.228.138:26659
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1146:16) {
errno: -111,
code: 'ECONNREFUSED',
syscall: 'connect',
address: '72.8.228.138',
port: 26659
}

Any known solution?

@alexsmith540
Copy link
Contributor

alexsmith540 commented Mar 12, 2022 via email

@dougbebber
Copy link
Author

Yes, I installed handyhost_v0.5.2.1.deb on Ubuntu 20.04 LTS.

Handy Host shows:
HandyHost v0.5.2
Akash v0.14.1

Are you on the latest v0.5.2 build?

Sent from my iPhone
On Mar 12, 2022, at 7:53 AM, Douglas Bebber @.***> wrote:  Everything appears to be working except that the Provider is offline and will not successfully start. I keep seeing the following in the /var/log/handyhost.log: AKT: get aggregates query error Error: post failed: Post "http://rpc-1.handyhost.computer/": dial tcp: address rpc-1.handyhost.computer: missing port in address AKT: get aggregates query error Error: post failed: Post "http://rpc-1.handyhost.computer/": dial tcp: address rpc-1.handyhost.computer: missing port in address AKT: get aggregates query error Error: post failed: Post "http://rpc-1.handyhost.computer/": dial tcp: address rpc-1.handyhost.computer: missing port in address AKT: get aggregates query error Error: post failed: Post "http://rpc-1.handyhost.computer/": dial tcp: address rpc-1.handyhost.computer: missing port in address AKT: get aggregates query error Error: post failed: Post "http://rpc-1.handyhost.computer/": dial tcp: address rpc-1.handyhost.computer: missing port in address error calling http://rpc-1.handyhost.computer:26659/akash1ghhkmp9c0zwynvktefh2uu0saam2dzw04pf52m Error: connect ECONNREFUSED 72.8.228.138:26659 at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1146:16) { errno: -111, code: 'ECONNREFUSED', syscall: 'connect', address: '72.8.228.138', port: 26659 } Any known solution? — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you are subscribed to this thread.

@alexsmith540
Copy link
Contributor

i think the errors you were seeing in the logs were likely due to an rpc outage on our aggregates node yesterday morning.
If you try to run the provider now do you still see any errors?

@dougbebber
Copy link
Author

dougbebber commented Mar 13, 2022

I no longer receive that error. However, each time I attempt to start the Provider, I initially get a dialog that says started Provider successfully. Then the provider status indicator in the upper right goes dark and the Provider entry on the left sidebar has a yellow exclamation point.
Provider1
Provider2

/var/log/handyhost.log now contains:
akash exists?? true
akash dir exists? true
is positional [ 'api', 'getIP' ]
is positional [ 'api', 'akt', 'getClusterConfig' ]
is positional [ 'api', 'akt', 'getMarketAggregates' ]
is positional [ 'api', 'akt', 'getClusterStats' ]
is positional [ 'api', 'akt', 'runProvider' ]
should akash run? true
is positional [ 'api', 'akt', 'getClusterStats' ]
is positional [ 'api', 'akt', 'getMarketAggregates' ]
is positional [ 'api', 'akt', 'getProviderLogs' ]
is positional [ 'api', 'akt', 'getProviderLogs' ]
is positional [ 'api', 'akt', 'getClusterStats' ]
is positional [ 'api', 'akt', 'getMarketAggregates' ]
is positional [ 'api', 'akt', 'getClusterConfig' ]
is positional [ 'api', 'akt', 'getProviderParams' ]
is positional [ 'api', 'akt', 'getState' ]
akash exists?? true
akash dir exists? true
is positional [ 'api', 'getIP' ]
is positional [ 'api', 'akt', 'getMarketAggregates' ]
is positional [ 'api', 'akt', 'getClusterStats' ]
is positional [ 'api', 'akt', 'runProvider' ]
should akash run? true
is positional [ 'api', 'akt', 'getClusterStats' ]
is positional [ 'api', 'akt', 'getMarketAggregates' ]
is positional [ 'api', 'akt', 'getClusterStats' ]
is positional [ 'api', 'akt', 'getMarketAggregates' ]
is positional [ 'api', 'akt', 'providerRegistrationGasEstimate' ]
is positional [ 'api', 'akt', 'getClusterStats' ]
is positional [ 'api', 'akt', 'getMarketAggregates' ]
is positional [ 'api', 'akt', 'providerRegistrationGasEstimate' ]
is positional [ 'api', 'akt', 'providerRegistrationGasEstimate' ]
is positional [ 'api', 'akt', 'providerRegistrationGasEstimate' ]
is positional [ 'api', 'akt', 'updateProviderRegistration' ]
is positional [ 'api', 'akt', 'getClusterStats' ]
is positional [ 'api', 'akt', 'getMarketAggregates' ]
is positional [ 'api', 'akt', 'runProvider' ]
should akash run? true
is positional [ 'api', 'akt', 'getClusterStats' ]
is positional [ 'api', 'akt', 'getMarketAggregates' ]
is positional [ 'api', 'akt', 'getClusterStats' ]
is positional [ 'api', 'akt', 'getMarketAggregates' ]
is positional [ 'api', 'akt', 'getMarketplaceOrders' ]
get orders params { page: 1, limit: 10 }
is positional [ 'api', 'akt', 'runProvider' ]
should akash run? true
is positional [ 'api', 'akt', 'getClusterStats' ]
is positional [ 'api', 'akt', 'getMarketAggregates' ]

I tried a few times (with the same result). I even tried to re-register the Provider certificate.

Provider3

I have noticed that when I click the "Provider Logs" link and the dialog comes up, I click Refresh logs and it just hangs (nothing comes back).

Any suggestions on how I can get the Provider to stay up?

@dougbebber
Copy link
Author

Here's my Cluster Nodes Resource Utilization:
Cluster

@alexsmith540
Copy link
Contributor

Any chance you've tried a system reboot (on the provider system)? Sometimes, like the case you ran into where the rpc node was down, a provider can get stuck in an endless restarting loop which i suspect may be happening here.

@dougbebber
Copy link
Author

dougbebber commented Mar 15, 2022

Rebooted several times. Actually the machine locks up after running for several hours and I am forced to reboot.
Same problem with the akash provider. Initially start up provider, message says provider started successfully, the provider status in the right hand corner goes yellow for a few seconds, then the provider is indicated offline and the yellow exclamation is next to the "Start Provider" menu option.

The provider is running on the handy host machine. Not any of the 3 akash cluster nodes.

@dougbebber
Copy link
Author

Is there an akash provider specific log I can inspect to try to discover the details of the problem?

@freakinfofa
Copy link

not sure if its the same thing it happened to me at first when i build my first cluster

Try this. start your provider but uncheck the "start at boot".It will probably fail if you had attempted to start the provider multiple times so reboot your host after a minute. (this is so that the provider does not try to start on its own and you have a better point of reference on when you actually started it)

when your host is backonline try to start your provider once again (this time check the start provider at boot. the dot will go yellow and then gray.... but just give it time (go make some coffee or something and be back in a few minutes) it will go online as long as you keep seeing the "Provider Logs" menu on the left DO NOT TRY TO START IT AGAIN.

if you dont see the provider logs menu on the left.. then the provider is not running

when i kept trying to start it right after it goes gray the first time. it would just never come online. it might be something totally unrelated, but my two cents!

@dougbebber
Copy link
Author

not sure if its the same thing it happened to me at first when i build my first cluster

Try this. start your provider but uncheck the "start at boot".It will probably fail if you had attempted to start the provider multiple times so reboot your host after a minute. (this is so that the provider does not try to start on its own and you have a better point of reference on when you actually started it)

when your host is backonline try to start your provider once again (this time check the start provider at boot. the dot will go yellow and then gray.... but just give it time (go make some coffee or something and be back in a few minutes) it will go online as long as you keep seeing the "Provider Logs" menu on the left DO NOT TRY TO START IT AGAIN.

if you dont see the provider logs menu on the left.. then the provider is not running

when i kept trying to start it right after it goes gray the first time. it would just never come online. it might be something totally unrelated, but my two cents!

I tried your suggestion. However the provider never started. I eventually clicked the "Provider logs" link and no information was presented. I'd like to be able to view some Akash provider log information to attempt to debug what's happening on my machine.

@freakinfofa
Copy link

what does your networking look like?
--are you able to see if any traffic is being blocked or dropped? (any firewalls or fancy routers you might be using?)
--ports 80 and 8443 are forwarded/NAT'd to your provider correctly? (in my case i have a reverse proxy in front of port 80 as i
host other services under that public IP)

I know its probably the obvious... just trying to help :)

@dougbebber
Copy link
Author

Port 80 is forwarded to the Ingress Controller machine on one of my cluster nodes. Port 8443 is forwarded to my Provider node (machine running Handy Host).

I have tried a number of things with no success getting the Provider running.

@freakinfofa
Copy link

You must open & forward ports 80 and 30000-32767 to your Ingress Controller

@avolon42x
Copy link

What you could try that helped on my side sometimes: do a systemctl stop handyhost followed by killall akash (do it twice till there is now akash proc running). systemctl start handyhost. Then be patient and let handyhost start the provider by itself -> if you start the provider through the ui there is the checkbox to automatically start it. If you enable it, handyhost will try and re-try to start the provider. It just took a few minutes on my setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants