Troubleshooting

Joe Sondow edited this page Dec 8, 2013 · 20 revisions
Clone this wiki locally

Logs

Follow the server logs in case of errors:

  • apache-tomcat-x/logs/catalina.out
  • apache-tomcat-x/logs/asgard.log

If you cannot determine the solution, search the Asgard Google group for the error. If you don't find a solution, post your question there.

Increase JVM heap size for larger accounts

If your account contains a large number of cloud objects you might need to increase Asgard’s memory footprint. It’s also advisable to change the JVM garbage collection settings. To do make these changes, create a setenv.sh file in apache-tomcat-x/bin/ similar to the following code block, then restart Tomcat.

    if [ "$1" == "start" ]; then
        export JAVA_OPTS=" \
            -verbose:sizes \
            -Xmx4g -Xms4g \
            -Xmn2g \
            -XX:MaxPermSize=256m \
            -XX:+HeapDumpOnOutOfMemoryError \
            -XX:-UseGCOverheadLimit \
            -XX:+ExplicitGCInvokesConcurrent \
            -XX:+PrintGCDateStamps -XX:+PrintGCDetails \
            -XX:+PrintTenuringDistribution \
            -XX:+CMSClassUnloadingEnabled \
            -XX:+UseConcMarkSweepGC \
        "
    else
        export JAVA_OPTS=""
    fi

Problems filling caches

If Asgard is failing to start up after several minutes, check your asgard.log file. See if it shows a lot of exceptions filling caches, like one of these messages:

[2012-11-05 17:08:43,772] [background-process-4] com.netflix.asgard.CachedMap Exception filling cache us-east-1 Security Group
com.amazonaws.AmazonClientException: Unable to unmarshall response (ParseError at [row,col]:[5097,13]
Message: Read timed out)
[2012-11-26 17:47:32,905] [background-process-6] com.netflix.asgard.CachedMap    Exception filling cache ap-northeast-1 Spot Instance Request
com.amazonaws.AmazonClientException: Unable to execute HTTP request: Connect to ec2.ap-northeast-1.amazonaws.com/ec2.ap-northeast-1.amazonaws.com/27.0.2.68 timed out

If so, there might be a lot of metadata in your AWS accounts and/or you may have a flaky network connection between your Asgard instance and some AWS API endpoints.

Find stale caches

Check http://localhost:8080/cache/list to identify which caches are failing to load. Note that some of them can take a few minutes.

Manual single cache fill

If you suspect a cache recently failed to load once and you want to instigate an immediate attempt to fill that cache again, find the cache's name from the cache list linked above, such as Multi-region Queue, and then send an HTTP POST request to Asgard like this:

curl -d "id=Multi-region Queue" http://localhost:8080/cache/fill

If you have enough patience then there is no reason to force a cache to fill, because all caches try to fill themselves regularly anyway.

Skip cache fill

If you want to start up Asgard and let users access it before all the caches are warm, then some of your cloud object lists will be empty for a while until their caches succeed in pulling data from Amazon and other endpoints. To skip the cache fill initialization process, start up Asgard with this JVM system property flag: -DskipCacheFill=true

Reduce AWS regions

You can also limit the set of AWS regions your Asgard instance will interact with. To specify the region names that you want to use, start up Asgard with a comma-separated list of region names in this JVM system property: -DonlyRegions=us-east-1,us-west-1,us-west-2,eu-west-1,ap-northeast-1

In Tomcat, you can use skipCacheFill and/or onlyRegions by adding parameters to the JAVA_OPTS configuration in your apache-tomcat-x/bin/setenv.sh file

    if [ "$1" == "start" ]; then
        export JAVA_OPTS=" \
            -DskipCacheFill=true \
            -DonlyRegions=us-east-1,us-west-2,sa-east-1,ap-northeast-1 \
        "
    else
        export JAVA_OPTS=""
    fi

Get a thread dump from Tomcat

If you want to diagnose the state of a misbehaving server, run this command to dump the current threads to catalina.out:

kill -3 `ps -ef | grep java.*tomcat | grep -v grep | awk '{print$2}'` &

Kill Tomcat process if stuck

On Mac or Linux, if shutdown.sh does not successfully stop Tomcat, run this command to kill the Tomcat process:

kill -9 `ps -ef | grep java.*tomcat | grep -v grep | awk '{print$2}'` &

Delete Tomcat "work" directory

Sometimes a problem with a remote dependency causes all Asgard users to require an upgrade. If everyone's Asgard instance seems happy with the upgrade except yours, try deleting the work directory in Tomcat. This can help remove any cached files from an older Asgard installation.

Check system properties and environment variables

To be certain of the environment settings of your Asgard instance, access http://localhost:8080/server/props to look at things your JDK vendor and version number, your grails.env, GRAILS_HOME, user.home, user.name, java.version, grails.version and anything else that looks like a clue for troubleshooting.

Known Issues

Setting up Asgard fails when the user's home directory is read-only

This seems to mainly affect Windows users, but there have been reports of Linux users seeing this on occasion. If the Asgard config screen won't save credentials, try setting an ASGARD_HOME environment variable to a known writable directory and restart Tomcat.

Unable to access Asgard with Firefox

There is currently a sporadic bug on some forms in Asgard when used in Firefox. For safety while we work on a robust fix, Asgard is read-only in Firefox.