Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AATAMS_ACOUSTIC_REPORTING: deadlock #311

Closed
xhoenner opened this issue Oct 19, 2015 · 30 comments
Closed

AATAMS_ACOUSTIC_REPORTING: deadlock #311

xhoenner opened this issue Oct 19, 2015 · 30 comments
Assignees

Comments

@xhoenner
Copy link
Contributor

Harvester failed on 10-nsp with the following deadlock error returned:

Exception in component iPostgresqlOutput_4
org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 1214 waits for AccessExclusiveLock on relation 12888578 of database 382180; blocked by process 9072.
Process 9072 waits for AccessShareLock on relation 27608909 of database 382180; blocked by process 1214.
  Hint: See server log for query details.
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2157)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1886)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:555)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:403)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:331)
        at aatams_acoustic_reporting.populate_reporting_views_0_1.Populate_reporting_views.tPostgresqlInput_5Process(Populate_reporting_views.java:5758)
        at aatams_acoustic_reporting.populate_reporting_views_0_1.Populate_reporting_views.tPostgresqlInput_4Process(Populate_reporting_views.java:3706)
        at aatams_acoustic_reporting.populate_reporting_views_0_1.Populate_reporting_views.tPostgresqlInput_3Process(Populate_reporting_views.java:2835)
        at aatams_acoustic_reporting.populate_reporting_views_0_1.Populate_reporting_views.tPostgresqlRow_1Process(Populate_reporting_views.java:1791)
        at aatams_acoustic_reporting.populate_reporting_views_0_1.Populate_reporting_views.tPostgresqlInput_1Process(Populate_reporting_views.java:1662)
        at aatams_acoustic_reporting.populate_reporting_views_0_1.Populate_reporting_views.tPostgresqlConnection_1Process(Populate_reporting_views.java:631)
        at aatams_acoustic_reporting.populate_reporting_views_0_1.Populate_reporting_views.runJobInTOS(Populate_reporting_views.java:9836)
        at aatams_acoustic_reporting.populate_reporting_views_0_1.Populate_reporting_views.runJob(Populate_reporting_views.java:9711)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.tRunJob_2Process(aatams_acoustic_harvester.java:2282)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.tRunJob_4Process(aatams_acoustic_harvester.java:2162)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.tRunJob_1Process(aatams_acoustic_harvester.java:1990)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.iPostgresqlDbUpdate_1Process(aatams_acoustic_harvester.java:1818)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.runJobInTOS(aatams_acoustic_harvester.java:4259)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.main(aatams_acoustic_harvester.java:4051)
Exception in component tRunJob_2

This error occurred while populating the reporting views (see 'Populate_reporting_views' subjob), to be more specific the following view: "aatams_acoustic_project_all_deployments_view". In this subjob, all the harvester does is run queries against dbprod (Destination database) so I don't think this is related to the AATAMS database or the web app.

Importantly, the same harvester ran successfully on 14-nsp at the same time... any idea @danfruehauf?

@danfruehauf
Copy link
Contributor

@xhoenner No idea. But maybe the answer lies in the question? - "In the same time..."

@xhoenner
Copy link
Contributor Author

so what? re-run the harvester tonight on 10-nsp and see if that works?

@danfruehauf
Copy link
Contributor

I don't know if those 2 harvesters running together can lock same tables etc, so this is why I mentioned it.

Yes, run on 10-nsp-mel, see what happens. We can also schedule them to run on different times.

@xhoenner
Copy link
Contributor Author

Alright we don't have the deadlock issue anymore, but have a weird connection issue. Have the data bags been changes @danfruehauf ?

In finally block
Exception in component tPostgresqlConnection_1
org.postgresql.util.PSQLException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:207)
        at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:64)
        at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:136)
        at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:29)
        at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:21)
        at org.postgresql.jdbc3g.Jdbc3gConnection.<init>(Jdbc3gConnection.java:24)
        at org.postgresql.Driver.makeConnection(Driver.java:393)
        at org.postgresql.Driver.connect(Driver.java:267)
        at java.sql.DriverManager.getConnection(DriverManager.java:571)
        at java.sql.DriverManager.getConnection(DriverManager.java:215)
        at aatams_acoustic_reporting.copy_tables_0_1.Copy_Tables.tPostgresqlConnection_1Process(Copy_Tables.java:771)
        at aatams_acoustic_reporting.copy_tables_0_1.Copy_Tables.runJobInTOS(Copy_Tables.java:11542)
        at aatams_acoustic_reporting.copy_tables_0_1.Copy_Tables.runJob(Copy_Tables.java:11393)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.tRunJob_1Process(aatams_acoustic_harvester.java:1938)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.iPostgresqlDbUpdate_1Process(aatams_acoustic_harvester.java:1818)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.runJobInTOS(aatams_acoustic_harvester.java:4259)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.main(aatams_acoustic_harvester.java:4051)
Caused by: java.net.ConnectException: Connection timed out
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:579)
        at java.net.Socket.connect(Socket.java:528)
        at org.postgresql.core.PGStream.<init>(PGStream.java:60)
        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:101)
        ... 16 more
Exception in component tRunJob_1

@danfruehauf
Copy link
Contributor

@xhoenner The credentials look OK. No idea...

@xhoenner
Copy link
Contributor Author

xhoenner commented Dec 9, 2015

@danfruehauf please make changes in chef to run this harvester on another day (Thursday?), I ran it on Monday and Tuesday this week and it didn't throw this error.

@danfruehauf
Copy link
Contributor

Changed to Thursday.

@xhoenner
Copy link
Contributor Author

xhoenner commented Jan 3, 2016

This seems to have solved this issue.

@xhoenner xhoenner closed this as completed Jan 3, 2016
@xhoenner xhoenner reopened this Nov 24, 2016
@xhoenner
Copy link
Contributor Author

Occurred yesterday on 10aws @lwgordonimos

In finally block
Exception in component iPostgresqlOutput_4
org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 27575 waits for AccessExclusiveLock on relation 31181122 of database 16415; blocked by process 25913.
Process 25913 waits for AccessShareLock on relation 48016904 of database 16415; blocked by process 27575.
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1592)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1327)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:192)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:451)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:336)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:282)
        at aatams_acoustic_reporting.populate_layers_0_1.Populate_layers.tPostgresqlInput_3Process(Populate_layers.java:3660)
        at aatams_acoustic_reporting.populate_layers_0_1.Populate_layers.tPostgresqlInput_2Process(Populate_layers.java:3180)
        at aatams_acoustic_reporting.populate_layers_0_1.Populate_layers.tPostgresqlInput_1Process(Populate_layers.java:2027)
        at aatams_acoustic_reporting.populate_layers_0_1.Populate_layers.tPostgresqlConnection_1Process(Populate_layers.java:841)
        at aatams_acoustic_reporting.populate_layers_0_1.Populate_layers.iIncludeSdiLibraries_1Process(Populate_layers.java:718)
        at aatams_acoustic_reporting.populate_layers_0_1.Populate_layers.runJobInTOS(Populate_layers.java:4435)
        at aatams_acoustic_reporting.populate_layers_0_1.Populate_layers.runJob(Populate_layers.java:4302)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.tRunJob_4Process(aatams_acoustic_harvester.java:2273)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.tRunJob_1Process(aatams_acoustic_harvester.java:2153)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.iPostgresqlDbUpdate_1Process(aatams_acoustic_harvester.java:1981)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.runJobInTOS(aatams_acoustic_harvester.java:4422)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.main(aatams_acoustic_harvester.java:4214)
Exception in component tRunJob_4
java.lang.RuntimeException: Child job running failed
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.tRunJob_4Process(aatams_acoustic_harvester.java:2284)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.tRunJob_1Process(aatams_acoustic_harvester.java:2153)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.iPostgresqlDbUpdate_1Process(aatams_acoustic_harvester.java:1981)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.runJobInTOS(aatams_acoustic_harvester.java:4422)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.main(aatams_acoustic_harvester.java:4214)
finish;2016-11-25 03:00:19+11:00 538 minutes

@julian1
Copy link
Contributor

julian1 commented Nov 24, 2016

neat!

@ghost
Copy link

ghost commented Nov 25, 2016

2016-11-24 16:00:19 GMT 52.64.146.138 dw_aatams_acoustic harvest 27575 40P01ERROR:  deadlock detected
2016-11-24 16:00:19 GMT 52.64.146.138 dw_aatams_acoustic harvest 27575 40P01DETAIL:  Process 27575 waits for AccessExclusiveLock on relation 31181122 of database 16415; blocked by process 25913.
        Process 25913 waits for AccessShareLock on relation 48016904 of database 16415; blocked by process 27575.
        Process 27575: TRUNCATE TABLE "dw_aatams_acoustic"."installation_summary"
        Process 25913: LOCK TABLE dw_aatams_acoustic.aatams_acoustic_detections_map IN ACCESS SHARE MODE
2016-11-24 16:00:19 GMT 52.64.146.138 dw_aatams_acoustic harvest 27575 40P01HINT:  See server log for query details.
2016-11-24 16:00:19 GMT 52.64.146.138 dw_aatams_acoustic harvest 27575 40P01STATEMENT:  TRUNCATE TABLE "dw_aatams_acoustic"."installation_summary"

Unfortunately can't tell what those PIDs were exactly, but the TRUNCATE was waiting for something that locked aatams_acoustic_detections_map.

@julian1
Copy link
Contributor

julian1 commented Nov 25, 2016

Is the harvester just copying tables from the aatams database into harvest - in order to perform reporting?

I wonder if it would be possible to move the aatams db in under a harvest schema. Then the harvester could be eliminated altogether. If the data is also being used to serve geoserver layers - then presumably that could be handled with some supporting view code?

@ghost
Copy link

ghost commented Nov 25, 2016

The other thing I've done in the past (not with Postgres, but Oracle/SQL Server with good success) is to use a linked database to transparently access a remote database in a "local" namespace. Could also be an option if combining databases is unsuitable due to size or any other concern which keeps them separate...

Have you used this in Postgres?
https://www.postgresql.org/docs/9.4/static/dblink.html

@julian1
Copy link
Contributor

julian1 commented Nov 25, 2016

wasn't aware of dblink - looks interesting. We did use postgres read replication - which was good in that it separated out the geoserver,wms,wfs load from the reporting/backup load.

@xhoenner
Copy link
Contributor Author

Is the harvester just copying tables from the aatams database into harvest - in order to perform reporting?

That's the purpose of an entire subjob yet. The other subjobs generate the views for the reporting and Geoserver.

@ghost
Copy link

ghost commented Nov 25, 2016

Would be interesting to see if the views could be modified to use a linked database under the hood, e.g.

http://stackoverflow.com/questions/13993302/postgresql-slow-query-dblink-and-inner-join
http://stackoverflow.com/questions/9493332/postgres-how-to-create-a-select-query-that-would-join-two-databases

SELECT * 
FROM   table1 tb1 
LEFT   JOIN (
   SELECT *
   FROM   dblink('dbname=db2','SELECT id, code FROM table2')
   AS     tb2(id int, code text);
   ) USING (code)

@julian1
Copy link
Contributor

julian1 commented Nov 25, 2016

That's very neat.

@julian1
Copy link
Contributor

julian1 commented Dec 18, 2016

@julian1
Copy link
Contributor

julian1 commented Dec 18, 2016

@xhoenner

@xhoenner
Copy link
Contributor Author

haha that's exactly it, the much dreaded snail! @pblain, sounds like it is time for Talend harvesters to have their own icons, similarly to Jenkins for instance. Any chance this can be added to the backlog?

shot-2014 04 01-18 31 10 copy

@anguss00
Copy link
Contributor

Morning made @xhoenner

@jonescc
Copy link
Contributor

jonescc commented Dec 19, 2016

So, talking to @xhoenner this morning, the harvester currently:

  • copies required data from the aatams database to the harvest database
    • includes valid_detections (which is huge) and associated tables.
    • each table is copied one after another with a truncate performed on the target table before each copy
  • copies data from views on the copied tables to data and map tables
    • again target tables are truncated before copying the data in
  • copies data from views on the copied tables to reporting tables
    • again truncating target tables

Truncates are performed in the same transaction as the copying of data. In postgres, truncates take an exclusive lock on the table so nothing else can access the table outside of the transaction until the transaction is committed. The truncate cannot be performed until the exclusive access can be granted. Many of the copies take a long long time to perform when using the valid_detections data as its source so exclusive locks are in place for long long periods of time preventing any other access to the data (e.g. for data downloads, backups etc). Other harvesters have moved away from truncating tables in favor of updating file related data only so exclusive access locks of this type aren't required.

@jonescc
Copy link
Contributor

jonescc commented Dec 20, 2016

All the above takes place in the aatams_acoustic_reporting schema.

Once a month data from reporting views are copied to the reporting schema for reporting (this is summary data so it is not a long running process).

@xhoenner
Copy link
Contributor Author

All the above takes place in the aatams_acoustic_reporting schema.

It actually takes place in the dw_aatams_acoustic schema. Then you're right, summary data for reporting is copied from the dw_aatams_acoustic to the reporting schema.

@jonescc
Copy link
Contributor

jonescc commented Dec 20, 2016

Thanks for the clarification @xhoenner.

Some possible ways we could look at resolving this:

  • put truncate into a separate transaction
    • data would be deleted for the duration of copies/the transaction.
  • use 'clear' instead of 'truncate' - clear deletes from the table and doesn't take an exclusive lock.
    • previous data will be visible outside the transaction until the transaction is committed
    • will probably take longer and may use more disk space (twice as much?)
    • doesn't reduce the amount of time required to run this harvester
  • move dw_acoustic_harvester_schema to the aatams database
    • would remove the initial copy all data step
    • data/map would need to be serverd from the aatams database
    • still need to use 'clear'
  • attempt to deliver map/data/reporting data using views on aatams tables
  • make aatams a schema on the harvest database

@jonescc
Copy link
Contributor

jonescc commented Dec 20, 2016

The dblink functionality looks pretty limited - all the data that you want to work with needs to be fetched locally anyway.

@julian1
Copy link
Contributor

julian1 commented Dec 20, 2016

The dblink functionality looks pretty limited - all the data that you want to work with needs to be fetched locally anyway.

What about using dblink to link the data. And then using the projection as the source for the copy action to co-locate it in one place so that it can be accessed efficiently?

That might eliminate the need to maintain a complicated harvester as well - since the required sql to perform the copy could just be ordinary reporting sql.

@jonescc
Copy link
Contributor

jonescc commented Dec 20, 2016

Good suggestion - no need to stream the data via talend

@jonescc
Copy link
Contributor

jonescc commented Dec 20, 2016

But not required if we are working in the aatams database.

@xhoenner
Copy link
Contributor Author

seems to be fixed as the harvester ran successfully on 10-aws.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants