AATAMS_ACOUSTIC_REPORTING: deadlock #311

xhoenner · 2015-10-19T03:40:46Z

Harvester failed on 10-nsp with the following deadlock error returned:

Exception in component iPostgresqlOutput_4
org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 1214 waits for AccessExclusiveLock on relation 12888578 of database 382180; blocked by process 9072.
Process 9072 waits for AccessShareLock on relation 27608909 of database 382180; blocked by process 1214.
  Hint: See server log for query details.
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2157)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1886)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:555)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:403)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:331)
        at aatams_acoustic_reporting.populate_reporting_views_0_1.Populate_reporting_views.tPostgresqlInput_5Process(Populate_reporting_views.java:5758)
        at aatams_acoustic_reporting.populate_reporting_views_0_1.Populate_reporting_views.tPostgresqlInput_4Process(Populate_reporting_views.java:3706)
        at aatams_acoustic_reporting.populate_reporting_views_0_1.Populate_reporting_views.tPostgresqlInput_3Process(Populate_reporting_views.java:2835)
        at aatams_acoustic_reporting.populate_reporting_views_0_1.Populate_reporting_views.tPostgresqlRow_1Process(Populate_reporting_views.java:1791)
        at aatams_acoustic_reporting.populate_reporting_views_0_1.Populate_reporting_views.tPostgresqlInput_1Process(Populate_reporting_views.java:1662)
        at aatams_acoustic_reporting.populate_reporting_views_0_1.Populate_reporting_views.tPostgresqlConnection_1Process(Populate_reporting_views.java:631)
        at aatams_acoustic_reporting.populate_reporting_views_0_1.Populate_reporting_views.runJobInTOS(Populate_reporting_views.java:9836)
        at aatams_acoustic_reporting.populate_reporting_views_0_1.Populate_reporting_views.runJob(Populate_reporting_views.java:9711)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.tRunJob_2Process(aatams_acoustic_harvester.java:2282)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.tRunJob_4Process(aatams_acoustic_harvester.java:2162)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.tRunJob_1Process(aatams_acoustic_harvester.java:1990)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.iPostgresqlDbUpdate_1Process(aatams_acoustic_harvester.java:1818)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.runJobInTOS(aatams_acoustic_harvester.java:4259)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.main(aatams_acoustic_harvester.java:4051)
Exception in component tRunJob_2

This error occurred while populating the reporting views (see 'Populate_reporting_views' subjob), to be more specific the following view: "aatams_acoustic_project_all_deployments_view". In this subjob, all the harvester does is run queries against dbprod (Destination database) so I don't think this is related to the AATAMS database or the web app.

Importantly, the same harvester ran successfully on 14-nsp at the same time... any idea @danfruehauf?

The text was updated successfully, but these errors were encountered:

danfruehauf · 2015-10-19T03:46:03Z

@xhoenner No idea. But maybe the answer lies in the question? - "In the same time..."

xhoenner · 2015-10-19T03:57:25Z

so what? re-run the harvester tonight on 10-nsp and see if that works?

danfruehauf · 2015-10-19T04:02:40Z

I don't know if those 2 harvesters running together can lock same tables etc, so this is why I mentioned it.

Yes, run on 10-nsp-mel, see what happens. We can also schedule them to run on different times.

xhoenner · 2015-10-26T04:30:05Z

Alright we don't have the deadlock issue anymore, but have a weird connection issue. Have the data bags been changes @danfruehauf ?

In finally block
Exception in component tPostgresqlConnection_1
org.postgresql.util.PSQLException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:207)
        at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:64)
        at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:136)
        at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:29)
        at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:21)
        at org.postgresql.jdbc3g.Jdbc3gConnection.<init>(Jdbc3gConnection.java:24)
        at org.postgresql.Driver.makeConnection(Driver.java:393)
        at org.postgresql.Driver.connect(Driver.java:267)
        at java.sql.DriverManager.getConnection(DriverManager.java:571)
        at java.sql.DriverManager.getConnection(DriverManager.java:215)
        at aatams_acoustic_reporting.copy_tables_0_1.Copy_Tables.tPostgresqlConnection_1Process(Copy_Tables.java:771)
        at aatams_acoustic_reporting.copy_tables_0_1.Copy_Tables.runJobInTOS(Copy_Tables.java:11542)
        at aatams_acoustic_reporting.copy_tables_0_1.Copy_Tables.runJob(Copy_Tables.java:11393)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.tRunJob_1Process(aatams_acoustic_harvester.java:1938)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.iPostgresqlDbUpdate_1Process(aatams_acoustic_harvester.java:1818)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.runJobInTOS(aatams_acoustic_harvester.java:4259)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.main(aatams_acoustic_harvester.java:4051)
Caused by: java.net.ConnectException: Connection timed out
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:579)
        at java.net.Socket.connect(Socket.java:528)
        at org.postgresql.core.PGStream.<init>(PGStream.java:60)
        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:101)
        ... 16 more
Exception in component tRunJob_1

danfruehauf · 2015-10-26T05:38:46Z

@xhoenner The credentials look OK. No idea...

xhoenner · 2015-12-09T22:51:34Z

@danfruehauf please make changes in chef to run this harvester on another day (Thursday?), I ran it on Monday and Tuesday this week and it didn't throw this error.

danfruehauf · 2015-12-10T00:26:53Z

Changed to Thursday.

xhoenner · 2016-01-03T23:01:13Z

This seems to have solved this issue.

xhoenner · 2016-11-24T23:12:33Z

Occurred yesterday on 10aws @lwgordonimos

In finally block
Exception in component iPostgresqlOutput_4
org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 27575 waits for AccessExclusiveLock on relation 31181122 of database 16415; blocked by process 25913.
Process 25913 waits for AccessShareLock on relation 48016904 of database 16415; blocked by process 27575.
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1592)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1327)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:192)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:451)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:336)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:282)
        at aatams_acoustic_reporting.populate_layers_0_1.Populate_layers.tPostgresqlInput_3Process(Populate_layers.java:3660)
        at aatams_acoustic_reporting.populate_layers_0_1.Populate_layers.tPostgresqlInput_2Process(Populate_layers.java:3180)
        at aatams_acoustic_reporting.populate_layers_0_1.Populate_layers.tPostgresqlInput_1Process(Populate_layers.java:2027)
        at aatams_acoustic_reporting.populate_layers_0_1.Populate_layers.tPostgresqlConnection_1Process(Populate_layers.java:841)
        at aatams_acoustic_reporting.populate_layers_0_1.Populate_layers.iIncludeSdiLibraries_1Process(Populate_layers.java:718)
        at aatams_acoustic_reporting.populate_layers_0_1.Populate_layers.runJobInTOS(Populate_layers.java:4435)
        at aatams_acoustic_reporting.populate_layers_0_1.Populate_layers.runJob(Populate_layers.java:4302)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.tRunJob_4Process(aatams_acoustic_harvester.java:2273)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.tRunJob_1Process(aatams_acoustic_harvester.java:2153)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.iPostgresqlDbUpdate_1Process(aatams_acoustic_harvester.java:1981)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.runJobInTOS(aatams_acoustic_harvester.java:4422)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.main(aatams_acoustic_harvester.java:4214)
Exception in component tRunJob_4
java.lang.RuntimeException: Child job running failed
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.tRunJob_4Process(aatams_acoustic_harvester.java:2284)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.tRunJob_1Process(aatams_acoustic_harvester.java:2153)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.iPostgresqlDbUpdate_1Process(aatams_acoustic_harvester.java:1981)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.runJobInTOS(aatams_acoustic_harvester.java:4422)
        at aatams_acoustic_reporting.aatams_acoustic_harvester_0_1.aatams_acoustic_harvester.main(aatams_acoustic_harvester.java:4214)
finish;2016-11-25 03:00:19+11:00 538 minutes

julian1 · 2016-11-24T23:44:54Z

neat!

ghost · 2016-11-25T00:05:58Z

2016-11-24 16:00:19 GMT 52.64.146.138 dw_aatams_acoustic harvest 27575 40P01ERROR:  deadlock detected
2016-11-24 16:00:19 GMT 52.64.146.138 dw_aatams_acoustic harvest 27575 40P01DETAIL:  Process 27575 waits for AccessExclusiveLock on relation 31181122 of database 16415; blocked by process 25913.
        Process 25913 waits for AccessShareLock on relation 48016904 of database 16415; blocked by process 27575.
        Process 27575: TRUNCATE TABLE "dw_aatams_acoustic"."installation_summary"
        Process 25913: LOCK TABLE dw_aatams_acoustic.aatams_acoustic_detections_map IN ACCESS SHARE MODE
2016-11-24 16:00:19 GMT 52.64.146.138 dw_aatams_acoustic harvest 27575 40P01HINT:  See server log for query details.
2016-11-24 16:00:19 GMT 52.64.146.138 dw_aatams_acoustic harvest 27575 40P01STATEMENT:  TRUNCATE TABLE "dw_aatams_acoustic"."installation_summary"

Unfortunately can't tell what those PIDs were exactly, but the TRUNCATE was waiting for something that locked aatams_acoustic_detections_map.

julian1 · 2016-11-25T00:13:08Z

Is the harvester just copying tables from the aatams database into harvest - in order to perform reporting?

I wonder if it would be possible to move the aatams db in under a harvest schema. Then the harvester could be eliminated altogether. If the data is also being used to serve geoserver layers - then presumably that could be handled with some supporting view code?

ghost · 2016-11-25T00:17:00Z

The other thing I've done in the past (not with Postgres, but Oracle/SQL Server with good success) is to use a linked database to transparently access a remote database in a "local" namespace. Could also be an option if combining databases is unsuitable due to size or any other concern which keeps them separate...

Have you used this in Postgres?
https://www.postgresql.org/docs/9.4/static/dblink.html

julian1 · 2016-11-25T00:23:46Z

wasn't aware of dblink - looks interesting. We did use postgres read replication - which was good in that it separated out the geoserver,wms,wfs load from the reporting/backup load.

xhoenner · 2016-11-25T00:39:39Z

Is the harvester just copying tables from the aatams database into harvest - in order to perform reporting?

That's the purpose of an entire subjob yet. The other subjobs generate the views for the reporting and Geoserver.

ghost · 2016-11-25T00:53:14Z

Would be interesting to see if the views could be modified to use a linked database under the hood, e.g.

http://stackoverflow.com/questions/13993302/postgresql-slow-query-dblink-and-inner-join
http://stackoverflow.com/questions/9493332/postgres-how-to-create-a-select-query-that-would-join-two-databases

SELECT * 
FROM   table1 tb1 
LEFT   JOIN (
   SELECT *
   FROM   dblink('dbname=db2','SELECT id, code FROM table2')
   AS     tb2(id int, code text);
   ) USING (code)

julian1 · 2016-11-25T00:55:22Z

That's very neat.

julian1 · 2016-12-18T20:39:41Z

https://www.youtube.com/watch?v=SEwUMiODTv4

julian1 · 2016-12-18T20:40:08Z

@xhoenner

xhoenner · 2016-12-18T22:47:21Z

haha that's exactly it, the much dreaded snail! @pblain, sounds like it is time for Talend harvesters to have their own icons, similarly to Jenkins for instance. Any chance this can be added to the backlog?

anguss00 · 2016-12-18T22:48:05Z

Morning made @xhoenner

jonescc · 2016-12-19T23:28:43Z

So, talking to @xhoenner this morning, the harvester currently:

copies required data from the aatams database to the harvest database
- includes valid_detections (which is huge) and associated tables.
- each table is copied one after another with a truncate performed on the target table before each copy
copies data from views on the copied tables to data and map tables
- again target tables are truncated before copying the data in
copies data from views on the copied tables to reporting tables
- again truncating target tables

Truncates are performed in the same transaction as the copying of data. In postgres, truncates take an exclusive lock on the table so nothing else can access the table outside of the transaction until the transaction is committed. The truncate cannot be performed until the exclusive access can be granted. Many of the copies take a long long time to perform when using the valid_detections data as its source so exclusive locks are in place for long long periods of time preventing any other access to the data (e.g. for data downloads, backups etc). Other harvesters have moved away from truncating tables in favor of updating file related data only so exclusive access locks of this type aren't required.

jonescc · 2016-12-20T00:19:57Z

All the above takes place in the aatams_acoustic_reporting schema.

Once a month data from reporting views are copied to the reporting schema for reporting (this is summary data so it is not a long running process).

xhoenner · 2016-12-20T00:24:16Z

All the above takes place in the aatams_acoustic_reporting schema.

It actually takes place in the dw_aatams_acoustic schema. Then you're right, summary data for reporting is copied from the dw_aatams_acoustic to the reporting schema.

jonescc · 2016-12-20T00:35:43Z

Thanks for the clarification @xhoenner.

Some possible ways we could look at resolving this:

put truncate into a separate transaction
- data would be deleted for the duration of copies/the transaction.
use 'clear' instead of 'truncate' - clear deletes from the table and doesn't take an exclusive lock.
- previous data will be visible outside the transaction until the transaction is committed
- will probably take longer and may use more disk space (twice as much?)
- doesn't reduce the amount of time required to run this harvester
move dw_acoustic_harvester_schema to the aatams database
- would remove the initial copy all data step
- data/map would need to be serverd from the aatams database
- still need to use 'clear'
attempt to deliver map/data/reporting data using views on aatams tables
make aatams a schema on the harvest database

jonescc · 2016-12-20T00:44:22Z

The dblink functionality looks pretty limited - all the data that you want to work with needs to be fetched locally anyway.

julian1 · 2016-12-20T00:47:34Z

The dblink functionality looks pretty limited - all the data that you want to work with needs to be fetched locally anyway.

What about using dblink to link the data. And then using the projection as the source for the copy action to co-locate it in one place so that it can be accessed efficiently?

That might eliminate the need to maintain a complicated harvester as well - since the required sql to perform the copy could just be ordinary reporting sql.

jonescc · 2016-12-20T00:48:24Z

Good suggestion - no need to stream the data via talend

jonescc · 2016-12-20T00:48:48Z

But not required if we are working in the aatams database.

xhoenner · 2017-01-17T20:37:38Z

seems to be fixed as the harvester ran successfully on 10-aws.

danfruehauf added bug in_production labels Oct 19, 2015

xhoenner closed this as completed Jan 3, 2016

xhoenner reopened this Nov 24, 2016

xhoenner mentioned this issue Dec 21, 2016

Change all truncates in iPostgresqlOutput to clear #488

Merged

pblain assigned smancini Jan 17, 2017

xhoenner closed this as completed Jan 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AATAMS_ACOUSTIC_REPORTING: deadlock #311

AATAMS_ACOUSTIC_REPORTING: deadlock #311

xhoenner commented Oct 19, 2015

danfruehauf commented Oct 19, 2015

xhoenner commented Oct 19, 2015

danfruehauf commented Oct 19, 2015

xhoenner commented Oct 26, 2015

danfruehauf commented Oct 26, 2015

xhoenner commented Dec 9, 2015

danfruehauf commented Dec 10, 2015

xhoenner commented Jan 3, 2016

xhoenner commented Nov 24, 2016

julian1 commented Nov 24, 2016

ghost commented Nov 25, 2016

julian1 commented Nov 25, 2016

ghost commented Nov 25, 2016

julian1 commented Nov 25, 2016 •

edited

Loading

xhoenner commented Nov 25, 2016

ghost commented Nov 25, 2016

julian1 commented Nov 25, 2016

julian1 commented Dec 18, 2016

julian1 commented Dec 18, 2016

xhoenner commented Dec 18, 2016

anguss00 commented Dec 18, 2016

jonescc commented Dec 19, 2016

jonescc commented Dec 20, 2016

xhoenner commented Dec 20, 2016

jonescc commented Dec 20, 2016 •

edited

Loading

jonescc commented Dec 20, 2016 •

edited

Loading

julian1 commented Dec 20, 2016 •

edited

Loading

jonescc commented Dec 20, 2016

jonescc commented Dec 20, 2016

xhoenner commented Jan 17, 2017

AATAMS_ACOUSTIC_REPORTING: deadlock #311

AATAMS_ACOUSTIC_REPORTING: deadlock #311

Comments

xhoenner commented Oct 19, 2015

danfruehauf commented Oct 19, 2015

xhoenner commented Oct 19, 2015

danfruehauf commented Oct 19, 2015

xhoenner commented Oct 26, 2015

danfruehauf commented Oct 26, 2015

xhoenner commented Dec 9, 2015

danfruehauf commented Dec 10, 2015

xhoenner commented Jan 3, 2016

xhoenner commented Nov 24, 2016

julian1 commented Nov 24, 2016

ghost commented Nov 25, 2016

julian1 commented Nov 25, 2016

ghost commented Nov 25, 2016

julian1 commented Nov 25, 2016 • edited Loading

xhoenner commented Nov 25, 2016

ghost commented Nov 25, 2016

julian1 commented Nov 25, 2016

julian1 commented Dec 18, 2016

julian1 commented Dec 18, 2016

xhoenner commented Dec 18, 2016

anguss00 commented Dec 18, 2016

jonescc commented Dec 19, 2016

jonescc commented Dec 20, 2016

xhoenner commented Dec 20, 2016

jonescc commented Dec 20, 2016 • edited Loading

jonescc commented Dec 20, 2016 • edited Loading

julian1 commented Dec 20, 2016 • edited Loading

jonescc commented Dec 20, 2016

jonescc commented Dec 20, 2016

xhoenner commented Jan 17, 2017

julian1 commented Nov 25, 2016 •

edited

Loading

jonescc commented Dec 20, 2016 •

edited

Loading

jonescc commented Dec 20, 2016 •

edited

Loading

julian1 commented Dec 20, 2016 •

edited

Loading