Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to select Postgres DB database? #16

Closed
igorbarinov opened this issue Jun 15, 2015 · 21 comments
Closed

How to select Postgres DB database? #16

igorbarinov opened this issue Jun 15, 2015 · 21 comments

Comments

@igorbarinov
Copy link

Hi!
I am trying to set up bottledwater-pg with my existing docker containers for data ingestion from Postgres to Kafka.
How to set up database to use when I run:

docker run -d --name bottledwater --hostname bottledwater --link postgres:postgres
--link kafka:kafka --link schema-registry:schema-registry confluent/bottledwater:0.1

Now it uses postgres db and I need to use other.

Thanks!

@ept
Copy link
Contributor

ept commented Jun 15, 2015

At the moment the database name is hard-coded in the CMD line of the Dockerfile (see build/Dockerfile.client). You can override it by providing the full bottledwater command line as an argument to docker run. It would look something like this:

docker run -d --name bottledwater --hostname bottledwater --link postgres:postgres \
    --link kafka:kafka --link schema-registry:schema-registry confluent/bottledwater:0.1 \
    sh -c '/usr/local/bin/bottledwater \
    --postgres "hostaddr=${POSTGRES_PORT_5432_TCP_ADDR} port=${POSTGRES_PORT_5432_TCP_PORT} dbname=postgres user=postgres" \
    --broker ${KAFKA_PORT_9092_TCP_ADDR}:${KAFKA_PORT_9092_TCP_PORT} \
    --schema-registry http://${SCHEMA_REGISTRY_PORT_8081_TCP_ADDR}:${SCHEMA_REGISTRY_PORT_8081_TCP_PORT}'

Set the dbname=postgres parameter to your own database name. Does that work?

@igorbarinov
Copy link
Author

@ept Martin, thanks! Now I've got it!

But when I try to connect bottledwater to the different database I see this error:

ubuntu@toshi500:~$ docker logs -f bottledwater Replication slot "bottledwater" exists, streaming changes from E/FE8D9010. /usr/local/bin/bottledwater: Replication stream was unexpectedly terminated: ERROR: replication slot "bottledwater" was not created in this database

Modified connection string with changed db link and dbname

docker run -d --name bottledwater --hostname bottledwater --link toshi_db_1:postgres \
    --link kafka:kafka --link schema-registry:schema-registry confluent/bottledwater:0.1 \
    sh -c '/usr/local/bin/bottledwater \
    --postgres "hostaddr=${POSTGRES_PORT_5432_TCP_ADDR} port=${POSTGRES_PORT_5432_TCP_PORT} dbname=toshi_development user=postgres" \
    --broker ${KAFKA_PORT_9092_TCP_ADDR}:${KAFKA_PORT_9092_TCP_PORT} \
    --schema-registry http://${SCHEMA_REGISTRY_PORT_8081_TCP_ADDR}:${SCHEMA_REGISTRY_PORT_8081_TCP_PORT}'

I checked if I have extension in DB:

toshi_development=# \df
                                                             List of functions
 Schema |           Name            | Result data type |                            Argument data types                            |  Type
--------+---------------------------+------------------+---------------------------------------------------------------------------+--------
 public | bottledwater_export       | SETOF bytea      | table_pattern text DEFAULT '%'::text, allow_unkeyed boolean DEFAULT false | normal
 public | bottledwater_frame_schema | text             |                                                                           | normal
 public | bottledwater_key_schema   | text             | name                                                                      | normal
 public | bottledwater_row_schema   | text             | name                                                                      | normal
(4 rows)

toshi_development=#

I appreciate any help. Thanks!

@igorbarinov
Copy link
Author

I checked for replication stream and it exists in DB. Still the same error:

postgres=# SELECT * FROM pg_replication_slots;
  slot_name   |    plugin    | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn
--------------+--------------+-----------+--------+----------+--------+------+--------------+-------------
 bottledwater | bottledwater | logical   |  12141 | postgres | f      |      |       374036 | E/FE8D9010
(1 row)

postgres=#

@igorbarinov
Copy link
Author

Looks like I have replication slots with the same name in both DB's :postgres and toshi_development. How can I drop one?

@igorbarinov
Copy link
Author

Ok, I drop the replication slot

toshi_development=# select * from pg_drop_replication_slot('bottledwater');
 pg_drop_replication_slot
--------------------------

(1 row)

and now I have

INFO:  bottledwater_export: Table public.blocks_transactions is keyed by index blocks_transactions_pkey
INFO:  bottledwater_export: Table public.unconfirmed_addresses_outputs is keyed by index unconfirmed_addresses_outputs_pkey
INFO:  bottledwater_export: Table public.unconfirmed_ledger_entries is keyed by index unconfirmed_ledger_entries_pkey
INFO:  bottledwater_export: Table public.unspent_outputs is keyed by index unspent_outputs_pkey
INFO:  bottledwater_export: Table public.addresses is keyed by index addresses_pkey
/usr/local/bin/bottledwater: While reading snapshot: PGRES_FATAL_ERROR: ERROR:  bottledwater_export: The following tables do not have a replica identity key:
        public.schema_info does not have a primary key.
        Please give them a primary key or set REPLICA IDENTITY USING INDEX.
        To ignore this issue, and export them anyway, use --allow-unkeyed
        (note that export of updates and deletes will then be incomplete).

@igorbarinov
Copy link
Author

Oh.. I've fixed everything with Postgres but now kafka doesn't accept it

INFO:  bottledwater_export: Table public.outputs is keyed by index outputs_pkey
INFO:  bottledwater_export: Table public.peers is keyed by index peers_pkey
INFO:  bottledwater_export: Table public.raw_blocks is keyed by index raw_blocks_pkey
INFO:  bottledwater_export: Table public.raw_transactions is keyed by index raw_transactions_pkey
INFO:  bottledwater_export: Table public.transactions is keyed by index transactions_pkey
INFO:  bottledwater_export: Table public.unconfirmed_addresses is keyed by index unconfirmed_addresses_pkey
INFO:  bottledwater_export: Table public.unconfirmed_inputs is keyed by index unconfirmed_inputs_pkey
INFO:  bottledwater_export: Table public.unconfirmed_outputs is keyed by index unconfirmed_outputs_pkey
INFO:  bottledwater_export: Table public.unconfirmed_raw_transactions is keyed by index unconfirmed_raw_transactions_pkey
INFO:  bottledwater_export: Table public.unconfirmed_transactions is keyed by index unconfirmed_transactions_pkey
INFO:  bottledwater_export: Table public.address_ledger_entries is keyed by index address_ledger_entries_pkey
INFO:  bottledwater_export: Table public.addresses_outputs is keyed by index addresses_outputs_pkey
INFO:  bottledwater_export: Table public.blocks_transactions is keyed by index blocks_transactions_pkey
INFO:  bottledwater_export: Table public.unconfirmed_addresses_outputs is keyed by index unconfirmed_addresses_outputs_pkey
INFO:  bottledwater_export: Table public.unconfirmed_ledger_entries is keyed by index unconfirmed_ledger_entries_pkey
INFO:  bottledwater_export: Table public.unspent_outputs is keyed by index unspent_outputs_pkey
INFO:  bottledwater_export: Table public.addresses is keyed by index addresses_pkey
INFO:  bottledwater_export: Table public.schema_info is keyed by index pk_version
/usr/local/bin/bottledwater: Schema registry returned HTTP status 500: Register schema operation failed while writing to the Kafka store
Dropping replication slot since the snapshot did not complete successfully.

@igorbarinov
Copy link
Author

After I rm and start zoo, avro and kafka containers the previous error gone.

Registered key schema for topic "inputs" with ID 3
Registered value schema for topic "inputs" with ID 4
/usr/local/bin/bottledwater: Could not receive snapshot data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Dropping replication slot since the snapshot did not complete successfully.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
/usr/local/bin/bottledwater: Command failed: DROP_REPLICATION_SLOT "bottledwater": server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

Look like I have troubles with RAM and Swap
I run on 4Gb EC2 instance with 8Gb of Swap. And in crash when bottledwater streams to kafka somewhere in the middle.
pic

how to calculate how much RAM do I need?

DB size

toshi_development=# SELECT pg_size_pretty(pg_database_size('toshi_development'));
 pg_size_pretty
----------------
 15 GB
(1 row)
toshi_development=# \d+
                                       List of relations
 Schema |                Name                 |   Type   |  Owner   |    Size    | Description
--------+-------------------------------------+----------+----------+------------+-------------
 public | address_ledger_entries              | table    | postgres | 843 MB     |
 public | address_ledger_entries_id_seq       | sequence | postgres | 8192 bytes |
 public | addresses                           | table    | postgres | 582 MB     |
 public | addresses_id_seq                    | sequence | postgres | 8192 bytes |
 public | addresses_outputs                   | table    | postgres | 317 MB     |
 public | addresses_outputs_id_seq            | sequence | postgres | 8192 bytes |
 public | blocks                              | table    | postgres | 87 MB      |
 public | blocks_id_seq                       | sequence | postgres | 8192 bytes |
 public | blocks_transactions                 | table    | postgres | 81 MB      |
 |      | inputs                              | table    | postgres | 1531 MB   --More--
 |      | inputs_id_seq                       | sequence | postgres | 8192 bytes--More--
 |      | outputs                             | table    | postgres | 1356 MB   --More--
 |      | outputs_id_seq                      | sequence | postgres | 8192 bytes--More--
 |      | peers                               | table    | postgres | 48 kB     --More--
 |      | peers_id_seq                        | sequence | postgres | 8192 bytes--More--
 |      | raw_blocks                          | table    | postgres | 1071 MB   --More--
 |      | raw_blocks_id_seq                   | sequence | postgres | 8192 bytes--More--
 |      | raw_transactions                    | table    | postgres | 1316 MB   --More--
 |      | raw_transactions_id_seq             | sequence | postgres | 8192 bytes--More--
 |      | schema_info                         | table    | postgres | 8192 bytes--More--
 |      | transactions                        | table    | postgres | 346 MB    --More--
 |      | transactions_id_seq                 | sequence | postgres | 8192 bytes--More--
 |      | unconfirmed_addresses               | table    | postgres | 600 kB    --More--
 |      | unconfirmed_addresses_id_seq        | sequence | postgres | 8192 bytes--More--
 |      | unconfirmed_addresses_outputs       | table    | postgres | 320 kB    --More--
 |      | unconfirmed_inputs                  | table    | postgres | 1512 kB   --More--
 |      | unconfirmed_inputs_id_seq           | sequence | postgres | 8192 bytes--More--
 |      | unconfirmed_ledger_entries          | table    | postgres | 608 kB    --More--
 |      | unconfirmed_ledger_entries_id_seq   | sequence | postgres | 8192 bytes--More--
 |      | unconfirmed_outputs                 | table    | postgres | 1496 kB   --More--
 |      | unconfirmed_outputs_id_seq          | sequence | postgres | 8192 bytes--More--
 |      | unconfirmed_raw_transactions        | table    | postgres | 2280 kB   --More--
 |      | unconfirmed_raw_transactions_id_seq | sequence | postgres | 8192 bytes--More--
 |      | unconfirmed_transactions            | table    | postgres | 728 kB    --More--
 |      | unconfirmed_transactions_id_seq     | sequence | postgres | 8192 bytes--More--
 |      | unspent_outputs                     | table    | postgres | 166 MB    --More--
 |      | unspent_outputs_id_seq              | sequence | postgres | 8192 bytes--More--
(37 rows)

@igorbarinov
Copy link
Author

Increased swap to 32Gb and passed this valley of death. But estimation of RAM I need is still actual.

Next problem is with avro. Any ideas how to debug?

/usr/local/bin/bottledwater: Replication stream was unexpectedly terminated: ERROR:  output_avro_change: row conversion failed: Field index 2 out of range
CONTEXT:  slot "bottledwater", output plugin "bottledwater", in the change callback, associated LSN 3F/6BAF11A0

@igorbarinov
Copy link
Author

Sorry for the spam. I just like your product and would like to use it in my project 😃

@igorbarinov
Copy link
Author

I can't figure out how to debug that row conversion failed bug. Any ideas?
doc2

@ept
Copy link
Contributor

ept commented Jun 16, 2015

Hi @igorbarinov, just catching up on this. Good work debugging your way up to this point :)

For the HTTP 500 response from the schema registry, there should be something in the schema registry log (perhaps that's a bug in the schema registry). But it looks like it was transient.

The memory use is a bit worrying — Bottled Water is supposed to only use a small amount of memory, even on a large database. So perhaps there's a memory leak somewhere. If you have time to look into it, I'd appreciate your contribution, otherwise I'll look when I get the time.

Finally, the "row conversion failed: Field index 2 out of range" error. It's interesting that the error occurs in the logical decoding plugin, but not while taking the snapshot. Do you know in which table you modified data after taking the snapshot, i.e. the table whose row it's trying to decode here? Could you give me the schema of that table? My guess is that your database has some edge case in the table's tupledescriptor, which BW isn't handling correctly.

@igorbarinov
Copy link
Author

Martin, thanks!
The database is a bitcoin blockchain node Toshi.io. Here is the db scheme. https://github.com/coinbase/toshi/blob/master/db/migrations/001_start.rb.

In standby mode when I disconnect it from the network I don't have the "row conversion failed: Field index 2 out of range" error. But after about ten minutes, I've got new error (on top schema-registry log, bottom left - bottledwater log, bottom right - kafka log
doc3.

Is it possible to consume only specified table not the complete DB scheme of the database?

@ept
Copy link
Contributor

ept commented Jun 17, 2015

Hm, not obvious what's going wrong here. I made a patch to log detailed debugging info on row conversion failure: see #17. Would you mind building with that patch, re-running and giving me the output?

@ept
Copy link
Contributor

ept commented Jun 17, 2015

To answer your question, it's currently not possible to filter by table. That feature could be added in principle, but hasn't been high priority thus far.

@ept
Copy link
Contributor

ept commented Jun 23, 2015

@igorbarinov I've rebuilt the docker image with the debugging output, so could you try again with the latest image? It also fixes a memory leak (#20), so hopefully you should now be able to take a snapshot without using excessive memory.

@igorbarinov
Copy link
Author

Nice, I will try it today!

On Jun 23, 2015, at 1:05 PM, Martin Kleppmann notifications@github.com wrote:

@igorbarinov I've rebuilt the docker image with the debugging output, so could you try again with the latest image? It also fixes a memory leak (#20), so hopefully you should now be able to take a snapshot without using excessive memory.


Reply to this email directly or view it on GitHub.

@scbrady
Copy link

scbrady commented Jul 15, 2015

I am having the same problem with the avro row conversion failure. Here is the output:

Replication slot "bottledwater" exists, streaming changes from 0/17D9E68.
Registered key schema for topic "_library_patron_request" with ID 35
Registered value schema for topic "_library_patron_request" with ID 36
INFO:  Row conversion failed: relation oid=32770 name=_library_patron_request ns=public relkind=r
           0. attrelid = 32770, attname = record_id, atttypid = 23, attlen = 4, attnum = 1, attndims = 0, atttypmod = -1, attnotnull = 1, atthasdef = 1, attisdropped = 0, attcollation = 0
           1. attrelid = 32770, attname = lpra_activity_id, atttypid = 23, attlen = 4, attnum = 2, attndims = 0, atttypmod = -1, attnotnull = 1, atthasdef = 0, attisdropped = 0, attcollation = 0
           2. attrelid = 32770, attname = lpra_activity_flag, atttypid = 21, attlen = 2, attnum = 3, attndims = 0, atttypmod = -1, attnotnull = 1, atthasdef = 1, attisdropped = 0, attcollation = 0
           3. attrelid = 32770, attname = lpra_activity_date, atttypid = 1184, attlen = 8, attnum = 4, attndims = 0, atttypmod = -1, attnotnull = 1, atthasdef = 1, attisdropped = 0, attcollation = 0
           4. attrelid = 32770, attname = lpat_patron_id, atttypid = 23, attlen = 4, attnum = 5, attndims = 0, atttypmod = -1, attnotnull = 1, atthasdef = 0, attisdropped = 0, attcollation = 0
           5. attrelid = 32770, attname = coin_reserve_id, atttypid = 2950, attlen = 16, attnum = 6, attndims = 0, atttypmod = -1, attnotnull = 1, atthasdef = 0, attisdropped = 0, attcollation = 0
           6. attrelid = 32770, attname = lfct_type_id, atttypid = 23, attlen = 4, attnum = 7, attndims = 0, atttypmod = -1, attnotnull = 1, atthasdef = 0, attisdropped = 0, attcollation = 0
           7. attrelid = 32770, attname = webs_website_id, atttypid = 23, attlen = 4, attnum = 8, attndims = 0, atttypmod = -1, attnotnull = 1, atthasdef = 0, attisdropped = 0, attcollation = 0
           8. attrelid = 32770, attname = lpre_email, atttypid = 1043, attlen = -1, attnum = 9, attndims = 0, atttypmod = 104, attnotnull = 0, atthasdef = 0, attisdropped = 0, attcollation = 100
           9. attrelid = 32770, attname = wlil_ilsname, atttypid = 1043, attlen = -1, attnum = 10, attndims = 0, atttypmod = 24, attnotnull = 0, atthasdef = 0, attisdropped = 0, attcollation = 100
          10. attrelid = 32770, attname = lpat_library_id, atttypid = 1043, attlen = -1, attnum = 11, attndims = 0, atttypmod = 54, attnotnull = 0, atthasdef = 0, attisdropped = 0, attcollation = 100
          11. attrelid = 32770, attname = lpre_add_hold, atttypid = 23, attlen = 4, attnum = 12, attndims = 0, atttypmod = -1, attnotnull = 1, atthasdef = 0, attisdropped = 0, attcollation = 0
          12. attrelid = 32770, attname = lpre_send_notification, atttypid = 23, attlen = 4, attnum = 13, attndims = 0, atttypmod = -1, attnotnull = 1, atthasdef = 0, attisdropped = 0, attcollation = 0
          13. attrelid = 32770, attname = lpre_request_date, atttypid = 1184, attlen = 8, attnum = 14, attndims = 0, atttypmod = -1, attnotnull = 1, atthasdef = 1, attisdropped = 0, attcollation = 0
          14. attrelid = 32770, attname = lpre_hold_added_date, atttypid = 1184, attlen = 8, attnum = 15, attndims = 0, atttypmod = -1, attnotnull = 0, atthasdef = 0, attisdropped = 0, attcollation = 0
          15. attrelid = 32770, attname = lpre_notification_date, atttypid = 1184, attlen = 8, attnum = 16, attndims = 0, atttypmod = -1, attnotnull = 0, atthasdef = 0, attisdropped = 0, attcollation = 0
          16. attrelid = 32770, attname = lpre_rtl_processed, atttypid = 23, attlen = 4, attnum = 17, attndims = 0, atttypmod = -1, attnotnull = 1, atthasdef = 1, attisdropped = 0, attcollation = 0
replica identity index: oid=32778 name=pk__library_patron_request_activity ns=public
           0. attrelid = 32778, attname = record_id, atttypid = 23, attlen = 4, attnum = 1, attndims = 0, atttypmod = -1, attnotnull = 0, atthasdef = 0, attisdropped = 0, attcollation = 0
CONTEXT:  slot "bottledwater", output plugin "bottledwater", in the change callback, associated LSN 0/1842FD0
/usr/local/bin/bottledwater: Replication stream was unexpectedly terminated: ERROR:  output_avro_change: row conversion failed: Field index 1 out of range
CONTEXT:  slot "bottledwater", output plugin "bottledwater", in the change callback, associated LSN 0/1842FD0

After posting this I see that it is trying to set the primary key (record_id) to be nullable and not have a definition. I don't know why it would be doing this, because when I select all the rows, record_id has a value for every row.

This only seems to happen when I import a lot of data into the table (I am using an SSIS task to import data from SQL Server to this Postgres database), I tried manually inserting data into the same table schema on a different VM and this error does not come up.

Also, I get the same memory error when running the SSIS task and bottled water at the same time:

/usr/local/bin/bottledwater: Could not receive replication data: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

But if I wait until the SSIS task is finished and then start bottledwater, then I get the avro row conversion failure. Let me know if you have any thoughts on this.

@scbrady
Copy link

scbrady commented Jul 15, 2015

I decided to reset everything (so I rm'ed the bottledwater container, removed the replication slot from the DB, and dropped and recreated the table) and the avro row conversion failure seemed to go away. I think it was due to the fact that I had dropped and recreated the table without dropping the replication slot, which meant that bottled water was trying to track the fact that I dropped the table. So that error seems to be my bad. I believe that error was also cascading and causing the other one too...so this may have all been due to the fact that bottledwater does not like to track tables that had data and were either dropped or had all of the data removed.

@scbrady
Copy link

scbrady commented Jul 15, 2015

On further inspection, the avro row conversion error still comes up when I am running the SSIS transfer task and bottledwater at the same time. But if I run SSIS to completion, then start bottledwater, it syncs the changes perfectly. So it seems like bottledwater has a hard time syncing changes while data is simultaneously being added to the table quickly.

@wanghaisheng
Copy link

How to select or escape one schema of a Postgres DB database? it seems bottlewater will scan all schema and check whether there is a primary key for each table

@ept
Copy link
Contributor

ept commented Dec 15, 2015

I am going to close this issue, since several different problems have got jumbled up in the description, and several of them are now fixed:

@ept ept closed this as completed Dec 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants