Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we need Metacat replication in 3.0.0? #1620

Closed
taojing2002 opened this issue Apr 26, 2023 · 7 comments
Closed

Do we need Metacat replication in 3.0.0? #1620

taojing2002 opened this issue Apr 26, 2023 · 7 comments
Assignees
Milestone

Comments

@taojing2002
Copy link
Contributor

taojing2002 commented Apr 26, 2023

Metacat replication was designed to replicate objects among different Metacat instances. Now we promote the DataONE replication mechanism to replicate objects among the member nodes. From this prospective, Metacat replication is obsoleted. There are also some other issues we need to think about:

  • Metacat replication is based on old metacat docid, which will be obsoleted in 3.0.0.
  • Metacat replication uses some old Metacat API calls, which will be obsoleted in 3.0.0
  • Replicating system metadata. Metacat replication first didn't support replicate system metadata. It seems Ben added some code about this feature. However, we don't really rely on it since hazelcast is used to sync system metadata among CNs. After we remove hazelcast, we will face the test if this code really works.
  • Configuration of Metacat replication is complicated.

However, CNs still use this Metacat replication mechanism to sync objects. We may use some other mechanisms to archive the backup feature:

  • Postgres replication to syc database data
  • Rsync to syc object and system metadata files
@artntek artntek added this to the 3.0.0 milestone Jul 24, 2023
@taojing2002 taojing2002 self-assigned this Nov 17, 2023
@taojing2002
Copy link
Contributor Author

Here is the link to discuss the three Postgresql replication:

  • PostgreSQL Asynchronous Replication
  • PostgreSQL Synchronous Replication
  • PostgreSQL Sync Replication Using Hevo Data

Based on the consideration of write performance, reading consistency, data loss, and distance proximity, it seems asynchronous replication is a good choice.

@taojing2002
Copy link
Contributor Author

Here are steps to set up the file sync from mn-sandbox-ucsb-1 to mn-sandbox-ucsb-1-clone:
1 Generate ssh key for the root user on mn-sandbox-ucsb-1.test.dataone.org:

ssh-keygen

(the file name is /root/.ssh/id_ecdsa)
(no password)

2 Copy the key to mn-sandbox-ucsb-1-clone:
Copy the key on mn-sandbox-ucsb-1

root@mn-sandbox-ucsb-1:vim /root/.ssh/id_ecdsa.pub 

Paste the key on the last line of this file:

tao@mn-sandbox-ucsb-1-clone:~$ vim /root/.ssh/authorized_keys

3 Create the rsync.sh file in the /var/metacat directory like this:

#!/bin/bash
rsync -aAXH --delete --stats --human-readable /var/metacat/documents/ mn-sandbox-ucsb-1-clone.test.dataone.org:/var/metacat/documents/

4 Change it to 774

chmod 774 /var/metacat/rsync.sh

5 Create a cron job every minute

crontab -e

Paste the line:

* * * * * /var/metacat/rsync.sh

@taojing2002
Copy link
Contributor Author

taojing2002 commented Dec 14, 2023

Here are steps to set up the Postgresql replication between mn-sandbox-ucsb-1 (primary, 128.111.85.184) and mn-sandbox-ucsb-1-clone (secondary, 128.111.85.191):
Note: This is one way replication: primary -> secondary. And the secondary Postgresql server is read-only.

Set up firewall on both servers:

root@mn-sandbox-ucsb-1:/home/dev/tao# sudo ufw allow from 128.111.85.191 to any port 5432
root@mn-sandbox-ucsb-1-clone:/home/dev/tao# sudo ufw allow from 128.111.85.184 to any port 5432

Set up the primary server mn-sandbox-ucsb-1:

  1. Create a user with the replication privilege - repuser
postgres=# CREATE USER repuser REPLICATION LOGIN CONNECTION LIMIT 1 PASSWORD 'password';
  1. Edit pg_hba.conf as user postgres:
vim /etc/postgresql/14/main/pg_hba.conf
#add the line:
hostssl replication  repuser  128.111.85.191/32  scram-sha-256
  1. Edit postgresql.conf as user postgres:
vim /etc/postgresql/14/main/postgresql.conf
#modify or add the lines:
listen_addresses = 'localhost,128.111.85.184'
wal_level = hot_standby
wal_keep_size = 64
max_wal_senders = 10
ssl = true
ssl_cert_file = '/etc/letsencrypt/live/mn-sandbox-ucsb-1.test.dataone.org/cert.pem'
ssl_key_file = '/etc/letsencrypt/live/mn-sandbox-ucsb-1.test.dataone.org/privkey.pem'
  1. Make the postgres user can read the cert.pem and privkey.pem files.
usermod -a -G ssl-cert postgres
cd /etc/letsencrypt
chown root:ssl-cert live
chown root:ssl-cert archive
chmod 750 live
chmod 750 archive
cd live
chown root:ssl-cert mn-sandbox-ucsb-1.test.dataone.org
cd ../archive
chown root:ssl-cert -R  *
cd mn-sandbox-ucsb-1.test.dataone.org
chmod 640 privkey*
  1. Restart the primary postgresql:
sudo /etc/init.d/postgresql restart

Set up the secondary server mn-sandbox-ucsb-1-clone:

  1. Stop postgresql:
sudo /etc/init.d/postgresql stop 
  1. Edit pg_hba.conf as user postgres:
vim /etc/postgresql/14/main/pg_hba.conf
#add the line:
hostssl replication  repuser  128.111.85.184/32  scram-sha-256
  1. Edit postgresql.conf as user postgres:
vim /etc/postgresql/14/main/postgresql.conf
#modify or add the lines:
listen_addresses = 'localhost,128.111.85.191 '
wal_level = hot_standby
max_wal_senders = 10
wal_keep_size = 64
hot_standby = on
ssl = true
ssl_cert_file = '/etc/letsencrypt/live/mn-sandbox-ucsb-1-clone.test.dataone.org/cert.pem'
ssl_key_file = '/etc/letsencrypt/live/mn-sandbox-ucsb-1-clone.test.dataone.org/privkey.pem'
  1. Make the postgres user can read the cert.pem and privkey.pem files
usermod -a -G ssl-cert postgres
cd /etc/letsencrypt
chown root:ssl-cert live
chown root:ssl-cert archive
chmod 750 live
chmod 750 archive
cd live
chown root:ssl-cert mn-sandbox-ucsb-1-clone.test.dataone.org
cd ../archive
chown root:ssl-cert -R  *
cd mn-sandbox-ucsb-1-clone.test.dataone.org
chmod 640 privkey*
  1. Access the PostgreSQL data directory in the secondary server and remove everything:
cd /var/lib/postgresql/14/main
sudo rm -rfv *
  1. Copy PostgreSQL primary server data directory files to PostgreSQL secondary server data directory as user postgres:
postgres@mn-sandbox-ucsb-1-clone    pg_basebackup -h 128.111.85.184 -D /var/lib/postgresql/14/main/ -P -U repuser --wal-method=fetch

Note: it took 70 minutes to transfer 185G data

  1. Add the following command in postgresql.conf file as user postgres:
vim /etc/postgresql/14/main/postgresql.conf
primary_conninfo = 'host=128.111.85.184 port=5432 sslmode=require user=repuser password=password'
  1. In var/lib/postgresql/14/main/, create a empty file to signal it is a standby server as user postgres:
postgres@mn-sandbox-ucsb-1:~/14/main$ cd /var/lib/postgresql/14/main
postgres@mn-sandbox-ucsb-1:~/14/main$ touch standby.signal
  1. start postgres:
root@mn-sandbox-ucsb-1:/var/lib/postgresql/14/main# sudo /etc/init.d/postgresql start

Check the status of replication

  1. Check the primary server:
postgres=# select * from pg_stat_replication;

3 Check the secondary server:

postgres=# select * from pg_stat_wal_receiver ;

@taojing2002
Copy link
Contributor Author

taojing2002 commented Dec 14, 2023

After I setting up both the file and postgres replications in the two servers, it works well - when I uploaded an object to mn-sandbox-ucsb-1, I could read the system metadata and object from the secondary server mn-sandbox-ucsb-1-clone as well. Since I didn't set up Zookeeper for Solr replication, so Solr search doesn't work.

Some issues:

  1. Reading object/sysmeta against the secondary server works but the associated read events can't be saved into the event log database table since it is read-only (We decided we can skip the log events happened on the secondary CN in our dev meeting). But we still saw the error message regarding the failure of saving database in the Tomcat log file. So do we need a new feature which can disable the log of events in Metacat? If we have, Metacat wouldn't bother to try to save logs so the error messages can be eliminated.

@taojing2002
Copy link
Contributor Author

We just figured out how to replicate files and database between Metacats. Additionally we have the way using Zookeeper to replicate Solr between CNs. So the Metacat replication between CNs can be replaced by them, which means the old Metacat replication feature can be dropped. @mbjones @artntek @doulikecookiedough What do you think?

@taojing2002
Copy link
Contributor Author

In today's dev meeting, we decided that we could remove the old metacat replication.

@taojing2002
Copy link
Contributor Author

@mbjones @artntek @doulikecookiedough I am going to drop the xml_replication table and theserver_location column in the xml_documents and xml_revision tables, which is a foreign key to the xml_replication table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants