Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrating from mysql-only to mysql-influxdb CHORDS #171

Closed
MisterMartin opened this issue Feb 8, 2017 · 4 comments
Closed

Migrating from mysql-only to mysql-influxdb CHORDS #171

MisterMartin opened this issue Feb 8, 2017 · 4 comments
Assignees
Labels
feature feature request (internal and external) refactor code refactoring

Comments

@MisterMartin
Copy link
Member

This mysql backup script will allow you to backup a mysql database while excluding the data from selected tables. Might be able to use it to transfer the portal metadata.

The mysql query from #170 can be used to capture the measurement data to be transferred to influxdb.

Done together, we should be able to completely transfer an old portal to the new influxdb version.

@MisterMartin MisterMartin added feature feature request (internal and external) portal maintenance labels Feb 8, 2017
@MisterMartin
Copy link
Member Author

MisterMartin commented Feb 16, 2017

Transferred the 3d CHORDS site to a new portal, preserving the configuration and the time series data.

The process follows these steps:

  • Use mysqldump to dump the complete 3d mysql database.
  • Bring up a fresh CHORDS instance under docker.
  • Use the mysql command to load the dump into the new MySQL database.
  • Extract the time series data from the new database, in InfluxDB line format.
  • Use the influx -import command to load the line data into the InfluxDB database.
  1. Dump the full mysql database from the existing AWS portal (running non-docker CHORDS system):
mysqldump -u chords_demo_user -p -h 127.0.0.1 chords_demo_development > 3d-2017-02-15.sql
gzip 3d-2017-02-15.sql
  1. Bring up a new docker based CHORDS system: docker-compose -p chords up -d etc.

  2. Copy the dump to the mysql container, and then load into mysql:

docker cp 3d-2017-02-15.sql chords_mysql:/3d-2017-02-15.sql
docker exec -it chords_mysql /bin/bash
mysql -u chords_demo_user -p chords_demo_production < 3d-2017-02-15.sql
  1. Use the mysql_to_influx.py script to extract time series data from the mysql container. This is done from the docker host; the script uses docker exec to run queries on the mysql database and save the output to a local file. Do this in time chunks, to break up the processing:
bin/mysql_to_influxdb/mysql_to_influxdb.py --prefix 3d --start "2016-01" --end "2016-06"
2016-01 2016-06  Elapsed time: 0:42:08.364467 

bin/mysql_to_influxdb/mysql_to_influxdb.py --prefix 3d --start "2016-06" --end "2017-03"
2016-06 2017-03  Elapsed time: 0:47:14.459913

ls -lh *.txt
-rw-r--r--  1 martinc  staff   373M Feb 18 08:19 3d-2016-01-2016-06.txt
-rw-r--r--  1 martinc  staff   1.3G Feb 18 09:41 3d-2016-06-2017-03.txt
  1. Populate the InfluxDB database, also done from the host. If you are working on a cloud-based
    system, you will probably have to install influx
    on the host. Alternatively, you can just copy the data files to the chords_influxdb container,
    run a shell in that container, and do the import from there.
influx -username admin -password you_know_it -database chords_ts_production -import -path=3d-2016-01-2016-06.txt -precision=ns
influx -username admin -password you_know_it -database chords_ts_production -import -path=3d-2016-06-2017-03.txt -precision=ns

Note: Larger databases can take an inordinate amount of time to process the queries (typically 40 minutes as above). It seems that the mysql query optimization is not working effectively. This is tolerable, but could be looked into.

@erikj
Copy link

erikj commented Feb 16, 2017

Add the following lines to the head of the 3d-influxdb-line.txt:

It looks like you can save yourself some effort, and the consequences of opening a potentially large file in a text editor, by creating the file w/ the header and then appending to it:

cat > 3d-influxdb-line.txt <<-EOF
# DML
# CONTEXT-DATABASE: chords_ts_production
EOF

echo "SELECT CONCAT('tsdata,inst=', m.instrument_id, ',site=', instruments.site_id, ',var=', vars.id , ',test=false', ' value=', m.value , ' ', UNIX_TIMESTAMP(m.measured_at), '000000000') from measurements as m left outer join instruments on instruments.id = m.instrument_id inner join vars ON vars.shortname LIKE m.parameter AND vars.instrument_id=instruments.id ;" > query.sql
mysql -s -r chords_demo_production < query.sql  >> 3d-influxdb-line.txt

@MisterMartin
Copy link
Member Author

Great suggestion.

@MisterMartin
Copy link
Member Author

It's working, more or less. Will have to solve the slow query problem when it becomes critical.

@MisterMartin MisterMartin self-assigned this Feb 18, 2017
@zastruga zastruga added refactor code refactoring and removed portal maintenance labels Feb 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature feature request (internal and external) refactor code refactoring
Projects
None yet
Development

No branches or pull requests

3 participants