From 5bdcd7cf54b24527356a2db65ec4542aab519343 Mon Sep 17 00:00:00 2001 From: Pete Meyer Date: Mon, 11 Mar 2019 13:04:22 -0400 Subject: [PATCH 1/2] doc cleanup --- .../source/developers/big-data-support.rst | 86 ++++--------------- 1 file changed, 18 insertions(+), 68 deletions(-) diff --git a/doc/sphinx-guides/source/developers/big-data-support.rst b/doc/sphinx-guides/source/developers/big-data-support.rst index 06917073b32..ebe4871a490 100644 --- a/doc/sphinx-guides/source/developers/big-data-support.rst +++ b/doc/sphinx-guides/source/developers/big-data-support.rst @@ -16,7 +16,7 @@ Data Capture Module (DCM) is an experimental component that allows users to uplo Install a DCM ~~~~~~~~~~~~~ -Installation instructions can be found at https://github.com/sbgrid/data-capture-module . Note that a shared filesystem (posix or AWS S3) between Dataverse and your DCM is required. You cannot use a DCM with Swift at this point in time. +Installation instructions can be found at https://github.com/sbgrid/data-capture-module/blob/master/doc/installation.md. Note that shared storage (posix or AWS S3) between Dataverse and your DCM is required. You cannot use a DCM with Swift at this point in time. .. FIXME: Explain what ``dataverse.files.dcm-s3-bucket-name`` is for and what it has to do with ``dataverse.files.s3-bucket-name``. @@ -57,31 +57,8 @@ Here's the syntax for sending the JSON. Steps to set up a DCM mock for Development ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Install Flask. +See instructions at https://github.com/sbgrid/data-capture-module/blob/master/doc/mock.md -Download and run the mock. You will be cloning the https://github.com/sbgrid/data-capture-module repo. - -- ``git clone git://github.com/sbgrid/data-capture-module.git`` -- ``cd data-capture-module/api`` -- ``./dev_mock.sh`` - -If you see an error about not having Flask installed, install it as explained below. - -On Mac, you can install Flask with: - -- ``mkvirtualenv mockdcm`` -- ``pip install -r requirements-mock.txt`` - -On Ubuntu/Debian, you can install Flask with: - -- ``sudo apt install python-pip`` (will install python as well) -- ``pip install flask`` - -Once you have Flask installed, try running the dev mock script again: - -- ``./dev_mock.sh`` - -This should spin up the DCM mock on port 5000. Add Dataverse settings to use mock (same as using DCM, noted above): @@ -96,6 +73,8 @@ At this point you should be able to download a placeholder rsync script. Dataver - ``echo "hello" > file1.txt`` - ``shasum file1.txt > files.sha`` + + Now the files are in place and you need to send JSON to Dataverse with a success or failure message as described above. Make a copy of ``doc/sphinx-guides/source/_static/installation/files/root/big-data-support/checksumValidationSuccess.json`` and put the identifier in place such as "X1METO" under "uploadFolder"). Then use curl as described above to send the JSON. Troubleshooting @@ -113,36 +92,17 @@ If you need a fully operating DCM client for development purposes, these steps w Docker Image Set-up ^^^^^^^^^^^^^^^^^^^ +See https://github.com/IQSS/dataverse/blob/develop/conf/docker-dcm/readme.txt + - Install docker if you do not have it -- Follow these steps (extracted from ``docker-aio/readme.md`` & ``docker-dcm/readme.txt``) : - - - ``cd conf/docker-aio`` and run ``./0prep_deps.sh`` to create Glassfish and Solr tarballs in conf/docker-aio/dv/deps. - - Run ``./1prep.sh`` - - Build the docker image: ``docker build -t dv0 -f c7.dockerfile .`` - - ``cd ../docker-dcm`` and run ``./0prep.sh`` - - Build dcm/dv0dcm images with docker-compose: ``docker-compose -f docker-compose.yml build`` - - Start containers: ``docker-compose -f docker-compose.yml up -d`` - - Wait for container to show "healthy" (aka - ``docker ps``), then wait another 5 minutes (even though it shows healthy, glassfish is still standing itself up). Then run Dataverse app installation: ``docker exec -it dvsrv /opt/dv/install.bash`` - - Configure Dataverse application to use DCM (run from outside the container): ``docker exec -it dvsrv /opt/dv/configure_dcm.sh`` - - The Dataverse installation is accessible at ``http://localhost:8084``. - - You may need to change the DoiProvider inside dvsrv (ezid does not work): - - - ``curl -X DELETE -d EZID "localhost:8080/api/admin/settings/:DoiProvider"`` - - ``curl -X PUT -d DataCite "localhost:8080/api/admin/settings/:DoiProvider"`` - - Also change the doi.baseUrlString, doi.username, doi.password Optional steps for setting up the S3 Docker DCM Variant ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Before: the default bucket for DCM to hold files in S3 is named test-dcm. It is coded into `post_upload_s3.bash` (line 30). Change to a different bucket if needed. -- Add AWS bucket info to dcmsrv - - - You need a credentials files in ~/.aws - - - ``mkdir ~/.aws`` - - ``yum install nano`` (or use a different editor below) - - ``nano ~/.aws/credentials`` and add these contents with your keys: + - Add AWS bucket info to dcmsrv + - Add AWS credentials to ``~/.aws/credentials`` - ``[default]`` - ``aws_access_key_id =`` - ``aws_secret_access_key =`` @@ -156,16 +116,12 @@ Optional steps for setting up the S3 Docker DCM Variant - ``./asadmin create-jvm-options "\-Ddataverse.files.storage-driver-id=s3"`` - Add AWS bucket info to Dataverse - - - ``mkdir ~/.aws`` - - ``yum install nano`` (or use a different editor below) - - ``nano ~/.aws/credentials`` and add these contents with your keys: - + - Add AWS credentials to ``~/.aws/credentials`` - ``[default]`` - ``aws_access_key_id =`` - ``aws_secret_access_key =`` - - Also: ``nano ~/.aws/config`` to create a region file. Add these contents: + - Also: set region in ``~/.aws/config`` to create a region file. Add these contents: - ``[default]`` - ``region = us-east-1`` @@ -188,30 +144,25 @@ Using the DCM Docker Containers For using these commands, you will need to connect to the shell prompt inside various containers (e.g. ``docker exec -it dvsrv /bin/bash``) - Create a dataset and download rsync upload script -- Upload script to dcm_client (if needed, you can probably do all the actions for create/download inside dcm_client) + - connect to client container: ``docker exec -it dcm_client bash`` + - create dataset: ``cd /mnt ; ./create.bash`` ; this will echo the database ID to stdout + - download transfer script: ``./get_transfer.bash $database_id_from_create_script`` + - execute the transfer script: ``bash ./upload-${database_id_from-create_script}.bash`` , and follow instructions from script. - - ``docker cp ~/Downloads/upload-FK2_NN49YM.bash dcm_client:/tmp`` - -- Create a folder of files to upload (files can be empty) - Run script - - e.g. ``bash ./upload-FK2_NN49YM.bash`` + - e.g. ``bash ./upload-3.bash`` (``3`` being the database id from earlier commands in this example). - Manually run post upload script on dcmsrv - - for posix implementation: ``bash ./opt/dcm/scn/post_upload.bash`` - - for S3 implementation: ``bash ./opt/dcm/scn/post_upload_s3.bash`` + - for posix implementation: ``docker exec -it dcmsrv /opt/dcm/scn/post_upload.bash`` + - for S3 implementation: ``docker exec -it dcmsrv /opt/dcm/scn/post_upload_s3.bash`` Additional DCM docker development tips ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - You can completely blow away all the docker images with these commands (including non DCM ones!) - - - ``docker stop dvsrv`` - - ``docker stop dcm_client`` - - ``docker stop dcmsrv`` - - ``docker rm $(docker ps -a -q)`` - - ``docker rmi $(docker images -q)`` + - ``docker-compose -f docmer-compose.yml down -v`` - There are a few logs to tail @@ -219,7 +170,6 @@ Additional DCM docker development tips - dcmsrv : ``tail -n 2000 -f /var/log/lighttpd/breakage.log`` - dcmsrv : ``tail -n 2000 -f /var/log/lighttpd/access.log`` -- Note that by default the docker container will stop running if the process it is following is turned off. For example flask with dcmsrv. You can get around this by having the script being followed never close (e.g. sleep infinity) https://stackoverflow.com/questions/31870222/how-can-i-keep-container-running-on-kubernetes - You may have to restart the glassfish domain occasionally to deal with memory filling up. If deployment is getting reallllllly slow, its a good time. Repository Storage Abstraction Layer (RSAL) From 2f14c4438c39e7bf887ff29aacd6059f970cb7fb Mon Sep 17 00:00:00 2001 From: Pete Meyer Date: Mon, 11 Mar 2019 13:10:43 -0400 Subject: [PATCH 2/2] rst fixes --- doc/sphinx-guides/source/developers/big-data-support.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/sphinx-guides/source/developers/big-data-support.rst b/doc/sphinx-guides/source/developers/big-data-support.rst index ebe4871a490..b9159bf0f45 100644 --- a/doc/sphinx-guides/source/developers/big-data-support.rst +++ b/doc/sphinx-guides/source/developers/big-data-support.rst @@ -144,6 +144,7 @@ Using the DCM Docker Containers For using these commands, you will need to connect to the shell prompt inside various containers (e.g. ``docker exec -it dvsrv /bin/bash``) - Create a dataset and download rsync upload script + - connect to client container: ``docker exec -it dcm_client bash`` - create dataset: ``cd /mnt ; ./create.bash`` ; this will echo the database ID to stdout - download transfer script: ``./get_transfer.bash $database_id_from_create_script``