Releases · MI-DPLA/combine

15 May 18:25

estelendur

v0.11

12c22d3

v0.11 Latest

Latest

localsettings template for docker changed to reflect use of nginx instead of internal static IP addresses
mysql port for docker changed to 3307 to facilitate local integration tests run from outside Docker
added localsettings template for testing
added new settings LIVY_UI_HOME, SPARK_HOST, ES_UI_HOME, ENABLE_PYTHON, CELERY_RPC_SERVER; removed setting APP_HOST
set the admin site site_url to '/combine'
added line to log traceback with errors under DEBUG
altered Validation Scenario, Transformation Scenario, Field Mapper, and Record Identifier Transformation Scenario to prohibit adding python code by default (can be enabled with server setting ENABLE_PYTHON; existing scenarios will still work but can't be modified without the setting)
added createsuperuser django management command to facilitate docker build script
allowed record groups to be sorted by when their most recently run job was run
fixed local transformation includes on import
protected actually all the endpoints that should require a login

Assets 2

25 Sep 16:23

estelendur

v0.10

963bf63

v0.10

This is a really big release! In addition to all the changes called out here, the codebase has been refactored and cleaned up quite a bit. Some of the dependencies have also been updated, including ElasticSearch. You may need to re-index your Jobs if upgrading in place.

Added

Add Configuration page to allow editing Validation Scenarios, Transformations, OAI Endpoints, etc. inside the Combine user interface #87
Allow changing the Publish Set ID on a Job without unpublishing/republishing #407
"Re-run all jobs" button on Organizations and Record Groups #410
Global error recording in admin panel #430
Add logout link #194
Add 'include upstream Jobs' toggle to Job re-run options #358
Include OAI harvest details in Job details #374

Changed

FIXED: trying to view the Test Validation Scenario and related pages when a Record exists with an invalid Job ID #426
FIXED: Malformed validation scenarios fail silently when running in a Job #431
Give background tasks the same status display as Jobs and Exports #438
Improve stateio status indicators #382
Clarify wording on configuration 'payloads' #441
FIXED: timestamp sorts #199
FIXED: Job on rerun with invalid records still marked Valid #379

Assets 2

21 May 12:48

ghukill

v0.9

a6de141

v0.9

Release Notes - v0.9

For changes see CHANGELOG

Upgrading to `v0.9` (Ansible/Vagrant Server)

This version v0.9 introduces some changes at the server level that the normal update utility cannot address. Steps to manually make these changes are outlined below for Ansible/Vagrant Server build or Docker deployment.

Switch from standalone Spark cluster to running in local mode

To more closely align with the Docker deployment, and reduce some complexity without much/any noticeable change to performance, this switches the Spark application that is created by Livy from running in a standalone Spark cluster to running in what is called "local" mode, using N-threads.

This is optional, but recommended, as future updates and releases will likely assume running in local mode.

First step is to stop Spark cluster, if running. The following can be run from anywhere:

# note the trailing colon, which is required
sudo supervisorctl stop spark:

Second, is to prevent the Spark cluster from autostarting on reboot. Modify the file /etc/supervisor/supervisord.conf, and then under the sections [program:spark_driver] and [program:spark_worker], change autostart and autorestart to false. They should then look something like the following:

[program:spark_driver]
environment =
    SPARK_MASTER_HOST=0.0.0.0
command=/opt/spark/bin/spark-class org.apache.spark.deploy.master.Master
directory=/opt/spark/
autostart = false
autorestart = false
stdout_logfile = /var/log/spark/spark_driver.stdout
stderr_logfile = /var/log/spark/spark_driver.stderr
user = combine

[program:spark_worker]
command=/opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://localhost:7077
directory=/opt/spark
autostart = false
autorestart = false
stdout_logfile = /var/log/spark/spark_worker.stdout
stderr_logfile = /var/log/spark/spark_worker.stderr
user = combine

To apply these changes, run the following:

sudo supervisorctl reread
sudo supervisorctl update

For reference sake, the configurations and binaries to run this standalone Spark cluster remain in the build, in the event it is deemed helpful, or might assist in configuring with another cluster.

Finally, update the parameter livy.spark.master in /opt/livy/conf/livy.conf to the following:

livy.spark.master = local[*]

To apply, restart Livy:

sudo supervisorctl restart livy

Update DPLA's Ingestion3 build

As outlined in this issue, moving from forked version of Ingestion3 to pinned commits in DPLA repository to build from.

The Docker deployment ships with a .jar file already compiled, which can be used for our purposes here. Ansible/Vagrant builds as of v0.9 will build this newer, updated version of Ingestion3.

To upgrade in place:

# jump to directory where Ingestion3 jar is located
cd /opt/ingestion3/target/scala-2.11

# backup previous Ingestion3 jar file
mv ingestion3_2.11-0.0.1.jar ingestion3_2.11-0.0.1.jar.BAK

# download pre-built .jar file
wget https://github.com/WSULib/combine-docker/raw/15938f053ccdfad08e41d60e6385588a064dc062/combinelib/ingestion3_2.11-0.0.1.jar

Then, restart Livy:

sudo supervisorctl restart livy

Finally, run update script as per normal

cd /opt/combine
source activate combine
git checkout master
git pull
pip install -r requirements.txt
./manage.py update --release v0.9

Upgrading to `v0.9` (Docker)

From the Combine-Docker git repository directory on your host machine, pull changes:

git pull

Checkout tagged release:

git checkout v0.9

Run update script:

./update_build.sh

Assets 2

19 Apr 17:03

ghukill

v0.8

413a69d

v0.8

Release Notes - v0.8

Added

Global search of Record's mapped fields
Ability to add Organizations, Record Groups, and/or Jobs to Published Subsets #395
Remove temporary payloads of static harvests on Job delete #394
Added CHANGELOG.md

Changed

Fixed precounts for Published Subsets when included Jobs mutate #396

Upgrading to `v0.8`

Run the built-in update command to run any migrations, restart services, and pull in new front-end static files:

cd /opt/combine
source activate combine
git checkout master
git pull
./manage.py update --release v0.8

Assets 2

08 Apr 20:07

ghukill

v0.7.1

f94c86e

v0.7.1

Release Notes - v0.7.1

bug fix and improvement of redis and celery python version pinning. Thanks @bibliotechy for finding this.
bug fix for exporting published subset to S3

Upgrading to `v0.7.1`

Run the built-in update command to run any migrations, restart services, and pull in new front-end static files:

cd /opt/combine
source activate combine
git checkout master
git pull
./manage.py update --release v0.7.1

Assets 2

03 Apr 12:29

ghukill

v0.7

f64944e

v0.7

Release Notes - v0.7

Introduction of Published Subsets (documentation)
- ability to create subsets of all published records based on Published Set Identifiers
- creates a unique OAI endpoint for this Published Subset
- when viewing a Published Subset, all metrics, exports, and analysis jobs are filtered for this subset
- Published Subsets are included in State Import/Export exports when any Jobs (upstream or downstream) are associated with a Published Subset
introduce small delay in firing background tasks, avoiding some potential race conditions for Job statuses
bug fixes for State Import/Export
pinning python redis client to 2.10.6 (issue)

Upgrading to `v0.7`

Depending on what version of Combine you're upgrading from, it may be necessary to add the configuration MONGO_HOST to your localsettings.py configuration file. You can see an example in the localsettings.py.template file.

Run the built-in update command to run any migrations, restart services, and pull in new front-end static files:

cd /opt/combine
source activate combine
git checkout master
git pull
./manage.py update --release v0.7

Assets 2

22 Mar 13:15

ghukill

v0.6.3

f2fcb3e

v0.6.3

Release Notes - v0.6.3

Hot fix for bug in v0.6.2 where records harvested via OAI-PMH that were outside of any OAI sets were missing the required oai_set column.
fix for pyspark_shell.sh that runs Spark environment on local[*], accommodating firing in Docker environment, and not requiring stopping of Livy session

Upgrading to `v0.6.3`

Run the built-in update command to run any migrations, restart services, and pull in new front-end static files:

./manage.py update --release v0.6.3

Assets 2

19 Mar 12:12

ghukill

v0.6.2

e7103b5

v0.6.2

Release Notes - v0.6.2

Includes a couple fixes / improvements:

Closes issue #383, draggable Transformation Scenarios should be cross-browser
Allows for OAI-PMH harvesting of records not part of an OAI set with new "Harvest All Records" option

Upgrading to `v0.6.2`

Run the built-in update command to run any migrations, restart services, and pull in new front-end static files:

./manage.py update --release v0.6.2

Note: While not mandatory, it's been observed that adding the following Spark configuration may help when data is highly "skewed" in OAI harvests, meaning some sets are very large, or all records exist outside of OAI sets.

Add the following configuration to the file /opt/spark/conf/spark-defaults.conf, allowing RPC messages to be 1024mb:

spark.rpc.message.maxSize 1024

Assets 2

12 Mar 16:47

ghukill

v0.6.1

1c1d52b

v0.6.1

Release Notes - v0.6.1

Thanks to @dcmcand for catching this, bumps Redis client in requirements.txt.

Assets 2

01 Mar 17:47

ghukill

v0.6

2b81774

v0.6

Release Notes - v0.6

v0.6 includes the following two major additions:

publishing records, mapped fields, or tabular data to S3 buckets
- exporting documentation here
supports Docker deployment
- read more about that process here: https://github.com/WSULib/combine-docker

The route of building a server dedicated to Combine via Ansible will continue to be supported for the foreseeable future, but increased attention will likely go the Docker deployment that begins with this version v0.6.

Upgrading to `v0.6`

The addition of S3 publishing, and some additional configurations needed to support Dockerization, requires a couple of specific changes to files.

Update /opt/spark/conf/spark-defaults.conf. Add the following package to the setting spark.jars.packages which allows Spark to communicate with S3:

org.apache.hadoop:hadoop-aws:2.7.3

Add the following variables to the /opt/combine/localsettings.py file if your installation is Ansible server based (if you are deploying via Docker, these settings should be included automatically via thelocalsettings.py.docker file):

# Deployment type (suggested as first variable, for clarity's sake)
COMBINE_DEPLOYMENT = 'server'

# (suggested as part of "Spark Tuning" section)
TARGET_RECORDS_PER_PARTITION = 5000

# Mongo server
MONGO_HOST = '127.0.0.1'

As always, you can see examples of these settings in /opt/combine/localsettings.py.template.

Once these changes are made, it is recommended to run the update management command to install any required dependencies, pull in GUI changes, and restart everything:

# from /opt/combine
./manage.py update

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added

Changed

Release Notes - v0.9

Upgrading to `v0.9` (Ansible/Vagrant Server)

Switch from standalone Spark cluster to running in local mode

Update DPLA's Ingestion3 build

Finally, run update script as per normal

Upgrading to `v0.9` (Docker)

Release Notes - v0.8

Added

Changed

Upgrading to `v0.8`

Release Notes - v0.7.1

Upgrading to `v0.7.1`

Release Notes - v0.7

Upgrading to `v0.7`

Release Notes - v0.6.3

Upgrading to `v0.6.3`

Release Notes - v0.6.2

Upgrading to `v0.6.2`

Release Notes - v0.6.1

Release Notes - v0.6

Upgrading to `v0.6`

Releases: MI-DPLA/combine

v0.11

v0.10

Added

Changed

v0.9

Release Notes - v0.9

Upgrading to v0.9 (Ansible/Vagrant Server)

Switch from standalone Spark cluster to running in local mode

Update DPLA's Ingestion3 build

Finally, run update script as per normal

Upgrading to v0.9 (Docker)

v0.8

Release Notes - v0.8

Added

Changed

Upgrading to v0.8

v0.7.1

Release Notes - v0.7.1

Upgrading to v0.7.1

v0.7

Release Notes - v0.7

Upgrading to v0.7

v0.6.3

Release Notes - v0.6.3

Upgrading to v0.6.3

v0.6.2

Release Notes - v0.6.2

Upgrading to v0.6.2

v0.6.1

Release Notes - v0.6.1

v0.6

Release Notes - v0.6

Upgrading to v0.6

Upgrading to `v0.9` (Ansible/Vagrant Server)

Upgrading to `v0.9` (Docker)

Upgrading to `v0.8`

Upgrading to `v0.7.1`

Upgrading to `v0.7`

Upgrading to `v0.6.3`

Upgrading to `v0.6.2`

Upgrading to `v0.6`