Metadata migration Closes #789 #1277

Merged
merged 1 commit into from Sep 9, 2016

Conversation

Projects
None yet
5 participants
Contributor

Horneth commented Aug 9, 2016 edited

Uses liquibase custom change to write classes that transform pre-0.20 existing data into 0.20 metadata.

  • Test at scale

Horneth added the in review label Aug 9, 2016

@geoffjentry geoffjentry commented on an outdated diff Aug 10, 2016

...l/database/migration/metadata/MetadataStatement.scala
@@ -0,0 +1,114 @@
+package cromwell.database.migration.metadata
+
+import java.sql.{Types, PreparedStatement, Timestamp}
+import java.time.{ZoneOffset, OffsetDateTime}
+import java.time.format.DateTimeFormatter
+
+import cromwell.database.SqlConverters._
+import liquibase.database.jvm.JdbcConnection
+import org.slf4j.LoggerFactory
+import wdl4s.values.{WdlBoolean, WdlFloat, WdlInteger, WdlValue}
+
+object MetadataStatement {
+ val workflowIdIdx = 1
@geoffjentry

geoffjentry Aug 10, 2016

Member

workflowIdIdx.capitalize (and for the rest)

mcovarr self-assigned this Aug 10, 2016

@mcovarr mcovarr commented on an outdated diff Aug 10, 2016

.../src/main/resources/changesets/metadata_migration.xml
@@ -0,0 +1,22 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.3.xsd">
+
+ <property name="clob.type" value="LONGTEXT" dbms="mysql"/>
+ <property name="clob.type" value="LONGVARCHAR" dbms="hsqldb"/>
+
+ <changeSet author="tjeandet" id="METADATA_MIGRATION">
+ <comment>
+ Metadata migration from 0.19 to 0.20.
@mcovarr

mcovarr Aug 10, 2016

Contributor

.21?

@mcovarr mcovarr and 1 other commented on an outdated diff Aug 10, 2016

...l/database/migration/metadata/MetadataMigration.scala
+ val executionDataResultSet = connection.createStatement().executeQuery(selectQuery)
+ val metadataInsertStatement = MetadataStatement.makeStatement(connection)
+
+ val executionIterator = new ResultSetIterator(executionDataResultSet)
+
+ executionIterator.zipWithIndex foreach {
+ case (row, idx) =>
+ migrateRow(connection, collectors, metadataInsertStatement, row, idx)
+ if (idx % 100 == 0) {
+ metadataInsertStatement.executeBatch()
+ connection.commit()
+ }
+ }
+
+ metadataInsertStatement.executeBatch()
+ connection.commit()
@mcovarr

mcovarr Aug 10, 2016

Contributor

is this going to be cool if the number of executions % 100 == 0?

@Horneth

Horneth Aug 10, 2016

Contributor

This was a problem on Hsql until 2.3.4 because they wouldn't execute empty batches, but they added a flag to allow so it should be fine now.

@mcovarr mcovarr commented on an outdated diff Aug 10, 2016

...l/database/migration/metadata/MetadataMigration.scala
+
+ private def findCollectorIds(connection: JdbcConnection) = {
+ val collectorsIdQuery =
+ """
+ |SELECT EXECUTION_ID
+ |FROM EXECUTION
+ |GROUP BY CALL_FQN, ATTEMPT, WORKFLOW_EXECUTION_ID
+ |HAVING COUNT(*) > 1
+ """.stripMargin
+
+ val collectorsRS = new ResultSetIterator(connection.createStatement().executeQuery(collectorsIdQuery))
+ collectorsRS map { _.getInt("EXECUTION_ID") } toSet
+ }
+
+ override def execute(database: Database): Unit = {
+ val dbConn = database.getConnection.asInstanceOf[JdbcConnection]
@mcovarr

mcovarr Aug 10, 2016

Contributor

move this inside the try?

@mcovarr mcovarr commented on an outdated diff Aug 10, 2016

...l/database/migration/metadata/MetadataStatement.scala
+ }
+
+ override def addEmptyValue(key: String): Unit = {
+ preparedStatement.setTimestamp(MetadataStatement.timestampIdx, dawn)
+ add(key, null, s"Failed to add empty value with key $key for workflow $workflowId")
+ }
+}
+
+class MetadataStatementForCall(preparedStatement: PreparedStatement, workflowId: String, callFqn: String, index: Int, attempt: Int) extends MetadataStatementForWorkflow(preparedStatement, workflowId) {
+ override def setStatement() = {
+ preparedStatement.setString(MetadataStatement.workflowIdIdx, workflowId)
+ preparedStatement.setString(MetadataStatement.callFqnIdx, callFqn)
+ preparedStatement.setInt(MetadataStatement.callIndexIdx, index)
+ preparedStatement.setInt(MetadataStatement.callAttemptIdx, attempt)
+ }
+}
@mcovarr

mcovarr Aug 10, 2016

Contributor

\n

@mcovarr mcovarr and 1 other commented on an outdated diff Aug 10, 2016

...l/database/migration/metadata/MetadataMigration.scala
+ val collectorsRS = new ResultSetIterator(connection.createStatement().executeQuery(collectorsIdQuery))
+ collectorsRS map { _.getInt("EXECUTION_ID") } toSet
+ }
+
+ override def execute(database: Database): Unit = {
+ val dbConn = database.getConnection.asInstanceOf[JdbcConnection]
+ try {
+ dbConn.setAutoCommit(false)
+ migrate(dbConn, findCollectorIds(dbConn))
+ } catch {
+ case t: CustomChangeException => throw t
+ case t: Throwable => throw new CustomChangeException("Could not apply migration script for metadata", t)
+ }
+ }
+
+ override def setUp(): Unit = { }
@mcovarr

mcovarr Aug 10, 2016

Contributor

= ()

@geoffjentry

geoffjentry Aug 10, 2016

Member

TOL (i.e. it just reminded me of something I was thinking about this morning) - we might want to consider turning on -Ywarn-value-discard compiler feature. I keep reading about people having bugs due to unit return types and other tomfoolery

@mcovarr

mcovarr Aug 10, 2016

Contributor

Ooh that would make a nice Developer's Choice ticket

@geoffjentry

geoffjentry Aug 10, 2016

Member

If we start talking that route there are all kinds of compiler flags I'd like to turn on, which unfortunately would probably break a bunch of stuff at the moment

@mcovarr

mcovarr Aug 10, 2016

Contributor

I suspect IntelliJ probably does an adequate job warning of these cases, but I find it a weird default behavior for everything to be silently coerced to the Unit value.

@geoffjentry

geoffjentry Aug 10, 2016

Member

I'm sure @Horneth loves this side conversation ;)

Recently I came across a blog post describing people running into this, which caused them much pain (can only find the SIP, not the blog): https://issues.scala-lang.org/browse/SI-9822

@mcovarr mcovarr commented on an outdated diff Aug 10, 2016

...l/database/migration/metadata/MetadataStatement.scala
+ protected def setStatement() = {
+ preparedStatement.setString(MetadataStatement.workflowIdIdx, workflowId)
+ preparedStatement.setNull(MetadataStatement.callFqnIdx, Types.VARCHAR)
+ preparedStatement.setNull(MetadataStatement.callIndexIdx, Types.INTEGER)
+ preparedStatement.setNull(MetadataStatement.callAttemptIdx, Types.INTEGER)
+ }
+
+ protected def addDataAndBatch(key: String, value: Any) = {
+ preparedStatement.setString(MetadataStatement.keyIdx, key)
+
+ // Set the value and type
+ value match {
+ case null =>
+ preparedStatement.setNull(MetadataStatement.valueIdx, Types.VARCHAR)
+ preparedStatement.setNull(MetadataStatement.valueTypeIdx, Types.VARCHAR) // Null values have null type
+ case v =>

@mcovarr mcovarr commented on an outdated diff Aug 10, 2016

...ion/metadata/table/ExecutionEventTableMigration.scala
+ |SELECT DESCRIPTION, EXECUTION.EXECUTION_ID, EXECUTION_EVENT.START_DT, EXECUTION_EVENT.END_DT, EXECUTION.CALL_FQN,
+ | EXECUTION.IDX, EXECUTION.ATTEMPT, EXECUTION.EXECUTION_ID, WORKFLOW_EXECUTION.WORKFLOW_EXECUTION_UUID
+ |FROM EXECUTION_EVENT
+ | LEFT JOIN EXECUTION ON EXECUTION.EXECUTION_ID = EXECUTION_EVENT.EXECUTION_ID
+ | LEFT JOIN WORKFLOW_EXECUTION ON WORKFLOW_EXECUTION.WORKFLOW_EXECUTION_ID = EXECUTION.WORKFLOW_EXECUTION_ID;
+ """.stripMargin
+
+ override protected def migrateRow(connection: JdbcConnection, collectors: Set[Int],
+ statement: PreparedStatement, row: ResultSet, idx: Int): Unit = {
+ if (!collectors.contains(row.getInt("EXECUTION_ID"))) {
+ val metadataStatement = new MetadataStatementForCall(statement,
+ row.getString("WORKFLOW_EXECUTION_UUID"),
+ row.getString("CALL_FQN"),
+ row.getInt("IDX"),
+ row.getInt("ATTEMPT")
+ )
@mcovarr

mcovarr Aug 10, 2016

Contributor

The if and the MetadataStatementForCall construction seems to occur in a lot of these migrations, can it be refactored to the trait?

@mcovarr mcovarr commented on an outdated diff Aug 10, 2016

...l/database/migration/metadata/MetadataMigration.scala
+ val executionIterator = new ResultSetIterator(executionDataResultSet)
+
+ executionIterator.zipWithIndex foreach {
+ case (row, idx) =>
+ migrateRow(connection, collectors, metadataInsertStatement, row, idx)
+ if (idx % 100 == 0) {
+ metadataInsertStatement.executeBatch()
+ connection.commit()
+ }
+ }
+
+ metadataInsertStatement.executeBatch()
+ connection.commit()
+ }
+
+ private var resourceAccessor: ResourceAccessor = null
@mcovarr

mcovarr Aug 10, 2016

Contributor

IntelliJ requests = _

@mcovarr mcovarr commented on an outdated diff Aug 10, 2016

...l/database/migration/metadata/MetadataMigration.scala
+ """
+ |SELECT EXECUTION_ID
+ |FROM EXECUTION
+ |GROUP BY CALL_FQN, ATTEMPT, WORKFLOW_EXECUTION_ID
+ |HAVING COUNT(*) > 1
+ """.stripMargin
+
+ val collectorsRS = new ResultSetIterator(connection.createStatement().executeQuery(collectorsIdQuery))
+ collectorsRS map { _.getInt("EXECUTION_ID") } toSet
+ }
+
+ override def execute(database: Database): Unit = {
+ val dbConn = database.getConnection.asInstanceOf[JdbcConnection]
+ try {
+ dbConn.setAutoCommit(false)
+ migrate(dbConn, findCollectorIds(dbConn))
@mcovarr

mcovarr Aug 10, 2016

Contributor

Not sure what the execution time is like for running all these migrations on a "mature" database, but might it be worth caching the collector IDs since it doesn't look like the results should change?

Contributor

mcovarr commented Aug 10, 2016 edited

👍 delta your "test at scale" item. 😄 Very cool!

Approved with PullApprove

@mcovarr mcovarr assigned geoffjentry and unassigned mcovarr Aug 10, 2016

Member

geoffjentry commented Aug 11, 2016 edited by cjllanwarne

💯 / 💯 ✔️

👍

Approved with PullApprove

Contributor

ruchim commented Aug 11, 2016

This is such a cool branch Thibault, I didn't realize Liquibase could do so much!

Contributor

Horneth commented Aug 11, 2016

Yeah I didn't know either but it's pretty useful in this case !

@mcovarr mcovarr assigned Horneth and unassigned geoffjentry Aug 12, 2016

@mcovarr mcovarr commented on an outdated diff Aug 22, 2016

...e/migration/metadata/table/SymbolTableMigration.scala
+import wdl4s.types.{WdlPrimitiveType, WdlType}
+import wdl4s.values._
+
+import scala.util.{Failure, Success, Try}
+
+abstract class SymbolTableMigration extends MetadataMigration {
+ override protected def migrateRow(connection: JdbcConnection, statement: PreparedStatement, row: ResultSet, idx: Int): Unit = {
+ // Try to coerce the value to a WdlValue
+ val value = for {
+ wdlTypeValue <- Try(row.getString("WDL_TYPE"))
+ wdlType <- Try(WdlType.fromWdlString(wdlTypeValue))
+ wdlValueValue <- Try(row.getString("WDL_VALUE"))
+ inflated <- inflate(wdlValueValue)
+
+ wdlValue <- wdlType match {
+ case p: WdlPrimitiveType => p.coerceRawValue(inflated)
@mcovarr

mcovarr Aug 22, 2016 edited

Contributor

Unfortunate that this distinction in behavior needs to be in the caller; it would be nice if WdlPrimitives overrode fromWdlString to do coerceRawValue.

@mcovarr mcovarr and 1 other commented on an outdated diff Aug 23, 2016

...n/metadata/table/CallOutputSymbolTableMigration.scala
+ SELECT SYMBOL.SCOPE, SYMBOL.NAME, MaxExecution.IDX, IO, WDL_TYPE,
+ WDL_VALUE, WORKFLOW_EXECUTION_UUID, ExecutionId.EXECUTION_ID, REPORTABLE_RESULT,
+ MaxExecution.MaxAttempt
+ FROM SYMBOL
+ LEFT JOIN WORKFLOW_EXECUTION ON SYMBOL.WORKFLOW_EXECUTION_ID = WORKFLOW_EXECUTION.WORKFLOW_EXECUTION_ID
+ LEFT JOIN(
+ SELECT CALL_FQN, IDX, WORKFLOW_EXECUTION_ID, MAX(EXECUTION.ATTEMPT) AS MaxAttempt
+ FROM EXECUTION
+ GROUP BY EXECUTION.CALL_FQN, EXECUTION.IDX, EXECUTION.WORKFLOW_EXECUTION_ID
+ ) MaxExecution
+ ON SYMBOL.SCOPE = MaxExecution.CALL_FQN
+ AND SYMBOL.WORKFLOW_EXECUTION_ID = MaxExecution.WORKFLOW_EXECUTION_ID
+ AND SYMBOL.IDX = MaxExecution.IDX
+ LEFT JOIN(SELECT EXECUTION_ID, CALL_FQN, IDX, WORKFLOW_EXECUTION_ID, ATTEMPT
+ FROM EXECUTION
+ ) ExecutionId
@mcovarr

mcovarr Aug 23, 2016

Contributor

Why is this a subquery, couldn't it just LEFT JOIN EXECUTION?

@Horneth

Horneth Aug 23, 2016

Contributor

Yes and it should

@mcovarr mcovarr commented on an outdated diff Aug 23, 2016

...l/database/migration/metadata/MetadataMigration.scala
+ } else executionIterator
+
+ filtered.zipWithIndex foreach {
+ case (row, idx) =>
+ migrateRow(connection, metadataInsertStatement, row, idx)
+ if (idx % 100 == 0) {
+ metadataInsertStatement.executeBatch()
+ connection.commit()
+ }
+ }
+
+ metadataInsertStatement.executeBatch()
+ connection.commit()
+ }
+
+ private var resourceAccessor: ResourceAccessor = _
@mcovarr

mcovarr Aug 23, 2016

Contributor

is this thing actually used?

@mcovarr mcovarr and 1 other commented on an outdated diff Aug 23, 2016

...n/metadata/table/CallOutputSymbolTableMigration.scala
@@ -0,0 +1,49 @@
+package cromwell.database.migration.metadata.table
+
+import java.sql.{PreparedStatement, ResultSet}
+
+import cromwell.database.migration.metadata.MetadataStatementForCall
+import wdl4s.values._
+
+class CallOutputSymbolTableMigration extends SymbolTableMigration {
+ override protected def selectQuery: String = """
+ SELECT SYMBOL.SCOPE, SYMBOL.NAME, MaxExecution.IDX, IO, WDL_TYPE,
+ WDL_VALUE, WORKFLOW_EXECUTION_UUID, ExecutionId.EXECUTION_ID, REPORTABLE_RESULT,
+ MaxExecution.MaxAttempt
+ FROM SYMBOL
+ LEFT JOIN WORKFLOW_EXECUTION ON SYMBOL.WORKFLOW_EXECUTION_ID = WORKFLOW_EXECUTION.WORKFLOW_EXECUTION_ID
@mcovarr

mcovarr Aug 23, 2016

Contributor

seems like the LEFT part of the LEFT JOIN should be unnecessary?

@Horneth

Horneth Aug 23, 2016

Contributor

Yep - I have cleaned this up in all the queries just haven't pushed it yet xD

@mcovarr mcovarr commented on an outdated diff Aug 23, 2016

...n/metadata/table/CallOutputSymbolTableMigration.scala
+
+import cromwell.database.migration.metadata.MetadataStatementForCall
+import wdl4s.values._
+
+class CallOutputSymbolTableMigration extends SymbolTableMigration {
+ override protected def selectQuery: String = """
+ SELECT SYMBOL.SCOPE, SYMBOL.NAME, MaxExecution.IDX, IO, WDL_TYPE,
+ WDL_VALUE, WORKFLOW_EXECUTION_UUID, ExecutionId.EXECUTION_ID, REPORTABLE_RESULT,
+ MaxExecution.MaxAttempt
+ FROM SYMBOL
+ LEFT JOIN WORKFLOW_EXECUTION ON SYMBOL.WORKFLOW_EXECUTION_ID = WORKFLOW_EXECUTION.WORKFLOW_EXECUTION_ID
+ LEFT JOIN(
+ SELECT CALL_FQN, IDX, WORKFLOW_EXECUTION_ID, MAX(EXECUTION.ATTEMPT) AS MaxAttempt
+ FROM EXECUTION
+ GROUP BY EXECUTION.CALL_FQN, EXECUTION.IDX, EXECUTION.WORKFLOW_EXECUTION_ID
+ ) MaxExecution
@mcovarr

mcovarr Aug 23, 2016

Contributor

I wonder if it would be faster to MAX on the execution ID rather than the attempt?

@kshakir kshakir commented on an outdated diff Aug 25, 2016

.../main/scala/cromwell/database/migration/package.scala
@@ -0,0 +1,10 @@
+package cromwell.database
+
+import java.sql.ResultSet
+
+package object migration {
+ class ResultSetIterator(rs: ResultSet) extends Iterator[ResultSet] {
@kshakir

kshakir Aug 25, 2016

Contributor

Move this to ResultSetIterator.scala

Contributor

kshakir commented Aug 31, 2016

Re: the hsqldb compliant commit: There should be multiple ways to check at runtime the database type, and then customize the code/SQL within the CustomTaskChange. For example, matching on the liquibase.database.Database, or go further and use the Database.getDatabaseProductName. The latter is a wrapper to java.sql.Connection.getMetadata.getDatabaseProductName. For HSQL it should return HsqlDatabaseProperties.PRODUCT_NAME, or the hard coded string "MySQL".

Contributor

Horneth commented Aug 31, 2016 edited

@kshakir I tagged everything with dbms="mysql" after the standup discussion and removed the hsql specific code...

@mcovarr mcovarr and 1 other commented on an outdated diff Sep 2, 2016

...scala/cromwell/core/simpleton/WdlValueSimpleton.scala
@@ -26,6 +27,7 @@ object WdlValueSimpleton {
implicit class WdlValueSimplifier(wdlValue: WdlValue) {
def simplify(name: String): Iterable[WdlValueSimpleton] = wdlValue match {
case prim: WdlPrimitive => List(WdlValueSimpleton(name, prim))
+ case expr: WdlExpression => List(WdlValueSimpleton(name, WdlString(expr.valueString)))
@mcovarr

mcovarr Sep 2, 2016

Contributor

How is this currently being used? I had thought that WdlValueSimpletons were used only for expressions that had been fully evaluated to WdlValues, in particular the cache and the job store. It doesn't look like WdlValueBuilder could currently handle reversing this simplification.

@Horneth

Horneth Sep 2, 2016

Contributor

Some symbols are stored as WdlExpression so I got a match error when trying to simpletonize them. I can filter them out earlier before calling simplify if that's a problem, but simplify should at least handle this case even if it's just throwing an exception

@mcovarr mcovarr commented on an outdated diff Sep 2, 2016

...sets/migration/metadata/CreateAndLoadTmpCollector.sql
@@ -0,0 +1,19 @@
+CREATE TABLE TMP_COLLECTOR
+(
+ TMP_COLLECTOR_ID INT PRIMARY KEY NOT NULL AUTO_INCREMENT,
+ EXECUTION_ID INT NOT NULL
+);
+
+INSERT INTO TMP_COLLECTOR (
+ EXECUTION_ID
+)
+ SELECT
+ EXECUTION_ID
+ FROM EXECUTION e
+ WHERE e.CALL_FQN LIKE '%$%' OR -- filter out scatters
@mcovarr

mcovarr Sep 2, 2016

Contributor

I guess the temp table is slightly misnamed if it's getting scatters too

@mcovarr mcovarr commented on an outdated diff Sep 2, 2016

...igration/metadata/table/ExecutionTableMigration.scala
+ METADATA_TIMESTAMP
+ )
+ SELECT
+ WORKFLOW_EXECUTION_UUID,
+ 'start',
+ CALL_FQN,
+ IDX,
+ ATTEMPT,
+ DATE_FORMAT(e.START_DT, '%Y-%m-%dT%T.%f$Offset'),
+ 'string',
+ NOW()
+ FROM EXECUTION e
+ JOIN WORKFLOW_EXECUTION we ON we.WORKFLOW_EXECUTION_ID = e.WORKFLOW_EXECUTION_ID
+ WHERE
+ e.START_DT IS NOT NULL AND
+ e.EXECUTION_ID NOT IN (SELECT EXECUTION_ID FROM TMP_COLLECTOR);""".stripMargin
@mcovarr

mcovarr Sep 2, 2016

Contributor

no margins in need of stripping here 😄

@mcovarr

mcovarr Sep 2, 2016 edited

Contributor

ToL these queries seem more interested in finding out what executions are not scatters/collectors than which are scatters / collectors. Perhaps there should be a table that contains the non-scatter/collector executions and then these queries could just join to that.

@mcovarr mcovarr and 1 other commented on an outdated diff Sep 2, 2016

...es/changesets/migration/metadata/LoadInputSymbols.sql
+ s.SCOPE,
+ e.IDX,
+ e.ATTEMPT,
+ s.WDL_VALUE,
+ s.WDL_TYPE
+ FROM SYMBOL s
+ JOIN WORKFLOW_EXECUTION we ON
+ we.WORKFLOW_EXECUTION_ID = s.WORKFLOW_EXECUTION_ID
+ LEFT JOIN EXECUTION e ON
+ e.CALL_FQN = s.SCOPE AND
+ e.WORKFLOW_EXECUTION_ID = s.WORKFLOW_EXECUTION_ID
+ -- Don't join on index here because inputs have index null (-1), but we want to duplicate them as many times as there are indices for a given execution
+ WHERE
+ s.IO = 'INPUT' AND
+ (
+ e.EXECUTION_ID IS NULL OR -- If it's a workflow output the execution ID will be null but we want to keep it
@mcovarr

mcovarr Sep 2, 2016

Contributor

but this is looking at INPUTs?

@Horneth

Horneth Sep 2, 2016

Contributor

the comment is wrong - it's workflow input

@mcovarr mcovarr and 1 other commented on an outdated diff Sep 2, 2016

...onevent/ExecutionEventTableDescriptionMigration.scala
+ override def queries: Array[String] = {
+ Array(
+ """
+ |INSERT INTO METADATA_JOURNAL (
+ | WORKFLOW_EXECUTION_UUID,
+ | METADATA_KEY,
+ | CALL_FQN,
+ | JOB_SCATTER_INDEX,
+ | JOB_RETRY_ATTEMPT,
+ | METADATA_VALUE,
+ | METADATA_VALUE_TYPE,
+ | METADATA_TIMESTAMP
+ |)
+ | SELECT
+ | WORKFLOW_EXECUTION_UUID,
+ | CONCAT('executionEvents[', ev.EVENT_ID ,']:description'),
@mcovarr

mcovarr Sep 2, 2016

Contributor

The index here is a synthetic PK in the EXECUTION_EVENTS table, does that make sense here?

@Horneth

Horneth Sep 2, 2016

Contributor

Hmm I'm not sure I understand what you're asking

@Horneth

Horneth Sep 6, 2016

Contributor

Oh I get it now -_- I was just looking for something unique so they wouldn't override each other. The order in the list doesn't matter as much I think as the elements have a startTime and endTime field.

@mcovarr mcovarr commented on an outdated diff Sep 2, 2016

...ation/metadata/table/FailureEventTableMigration.scala
+ override def queries: Array[String] = {
+ Array(
+ """
+ |INSERT INTO METADATA_JOURNAL (
+ | WORKFLOW_EXECUTION_UUID,
+ | METADATA_KEY,
+ | CALL_FQN,
+ | JOB_SCATTER_INDEX,
+ | JOB_RETRY_ATTEMPT,
+ | METADATA_VALUE,
+ | METADATA_VALUE_TYPE,
+ | METADATA_TIMESTAMP
+ |)
+ |SELECT
+ | WORKFLOW_EXECUTION_UUID,
+ | CONCAT('failures[', fe.FAILURE_EVENT_ID ,']:failure'),
@mcovarr

mcovarr Sep 2, 2016

Contributor

same question as below about using a synthetic PK as a subscript for this metadata

@mcovarr mcovarr and 1 other commented on an outdated diff Sep 2, 2016

...igration/metadata/table/ExecutionTableMigration.scala
+ WORKFLOW_EXECUTION_UUID,
+ METADATA_KEY,
+ CALL_FQN,
+ JOB_SCATTER_INDEX,
+ JOB_RETRY_ATTEMPT,
+ METADATA_VALUE,
+ METADATA_VALUE_TYPE,
+ METADATA_TIMESTAMP
+ )
+ SELECT
+ WORKFLOW_EXECUTION_UUID,
+ 'start',
+ CALL_FQN,
+ IDX,
+ ATTEMPT,
+ DATE_FORMAT(e.START_DT, '%Y-%m-%dT%T.%f$Offset'),
@mcovarr

mcovarr Sep 2, 2016

Contributor

Is it correct to apply the $Offset here? I thought the original times were UTC and should be kept that way?

@Horneth

Horneth Sep 2, 2016

Contributor

I double checked in the code, but from what I see we inject
new Timestamp(new Date().getTime) (here + here) as start and end times, which on my machine returns local time:

scala> import java.sql.Timestamp
import java.sql.Timestamp
scala> import java.util.Date
import java.util.Date
scala> new Timestamp(new Date().getTime)
res0: java.sql.Timestamp = 2016-09-02 18:08:27.173

@mcovarr mcovarr and 1 other commented on an outdated diff Sep 2, 2016

...onevent/ExecutionEventTableDescriptionMigration.scala
@@ -0,0 +1,37 @@
+package cromwell.database.migration.metadata.table.executionevent
+
+import cromwell.database.migration.metadata.MetadataCustomSql
+
+class ExecutionEventTableDescriptionMigration extends MetadataCustomSql {
@mcovarr

mcovarr Sep 2, 2016

Contributor

not sure why this one is a custom sql migration?

@Horneth

Horneth Sep 2, 2016

Contributor

It doesn't have to - I just thought migration queries are already all over the place so I at least wanted to keep tables together in the same "logic area"

@kshakir kshakir and 1 other commented on an outdated diff Sep 6, 2016

core/src/main/resources/application.conf
@@ -362,7 +362,7 @@ database {
driver = "slick.driver.HsqldbDriver$"
db {
driver = "org.hsqldb.jdbcDriver"
- url = "jdbc:hsqldb:mem:${uniqueSchema};shutdown=false;hsqldb.tx=mvcc"
+ url = "jdbc:hsqldb:mem:${uniqueSchema};shutdown=false;hsqldb.tx=mvcc;allow_empty_batch=true"
@kshakir

kshakir Sep 6, 2016

Contributor

ToL: Is this still required for all HSQLDB urls? If so, we may want to start documenting these requirements.

@Horneth

Horneth Sep 6, 2016

Contributor

I guess not since the migration only applies to mysql. However there will be some flag required for mysql which I thought could be documented in the Changelog

@geoffjentry geoffjentry commented on an outdated diff Sep 6, 2016

...ation/metadata/CreateAndLoadTmpExecutionMigration.sql
+ START_DT,
+ END_DT,
+ ALLOWS_RESULT_REUSE,
+ DOCKER_IMAGE_HASH,
+ EXECUTION_HASH,
+ ATTEMPT,
+ BACKEND_TYPE
+ FROM EXECUTION e
+ WHERE e.CALL_FQN NOT LIKE '%$%' AND -- filter out scatters
+ NOT (e.IDX = -1 AND EXISTS ( -- filter out collectors
+ SELECT 1 FROM EXECUTION e2 WHERE
+ e2.WORKFLOW_EXECUTION_ID = e.WORKFLOW_EXECUTION_ID AND
+ e2.CALL_FQN = e.CALL_FQN AND
+ e2.IDX != -1)
+ );
+

@kshakir kshakir commented on the diff Sep 7, 2016

...ain/resources/changesets/standardize_column_names.xml
+ </comment>
+ <renameColumn columnDataType="VARCHAR(255)"
+ newColumnName="CALL_FQN"
+ oldColumnName="METADATA_CALL_FQN"
+ tableName="METADATA_JOURNAL"/>
+ <renameColumn columnDataType="INT"
+ newColumnName="JOB_SCATTER_INDEX"
+ oldColumnName="METADATA_CALL_INDEX"
+ tableName="METADATA_JOURNAL"/>
+ <renameColumn columnDataType="INT"
+ newColumnName="JOB_RETRY_ATTEMPT"
+ oldColumnName="METADATA_CALL_ATTEMPT"
+ tableName="METADATA_JOURNAL"/>
+ </changeSet>
+
+</databaseChangeLog>
@kshakir

kshakir Sep 7, 2016

Contributor

\n, mainly because after this gets merged, the md5 is saved into the database and I can't change it in another PR later.

@geoffjentry

geoffjentry Sep 7, 2016

Member

Oh now I think that @Horneth should leave it like this just to make @kshakir forever mournful

@Horneth Horneth Metadata Migration
e2850b0

@Horneth Horneth merged commit 74c058d into develop Sep 9, 2016

4 checks passed

code-review/pullapprove Approved by geoffjentry, mcovarr
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage remained the same at 66.853%
Details
Member

geoffjentry commented Sep 9, 2016

@Horneth woohoo!

mcovarr deleted the cromwell-789 branch Nov 30, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment