Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Missing cases migrating from TH3 to TH4 #1682

Closed
mamoedo opened this issue Nov 24, 2020 · 9 comments
Closed

[Bug] Missing cases migrating from TH3 to TH4 #1682

mamoedo opened this issue Nov 24, 2020 · 9 comments
Assignees
Labels
Milestone

Comments

@mamoedo
Copy link

mamoedo commented Nov 24, 2020

Request Type

Bug

Work Environment

Question Answer
OS version (server) Debian
TheHive version / git hash 4.0.2
Package Type Binary

Problem Description

Some cases are missing when migrating multiple index from TH3 to different TH4 organizations.

Steps to Reproduce

  1. Migrate the first index (about 100 cases):
    ./migrate --output /etc/thehive/application.conf --main-organisation Small-Org --es-index the_hive_small --es-uri http://elasticsearch:9200 --exclude-audit-actions Update,Creation,Delete
  2. Wait for the migration to finish. Migrate the second index (about 700 cases):
    ./migrate --output /etc/thehive/application.conf --main-organisation Medium-Org --es-index the_hive_medium --es-uri http://elasticsearch:9200 --exclude-audit-actions Update,Creation,Delete
  3. Wait for the migration to finish. Migrate the second index (about 1500 cases):
    ./migrate --output /etc/thehive/application.conf --main-organisation Main-Org --es-index the_hive --es-uri http://elasticsearch:9200 --exclude-audit-actions Update,Creation,Delete
  4. Compare the number of migrated cases with the TH3 number of cases for each organization.

Possible Solutions

I think it could be a problem with the case numeration (see add. info. 3). I tried a few cases searching them by number, and there's only one case with each number in one organization. For example: if you find case 50 in org A, you won't find it in org B or C. If you find case 250 in org B, it won't exist on org A or C and so on.

Complementary information

  1. Note that when using the --es-index option, _15 is always appended to the index name.

  2. Migration log seems to skip some cases:

[info] [Migrate cases and alerts] CaseTemplate/Task:55 Action:82/564 Case/Task:341/1626 Case:200/1303 Job:15026/304192 ObservableType:23/63 Alert:10692/48043(5ms) CaseTemplate:15/20 Alert/Observable:99725(13ms) Case/Observable:16026/227354 User:14/19 CustomField:28/29 Case/Task/Log
:143/606
[info] [Migrate cases and alerts] CaseTemplate/Task:55 Action:82/564 Case/Task:345/1626(6ms) Case:203/1303(6ms) Job:15070/304192(7ms) ObservableType:23/63 Alert:10711/48043(6ms) CaseTemplate:15/20 Alert/Observable:100122(7ms) Case/Observable:16146/227354(33ms) User:14/19 CustomFiel
d:28/29 Case/Task/Log:143/606
  1. At the end of the main organization, it shows:

Case: 338/1303 (965 exists) avg:10ms

As if 965 cases were already migrated, but they were not for this organization and from this index. (Altough it's a number very close to Small-Org + MediumOrg)

  1. When there's the same user in more than one index, it's only migrated the first time. So if user Bob exists in org A,B,C, it will only be migrated to orgA.
@mamoedo
Copy link
Author

mamoedo commented Nov 27, 2020

I tried migrating first the Main Org with cases from 1 to 500 and then migrating the Small Org:

At the beginning of the migration, this errors are shown multiple times while preparing database:

[info] [Prepare database]
[info] [Prepare database]
[error] uncaught error, not retrying
java.lang.IllegalArgumentException: Update action [REMOVE_INDEX] cannot be invoked for index with status [INSTALLED]
        at org.janusgraph.core.schema.SchemaAction.isApplicableStatus(SchemaAction.java:79)
        at org.janusgraph.graphdb.database.management.ManagementSystem.updateIndex(ManagementSystem.java:798)
        at org.thp.scalligraph.janus.JanusDatabase.$anonfun$removeIndex$10(JanusDatabase.scala:451)
        at scala.Option.foreach(Option.scala:407)
        at org.thp.scalligraph.janus.JanusDatabase.$anonfun$removeIndex$9(JanusDatabase.scala:450)
        at org.thp.scalligraph.janus.JanusDatabase.$anonfun$managementTransaction$5(JanusDatabase.scala:257)
        at scala.util.Try$.apply(Try.scala:213)
        at org.thp.scalligraph.janus.JanusDatabase.$anonfun$managementTransaction$4(JanusDatabase.scala:257)
        at scala.util.Try$.apply(Try.scala:213)
        at org.thp.scalligraph.utils.Retry.org$thp$scalligraph$utils$Retry$$runTry(Retry.scala:58)
[info] [Prepare database]
[info] [Prepare database]

It's been a day and the same log is shown every 5 seconds or so. Alert/Observable is the only field increasing and it started on 1.

[info] [Migrate cases and alerts] Organisation:1/1 Alert:1/231 Alert/Observable:8 User:13/17 CustomField:2/3

Sometimes this warning is shown:

[warn] An error occurs (Adding this property for key [_label] and value [Tag] violates a uniqueness constraint [TagNamespacePredicateValue_1]), retrying (3)

@mamoedo
Copy link
Author

mamoedo commented Nov 30, 2020

This log is also shown when migrating Small-Org. It's very slow, something seems wrong:

[info] [Migrate cases and alerts] Alert:1/253
[error] Alert/Observable creation failure: org.thp.scalligraph.models.DatabaseException: Violation of database schema
[info] [Migrate cases and alerts] Alert:1/253 Alert/Observable:1(8535536ms)

@mamoedo
Copy link
Author

mamoedo commented Dec 10, 2020

Also, when this exception shows up

java.lang.NullPointerException: null
[error] Exception raised, rollback (null)
[error] Job creation failure: java.lang.NullPointerException

All jobs failed to migrate. I don't know if it's just a coincidence:

Job: 0/310105 (310102 failures) avg:2ms

@mamoedo
Copy link
Author

mamoedo commented Dec 10, 2020

In this state, when I try to import an alert in TH, it fails and this response is shown

{"type":"NotFoundError","message":"Alert not found"}

And when I click on a case, this log shows:


[error] o.t.s.ErrorHandler [00000050|] Internal error
java.lang.NullPointerException: Could not find type for id: 166933
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:923)
        at org.janusgraph.graphdb.types.vertices.JanusGraphSchemaVertex.name(JanusGraphSchemaVertex.java:57)
        at org.janusgraph.graphdb.query.vertex.BasicVertexCentricQueryBuilder.constructQueryWithoutProfile(BasicVertexCentricQueryBuilder.java:502)
        at org.janusgraph.graphdb.query.vertex.BasicVertexCentricQueryBuilder.constructQuery(BasicVertexCentricQueryBuilder.java:416)
        at org.janusgraph.graphdb.query.vertex.VertexCentricQueryBuilder.execute(VertexCentricQueryBuilder.java:68)
        at org.janusgraph.graphdb.query.vertex.VertexCentricQueryBuilder.vertices(VertexCentricQueryBuilder.java:114)
        at org.janusgraph.graphdb.tinkerpop.optimize.JanusGraphVertexStep.flatMap(JanusGraphVertexStep.java:192)
        at org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapStep.processNextStart(FlatMapStep.java:49)
        at org.janusgraph.graphdb.tinkerpop.optimize.JanusGraphVertexStep.processNextStart(JanusGraphVertexStep.java:177)
        at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)
[warn] o.t.s.ErrorHandler [00000050|] POST /api/v1/query?name=case-actions returned 500
java.lang.NullPointerException: Could not find type for id: 166933
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:923)
        at org.janusgraph.graphdb.types.vertices.JanusGraphSchemaVertex.name(JanusGraphSchemaVertex.java:57)
        at org.janusgraph.graphdb.query.vertex.BasicVertexCentricQueryBuilder.constructQueryWithoutProfile(BasicVertexCentricQueryBuilder.java:502)
        at org.janusgraph.graphdb.query.vertex.BasicVertexCentricQueryBuilder.constructQuery(BasicVertexCentricQueryBuilder.java:416)
        at org.janusgraph.graphdb.query.vertex.VertexCentricQueryBuilder.execute(VertexCentricQueryBuilder.java:68)
        at org.janusgraph.graphdb.query.vertex.VertexCentricQueryBuilder.vertices(VertexCentricQueryBuilder.java:114)
        at org.janusgraph.graphdb.tinkerpop.optimize.JanusGraphVertexStep.flatMap(JanusGraphVertexStep.java:192)
        at org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapStep.processNextStart(FlatMapStep.java:49)
        at org.janusgraph.graphdb.tinkerpop.optimize.JanusGraphVertexStep.processNextStart(JanusGraphVertexStep.java:177)
        at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:143)

@mamoedo
Copy link
Author

mamoedo commented Jan 20, 2021

I also tried this on 4.0.4 and it's not working

@mamoedo
Copy link
Author

mamoedo commented Jan 26, 2021

Another log that might be useful:

[warn] c.d.d.c.RequestHandler [|] Query '[4 bound values] SELECT column1,value,writetime(value) AS writetime,ttl(value) AS ttl FROM thehive.graphindex WHERE key=:key AND column1>=:sliceStart AND column1<:sliceEnd LIMIT :maxRows;' generated server side warning(s): Read 5000 live rows and 2634 tombstone cells for query SELECT * FROM thehive.graphindex WHERE key = 053e89a0446174e1 AND column1 > 0006446421c8 AND column1 < ff LIMIT 5000 (see tombstone_warn_threshold)

@andreacardaropoli
Copy link

andreacardaropoli commented Feb 11, 2021

I have experienced pretty much the same behavior.
I have tried to migrate cases, alerts and audit until a certain date, then from the date, I have seen exactly the same errors @mamoedo has experienced, afterwards I just cannot login:

[�[33mwarn�[0m] o.j.g.t.StandardJanusGraphTx - Query requires iterating over all vertices [(_label = User AND ~label = User AND login = xxx)]. For better performance, use indexes

@mamoedo
Copy link
Author

mamoedo commented Mar 5, 2021

@nadouani @To-om will this scope:thehive4-migration issues be checked out before TH 4.1.0 release?

@To-om
Copy link
Contributor

To-om commented Mar 8, 2021

Currently, it is not possible to migrate several TH3 index in the same TH4 database because case number must be unique. This explains why you get Case: 338/1303 (965 exists). I think I could add a parameter to shift case number and prevent collision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants