Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] (Still) slow loading of list-tags endpoint with 4.1.2 #1914

Closed
crackytsi opened this issue Mar 30, 2021 · 20 comments
Closed

[Bug] (Still) slow loading of list-tags endpoint with 4.1.2 #1914

crackytsi opened this issue Mar 30, 2021 · 20 comments
Assignees
Labels
Milestone

Comments

@crackytsi
Copy link

Request Type

Bug

Work Environment

Question Answer
OS version (server) Debian
OS version (client) 10
TheHive version / git hash 4.1.2
Package Type DEB

Problem Description

Sorry for popping up this toppic once again :-(

Starting the TheHive page or directly jumping to a case using a link (still) takes several time.
By analyzing this, the root-cause seems to be related to URL:

/api/v1/query?name=list-tags

On my browser it takes about 16 seconds (sec not ms!) to download the file, but the file-size is only 993 Bytes according to browser developer-console.

Meanwhile TheHive application.log shows:
2021-03-30 06:03:17,902 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-13 [00000520|355b8639] 10.32.30.10 POST /api/v1/query?name=list-tags took 29ms and returned 200
Screenshot of developer console can be seen at the old case: #1869
I can also see that the load on cassandra is very high, I have a system with 8 cores, much more powerfull than the system on TH 3.4 that handled the same cases.

Possible Solutions

Caching? Indexing of Tags? I don't know :-(

@crackytsi crackytsi added TheHive4 TheHive4 related issues bug labels Mar 30, 2021
@nadouani
Copy link
Contributor

@crackytsi can you please share the POST request body for the /api/v1/query?name=list-tags call that takes 12s?

@nadouani nadouani self-assigned this Mar 31, 2021
@crackytsi
Copy link
Author

image

image

@crackytsi
Copy link
Author

image

@crackytsi
Copy link
Author

image

@crackytsi
Copy link
Author

Sure :) Attached some data, if you need more, let me know ;)

@mamoedo
Copy link

mamoedo commented Mar 31, 2021

Same problem here!

Slow

@mamoedo
Copy link

mamoedo commented Mar 31, 2021

I hope this helps, my instance is not usable right now:

Image

The same queries in Kibana takes miliseconds

@nadouani
Copy link
Contributor

nadouani commented Apr 1, 2021

@mamoedo

  • what type of index you are using (Elasticsearch or Lucene)
  • what's the status of your index (just asking)
  • are using a clustered installation of TheHive
  • what are the filters you are applying in this screenshot
  • what's the behavior of the alert list without filters?

@mamoedo
Copy link

mamoedo commented Apr 1, 2021

@mamoedo

* what type of index you are using (Elasticsearch or Lucene)

* what's the status of your index (just asking)

* are using a clustered installation of TheHive

* what are the filters you are applying in this screenshot

* what's the behavior of the alert list without filters?
  • I'm using ES
  • Yellow status. I don't know why it's yellow because it was created by TheHive, and other index like Kibana's are green.
  • I'm not using TheHive clustered
  • In the alerts screenshot I was using only the "imported" filter
  • When I click Alerts it takes a while to load the total number (YYY) of alerts "List of alerts (XX of YYY)", then it loads the alert list at the same time that it loads the number (XX) of filtered alerts.

If I go as admin to the Platform Status, the section "Data index status" is stuck in "Loading index status..." forever.

I'm trying to give as many resources as I can to the Cassandra and ES machines but I'm afraid that it's an implementation issue, because from Kibana the queries are super fast, faster than in TH3.

@mamoedo
Copy link

mamoedo commented Apr 1, 2021

Also, uploading task logs takes minutes

@To-om
Copy link
Contributor

To-om commented Apr 1, 2021

@mamoedo Your index doesn't seem to work. Can you log to TheHive with global admin user and go to http://thehive:9000/api/v1/admin/index/Alert/reindex, then check logs and wait until you see the message Reindex job is finished or Reindex job has been cancelled.
After that, tell me if alert list is faster.

@nadouani
Copy link
Contributor

nadouani commented Apr 1, 2021

Or export the platform status report and share it with us. (It doesn't contain any sensitive data)

@mamoedo
Copy link

mamoedo commented Apr 1, 2021

Or export the platform status report and share it with us. (It doesn't contain any sensitive data)

The problem is that I can't see anything, it's not loading

{
  "indexStatus": {},
  "checkStatus": {},
  "schemaStatus": [
    {
      "name": "thehive-cortex",
      "currentVersion": 2,
      "expectedVersion": 2,
      "error": null,
      "$$hashKey": "object:1487"
    },
    {
      "name": "thehive",
      "currentVersion": 69,
      "expectedVersion": 69,
      "error": null,
      "$$hashKey": "object:1488"
    }
  ]
}

@mamoedo
Copy link

mamoedo commented Apr 1, 2021

@mamoedo Your index doesn't seem to work. Can you log to TheHive with global admin user and go to http://thehive:9000/api/v1/admin/index/Alert/reindex, then check logs and wait until you see the message Reindex job is finished or Reindex job has been cancelled.
After that, tell me if alert list is faster.

Done. It finished and it doesn't go faster, in fact I went to the alerts list and 7 tasks were pending without loading anything for minutes :(

@To-om
Copy link
Contributor

To-om commented Apr 2, 2021

@mamoedo can you increase log level by updating /etc/thehive/logback.xml (<logger name="org.thp" level="INFO"/> => <logger name="org.thp" level="TRACE"/>) and share your logs, in particular the section where you request the list of alerts ?
Then revert the modification of logback file (debug has negative effect on performance).
I try to understand why the index is not used.

@nadouani
Copy link
Contributor

nadouani commented Apr 2, 2021

Hello @mamoedo We would like to have a call with you to troubleshoot this problem. We need to run the queries directly on the database and profile them. It looks like the index is not used and we need to verify that as we don't reproduce. Are you on discord? chat.thehive-project.org?

@nadouani
Copy link
Contributor

nadouani commented Apr 2, 2021

@crackytsi do you have the same behavior as @mamoedo with requests other that the list of tags?

@mamoedo
Copy link

mamoedo commented Apr 5, 2021

@mamoedo can you increase log level by updating /etc/thehive/logback.xml (<logger name="org.thp" level="INFO"/> => <logger name="org.thp" level="TRACE"/>) and share your logs, in particular the section where you request the list of alerts ?
Then revert the modification of logback file (debug has negative effect on performance).
I try to understand why the index is not used.

I did the modification and restarted the application and I think the logs are the same:

/etc/thehive/logback.xml

<?xml version="1.0" encoding="UTF-8"?>
<configuration debug="false">

    <conversionRule conversionWord="coloredLevel"
                    converterClass="play.api.libs.logback.ColoredLevel"/>

    <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>/var/log/thehive/application.log</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
            <fileNamePattern>/var/log/thehive/application.%i.log.zip</fileNamePattern>
            <minIndex>1</minIndex>
            <maxIndex>10</maxIndex>
        </rollingPolicy>
        <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
            <maxFileSize>10MB</maxFileSize>
        </triggeringPolicy>
        <encoder>
            <pattern>%date [%level] from %logger in %thread [%X{request}|%X{tx}] %message%n%xException</pattern>
        </encoder>
    </appender>

    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>%coloredLevel %logger{15} [%X{request}|%X{tx}] %message%n%xException{10}
            </pattern>
        </encoder>
    </appender>

    <appender name="ASYNCFILE" class="ch.qos.logback.classic.AsyncAppender">
        <appender-ref ref="FILE"/>
    </appender>

    <appender name="ASYNCSTDOUT" class="ch.qos.logback.classic.AsyncAppender">
        <appender-ref ref="STDOUT"/>
    </appender>

    <logger name="org.thp" level="TRACE"/>

    <root level="INFO">
        <appender-ref ref="ASYNCFILE"/>
        <appender-ref ref="ASYNCSTDOUT"/>
    </root>

</configuration>

2021-04-05 08:17:23,799 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-5 [000000e7|] XXX.XXX.XXX.1 POST /api/v1/query?name=alerts took 20ms and returned 200
2021-04-05 08:17:28,381 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-18 [000000ea|] XXX.XXX.XXX.1 GET / took 1ms and returned 308 0 bytes
2021-04-05 08:17:28,397 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-21 [000000eb|] XXX.XXX.XXX.1 GET / took 0ms and returned 308 0 bytes
2021-04-05 08:17:28,739 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-18 [000000ec|] XXX.XXX.XXX.1 GET / took 1ms and returned 308 0 bytes
2021-04-05 08:17:28,748 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-18 [000000ed|] XXX.XXX.XXX.1 GET / took 0ms and returned 308 0 bytes
2021-04-05 08:17:39,563 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-19 [000000ee|] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:17:39,646 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-19 [000000ef|] XXX.XXX.XXX.1 GET /api/status took 0ms and returned 200 612 bytes
2021-04-05 08:17:39,719 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-21 [000000f0|] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:18:05,038 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-19 [000000e0|] XXX.XXX.XXX.1 GET /api/stream/PoAQC6xbrELP1SHPAhez took 60019ms and returned 200 2 bytes
2021-04-05 08:18:06,886 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-5 [000000f2|] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:18:10,400 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-21 [000000f3|] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:18:12,189 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-5 [000000f4|] XXX.XXX.XXX.1 GET /api/status took 0ms and returned 200 612 bytes
2021-04-05 08:18:22,729 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-13 [000000f5|] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:18:22,809 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-13 [000000e5|] XXX.XXX.XXX.1 GET /api/stream/bKImkO8AbWyYJZMsi4Ez took 60021ms and returned 200 2 bytes
2021-04-05 08:18:23,774 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-19 [000000f7|] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:18:31,572 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-13 [000000f8|] XXX.XXX.XXX.1 GET / took 0ms and returned 308 0 bytes
2021-04-05 08:18:39,620 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-18 [000000f9|] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:18:39,694 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-18 [000000fa|] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:18:39,761 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-21 [000000fb|] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:19:01,208 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-20 [000000e6|] XXX.XXX.XXX.1 POST /api/v1/query?name=alert-count took 97590ms and returned 200 5 bytes
2021-04-05 08:19:06,129 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-18 [000000f1|] XXX.XXX.XXX.1 GET /api/stream/PoAQC6xbrELP1SHPAhez took 60017ms and returned 200 2 bytes
2021-04-05 08:19:06,789 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-18 [000000fc|49a149cb] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:19:10,179 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-21 [000000fe|49a149cb] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:19:12,184 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-20 [000000ff|49a149cb] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:19:22,734 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-18 [00000100|] XXX.XXX.XXX.1 GET /api/status took 0ms and returned 200 612 bytes
2021-04-05 08:19:22,909 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-21 [000000f6|] XXX.XXX.XXX.1 GET /api/stream/bKImkO8AbWyYJZMsi4Ez took 60019ms and returned 200 2 bytes
2021-04-05 08:19:23,769 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-21 [00000102|49a149cb] XXX.XXX.XXX.1 GET /api/status took 0ms and returned 200 612 bytes
2021-04-05 08:19:28,377 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-21 [00000103|49a149cb] XXX.XXX.XXX.1 GET / took 0ms and returned 308 0 bytes
2021-04-05 08:19:28,384 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-20 [00000104|49a149cb] XXX.XXX.XXX.1 GET / took 1ms and returned 308 0 bytes
2021-04-05 08:19:28,746 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-21 [00000105|49a149cb] XXX.XXX.XXX.1 GET / took 1ms and returned 308 0 bytes
2021-04-05 08:19:28,748 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-21 [00000106|49a149cb] XXX.XXX.XXX.1 GET / took 0ms and returned 308 0 bytes
2021-04-05 08:19:39,679 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-5 [00000107|] XXX.XXX.XXX.1 GET /api/status took 0ms and returned 200 612 bytes
2021-04-05 08:19:39,761 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-13 [00000108|49a149cb] XXX.XXX.XXX.1 GET /api/status took 0ms and returned 200 612 bytes
2021-04-05 08:19:39,834 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-21 [00000109|49a149cb] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:20:06,771 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-13 [0000010a|] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:20:07,219 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-13 [000000fd|49a149cb] XXX.XXX.XXX.1 GET /api/stream/PoAQC6xbrELP1SHPAhez took 60016ms and returned 200 2 bytes
2021-04-05 08:20:10,299 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-13 [0000010c|] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:20:12,310 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-21 [0000010d|49a149cb] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:20:22,753 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-13 [0000010e|] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:20:23,009 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-5 [00000101|49a149cb] XXX.XXX.XXX.1 GET /api/stream/bKImkO8AbWyYJZMsi4Ez took 60014ms and returned 200 2 bytes
2021-04-05 08:20:23,780 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-5 [00000110|] XXX.XXX.XXX.1 GET /api/status took 1ms and returned 200 612 bytes
2021-04-05 08:20:30,647 [INFO] from org.thp.scalligraph.AccessLogFilter in application-akka.actor.default-dispatcher-22 [000000e8|49a149cb] XXX.XXX.XXX.1 POST /api/v1/query?name=alerts.count took 186867ms and returned 200 5 bytes

@mamoedo
Copy link

mamoedo commented Apr 5, 2021

By the way, I put the ES index status on green and the performance seems to be the same

@crackytsi
Copy link
Author

@crackytsi do you have the same behavior as @mamoedo with requests other that the list of tags?

Unfortunately yes.
e.g.
http://X.X.X.X:9080/api/v1/query?name=cases
with data:
{"query":[{"_name":"listCase"},{"_name":"sort","_fields":[{"flag":"desc"},{"startDate":"desc"}]},{"_name":"page","from":120,"to":135,"extraData":["observableStats","taskStats","procedureCount","isOwner","shareCount","permissions","actionRequired"]}]}

The Problem seems still to be related to the number of elements.
image

list-users take on my system with arround 30 users 800 ms.

But e.g. 4000 Alerts or 15.000 Cases should not be a range where TheHive stops working, especially if you compare it to TH3 :-(

@To-om To-om added this to the 4.1.3 milestone Apr 8, 2021
To-om added a commit that referenced this issue Apr 8, 2021
@nadouani nadouani closed this as completed Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants