Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge long term disk usage (performance improvement) #152

Closed
slrslr opened this issue Aug 1, 2021 · 48 comments
Closed

Huge long term disk usage (performance improvement) #152

slrslr opened this issue Aug 1, 2021 · 48 comments
Labels
feature New feature or request

Comments

@slrslr
Copy link

slrslr commented Aug 1, 2021

Hello,
on Linux, there is more than 100MB/s disk usage returned by iotop utility for the process:
searchd --config /home/me/rats-search/sphinx.conf --nodetach

searchd = rats-search/imports/linux/x64/searchd
when i have killed the rats (there is no restart/shutdown button) and started again, i see after some minutes issue is back.
Any idea on commands that can shed some light on this please?
here some more details (password: r)
i am at 1.7 million torrents and the database folder is 7,5G already, program shows something under 20 million torrents possible

@DEgITx
Copy link
Owner

DEgITx commented Aug 1, 2021

please post this issue to https://github.com/manticoresoftware/manticoresearch
this not related to the rats itself

@slrslr
Copy link
Author

slrslr commented Aug 3, 2021

@DEgITx Hello, I have submitted an issue at manticore as seen/linked above, he suggested 1) to upgrade manticore as you are using very old in Rats search 2) to mlock attributes / attributes + doclists + hitlists but i do not know how to. Do you know what exactly to do or you even provide a modified file so i try it? Or beyond out of your scope?

@slrslr
Copy link
Author

slrslr commented Aug 8, 2021

No reply from @DEgITx , can you try to build with latest manticoresearch please, because its developer is not willing to debug this serious issue on 3.5 old manticoresearch that you seems to be using in rats-search ?
Or if you can give me instructions.

If you fail to reply or not willing to update the SW, i may have to stop using rats-search since it is causing disk overload for me as described.

@DEgITx
Copy link
Owner

DEgITx commented Aug 8, 2021

@slrslr are you shure you are using last rats build? There are 3.6.0 manticore version for 64-bit hosts.

@DEgITx
Copy link
Owner

DEgITx commented Aug 8, 2021

you will see version in rats.log, must be 3.6.0

@DEgITx DEgITx added the waiting for reply Awaiting for feedback from user/reporter label Aug 8, 2021
@DEgITx
Copy link
Owner

DEgITx commented Aug 8, 2021

image
ensure your rats using this version

@DEgITx
Copy link
Owner

DEgITx commented Aug 8, 2021

another option to descrese disk usage and using sphinx overall is this option section

https://github.com/DEgITx/rats-search/blob/master/docs/USAGE.md#torrent-scanner-settings

it will descrese sphinx usage, but also descrese speed of collection torrents. Performance of disk usage by searched process in related to manticore issues.

@DEgITx
Copy link
Owner

DEgITx commented Aug 8, 2021

@slrslr
About manticore debuging accordings docs, you can change manticore settings in this file

let generateConfig = () => (`

you can add sphinx/manticore options accroding their docs

and run
npm start
or
npm run server
(depend on which desktop or server) version of rats you are using

@slrslr
Copy link
Author

slrslr commented Aug 8, 2021

I do not know where is that log and its file name, i was unable to find it in git cloned rats-search/ dir. and no "daemon version" string found in its files. But i have found "Manticore 2.6.1 a7fa71e@180126 release" in searchd.log

rats-search/src/background/sphinx.js ; let generateConfig

i do not know what is this or how i should be setting this, i am not a developer. The developer in the issue i linked earlier says:
"you can also try to mlock attributes / attributes + doclists + hitlists according to this docs https://manual.manticoresearch.com/Creating_an_index/Local_indexes/Plain_and_real-time_index_settings#Accessing-index-files and check again."
but that is beyond my understanding

npm run server

outputs:
[system] Rats v1.6.0
some deprecation warning
Manticore 2.6.1 a7fa71e@180126 release

i have tried to update from https://github.com/DEgITx/rats-search.git, it said: Updating 69df9ec..750dbfd
but still same Manticore version release.

After some time i have found https://github.com/DEgITx/rats-search/tree/manticore3 but do not know what is its .git or if/how it is compatible with my current manticore.

@DEgITx
Copy link
Owner

DEgITx commented Aug 8, 2021

for server version you must use last master version of rats
750dbfd

rats.log located in the same directory where you started npm run server
you can see log in the console and this log file (it's the same) - so you can check it in console or rats.log file

ensure
[16:30:39] NodeJS: v16.2.0
[16:30:39] Web server
[16:30:39] Sphinx Path: e:\Projects\rats\imports\win\x64\searchd.exe
[16:30:39] listening udp tracker respose on 0.0.0.0:4446

that you are see this message, in case of linux your directory must be imports\linux\x64\searchd

ensure that you are seeing
[16:30:39] sphinx: starting daemon version '3.6.0 96d61d8bf@210504 release' ...

remove searchd.log before start

if not post you info that you are see

@slrslr
Copy link
Author

slrslr commented Aug 8, 2021

for server version you must use last master version

i am sorry, unfortunately this information does not make me understand how to proceed, please kindly let me know the steps. My commands was:
cd path/to/rats-search/;git remote add origin https://github.com/DEgITx/rats-search.git 2>/dev/null;git submodule update --init --recursive;git pull origin master 2>/dev/null||git pull origin main 2>/dev/null
result:
Updating 69df9ec..750dbfd
then:
npm install;npm run buildweb
yet, seeing old Manticore 2.6.1 a7fa71e@180126 release

rats.log

i have been searching for this file in my rats-search directory clone (from git)
find . -name "rats.log"; locate rats.log; etc. but not found. On my linux is query.log, searchd.log

@DEgITx
Copy link
Owner

DEgITx commented Aug 8, 2021

in console log the messages the same as in rats.log

so you can find both of the messages

image

I'm intrested in this two messages in console

@slrslr
Copy link
Author

slrslr commented Aug 8, 2021

@DEgITx

I'm intrested in this two messages in console

Sphinx path: */rats-search/imports/linux/x64/searchd
sphinx: ... using config file */rats-search/sphinx.conf'...
listening on 127.0.0.1:9312
listening on 127.0.0.1:9306
Manticore 2.6.1 a7fa71e@180126 release

@DEgITx
Copy link
Owner

DEgITx commented Aug 8, 2021

$ md5sum imports/linux/x64/searchd
e6e5754d68c2b70e7f668f029ed81905 imports/linux/x64/searchd

verify thats correct line, if not check you git repository, something not updated in your repo and it's old

@slrslr
Copy link
Author

slrslr commented Aug 8, 2021

mine md5sum is different

if not check you git repository, something not updated

i do not understand git, do you have idea what is wrong in that mentioned command:
cd path/to/rats-search/;git remote add origin https://github.com/DEgITx/rats-search.git 2>/dev/null;git submodule update --init --recursive;git pull origin master 2>/dev/null||git pull origin main 2>/dev/null
?
using that command my aim was to create command that will keep local repository up to date with remote..

I know that one can "reset" local repository: cd rats-search;git reset --hard
but i am worried not to cause delete of the database/ contents

@DEgITx
Copy link
Owner

DEgITx commented Aug 9, 2021

git reset --hard doesn't touch files not under git

@slrslr
Copy link
Author

slrslr commented Aug 9, 2021

thanks, yes, it updated the files to:
Rats v1.7.1
sphinx: Manticore 3.6.0 96d61d8bf@210504 release
though program (npm run server) not continuing: "indexes with meta prior to v.14 are no longer supported (use index_converter tool)" , detail
how to proceed please? I think that your software should return some user friendly message so user knows what to run exactly and how.

@DEgITx
Copy link
Owner

DEgITx commented Aug 9, 2021

It must start migration process, what messages do you see below

@slrslr
Copy link
Author

slrslr commented Aug 9, 2021

i think that it output some issue at first run but that run was unfortunately on background and i do not know more detail (no .log file in rats-search/ contain any more detail. But on every run now it shows only the things i posted under link in my previous comment. Nothing more it stops like that. I do not see any significant CPU, disk activity of the server.js processes and "npm run server" one.

When i interrupt by Ctrl+C shortcut, it says:

[system] Exception: ReferenceError: spider is not defined
    at process.<anonymous> (/home/me/apps/rats-search/src/background/server.js:80:2)

more complete output

so it mentions rats-search/imports/linux/x64/index_converter
idea what to do please?

@DEgITx
Copy link
Owner

DEgITx commented Aug 9, 2021

update from master and check again

@slrslr
Copy link
Author

slrslr commented Aug 9, 2021

update from master and check again

just did and this is the output of "npm run"
searchd.log contains some backtrace attempt
i let it run like 10 minutes, near zero cpu,disk activity of searchd.v2 and server.js and "npm run server", web UI page blank

@DEgITx
Copy link
Owner

DEgITx commented Aug 9, 2021

try to restart it, if not helping, can you pack you database and send to me?

the log end is pretty stange

@slrslr
Copy link
Author

slrslr commented Aug 9, 2021

i do not think i can restart it (interrupting on command line does not seem to end it either), so far i am killing it, after killing and starting i think the result is same as described earlier, please kindly download the database using this magnet and extract the file into database/ folder.

@DEgITx
Copy link
Owner

DEgITx commented Aug 9, 2021

not downloading by this magnet

@slrslr
Copy link
Author

slrslr commented Aug 9, 2021

Did you embed also trackers? it is private torrent with several trackers Working on my side. Seeded non-stop, active peer so i am not sure why not worked. I am uploading to a server now.. can take longer time.

@slrslr
Copy link
Author

slrslr commented Aug 9, 2021

@DEgITx upload finished, please download database here and try to reproduce the issue #152 (comment)

@DEgITx
Copy link
Owner

DEgITx commented Aug 30, 2021

but if I'm not it may be beneficial for you app's users if you store the above strings in "stored only fields"

@sanikolaev, thanks for advice, I will enable them as stored only

@slrslr
Copy link
Author

slrslr commented Aug 30, 2021

@DEgITx
line
const options = ['--config', config, '--logdebug', 'cli']
and command "cd ~/apps/rats-search;npm --trace-deprecation run server" result in:

(node:981456) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
[udp-tracker] listening udp tracker respose on 0.0.0.0:porthere
[sphinx] writed sphinx config to /home/user/apps/rats-search
[sphinx] db path: /home/user/apps/rats-search
[sphinx] sphinx: [Mon Aug 30 11:14:35.160 2021] [981485] FATAL: malformed or unknown option near 'cli'; use '-h' or '--help' to see available options.

[sphinx] sphinx: Manticore 3.6.0 96d61d8bf@210504 release
...
[sphinx] sphinx closed with code 1 and signal null

consider reopening this issue?

@DEgITx
Copy link
Owner

DEgITx commented Aug 30, 2021

@DEgITx
line
const options = ['--config', config, '--logdebug', 'cli']
and command "cd ~/apps/rats-search;npm --trace-deprecation run server" result in:

(node:981456) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
[udp-tracker] listening udp tracker respose on 0.0.0.0:porthere
[sphinx] writed sphinx config to /home/user/apps/rats-search
[sphinx] db path: /home/user/apps/rats-search
[sphinx] sphinx: [Mon Aug 30 11:14:35.160 2021] [981485] FATAL: malformed or unknown option near 'cli'; use '-h' or '--help' to see available options.

[sphinx] sphinx: Manticore 3.6.0 96d61d8bf@210504 release
...
[sphinx] sphinx closed with code 1 and signal null

consider reopening this issue?

sorry, I edited anwer above, you probably need only '--logdebug',

@slrslr
Copy link
Author

slrslr commented Aug 30, 2021

Thanks that worked - I have been running rats with sphinx --logdebug for a few hours and issue reappeared (hundreds of MB/s read sys. drive).
rats.log
query.log
searchd.log
sphinx.conf (still same)

hopefully it will help you to debug this issue (maybe should be re-opened)

thanks for advice, I will enable them as stored only

can you let me know so i will try to apply your modification and test on my end?

@DEgITx
Copy link
Owner

DEgITx commented Aug 30, 2021

Hi guys. Manticore team member here.

Regarding "maybe now they can resolve something or give advice": as it turned out the main problem may be that string attributes are used instead of stored fields. E.g. here https://github.com/DEgITx/rats-search/blob/master/src/background/sphinx.js#L44 and below that line all these are string attributes:

  index torrents
    rt_attr_string = hash
    rt_attr_string = name
    rt_attr_string = ipv4
    rt_attr_string = contentType
    rt_attr_string = contentCategory

  index files
      rt_attr_string = path
    rt_attr_string = hash
    rt_attr_string = size

and:

  • they have to be in memory for good performance
  • they take much more memory than numbers

Here and here I don't find any signs that you sort or group by these attributes. I may be wrong, but if I'm not it may be beneficial for you app's users if you store the above strings in "stored only fields" - http://mnt.cr/stored_only_fields

@sanikolaev , I noticed problem to execute such query with stored only fields

            SELECT 
                MAX(id) as maxid
            FROM files

with any of them will fail the request (as example from files)

@sanikolaev
Copy link

will fail the request (as example from files)

Please provide more details about the failure. I can't reproduce it:

mysql> drop table if exists t; create table t(f text, path text stored); desc t; insert into t values(0,'abc','path'),(0,'def','another path'); select * from t; select max(id) as maxid from t;
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.05 sec)

--------------
create table t(f text, path text stored)
--------------

Query OK, 0 rows affected (0.03 sec)

--------------
desc t
--------------

+-------+--------+----------------+
| Field | Type   | Properties     |
+-------+--------+----------------+
| id    | bigint |                |
| f     | text   | indexed stored |
| path  | text   | stored         |
+-------+--------+----------------+
3 rows in set (0.00 sec)

--------------
insert into t values(0,'abc','path'),(0,'def','another path')
--------------

Query OK, 2 rows affected (0.00 sec)

--------------
select * from t
--------------

+---------------------+------+--------------+
| id                  | f    | path         |
+---------------------+------+--------------+
| 1514445464932450316 | abc  | path         |
| 1514445464932450317 | def  | another path |
+---------------------+------+--------------+
2 rows in set (0.00 sec)

--------------
select max(id) as maxid from t
--------------

+---------------------+
| maxid               |
+---------------------+
| 1514445464932450317 |
+---------------------+
1 row in set (0.00 sec)

@DEgITx
Copy link
Owner

DEgITx commented Aug 31, 2021

@sanikolaev

i just crested db from zero with config:

  index files
  {
      type = rt
      path = ${dbPath}/database/files
      
      rt_attr_string = path
      rt_field = pathIndex
      rt_attr_string = hash
      rt_attr_string = size

      stored_only_fields = size
  }
sphinx: WARNING: ERROR: index 'files': existing attribute specified in stored_fields: 'size'
 - NOT SERVING
WARNING: index 'files': existing attribute specified in stored_fields: 'size'
 - NOT SERVING
SELECT 
                MAX(id) as maxid
            FROM files

gives

unknown local index(es) 'files' in search request

@sanikolaev
Copy link

I've corrected it:

  index files
  {
      type = rt
      path = ${dbPath}/database/files
      
      rt_attr_string = path
      rt_field = pathIndex
      rt_attr_string = hash
      rt_field = size

      stored_only_fields = size
  }

@DEgITx
Copy link
Owner

DEgITx commented Sep 1, 2021

@sanikolaev some questions about stored_only_fields.

First of all is switching rt_attr_string to to stored fields will affect the performance and memory usage of table in memory?
So I can switch some fields to them, but still not sure is it will any good effect after it: on performance and memory usage with big tables.
Some fileds can not be converted, because for example

  index files
  {
      type = rt
      path = ${dbPath}/database/files
      
      rt_field = path
      rt_field = pathIndex
      rt_attr_string = hash
      rt_field = size

      stored_only_fields = path, size
  }

hash attribute can not be conveted, that cause error on request:

sqlMessage: "index files: unsupported column 'hash' (stored field, NOT attribute)",
sqlState: '42000',
index: 0,
sql: "SELECT * FROM `files` WHERE `hash` = '5472a94ccff4f84e991224b206e75d3977dadc9f'"

so before using of them I'm intresting is it will be any purpose of switching, because it will cause some limitation of requests that I can use in future.

And for example do you think in @slrslr situation it will help? His table has a million of records, so maybe effect can not be such significant.

I can do commit in separate branch to give him test, is it good or not. And I also recommend update doc in manticore about stored_only_fields. There was no info about stored_only_fields and attributes difference and limitation, so hard to compare pros cons of their usage.

@DEgITx
Copy link
Owner

DEgITx commented Sep 1, 2021

I reopen this issue because of potencial change in rats to increse performance

@DEgITx DEgITx reopened this Sep 1, 2021
@DEgITx DEgITx added feature New feature or request and removed waiting for reply Awaiting for feedback from user/reporter labels Sep 1, 2021
@DEgITx DEgITx changed the title Huge long term disk usage (100s of MB/s) - searchd sphinx.conf Huge long term disk usage (incese performance) Sep 1, 2021
@DEgITx DEgITx changed the title Huge long term disk usage (incese performance) Huge long term disk usage (performance improvement) Sep 1, 2021
@sanikolaev
Copy link

rt_attr_string to to stored fields will affect the performance and memory usage of table in memory?

Yes, that's correct. rt_attr_string requires RAM, stored fields - not.

Some fileds can not be converted

It's all simple:

  • stored fields cannot be used in WHERE, ORDER BY, GROUP BY, you can only fetch them in SELECT. But they don't require RAM.
  • string attributes can be used in all kinds of queries, but require RAM.

I don't know what queries you have, so can't say what can be converted to stored fields. Obviously if you have to do WHERE hash = '5472a94ccff4f84e991224b206e75d3977dadc9f' it can't be a stored field.
Offtopic: I would suggest to try to use a shorter hash and pack it into bigint, so it takes significantly less RAM, but it's another story.

And for example do you think in @slrslr situation it will help? His table has a million of records, so maybe effect can not be such significant.

Yes, it will, please read manticoresoftware/manticoresearch#602 (comment). The string attributes take almost 7GB of RAM which on 4GB server makes Manticore read them from disk on probably each query.

@DEgITx
Copy link
Owner

DEgITx commented Sep 1, 2021

@sanikolaev and also intresting question, what converting requests between those twos after config change, is need to convertion aready created db, how to do it in both directions -> to attr strings from fields and to fields from attr strings?

@sanikolaev
Copy link

You can't switch from string attr to stored field or vice versa online, you need to re-create the index and repopulate it with data.

DEgITx added a commit that referenced this issue Sep 1, 2021
DEgITx added a commit that referenced this issue Sep 11, 2021
github-actions bot pushed a commit that referenced this issue Sep 12, 2021
# [1.8.0](v1.7.1...v1.8.0) (2021-09-12)

### Bug Fixes

* **db:** converting db to version 8 ([c19a95d](c19a95d))
* **db:** moving content type to uint values ([f4b7a8d](f4b7a8d))
* **docker:** moved to 16 version ([1089fa3](1089fa3))
* **linux:** add execute right to searchd.v2 [#152](#152) ([0bc35c5](0bc35c5))
* **linux:** fix convertation of db under linux system [#152](#152) ([ea01858](ea01858))

### Features

* **log:** using tagslog ([750dbfd](750dbfd))
* **server:** missing rats.log functionality restored [#145](#145) ([d5243ff](d5243ff))

### Performance Improvements

* **db:** optimize some tables to stored_only_fields to recrudesce memory usage of big databases [#152](#152) ([762b0d1](762b0d1))
@DEgITx DEgITx closed this as completed Sep 12, 2021
@slrslr
Copy link
Author

slrslr commented Sep 12, 2021

I have git clonned and "npm start server" 3 hours ago, and the issue still exist, i am unsure why this issue is closed.
Btw. the excessive system disk read activity happen also when i pause the think on Activity tab (in rats UI). @DEgITx

@DEgITx
Copy link
Owner

DEgITx commented Sep 12, 2021

@slrslr , this issue is closed because it was reopened only for resolving/optimize some suggestions from manticore teams memebers, itself it's not garantee that you problem is resolved on such big database, and I said before the problem related to manticore engine and not for rats search (so thats why this issue closed).

You need to ask @sanikolaev and and other manticore teams is any more optimization in you case possible to make such big database work with satisfactory speed and performance, because descresing of performance related to manticore. If they will tell that there is no possible optimization in configuration in database and it structure your only choise will be delete some data from tables to make you work comfatable.
It's possible to do with filter tab in rats search.

DEgITx added a commit that referenced this issue Jun 1, 2023
DEgITx pushed a commit that referenced this issue Jun 1, 2023
# [1.8.0](v1.7.1...v1.8.0) (2021-09-12)

### Bug Fixes

* **db:** converting db to version 8 ([c19a95d](c19a95d))
* **db:** moving content type to uint values ([f4b7a8d](f4b7a8d))
* **docker:** moved to 16 version ([1089fa3](1089fa3))
* **linux:** add execute right to searchd.v2 [#152](#152) ([0bc35c5](0bc35c5))
* **linux:** fix convertation of db under linux system [#152](#152) ([ea01858](ea01858))

### Features

* **log:** using tagslog ([750dbfd](750dbfd))
* **server:** missing rats.log functionality restored [#145](#145) ([d5243ff](d5243ff))

### Performance Improvements

* **db:** optimize some tables to stored_only_fields to recrudesce memory usage of big databases [#152](#152) ([762b0d1](762b0d1))
DEgITx added a commit that referenced this issue Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants