Skip to content
This repository has been archived by the owner on Jan 21, 2022. It is now read-only.

Community Database Dump #218

Closed
anindyamaiti opened this issue Oct 14, 2019 · 64 comments
Closed

Community Database Dump #218

anindyamaiti opened this issue Oct 14, 2019 · 64 comments

Comments

@anindyamaiti
Copy link

I started fresh last month, and have ~1.7M torrents in my database after about 3 weeks. And, I plan to keep my magneticod running for the foreseeable future, expecting to add about ~50K per day after the initial spike converges.

There have been requests for database dump before, but no one has shared theirs in my knowledge. So, I thought of taking the initiative. Here is my website where I will share (via .torrent) my database dump 1-2 times a month: https://tnt.maiti.info/dhtd/

You can use it as-is or to get a head start with magneticod. And, don't forget to seed!

Huge thanks to @boramalper for making this project happen.

@kescherCode
Copy link

I will soon create a page under https://kescher.at/magneticod-db for sharing my own SQLite database backups.

@anindyamaiti
Copy link
Author

Thanks @kescherCode. I suggest you take the same approach of sharing via .torrent.

As far as GitHub is concerned, I am confident that sharing an external webpage that contains .torrent files of self-created databases is not in violation of any rules. Neither the webpage posted here nor the torrent/database contains any copyrighted material.

@Glandos
Copy link

Glandos commented Oct 14, 2019

That's good news.

Someone is brave enough to write a small script to merge 2 databases?

@kescherCode
Copy link

kescherCode commented Oct 14, 2019

That's good news.

Someone is brave enough to write a small script to merge 2 databases?

https://github.com/kd8bny/sqlMerge

Alternatively:
https://stackoverflow.com/a/37138506/11519866

@kescherCode
Copy link

I have put up a page at https://kescher.at/magneticod-db now :)

@Glandos
Copy link

Glandos commented Oct 14, 2019

That's good news.
Someone is brave enough to write a small script to merge 2 databases?

https://github.com/kd8bny/sqlMerge

Alternatively:
https://stackoverflow.com/a/37138506/11519866

There is a real issue here.
First, the sqlMerge repository doesn't work even by merging kd8bny/sqlMerge#3 and kd8bny/sqlMerge#4 since it relies on str(row) which doesn't work on BLOB.

Then, I explored manual merging. Obviously, torrents.id needs to be modified, but the issue is that there is a foreign key from files.torrent_id to torrents.id, so that the script should:

  • Read a record from merged_db
  • Insert it in original_db and record the insertion id
  • Modify merged_db.files.torrent_id according to the new id, and merge the merged_db.files rows.

This is not so difficult, but I don't think it can be done using a generic script.

@AlphaDelta
Copy link

It may be ideal to compress the database before sharing.

Compressing with LZMA2 (7zip) on the 'fastest' preset (64K dictionary, 32 word size) yields a file 23.3% of the size of the uncompressed database.

Compressing on the 'normal' preset (16M dictionary, 32 word size) yields a file 15-18.2% of the size of the uncompressed database.

I'd suggest using the following command on linux systems with xz-utils

xz -7vk --threads=12 database.sqlite3

7 is the compression level (out of 9)
v is for 'verbose', shows you the progress of the compression in your TTY
k is for 'keep', meaning it wont delete your database after the compression is done >_>
--threads=12 specifies to use 12 threads, you can use --threads=0 to tell xz to use as many threads as there are CPUs on your server.

This will produce a file named database.sqlite3.xz

It can then be decompressed after being downloaded using unxz -v database.sqlite3.xz (or -vk if you want to keep the archive).

I've taken the liberty of compressing both of the shared databases so far: https://public.tool.nz/DHT/

@kescherCode
Copy link

@AlphaDelta I agree, and I will compress my future torrents containing a database.sqlite3 with xz, however probably with the most aggressive preset you've ever seen:

xz --lzma=dict=1536Mi,nice=273

@AlphaDelta
Copy link

@kescherCode That is indeed the most aggressive preset I've ever seen 😂

Just keep in mind the entire dictionary is loaded into memory when decompressing, so it would require allocating 1536MiB of memory just to decompress the database.

Probably not worth riding the exponential-cost train that far.

@anindyamaiti
Copy link
Author

That's a very significant size reduction using xz!

I will share both compressed and uncompressed versions from next time. Those who have a low-memory VPS for hosting magneticow, may want to directly download the uncompressed database.

@skobkin
Copy link
Contributor

skobkin commented Oct 16, 2019

Someone is brave enough to write a small script to merge 2 databases?

BTW I was writing a simple and dirty tool to migrate old magneticod (Python) database data to the new version (including PostgreSQL and any another supported engine). It's not optimized yet and uses too much memory (loads all torrents from the database at once).

But if you're not scared of bad and not optimized code you can try to use it.

UPD: just pushed small README.md update and added ability to use not merged yet postgres and beanstalk engines as well as upstream's sqlite3 and stdout.

@Glandos
Copy link

Glandos commented Oct 26, 2019

Here it is : https://framagit.org/Glandos/magnetico_merge
It's hosted on another instance of GitLab, but you can login with your GitHub account if you want to contribute. If you want to fork it on GitHub, please let me know so I can follow your improvements :)

@skobkin
Copy link
Contributor

skobkin commented Oct 26, 2019

@Glandos It worth mentioning that it's working only with SQLite databases.

@Glandos
Copy link

Glandos commented Oct 26, 2019

Yes indeed, but it is the only database currently supported :) And the only database that is shared. Sharing postgresql database for merging is not complex, but different.

@Glandos
Copy link

Glandos commented Oct 27, 2019

Merging Magneticod bootstrap 2019-10-14/database.sqlite3 into database.sqlite3
Gathering database statistics: 4835749 torrents to merge.
  [######################################################]  4836000/4835749
Comitting… OK. 4835749 torrents processed. 2820832 torrents were not merged due to errors.

Here it is. Now, I have a big merged database with your both database. More than 7 millions torrents with more than 216 millions file entries.
I should share this database when I'll have time.

@anindyamaiti
Copy link
Author

7 millions torrents

Did you not remove duplicates? I would guess that there would be significant overlap between the databases.

@Glandos
Copy link

Glandos commented Oct 28, 2019

There are a lot of overlap as you saw in the merge report : 2820832 torrents were not merged due to errors. The message is not clear, but it usually means that some constraint was violated, and the merge insert was skipped.

@skobkin
Copy link
Contributor

skobkin commented Oct 31, 2019

Did you not remove duplicates?

What do you call a duplicate? Torrents with the same infohash couldn't be inserted again.

@Glandos
Copy link

Glandos commented Nov 2, 2019

Here is mine: https://antipoul.fr/dhtd/ This is very basic.
It includes databases from https://tnt.maiti.info/dhtd/ and https://kescher.at/magneticod-db at the time of writing.
It is huge (21GB after decompression), but it works on my Atom D2550, so it should work anywhere.

@Glandos
Copy link

Glandos commented Nov 9, 2019

@anindyamaiti Thanks for your regular updates. Your page is very nice. Do you think you can add an RSS feed? I know it's another thing to do :)

@boramalper boramalper pinned this issue Nov 9, 2019
@boramalper
Copy link
Owner

boramalper commented Nov 9, 2019

Pinned! I think once we implement import & export functionality, it'd be even easier (and portable across different databases). =)

Closing because it's not an issue but feel free to keep the discussion & sharing going.

@boramalper boramalper changed the title [Not Issue] Community Database Dump Community Database Dump Nov 9, 2019
@anindyamaiti
Copy link
Author

anindyamaiti commented Nov 9, 2019

Do you think you can add an RSS feed?

@Glandos I was thinking of the same. Here is a basic automated RSS feed of the ten most recently added files: https://tnt.maiti.info/dhtd/rss.php

Nothing fancy, just the filenames, but it should be good enough for a notification.

If anyone else is interested in incorporating RSS for their shares, here is my (dirty) PHP code: https://tnt.maiti.info/dhtd/rss.php.txt

Pinned!

@boramalper thanks for the pin! 😊

@dessalines
Copy link

@boramalper suggested I point to torrents.csv, an open repository of torrents / global search engine. Here's the issue for potentially adding people's data to this.

@Glandos
Copy link

Glandos commented Feb 3, 2020

I have a new dump of my own: https://antipoul.fr/dhtd/20200203_9.2M_magnetico-merge.torrent

Since my server is really low on CPU, I didn't use XZ and switch to zstandard. The output is larger, but much faster to compress / decompress.

No tracker inside, so I will be the first swarm.

@kescherCode
Copy link

kescherCode commented Feb 3, 2020

I, too, have released a new dump on https://kescher.at/magneticod-db.
It has around 6.4 million torrents in it.
It uses zstandard compression from now on as well.

Obviously not relying on trackers either, just the DHT.

In case your client allows manual adding of peers and your client doesn't seem to find a connection, feel free to add 185.21.216.171:55451 as peer.

Also, I may seed other dumps here in order to increase availability for people that want to bootstrap their db, hence why I call my torrents "Magneticod bootstrap".

@kescherCode
Copy link

I have released a new dump, having roughly 10.8 million torrents.

You can get it here.

If you can't find any peers through DHT, add 185.21.216.171:55451 as peer if your client allows it.

@19h
Copy link

19h commented Jul 5, 2020

@kescherCode could you update your dump? I'd offer to host it as a direct download on one of my servers.

I considered sharing my version but figured your public magnetico instance has over 11.3 million torrents now which makes my 8 million look rather pale in comparison.

@kescherCode
Copy link

kescherCode commented Jul 7, 2020

@19h Small hint for the future: If you want to share your sqlite3 file, before sure to manually open it with sqlite3 and execute PRAGMA wal_checkpoint(TRUNCATE);. That way, -shm and -wal files are written to the database, and deleted afterwards.

@Glandos
Copy link

Glandos commented Jul 7, 2020

@19h and @kescherCode I recreated the torrent with announce URIs in it. My client announced it, so now, you should be able to find me. But you need to reimport the torrent (from https://antipoul.fr/dhtd/20200706_13.6M_magnetico-merge.torrent) as I've updated it.

@skobkin I can't download from MEGA because the file is too big, and it requires me to install an extra software. I won't do this, sorry :)

@skobkin
Copy link
Contributor

skobkin commented Jul 7, 2020

@Glandos I'm removing it then 🤷

@kescherCode
Copy link

kescherCode commented Jul 8, 2020

My latest dump, containing 13.8 million torrents.
Magnet link available separately here

I will make sure this file is well-seeded by a fast connection as well as my home connection.

@kescherCode
Copy link

@19h see updated dump above.

@19h
Copy link

19h commented Jul 8, 2020

@Glandos @kescherCode that's amazing, thanks both of you!

@19h
Copy link

19h commented Jul 8, 2020

@Glandos @kescherCode jfyi I fetched both dumps and my seedbox is seeding them.

@19h
Copy link

19h commented Jul 8, 2020

I'm seeding your torrents.

Also ... I'm currently writing a merging tool in Rust so that it's a bit faster, but my ideal future of this would be migrating off sqlite to leveldb (or the fb fork rocksdb). I'm also playing with the idea of building a frontend searching the database using tantivy, but that's a bit of a stretch goal ..

It would be cool if we could have a semi-dht where we can interconnect our instances so that they act as isolated sattelites for each other..

@sunnymme
Copy link

sunnymme commented Jul 9, 2020

how to install this project on vps?

@19h
Copy link

19h commented Jul 11, 2020

@sunnymme this isn't the right place for this question. Check the readme, check other issues or create one ..

@sunnymme
Copy link

sunnymme commented Jul 11, 2020 via email

@DyonR
Copy link

DyonR commented Jul 18, 2020

Here is the dump of my database. 2.64M torrents. My database is not merged with any other database.
Compressed zst file is 2.5GB. The sqlite3 file is 8.3GB.
You can find my database at https://dyonr.nl/magnetico/
Preferable, use the .torrent to download it, instead of downloading the zst file.
The torrent is loaded on my seedbox (1Gbit/s), the .zst on my server which is limited to 200Mbit/s.

@sunnymme
Copy link

sunnymme commented Jul 18, 2020 via email

@kescherCode
Copy link

@DyonR I'm seeding your database now, not merging it in yet until the next time I'll dump

urbanguacamole added a commit to urbanguacamole/torrent-paradise that referenced this issue Jul 20, 2020
@Glandos
Copy link

Glandos commented Nov 12, 2020

Here is my fresher dump: https://antipoul.fr/dhtd/20201112_14.1_magnetico-merge.torrent

Unfortunately, it seems to be a bit stalling. Sometimes, I have 0.1 torrent per second, but it is usually 10 times less…

@ngosang
Copy link
Contributor

ngosang commented Nov 15, 2020

Since some of you have millions of torrents maybe you are interested in add support for other databases that scale better than SQLite. Some users are having request timeouts in magneticow due to poor performance in SQLite.

I don't have time to work on this issue, but maybe some of you do. Jackett/Jackett#10174 (comment)

UPDATE: Of course, having a faster backend will increase the discovery/indexing speed too. There is an attempt to include Postgres but I think it's abandoned #214

@skobkin
Copy link
Contributor

skobkin commented Nov 15, 2020

@ngosang It's not abandoned, it's working for me more than a year for now 😄

I've just forgot about it because Bora didn't answer to my question. I think I can make the last change he asked soon, but I'm not sure he'll merge it because he's not supporting magnetico for a long time.

UPD: You can test it using this Docker image: https://hub.docker.com/r/skobkin/magneticod

@Glandos
Copy link

Glandos commented Nov 15, 2020

@ngosang commented on 15 nov. 2020 à 13:01 UTC+1:

UPDATE: Of course, having a faster backend will increase the discovery/indexing speed too.

Since magneticod is not using 100% of a CPU, I don't think this is the current bottleneck.

@skobkin
Copy link
Contributor

skobkin commented Nov 15, 2020

@Glandos SQLite is really a bottleneck sometimes. It'll not use 100% of CPU because it's most likely using 100% of the disk.
Probably you can tweak SQLite when initializing the client to use very big caches and so on, but I'm not sure that it'll outperform MySQL or PostgreSQL though.

I don't have time to check it, so I can be wrong. If someone can check the disk usage (IOPS, throughput, latency) when searching torrents in VERY LARGE database, let us know.

@ngosang
Copy link
Contributor

ngosang commented Nov 15, 2020

From my experience as software architect if you have a 10GB database, the SQLite read performance is between 100 and 1000 times slower than other relational databases like MySQL, Postgres, Oracle.
If the entire database does not fit in memory then all databases have to read from disk at some point. The difference is that SQLite does not have several levels of cache in memory with the table indexes, most common queries, etc. With each query you have to read much more data from disk than other databases. I saw 32GB exports in this post. You should notice an amazing improvement in both indexing and search.

@skobkin
Copy link
Contributor

skobkin commented Nov 23, 2020

BTW, I've just updated the PR with PostgreSQL eliminating the last "problem" which was pointed a year ago.

@skobkin
Copy link
Contributor

skobkin commented Nov 27, 2020

It was merged!

@kescherCode
Copy link

@skobkin Now, how do I migrate my data from SQLite to Postgres? lol

@skobkin
Copy link
Contributor

skobkin commented Nov 28, 2020

@kescherCode See this comment.

Be aware that magneticow does not work with PostgreSQL as of now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests