Multiple Crawlers #106

schemen · 2017-06-20T14:15:20Z

Hi!

Quick question. Is it possible to run multiple crawlers with a shared network file system extending the same database?

Scenario: I got it running via Docker on a Synology system which is slow, I got another server, could they extend each other?

Cheers!

ad-m · 2017-06-20T14:35:48Z

SQLite used in magnetico* is not optimized for network applications. It would be valuable to add support to database in client-server architecture. See https://sqlite.org/whentouse.html .

Glandos · 2017-06-20T15:06:15Z

Maybe magneticod should stay as-is, using one database file per-instance.
However, magneticow can be modified to query multiple databases. This would lead to some duplicates, but code for magneticod persistence would stay simple.

ad-m · 2017-06-20T15:15:03Z

I think that abstraction and two storage implementations would not complicate the code too much

Glandos · 2017-06-20T15:21:04Z

Yes, of course, abstraction layer for accessing SQL database is quite easy nowadays.
But I was also thinking that it could be a good feature for magneticow to have multiple sources.

ad-m · 2017-06-20T15:25:15Z

Why do you want to have multiple data sources? In magneticow or in mangeticod? What are the obstacles to the centralization of the database to keep simplicity?

Glandos · 2017-06-20T15:30:00Z

I'm talking about multiple data sources in magneticow. The use case is when you can have multiple magneticod running on different hosts far away from each other. magneticod needs a fast connection to its database to be able to find duplicates quickly (as from today), whereas magneticow can takes more time to answer a user request.

ad-m · 2017-06-20T15:34:13Z

@Glandos , replication is a solution.

Glandos · 2017-06-20T15:43:09Z

Fortunately, there often is multiple solutions for a single problem in software engineering ;)

ad-m · 2017-06-20T15:47:51Z

Yes, but I think it is worth choosing solutions that will make magnetico* a simple solution that is usable without the title of IT professor. Delegation of issues to external optional components helps this.

I wonder if relational databases are optimal for us anyway.

blimeybloke · 2017-06-21T02:34:43Z

mysql support would be awesome :)

skobkin · 2018-03-03T21:29:59Z

Maybe magneticod should stay as-is, using one database file per-instance.
However, magneticow can be modified to query multiple databases

So you'll get data duplication across all databases. What are your initial goals? Speeding up the crawling process or data replication?

boramalper · 2018-12-30T08:06:19Z

Go version supports multiple trawlers/crawlers so this is no longer an issue. Also database access is abstracted in pkg/persistence module so -in future- we can have different database engines (such as MySQL, Postgres, etc.) which have better concurrency support. =)

Glandos mentioned this issue Jun 21, 2017

Database abstraction #108

Closed

boramalper closed this as completed Dec 30, 2018

milahu mentioned this issue Aug 8, 2023

alternatives to dhtcrawler2 btdig/dhtcrawler2#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple Crawlers #106

Multiple Crawlers #106

schemen commented Jun 20, 2017

ad-m commented Jun 20, 2017

Glandos commented Jun 20, 2017

ad-m commented Jun 20, 2017

Glandos commented Jun 20, 2017

ad-m commented Jun 20, 2017

Glandos commented Jun 20, 2017

ad-m commented Jun 20, 2017

Glandos commented Jun 20, 2017

ad-m commented Jun 20, 2017

blimeybloke commented Jun 21, 2017

skobkin commented Mar 3, 2018 •

edited

Loading

boramalper commented Dec 30, 2018

Multiple Crawlers #106

Multiple Crawlers #106

Comments

schemen commented Jun 20, 2017

ad-m commented Jun 20, 2017

Glandos commented Jun 20, 2017

ad-m commented Jun 20, 2017

Glandos commented Jun 20, 2017

ad-m commented Jun 20, 2017

Glandos commented Jun 20, 2017

ad-m commented Jun 20, 2017

Glandos commented Jun 20, 2017

ad-m commented Jun 20, 2017

blimeybloke commented Jun 21, 2017

skobkin commented Mar 3, 2018 • edited Loading

boramalper commented Dec 30, 2018

skobkin commented Mar 3, 2018 •

edited

Loading