Skip to content
This repository has been archived by the owner on Jan 21, 2022. It is now read-only.

Multiple Crawlers #106

Closed
schemen opened this issue Jun 20, 2017 · 12 comments
Closed

Multiple Crawlers #106

schemen opened this issue Jun 20, 2017 · 12 comments

Comments

@schemen
Copy link

schemen commented Jun 20, 2017

Hi!

Quick question. Is it possible to run multiple crawlers with a shared network file system extending the same database?

Scenario: I got it running via Docker on a Synology system which is slow, I got another server, could they extend each other?

Cheers!

@ad-m
Copy link
Contributor

ad-m commented Jun 20, 2017

SQLite used in magnetico* is not optimized for network applications. It would be valuable to add support to database in client-server architecture. See https://sqlite.org/whentouse.html .

@Glandos
Copy link

Glandos commented Jun 20, 2017

Maybe magneticod should stay as-is, using one database file per-instance.
However, magneticow can be modified to query multiple databases. This would lead to some duplicates, but code for magneticod persistence would stay simple.

@ad-m
Copy link
Contributor

ad-m commented Jun 20, 2017

I think that abstraction and two storage implementations would not complicate the code too much

@Glandos
Copy link

Glandos commented Jun 20, 2017

Yes, of course, abstraction layer for accessing SQL database is quite easy nowadays.
But I was also thinking that it could be a good feature for magneticow to have multiple sources.

@ad-m
Copy link
Contributor

ad-m commented Jun 20, 2017

Why do you want to have multiple data sources? In magneticow or in mangeticod? What are the obstacles to the centralization of the database to keep simplicity?

@Glandos
Copy link

Glandos commented Jun 20, 2017

I'm talking about multiple data sources in magneticow. The use case is when you can have multiple magneticod running on different hosts far away from each other. magneticod needs a fast connection to its database to be able to find duplicates quickly (as from today), whereas magneticow can takes more time to answer a user request.

@ad-m
Copy link
Contributor

ad-m commented Jun 20, 2017

@Glandos , replication is a solution.

@Glandos
Copy link

Glandos commented Jun 20, 2017

Fortunately, there often is multiple solutions for a single problem in software engineering ;)

@ad-m
Copy link
Contributor

ad-m commented Jun 20, 2017

Yes, but I think it is worth choosing solutions that will make magnetico* a simple solution that is usable without the title of IT professor. Delegation of issues to external optional components helps this.

I wonder if relational databases are optimal for us anyway.

@blimeybloke
Copy link

mysql support would be awesome :)

@skobkin
Copy link
Contributor

skobkin commented Mar 3, 2018

Maybe magneticod should stay as-is, using one database file per-instance.
However, magneticow can be modified to query multiple databases

So you'll get data duplication across all databases. What are your initial goals? Speeding up the crawling process or data replication?

@boramalper
Copy link
Owner

Go version supports multiple trawlers/crawlers so this is no longer an issue. Also database access is abstracted in pkg/persistence module so -in future- we can have different database engines (such as MySQL, Postgres, etc.) which have better concurrency support. =)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants