This repository has been archived by the owner. It is now read-only.

Replication may be pushing too many feeds into the connection #39

Open
pfrazee opened this Issue Jan 4, 2017 · 7 comments

Comments

Projects
None yet
3 participants
@pfrazee
Member

pfrazee commented Jan 4, 2017

If you look in hypercore-archiver, the replication code adds all stored feeds to the connection. (Its current usage, in archiver-server, does not set passive to false.)

I'm guessing this means that hypercloud will, at minimum, announce all currently stored archives at the time of connect. That can't scale. Shouldn't the hypercloud sit and wait for requests, passively?

@pfrazee pfrazee added the dicussion label Jan 4, 2017

@pfrazee

This comment has been minimized.

Show comment
Hide comment
@pfrazee

pfrazee Jan 4, 2017

Member

@maxogden I think this might relate to your remarks earlier about the archiver-server being passive. The current archiver-bot does set passive to true when it replicates. There's definitely a scaling issue there.

But, if passive is true, then the public peer wont ask other public peers for anything. I'm I'm understanding this correctly, we'll need some kind of middle-ground; an algorithm for asking for updates with proper throttling.

Member

pfrazee commented Jan 4, 2017

@maxogden I think this might relate to your remarks earlier about the archiver-server being passive. The current archiver-bot does set passive to true when it replicates. There's definitely a scaling issue there.

But, if passive is true, then the public peer wont ask other public peers for anything. I'm I'm understanding this correctly, we'll need some kind of middle-ground; an algorithm for asking for updates with proper throttling.

@maxogden

This comment has been minimized.

Show comment
Hide comment
@maxogden

maxogden Jan 4, 2017

Clarifying question on that code (hard for me to understand due to vague method/variable names), is this the line that 'adds' a feed to a connection? https://github.com/mafintosh/hypercore-archiver/blob/dd34d62253d56604c94d8785e5e39b83816fb30f/index.js#L194 So the issue is the archiver will call .replicate many times over one connection?

Why is it doing that in the first place? Can't we just only call .replicate() for the hypercore that the connection is asking for?

maxogden commented Jan 4, 2017

Clarifying question on that code (hard for me to understand due to vague method/variable names), is this the line that 'adds' a feed to a connection? https://github.com/mafintosh/hypercore-archiver/blob/dd34d62253d56604c94d8785e5e39b83816fb30f/index.js#L194 So the issue is the archiver will call .replicate many times over one connection?

Why is it doing that in the first place? Can't we just only call .replicate() for the hypercore that the connection is asking for?

@pfrazee

This comment has been minimized.

Show comment
Hide comment
@pfrazee

pfrazee Jan 4, 2017

Member

Why is it doing that in the first place? Can't we just only call .replicate() for the hypercore that the connection is asking for?

As I understand it, you need to call feed.replicate() for every feed you want to sync.

I believe the issue is, that we only have two modes: 1) ask to sync every feed we have stored locally, or 2) don't ask to sync anything and let the peer make the feed.replicate() calls.

The latter is passive-mode. If two passive-mode peers connect, no transfer will occur. That's the problem you remarked on, earlier.

However, non-passive-mode will have a scaling problem at some point. You'll ask to sync too many feeds for the connection.

Member

pfrazee commented Jan 4, 2017

Why is it doing that in the first place? Can't we just only call .replicate() for the hypercore that the connection is asking for?

As I understand it, you need to call feed.replicate() for every feed you want to sync.

I believe the issue is, that we only have two modes: 1) ask to sync every feed we have stored locally, or 2) don't ask to sync anything and let the peer make the feed.replicate() calls.

The latter is passive-mode. If two passive-mode peers connect, no transfer will occur. That's the problem you remarked on, earlier.

However, non-passive-mode will have a scaling problem at some point. You'll ask to sync too many feeds for the connection.

@maxogden

This comment has been minimized.

Show comment
Hide comment
@maxogden

maxogden Jan 4, 2017

What if we just used 1 connection per .replicate()?

maxogden commented Jan 4, 2017

What if we just used 1 connection per .replicate()?

@pfrazee

This comment has been minimized.

Show comment
Hide comment
@pfrazee

pfrazee Jan 4, 2017

Member

No that wouldn't solve the problem. Basically the problem is that hyperclouds are interested in too many hypercores. A peer will show up and the hypercore will ask "you have anything new for 10mm cores?" Too thirsty.

We do want the hypercloud to ask about some of their cores. Just not all of them, every time.

Member

pfrazee commented Jan 4, 2017

No that wouldn't solve the problem. Basically the problem is that hyperclouds are interested in too many hypercores. A peer will show up and the hypercore will ask "you have anything new for 10mm cores?" Too thirsty.

We do want the hypercloud to ask about some of their cores. Just not all of them, every time.

@joehand

This comment has been minimized.

Show comment
Hide comment
@joehand

joehand Jan 4, 2017

Collaborator

I'm guessing this means that hypercloud will, at minimum, announce all currently stored archives at the time of connect. That can't scale. Shouldn't the hypercloud sit and wait for requests, passively?

Important to note that announcing is separate from opening the feed. In archiver-server, there is a random timeout to avoid flooding all those announcements but still likely a problem.

But both are issues: 1) have many feeds open and 2) announcing too many things at once

pfrazee: jhand: to clarify, there's two places where a flood could happen. The one you linked to is announcing on the discovery network. The other one, which max and I are discussing, is announcing feeds once a connection is established between peers

Ah!

Collaborator

joehand commented Jan 4, 2017

I'm guessing this means that hypercloud will, at minimum, announce all currently stored archives at the time of connect. That can't scale. Shouldn't the hypercloud sit and wait for requests, passively?

Important to note that announcing is separate from opening the feed. In archiver-server, there is a random timeout to avoid flooding all those announcements but still likely a problem.

But both are issues: 1) have many feeds open and 2) announcing too many things at once

pfrazee: jhand: to clarify, there's two places where a flood could happen. The one you linked to is announcing on the discovery network. The other one, which max and I are discussing, is announcing feeds once a connection is established between peers

Ah!

@pfrazee

This comment has been minimized.

Show comment
Hide comment
@pfrazee

pfrazee Jan 4, 2017

Member

(Max and I clarified our points in IRC)

Member

pfrazee commented Jan 4, 2017

(Max and I clarified our points in IRC)

garbados pushed a commit to garbados/hypercloud that referenced this issue Aug 14, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.