You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is caused primarily by the fact that every page Distru indexes has to have an individual http request. These are currently run linearly; each page must wait for the previous one to finish. Each site must wait for the previous site to finish. Most of the lag, in this case, is network lag; Distru can handle the indexing very fast, but the target sites are slow to respond, and therefore they slow down our indexing.
This can be fixed by allowing Distru to send many HTTP requests simultaneously, mainly to separate sites, so that they can be indexed simultaneously. More complicatedly, requests to different pages could be handled simultaneously. This would require a bit more overhead, to keep track of pages already being requested, but it is definitely doable.
The text was updated successfully, but these errors were encountered:
Distru used to invoke NewIndex() on startup to initialize the variable
Idx. This meant that that function had to be run before any other
pieces of the software could initialize. I just moved this initializ-
ation to a new goroutine, which is invoked inside of Serve(). This
allows the server to start first, and then for the index to build while
it is running.
The server will only serve only an empty index until it finishes invoking the constructor.
ref #12
Previously, Indexes were very nondynamic structures, for which there
existed one constructor, which would index a list of sites in series
(a very slow process) using only one thread.
This commit changes that process entirely, so that indexes are instead
maintained by a number of constantly running processes, Indexers. These
read from a common channel, owned by Serve(), that controls what sites
should be indexed. Once put into the channel, they are addressed by the
first free Indexer, which immediately removes them from the channel and
begins to index the site. When it finishes, it looks for any new items
in the channel, and so on. If the channel is closed with close(c), the
Indexers tied to that channel are terminated.
The number of indexers to run is passed as an argument to MaintainIndex(),
so the number of sites that can be indexed concurrently is settable.
Furthermore, target urls may be added at any time, by any process which
knows about the queue channel. This may in the future be tied to a user
interface.
ref #12
This is caused primarily by the fact that every page Distru indexes has to have an individual http request. These are currently run linearly; each page must wait for the previous one to finish. Each site must wait for the previous site to finish. Most of the lag, in this case, is network lag; Distru can handle the indexing very fast, but the target sites are slow to respond, and therefore they slow down our indexing.
This can be fixed by allowing Distru to send many HTTP requests simultaneously, mainly to separate sites, so that they can be indexed simultaneously. More complicatedly, requests to different pages could be handled simultaneously. This would require a bit more overhead, to keep track of pages already being requested, but it is definitely doable.
The text was updated successfully, but these errors were encountered: