Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Below is the English FAQ.
Seeks is a p2p pattern matching overlay network on top of existing search engines. It provides collaborative websearch capabilities by automatically regrouping users based on the similarity of their queries, and letting them reorganize and evaluate the search results together. Seeks implements a websearch proxy and a distributed hashtable for this purpose.
Seeks proposes that users share their queries to the main websearch engines. By doing so, users who perform similar queries can be automatically connected together through a p2p network. The grouping of users is called a search group. Within a search group users and their machines interact to evaluate, organize and monitor results to their queries. Also, users will have the ability to connect to a search group, and publish their own work (i.e. website, comments, tweets) directly to the group.
Seeks implements two main elements:
- a websearch proxy (a little piece of software between the Internet and your browser) that provides a meta-search engine. More precisely, it intercepts your queries to main search engines, captures and reorganizes the results based on consensus among several search engines (for now Google & Bing). It does so quickly, and allows you to benefit from more accurate search results.
- a p2p client (also known as a distributed hashtable) that automatically regroups users who perform similar queries. This regrouping is done in realtime.
We're not sure. The idea of searching together on top of the most connected networks ever built in human history is simple, realistic, and technically feasible. Our tentative answer is first that there may be not much to gain, monetarily, see below. Second, ingrained habits may not be ready for collaborative websearch.
It is a common understanding for us that the architecture provided by the Seeks project can be used in many ways. However, we believe that our original model for free and transparent collaborative websearch would be hampered by a standard business model, such as support through advertizing. The challenge is to devise new business models to sustain Seeks. In the current design, the Seeks platform and websearch are to be protected through a non-profit structure. This would allow us to keep the code available, open and transparent in its development. Update: there is now a company servicing the Seeks platform for enterprise search and beyond, www.seeks.pro.
There are two main usages that we foresee. First, any application that would live on top of groups of automatically regrouped users could be enabled by Seeks. Second, the Seeks' proxy, controlling the input/output on the http port, used locally on a personal machine, or on a local network, allows the development of a full new set of innovative applications. Those include personal dashboards for displaying information flows from the Web, the remashing of webpages, the development of personal machine learning assistants, either for crawling the Web, or helping with many information clustering and decision-making tasks, as well as the clustering and analysis of images and other media locally.
We believe this goes against their current business model. These companies are living from revenues of advertising they target to websearch users on the basis of their search queries. Therefore it is unlikely they will let the users share their queries, otherwise anybody could advertise anything to the users directly, breaking the mainstream advertising-based business model. Seeks aims at powering up websearch, nothing more. We believe technical innovation should not be hampered by business goals.
We have four steps in mind, take a look at the Roadmap.
The point is your website or any website by others does not need to be discovered, crawled and ranked by a third party search engine. Now you are in charge: decide a set of queries that fit/describe your website or the site you want to register. This website will be recommended to users who perform similar queries to those you used in describing the registered content. Eventually, there is a plan to provide algorithms for automatically generating queries wrt. content.
Is there a way to bypass regular server-like search engines ? I thought it was theoretically slower ?
Decentralized search is several orders of magnitude slower than centralized search. However, Seeks is not a distributed search engine. Seeks is a distributed hashtable of queries, users and contents (mostly URLs). Search can be done by using regular (centralized) search engines while reworking their results collaboratively afterwards, and/or by registering content directly into Seeks' hashtable.
caption Seeks is not just a piece of software, it is a working and actives networks of nodes. Without running instances, there is no point to it. Seeks and its network are community fueled project. You can use it without contributing anything back but that, even without considering the moral aspect, has some limitation.
Some of the benefits from having your own node are:
- make the proxy look and behave exactly like you want it to, e.g. use Sonka
- know and control exactly what is going on, what is logged, what is not, cf Seeks Configuration and Security
- hand pick the sources with additional parsers, e.g. your own wikis, cf How to build and test a search engine parser
- craft the ranking, you decide what comes first
- have a way to test your own Seeks plugins, cf How Seeks works and how to extend it
- contribute back to the code base and send bug reports, cf repository and bug tracker
For now your queries are stored by private companies, running websearch services as a business. There is no reason to believe that sharing queries by making them public would be worse. But most importantly, Seeks regroups users who perform similar queries, in real time. Other people seeing your queries have performed queries that are similar to yours. In other words, why hide among your own crowd? Second, sharing leads to collaboration, which leads to an improved, more subtle and precise treatment of information. So you share for a benefit, and you accept the trade-off. Don't share what you want to keep for yourself (for the search engine databases actually). Third, when querying the Web, you are most likely looking for some human generated information. You are not alone out there, and what you are asking, others have asked it, and sometimes solved it before you. This means that most of the time, your query is a well known drop in an ocean of bits. We believe sharing is a reasonable option, backed up by serious rules and technical protection of the information you may not want to divulge.
Queries are hashed before passed on the network. That is your query never navigates into the clear, but as a bunch of numbers instead. This is equivalent to encryption. When it is to your benefit that the query reaches your peers in the clear, you will have the choice to do so. In this case, we have a plan to provide dedicated encryption. But since you will be making this query (and not just the bunch of numbers) public, encryption should not be required.
No other users cannot identify you except by your IP address. If you wish to hide it, you must use Tor or another similar anonymous routing system. Your Seeks ID on the system is a 160bit randomly generated key.
No. However, this is a little bit complicated. In collaborative mode, Seeks will generate personalized rankings of websearch results. The ranking uses information from other users, mostly automatically computed scores on URLs visited out of a websearch. So, someone theoretically could associate URLs to IP addresses that might have visited them. However, the sharing of scores will be put under every user's control, so privacy will be preserved whenever desired by the user.
Locality sensitive hashing is a method for regrouping similar elements. The general idea behind the theory is to control the collisions in hashing so that similar contents end up with colliding keys. If you are interested in the theory, please start with the LSH page on Wikipedia.
Normally, not much. Try it on your laptop, you will see that your CPU is not strained by it. Memory cache should be in the xxMb every now and then for a single user, a few times more for a public node (as an example, ~16Mb on our public node, and ~40Mb on my laptop). The required space on your hard-drive should remain in the few Mb if you are using the proxy SOLO version.
We are writing a DHT (p2p) from scratch based on Chord. The reason behind this choice is that Chord is a minimal, well studied DHT setting. The reason we are starting from scratch is that we need full and precise tweaking control over the software. Seeks protocol requires very fast transfer of information, in tiny amounts, among peers. Second, Seeks DHT defines several communication layers, from low-level stabilization of the p2p overlay network, to load balancing and user defined plugin-based decentralized exchanges. To achieve this, writing yet another DHT was we believe, the right decision.
A proxy was not required by the architecture but is a flexible solution that offers many advantages and almost no drawbacks. Among the advantages are:
- A proxy is transparent and allows one to redirect traffic from several domains to the same node. For example, some of our main nodes on www.seeks-project.info are hosted on other remote machines;
- A proxy allows one to intercept queries to other search engines, with no plugin added to the browser;
- A proxy allows one to capture/intercept user feedback (useful for collaborative filtering) in a passive manner;
- A proxy allows one to contact the DHT and to integrate additional information into the webpages (e.g. could intercept calls to URLs and ask the DHT for information related to these URLs, such as ratings, comments, etc.).
- A proxy helps protect user data and controls interaction with servers on the network.
- A proxy doesn't prevent any other solution, such as using a web server including the included HTTP server plugin.
There is currently no API to provide direct or remote access to the data generated by a remote instance. For a local instance you can explore the logs.
The plugin udb or 'user db service' can be activated on your node and thus allow you to get access by hash to a protobuffer.
e.g. "curl http://MySeekNode/find_dbr?urkey=URKEY&pn=query-capture"
Overall if you want to develop an extension, generate a visualization or just have some idea in mind, you are invited to come to the IRC channel so that the API can be adjusted to your need. Of course the source itself always remain available for inspection for the most courageous.
If you develop a cool extension, please consider letting us know so that we can publish it and so that the whole community can benefit from it.
Technically, we could find no glitch. And we've been ruminating the whole project for several years now. Both theoretical and technical sides have been thoroughly analyzed. So technically it is feasible, no doubt. However, we are aware that public habits, demand, and usage may (and very probably will) not meet our vision. We are fine with this, Seeks being an open architecture, we are confident that it will fill a gap in collaborative websearch and communication, with a full respect of privacy in a decentralized architecture. How, when and what it will truly look like, that's what the adventure is about, and that's for you to decide!
If you are keen to give us a hand, take a look a the list of tasks that need help.
You can help with either debugging or by coding up plugins for a start. Take a look a the list of tasks that need help.
Definitely. The Seeks user interface is open to UI and web designers. You can start by setting up your own user interface, and then report to us, or by picking an open task. Depending oo user interest we should provide a system of skins for at least Seeks websearch plugin.
Yes. The easiest way is by simply using the software. Other than that, Seeks draws its force from a philosophy of openness, sharing and collaboration in the information age. So you can help with new high level ideas and criticism that would strengthen our skills and free access to good information.
Security of software is always a concern. In a Free Software community bugs happen to be detected quickly. However, on a network of interacting machines, fiddling a communication protocol can partly disrupt the communication system. Here, it is enough to fall back onto the good mass vs. the bad mass argument: the more we are using the software as it is, the more robust the network, and the more difficult is the disruption.
Yes. You can find it here. It is to be signed by contributors. Its purpose is to better protect Seeks code by regrouping the code under a single banner (authorship). Though you can contribute without signing the CLA.
This error is commonly triggered by a failed or timed out connection to a remote peer. Your Seeks instance then receives a partial message and fails decoding it. To mitigate this error, you can increase the default timeout for P2P calls. Edit