-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reCAPTCHA proxy #103
Comments
it will first display that to the user because each user needs its own session to prevent more then one user doing the same captcha at once |
I think a good solution could be having a backup search engine. |
I already wrote most of the code for the captcha proxy already |
also look at librey and its fall back system it's not necessarily clean or well done |
I will likely add support for other engines at some point but the user should be able to use Google if they want rate limited or not |
that's why I'm working on the proxy |
@amogusussy i found a better way to proxy the captcha using sessions and a iframe sending data to the iframe from the server like the sitekey and s-data |
Have you tried that with a different device? If you send an iframe to the user's device, whatever site it's loading will just think it's a request from the user, so it wont be rate limited. |
yes im aware of the normal page thing i have been testing for hours ill be fine once its out/done |
its going to need an entire local proxy server for this to work https://mitmproxy.org/ i found this but do you know any better http proxies? |
recaptcha needs to be done using the servers ips so i need to proxy everything to the end user |
I've found this list of alternatives for Linux, but I don't really know what makes a proxy better/worse. |
I think I should use a paid Captcha solver service |
because it's not necessarily practical to proxy it to my users or even possible |
i'm going to drop this for now and add support for a different engine as a backup |
any ideas for what engine i should use for the backup? |
Also, I will be implementing the backup engine, so there is a template to build off of. |
i want everything to look like it belongs unlike LibreY and its broken system |
a captcha proxy cost less then google search api |
Qwant has a free API. The only problem is that it doesn't show the wikipedia results in the API, so you'll have to scrape that yourself. There's also DuckDuckGo, Startpage, yahoo, and brave. I think we should standardize the results into one dict/json object, like what's been done with the torrent results. If we do that, it'll be 10x easier to add new engines, and maybe even give the user the ability to choose what engines they want to use. |
question what do you think of anonymous data submission of search results as an opt in future it would only collect the sub domain and domain for each result for the query but it wouldn't record the query itself so like www.youtube.com or GitHub.com nothing after the / |
I would use that data to index and improve aspects of the search results and make results more visual |
such as favicon indexing etc I might collect YouTube channel URLs too so the / for that but that's so I can index all channels over 10k subscribers |
the data collection code would be open source and anonymous |
and if the user wants to opt out of the setting turned on by default in settings they can |
i want to index some stuff that each engine can use like qwant google etc in Araa |
i want to make results look more visual and modern |
I want it to be on par with closed-source meta search engines, and for that to work, some data collection may be required. |
its only an idea and does not mean it will happen |
its something i want to do but if i do decide to develop it something might change resulting in it getting dropped |
Something like that seems too far out from the reach of this project. |
it's not necessarily far out of reach it's common to come across the same websites in the search results for different queries and many people will search/request the same websites time from time so after the first request it will index the favicon etc and pair it with that sub domain and domain |
like medium and other articles sites are quite common or even stack overflow for a coding related query |
most people only really go to the top 1000 or so sites and it will naturally index's information for thos sites overtime and many other sites |
it may seem far out of reach but when you really think about it and user habits it isn't impossible |
the indexer application would have to be a separate project from this repo and this repo would only use the data it produces using the collected data from this repository |
also due to speed it will only show that data after it has it indexed by the other application |
the indexer might be MIT or something I'm not sure but any data it produces likely won't be subject to GPL |
yes I want to add things like weather news and spots thos are also topics I want to index |
I wouldn't index text results because it's more compacted then news or other topics/subjects |
I would only get data to associate with text results |
This is a local autocomplete demo with some data collection. It could improve a ton, and then there would be no need to relay on DuckDuckGo, making it faster with some optimization. |
read how many lines it has of right now |
I'm more talking about how we'd need to use things like databases for all the favicons. If you want good speeds for it, you'll need to use a dedicated database, like SQLite, rather than using python dicts. It probably could be done, but it might take a bit of time to do it right. Does the search suggestions deal with misspelled words? If I go to DuckDuckGo and type 'liux', it gives a suggestion of 'linux', because it can guess what I was probably going for. Does this have anything similar yet? |
|
That looks good then. I think it should still keep duckduckgo by default though, unless you make a way for it to actually guess what the user's going to type, beside using a list to look it up. |
I'm working on a system that allows users to interact with reCAPTCHA. Whenever Araa gets rate-limited, it will then load a web driver to proxy the captcha, allowing users to interact with it. If the user successfully completes the captcha, the web driver will then capture the "GOOGLE_ABUSE_EXEMPTION=ID" cookie and send it in the request header to Google using makeHTMLRequest. Both SearXNG and other projects do not do this, so this will be the first.
The text was updated successfully, but these errors were encountered: