中文|英文
Bot that automatically downloads image sets from EH/EX/NH and uploads them to Telegraph.
This code is only guaranteed to work correctly on MacOS (partial functionality) and Linux.
- Install Docker and docker-compose.
- Create a new folder
ehbot
. - Copy
config_example.yaml
from the project toehbot
and rename it toconfig.yaml
, then change the configuration details (see the next section). - Copy
docker-compose.yml
toehbot
. - Start and Shutdown.
- Start: Run
docker-compose up -d
in this folder. - Shutdown: Run
docker-compose down
in this folder. - View logs: Run
docker-compose logs
in this folder. - Update the image: Run
docker-compose pull
in this folder.
- Start: Run
- Basic Configuration
Bot Token: Find @BotFather in Telegram to apply.
2. Admin (can be empty): your Telegram ID, you can get it from any relevant Bot (you can also get it from this Bot
/id
). 3. Telegraph: Use your browser to create a Telegraph Token via this link and fill in. You can also change the author name and URL. - Proxy Configuration
- Deploy
worker/web_proxy.js
of this repository to Cloudflare Workers and configure theKEY
environment variable to be a random string (the purpose of theKEY
is to prevent unauthorized requests to the proxy). - Fill in the URL and Key into the yaml.
- The proxy is used to request some services with frequency limitation, so do not abuse it.
- Deploy
- IPv6 configuration
- You can specify an IPv6 segment, if you do not have a larger (meaning larger than
/64
) IPv6 segment, please leave it blank. - Configure IPv6 to somewhat alleviate the flow restriction for single IP.
- You can specify an IPv6 segment, if you do not have a larger (meaning larger than
- Configure cookies for some Collectors.
- Currently, only exhentai is required.
- KV configuration
- This project uses a built-in caching service to avoid repeated synchronization of an image set.
- Please refer to cloudflare-kv-proxy for deployment and fill in the yaml file.
- If you don't want to use remote caching, you can also use pure memory caching (it will be invalid after reboot). If you want to do so, you need to modify the code and recompile it by yourself.
Requires the latest Nightly version of Rust. Recommended to use VSCode or Clion for development.
RsProxy is recommended as the crates.io source and toolchain installation source for users in China Mainland.
A Docker build can be triggered by typing a Tag starting with v
. You can type the tag directly in git and push it up; however, it is easier to publish the release in github and fill in the v
prefix.
Although this project is a simple crawler, there are still some considerations that need to be explained.
Github Action can be used to automatically build Docker images, and this project supports automatic builds for the x86_64
platform.
However, it can also build arm64
versions, but it is not enabled because it uses qemu to emulate the arm environment on x86_64, so it is extremely slow (more than 1h for a single build).
Some sites have IP-specific access frequency limits, which can be mitigated by using multiple IPs. The most common approach in practice is proxy pooling, but proxy pools are often extremely unstable and require maintenance and possibly some cost.
Observe the target sites of this project, many use Cloudflare, and Cloudflare supports IPv6 and the granularity of flow limitation is /64
. If we bind a larger IPv6 segment for the local machine and randomly select IPs from it as client exit addresses, we can make more frequent requests steadily.
Since the NIC will only bind a single IPv6 address, we need to enable net.ipv6.ip_nonlocal_bind
.
After configuring IPv6, for target sites that can use IPv6, this project will use random IP requests from the IPv6 segment.
Configuration (configuration for the NIC can be written in if-up
for persistence).
sudo ip add add local 2001:x:x::/48 dev lo
sudo ip route add local 2001:x:x::/48 dev your-interface
- Configure
net.ipv6.ip_nonlocal_bind=1
in Sysctl. This step varies by distribution (for example, the common/etc/sysctl.conf
does not exist in Arch Linux).
Where to get IPv6? he.net offers a free service for this, but of course it is not expensive to buy an IPv6 IP segment yourself.
You can test the configuration with curl --interface 2001:***** ifconfig.co
to see if it is correct.
The site mentioned in the previous subsection uses Cloudflare, but in fact does not really enable IPv6. when you specify the ipv6 request directly using curl, you will find that it has no AAAA records at all. But because the CF infrastructure is Anycast, so if the target site does not explicitly deny IPv6 visitors in the code, they can still be accessed through IPv6.
-
telegra.ph: No AAAA records, but force resolves to Telegram's entry IP for access, but the certificate is
*.telegram.org
.This project writes a TLS validator that checks the validity of a given domain's certificate, to allow for misconfiguration of its certificate while maintaining security.However, Telegraph fixed the problem very quickly, so the TLS verifier is currently disabled.
-
EH/NH: Forced IPv6 availability.
-
EX: CF is not used and no IPv6 service is available.
This project uses Cloudflare Workers as a partial API proxy to alleviate the flow limitation problem when IPv6 is not available. See src/http_proxy.rs
and worker/web_proxy.js
.
To minimize duplicate pulls, this project uses in-memory caching and remote persistent caching. Remote persistent cache using Cloudflare Worker with Cloudflare KV to build. The main project code reference is cloudflare-kv-proxy.
Since it takes some time to synchronize image sets, to avoid repeated synchronization, this project uses singleflight-async to reduce this kind of waste.
You are welcome to contribute code to this project(no matter how small the commit is)!