Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] BOT detected when headless mode #614

Open
adiwirak opened this issue Jan 25, 2022 · 8 comments
Open

[Bug] BOT detected when headless mode #614

adiwirak opened this issue Jan 25, 2022 · 8 comments
Labels
issue: bug report A bug has been reported needs triage

Comments

@adiwirak
Copy link

Describe the bug
if I set headless mode, open the web that I scrap detects as BOT.
I know that this is because of WAF. so that it is taken to /_Incapsula_Resource? blah..bla...bla

Any ideas to bypass this problem?

Because if I force it to set headless=false, then the problem I face is that my OS is Linux, which doesn't have a GUI.

Versions

"dependencies": { "cheerio": "*", "express": "^4.17.1", "moment": "^2.29.1", "mongodb": "^4.2.0", "mysql": "^2.18.1", "puppeteer-extra": "^3.2.3", "puppeteer-extra-plugin-adblocker": "^2.12.0", "puppeteer-extra-plugin-stealth": "^2.9.0", "request-promise": "^4.2.6", "shelljs": "^0.8.4", "socket.io": "^4.4.0", "socket.io-client": "^4.4.0", "sprintf-js": "^1.1.2", "telegraf": "^4.5.2", "util": "^0.12.4" }

@adiwirak adiwirak added issue: bug report A bug has been reported needs triage labels Jan 25, 2022
@sk91
Copy link

sk91 commented Jan 27, 2022

In my case, I needed the browser to run in a headfull mode in a docker container + vnc.
Maybe my solution will help you find a workaround for you.
I solved it by using fluxbox (http://fluxbox.org/)

# worker-base image
FROM node:14.18-slim

## install base deps
RUN apt-get update \
  && apt-get install -yq --no-install-recommends \
  gnupg  \
  curl \
  gconf-service \
  libasound2 \
  libatk1.0-0 \
  libc6 \
  libcairo2 \
  libcups2 \
  libdbus-1-3 \
  libexpat1 \
  libfontconfig1 \
  libgcc1 \
  libgconf-2-4 \
  libgdk-pixbuf2.0-0 \
  libglib2.0-0 \
  libgtk-3-0 \
  libnspr4 \
  libpango-1.0-0 \
  libpangocairo-1.0-0 \
  libstdc++6 \
  libx11-6 \
  libx11-xcb1 \
  libxcb1 \
  libxcomposite1 \
  libxcursor1 \
  libxdamage1 \
  libxext6 \
  libxfixes3 \
  libxi6 \
  libxrandr2 \
  libxrender1 \
  libxss1 \
  libxtst6 \
  ca-certificates \
  fonts-liberation \
  libappindicator1 \
  libnss3 \
  lsb-release \
  xdg-utils \
  wget \
  x11vnc \
  x11-xkb-utils \
  xfonts-100dpi \
  xfonts-75dpi \
  xfonts-scalable \
  xfonts-cyrillic \
  x11-apps xvfb \
  fonts-ipafont-gothic \
  fonts-wqy-zenhei \
  fonts-thai-tlwg \
  fonts-kacst \
  ttf-freefont \
  fluxbox \
  procps \
  x11-utils \
  eterm \
  xterm \
  netcat

# install google chrome
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
  && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
  && apt-get update \
  && apt-get install -y --no-install-recommends \
  google-chrome-stable \
  && rm -rf /var/lib/apt/lists/* \
  && rm -rf /src/*.deb

ENV DISPLAY=:99
ENV X11VNC_PASSWORD=password
ENV XVFB_SCREEN_SIZE=1024x768x24
WORKDIR /usr/src/app
COPY ./scripts/worker-entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

entrypoint.sh

#! /bin/sh
export DISPLAY=${DISPLAY:-:0} # Select screen 0 by default.
export XVFB_SCREEN_SIZE=${XVFB_SCREEN_SIZE:-1024x768x24}
export X11VNC_PASSWORD=${X11VNC_PASSWORD:-password}
xdpyinfo
rm -rf /tmp/.X99-lock
rm -rf .X11-unix
sleep 1
! pgrep -a Xvfb && Xvfb $DISPLAY -screen 0 ${XVFB_SCREEN_SIZE} -ac &
sleep 1
if which x11vnc &>/dev/null; then
  # ! pgrep -a x11vnc && x11vnc -bg -forever -passwd ${X11VNC_PASSWORD} -ncache 10 -ncache_cr -quiet -display WAIT$DISPLAY &
  ! pgrep -a x11vnc && x11vnc -bg --shared -forever -passwd ${X11VNC_PASSWORD} -quiet -display WAIT$DISPLAY &
fi
if which fluxbox &>/dev/null; then
  ! pgrep -a fluxbox && fluxbox 2>/dev/null &
fi
echo "IP: $(hostname -I) ($(hostname))"

exec "$@"

@StackedQueries
Copy link

Using just Xvfb solved most of these problems for me. There is an older xvfb for node that one could use to manage the screens. Essentially run xvfb via the package (or a homemade script) and push to that display via the launch args in puppeteer. IIRC it would be something like --display=:${displayId}. It's worth mentioning that working with sessions/multiple screens/etc will require you to do some manipulation of the lock files like mentioned in @sk91's entrypoint.sh

rm -rf /tmp/.X99-lock
rm -rf .X11-unix

@adiwirak
Copy link
Author

adiwirak commented Jan 28, 2022

rm -rf

I've tried using xvfb. But always error, when xvfb.startSync()

    const Xvfb = require('xvfb');
    const xvfb = new Xvfb({
        silent: true,
        xvfb_args: ["-screen", "0", '1280x720x24', "-ac"],
    });
    xvfb.startSync()
    this.config.args.push( '--display='+xvfb._display)
    this.VirtLayar = xvfb
  }

I don't understand, is this because my PC doesn't support it, or I'm wrong in the installation?
Can u explain step by step, how to install & use it?

@StackedQueries
Copy link

Can you provide the error you are getting? silent: false should give you some information regarding xvfb errors as well. This is essentially the logic I would follow.

const display = new Xvfb({
  displayNum: 1,
  reuse: false,
  silent: true,
  xvfb_args: ['-screen', '0', '1280x720x24', '-ac', '-noreset']
})

display.startSync()

await puppeteer.launch( {args: [
  `--display=${display.id}`
]})

@adiwirak
Copy link
Author

Can you provide the error you are getting? silent: false should give you some information regarding xvfb errors as well. This is essentially the logic I would follow.

const display = new Xvfb({
  displayNum: 1,
  reuse: false,
  silent: true,
  xvfb_args: ['-screen', '0', '1280x720x24', '-ac', '-noreset']
})

display.startSync()

await puppeteer.launch( {args: [
  `--display=${display.id}`
]})

Thank you very much. Finally, my problem was solved with Xvfb module.

But I'm still curious about the initial parameters in the module.

Like the following example:

    const xvfb = new Xvfb({
        silent: true, reuse: true,
        xvfb_args: ["-screen", "0", '1280x720x24', "-ac"],
    });
    xvfb.startSync()

if I don't set reuse: true, then I get an error. Can you explain why this happened?

Indeed, this code (module) is executed with different parameters. So I thought, there was a crash in using the screen. Is my guess correct?

So, in your opinion, should each module create its own virtual monitor? or all leads to one virtual monitor?

Because I don't really understand, how to create a virtual monitor? how many can be created? etc.
Please explain. I really appreciate your explanation

@StackedQueries
Copy link

StackedQueries commented Jan 31, 2022

Sure :) The reuse option is really just dependent on the use case. From the docs, reuse - whether to reuse an existing Xvfb instance if it already exists on the X display referenced by displayNum. If I understand this correctly, it's just saying that you can reinitialize the display w/ diff params after the fact. I just continuously use the same display. It's important to understand the difference between displays and screens as well. Displays are what is referenced when starting puppeteer (i.e. :0), and screens are contained in the displays. Screens shouldn't really matter in your use case though. Multiple puppeteer instances can be used on the same display. I would recommend checking out the man page for xvfb as well.

@soshimee
Copy link

I have the exact same issue... except I'm on Windows. The protection service on the website I'm trying to scrape is "StackPath." It passes after a few seconds without headless mode, but it gets blocked instantly with headless mode.

@Postur
Copy link

Postur commented Feb 21, 2022

for me google detects i'm headless.
refuses to log me in because 'browser may not be secure' or whatever.

does anyone have a fix for this?

I don't want to add a display to my environment, I need headless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
issue: bug report A bug has been reported needs triage
Projects
None yet
Development

No branches or pull requests

5 participants