Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocr 报错 #5

Closed
luckycat0426 opened this issue Oct 25, 2021 · 16 comments · Fixed by #15
Closed

ocr 报错 #5

luckycat0426 opened this issue Oct 25, 2021 · 16 comments · Fixed by #15

Comments

@luckycat0426
Copy link
Contributor

luckycat0426 commented Oct 25, 2021

执行 cea sign 遇到验证码触发ocr,执行报错.

root@debian:~/config# cea sign
(node:27809) ExperimentalWarning: stream/web is an experimental feature. This feature could change at any time
(Use node --trace-warnings ... to show where the warning was created)
⚠ 警示 登录需要验证码,正在用 OCR 识别 @two
Error opening data file ./eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
AdaptedTemplates != nullptr:Error:Assert failed:in file /workspace/tesseract/src/classify/adaptmatch.cpp, line 196
undefined
undefined
/usr/lib/node_modules/cea/node_modules/tesseract.js/src/createWorker.js:173
throw Error(data);

node 版本

root@debian:~/config# node -v
v16.12.0

tmp目录有发现下载的ocr模型

root@debian:~/config# ls -alh /tmp
total 72K
drwxrwxrwt 13 root root 4.0K Oct 25 10:01 .
drwxr-xr-x 20 root root 4.0K Oct 25 09:01 ..
drwxr-xr-x 2 root root 4.0K Oct 23 18:39 conf
-rw-r--r-- 1 root root 166 Oct 25 10:01 eng.traineddata
-rw-r--r-- 1 root root 166 Oct 25 10:00 eng.traineddata.gz

@luckycat0426
Copy link
Contributor Author

luckycat0426 commented Oct 25, 2021

可能发现问题了
https://beetcb.gitee.io/filetransfer/tmp/eng.traineddata.gz

async function downloadTessdata() {
    process.env.TESSDATA_PREFIX = '/tmp';
    if (!fs.existsSync('/tmp')) {
        fs.mkdirSync('/tmp');
    }
    else {
        if (fs.existsSync(tessdataPath)) {
            return;
        }
    }
    download('https://beetcb.gitee.io/filetransfer/tmp/eng.traineddata.gz', tessdataPath);
}

下载的文件不存在

@beetcb
Copy link
Contributor

beetcb commented Oct 25, 2021

啊,对诶!上次失手把 Gitee 上这个仓库删掉了,我等下把它救回来

@luckycat0426
Copy link
Contributor Author

啊,对诶!上次失手把 Gitee 上这个仓库删掉了,我等下把它救回来

拿网上最新模型测试了一下,验证码通过率似乎有点低,不知您这个模型有无对验证码特殊训练过。鉴于一次成功登陆后就能获得cookie,是否考虑增加手动输入验证码选项

@beetcb
Copy link
Contributor

beetcb commented Oct 25, 2021

  1. 这个模型就是原始的没有训练过(现仓库已恢复)
  2. 这是个很好的想法,欢迎 PR,跟它有关的代码位置在这里:
    // check captcha is needed
    const addtionalParams =
    `?username=${user.username}&ltId=${hiddenInputNameValueMap.lt || ''}`
    const needCaptcha = hiddenInputNameValueMap.needCaptcha === undefined
    ? (
    await (
    await fetch.get(
    `${school.auth}${schoolEdgeCases.checkCaptchaPath}${addtionalParams}`,
    )
    ).text()
    ).includes('true')
    : hiddenInputNameValueMap.needCaptcha

    @luckycat0426

@luckycat0426
Copy link
Contributor Author

luckycat0426 commented Oct 25, 2021

luckycat0426@9183da0
没有学过typescript,代码可能不太规范,现在测试代码遇到问题,不知如何运行,我尝试使用package.json中的脚本

"build:debug": "esbuild ./src/*.ts ./src/**/*.ts --outdir=lib/src/ --format=cjs --sourcemap"

编译后将代码替换系统中的cea代码尝试运行测试.

cp -rf /root/cea/core/lib/* /usr/lib/node_modules/cea/node_modules/cea-core/lib/

遇到报错

root@debian:~/cea# cea sign
file:///usr/lib/node_modules/cea/node_modules/cea-check-in/lib/src/index.js:1
import { CampusphereEndpoint } from 'cea-core';
         ^^^^^^^^^^^^^^^^^^^
SyntaxError: The requested module 'cea-core' does not provide an export named 'CampusphereEndpoint' 

能否请教一下代码构建流程.有无教程推荐给初入门的新手.

@beetcb
Copy link
Contributor

beetcb commented Oct 25, 2021

我应该写个 Contribution Guide 的。
这是个 Monorepo,应该要多个 package 一起 build 才能相互依赖,完整流程应该是:

1. git clone && cd cea
2. npm run bootstrap
3. npm run build (build:debug 是用来 vscode debug 编译 ts 代码的)

@luckycat0426

@beetcb
Copy link
Contributor

beetcb commented Oct 25, 2021

如果方便的话,咱们最好再开一个 PR 来 track 验证码问题,一个 PR 只做一件事 (●'◡'●)

@luckycat0426
Copy link
Contributor Author

不好意思,忘记开新的分枝处理验证码问题了,我等下重置到docker那次commit pr上去再开另外一个分支。

@luckycat0426
Copy link
Contributor Author

luckycat0426 commented Oct 25, 2021

执行

npm run bootstrap
npm run build

后尝试使用vscode 调试配置

configurations": [
    {
      "type": "pwa-node",
      "request": "launch",
      "name": "Check In Test",
      "skipFiles": ["<node_internals>/**"],
      "preLaunchTask": "npm: build:debug",
      "program": "${workspaceFolder}/internal/src/cli.ts",
      "args": ["sign"],
      "outFiles": ["${workspaceFolder}/**/*.js"]
    }

输出

> Executing task: npm run build:debug <


> build:debug
> lerna run build:debug

lerna notice cli v4.0.0
lerna info Executing command in 3 packages: "npm run build:debug"
lerna info run Ran npm script 'build:debug' in 'cea-core' in 0.5s:

> cea-core@2.0.1 build:debug
> esbuild ./src/*.ts ./src/**/*.ts --outdir=lib/src/ --format=cjs --sourcemap

lerna info run Ran npm script 'build:debug' in 'cea-check-in' in 0.4s:

> cea-check-in@2.0.1 build:debug
> esbuild ./src/*.ts --outdir=lib/src/ --format=cjs --sourcemap

lerna info run Ran npm script 'build:debug' in 'cea' in 0.4s:

> cea@2.0.1 build:debug
> esbuild ./src/*.ts --outdir=lib/src/ --format=cjs --sourcemap

lerna success run Ran npm script 'build:debug' in 3 packages in 1.3s:
lerna success - cea-core
lerna success - cea
lerna success - cea-check-in

终端将被任务重用,按任意键关闭。

遇到错误

Process exited with code 1
/usr/bin/node ./internal/lib/src/cli.js sign
Uncaught ReferenceError: require is not defined in ES module scope, you can use import instead

node版本是

root@debian:~/cea# /usr/bin/node
Welcome to Node.js v16.12.0.
Type ".help" for more information.

@beetcb
Copy link
Contributor

beetcb commented Oct 26, 2021

每个包内 build:debug 脚本里 esbuild 的参数 --format=cjs 应该移除,最近这个仓库刚刚换成 esm,遗留的脚本还没改过来

暂时用 npm run build 编译就行,只是 debug 不太方便
@luckycat0426

@luckycat0426
Copy link
Contributor Author

luckycat0426 commented Oct 30, 2021

手动填写验证码写完了,但是好像登陆不上,抓包发现

username=学号&password=加密密码&lt=LT-717239-AQV9Dz1Khom*****************50271-pizL-cas&dllt=userNamePasswordLogin&execution=e1s1&_eventId=submit&rmShown=1&rememberMe=on&captcha=ppzc'

这个是cea请求

captcha

username=学号&password=加密密码&captchaResponse=5Aea&lt=LT-3031222-IcQNzv*****************SJohVA1635560643698-1cSz-cas&dllt=userNamePasswordLogin&execution=e1s2&_eventId=submit&rmShown=1

这个是我网页登陆请求

captchaResponse

看起来边缘情况可能要额外考虑验证码form字段
@beetcb

@luckycat0426
Copy link
Contributor Author

改了字段登陆成功了,稍后我整理下代码提交pr

@beetcb
Copy link
Contributor

beetcb commented Oct 30, 2021

对对,不同学校实现的方式与字段都不同,只能让边缘情况文件不断壮大了 😂

@luckycat0426
Copy link
Contributor Author

#15
之后还可以考虑添加打码平台进行验证码识别

@beetcb beetcb linked a pull request Oct 30, 2021 that will close this issue
@beetcb
Copy link
Contributor

beetcb commented Oct 30, 2021

#15 之后还可以考虑添加打码平台进行验证码识别

这是什么意思,没看懂,是 API 识别吗?

@luckycat0426
Copy link
Contributor Author

http://www.fateadm.com/
"人工"智能

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants