Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot download novel with an apostrophe #214

Closed
KiraYamatoSD opened this issue Oct 22, 2019 · 8 comments
Closed

Cannot download novel with an apostrophe #214

KiraYamatoSD opened this issue Oct 22, 2019 · 8 comments
Labels
bug Something isn't working

Comments

@KiraYamatoSD
Copy link

KiraYamatoSD commented Oct 22, 2019

Hi, I have encountered an issue where the downloader stops working with titles containing an apostrophe.

Example 1: https://babelnovel.com/books/the-school-s-omnipotent-useless-garbage
Example 2: https://babelnovel.com/books/soldier-king-s-love-story-in-the-city

Attached is the log.

Z:\SharedFolder\python_apps\lightnovel-crawler>lightnovel-crawler -lll --multi --all --ignore --add-
source-url --suppress --format "epub" --source "https://babelnovel.com/books/the-school-s-omnipotent
-useless-garbage"
================================================================================
                            Lightnovel Crawler #2.16.0
                  https://github.com/dipu-bd/lightnovel-crawler
--------------------------------------------------------------------------------
                          << LOG LEVEL: DEBUG
--------------------------------------------------------------------------------
 ! Input is suppressed
--------------------------------------------------------------------------------
Namespace(add_source_url=True, all=True, bot=None, chapters=None, extra={}, first=None, force=False,
 ignore=True, last=None, list_sources=False, log=3, login=None, multi=True, novel_page='https://babe
lnovel.com/books/the-school-s-omnipotent-useless-garbage', output_formats=['epub'], output_path=None
, page=None, query=None, range=None, single=False, sources=False, suppress=True, volumes=None)
2019-10-22 18:45:24,560 [DEBUG] (urllib3.connectionpool)
Starting new HTTP connection (1): bit.ly:80
2019-10-22 18:45:25,635 [DEBUG] (urllib3.connectionpool)
http://bit.ly:80 "GET /2yYyFGd HTTP/1.1" 301 132
2019-10-22 18:45:25,638 [DEBUG] (urllib3.connectionpool)
Starting new HTTPS connection (1): pypi.org:443
2019-10-22 18:45:27,851 [DEBUG] (urllib3.connectionpool)
https://pypi.org:443 "GET /pypi/lightnovel-crawler/json HTTP/1.1" 200 14268

-> Press  Ctrl + C  to exit

2019-10-22 18:45:28,716 [WARNING] (DOWNLOADER)
CairoSVG was not loaded properly. SVG to PNG conversion will fail.
2019-10-22 18:45:28,719 [INFO] (APP)
Initialized App
2019-10-22 18:45:28,721 [INFO] (APP)
Detected URL input
2019-10-22 18:45:28,723 [INFO] (APP)
Initializing crawler for: https://babelnovel.com/
Retrieving novel info...
https://babelnovel.com/books/the-school-s-omnipotent-useless-garbage
2019-10-22 18:45:28,744 [DEBUG] (urllib3.connectionpool)
Starting new HTTPS connection (1): babelnovel.com:443
2019-10-22 18:45:32,364 [DEBUG] (urllib3.connectionpool)
https://babelnovel.com:443 "GET / HTTP/1.1" 200 None
2019-10-22 18:45:39,459 [INFO] (BABELNOVEL)
Getting https://babelnovel.com/content-css?hash=a2d57dcc8e2040f0e577739a3c210406
2019-10-22 18:45:40,987 [DEBUG] (urllib3.connectionpool)
https://babelnovel.com:443 "GET /content-css?hash=a2d57dcc8e2040f0e577739a3c210406 HTTP/1.1" 200 Non
e
2019-10-22 18:45:40,987 [INFO] (BABELNOVEL)
Bad selectors: #PWUHVHPE, .ATOYDHBM, #XXNHGWPR, #HUYWZSND, #ZDFENBFN, .HPULPMNR, .WSMETHAH, #YMMPZIE
T, .SOGOCAKM, .LAHOTOGB, .NEXKCSUV, .GWNKTBBC, .FSSBCGNM, .BDFQLKSJ, .NECPSWPE, #VYNJGYTY, .THBTZSRT
, .SIRVIKXO, #EMVXBWRY, #DIKWQORG, .TCCHZEPN, #SIHGGFEZ, .NVGHFOHA, .GWBNRNBP, #HEOZHUZQ, .KZOSKRUS,
 .BXYKSBAY, .XJOCEALK, .VESBQMHL, .VTDWQDMV, #XINFRKMG, .DOUSNMTR, .IZBNVAMB, .XQLZGAWK, #WNYNDLIM,
.CQOYNJOA, #XDHTNZAY, #KWJWJYBA, .SYVXHZCI, .DPYRQBMM, #UKEEEXUP, .ICFBHAKD, #NLBNUZED, #ELGQPNAX, #
QFTGNLAR, #KHQMBQRS, #ODJGHVCX, .SOMGGJDN, #HGMSPNZM, #FOZBTZON
2019-10-22 18:45:40,987 [INFO] (BABELNOVEL)
Canonical name: the-school-s-omnipotent-useless-garbage
2019-10-22 18:45:40,987 [DEBUG] (BABELNOVEL)
Visiting https://babelnovel.com/api/books/the-school-s-omnipotent-useless-garbage
2019-10-22 18:45:41,766 [DEBUG] (urllib3.connectionpool)
https://babelnovel.com:443 "GET /api/books/the-school-s-omnipotent-useless-garbage HTTP/1.1" 200 Non
e
2019-10-22 18:45:41,766 [INFO] (BABELNOVEL)
Novel ID: 5c233f27-c124-455a-bd0c-538f4c06cbae
2019-10-22 18:45:41,766 [INFO] (BABELNOVEL)
Novel title: The School\u2019s Omnipotent Useless Garbage
2019-10-22 18:45:41,766 [INFO] (BABELNOVEL)
Novel cover: https://img.babelchain.org/book_images/The School\u2019s Omnipotent Useless Garbage.jpg

2019-10-22 18:45:41,766 [DEBUG] (BABELNOVEL)
Visiting https://babelnovel.com/api/books/5c233f27-c124-455a-bd0c-538f4c06cbae/chapters?bookId=5c233
f27-c124-455a-bd0c-538f4c06cbae&page=0&pageSize=100&fields=id,name,canonicalName,hasContent
2019-10-22 18:45:41,782 [DEBUG] (BABELNOVEL)
Visiting https://babelnovel.com/api/books/5c233f27-c124-455a-bd0c-538f4c06cbae/chapters?bookId=5c233
f27-c124-455a-bd0c-538f4c06cbae&page=1&pageSize=100&fields=id,name,canonicalName,hasContent
2019-10-22 18:45:41,782 [DEBUG] (urllib3.connectionpool)
Starting new HTTPS connection (2): babelnovel.com:443
2019-10-22 18:45:42,561 [DEBUG] (urllib3.connectionpool)
https://babelnovel.com:443 "GET /api/books/5c233f27-c124-455a-bd0c-538f4c06cbae/chapters?bookId=5c23
3f27-c124-455a-bd0c-538f4c06cbae&page=0&pageSize=100&fields=id,name,canonicalName,hasContent HTTP/1.
1" 200 None
2019-10-22 18:45:42,561 [DEBUG] (BABELNOVEL)
Visiting https://babelnovel.com/api/books/5c233f27-c124-455a-bd0c-538f4c06cbae/chapters?bookId=5c233
f27-c124-455a-bd0c-538f4c06cbae&page=2&pageSize=100&fields=id,name,canonicalName,hasContent
2019-10-22 18:45:43,200 [DEBUG] (urllib3.connectionpool)
https://babelnovel.com:443 "GET /api/books/5c233f27-c124-455a-bd0c-538f4c06cbae/chapters?bookId=5c23
3f27-c124-455a-bd0c-538f4c06cbae&page=2&pageSize=100&fields=id,name,canonicalName,hasContent HTTP/1.
1" 200 None
2019-10-22 18:45:44,244 [DEBUG] (urllib3.connectionpool)
https://babelnovel.com:443 "GET /api/books/5c233f27-c124-455a-bd0c-538f4c06cbae/chapters?bookId=5c23
3f27-c124-455a-bd0c-538f4c06cbae&page=1&pageSize=100&fields=id,name,canonicalName,hasContent HTTP/1.
1" 200 None
2019-10-22 18:45:44,244 [INFO] (BABELNOVEL)
3 volumes and 262 chapters found
Traceback (most recent call last):
  File "c:\python35\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\python35\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Python35\Scripts\lightnovel-crawler.exe\__main__.py", line 9, in <module>
  File "c:\python35\lib\site-packages\lncrawl\__init__.py", line 13, in main
    start_app()
  File "c:\python35\lib\site-packages\lncrawl\core\__init__.py", line 78, in start_app
    raise err
  File "c:\python35\lib\site-packages\lncrawl\core\__init__.py", line 75, in start_app
    run_bot(bot)
  File "c:\python35\lib\site-packages\lncrawl\bots\__init__.py", line 18, in run_bot
    ConsoleBot().start()
  File "c:\python35\lib\site-packages\lncrawl\bots\console.py", line 62, in start
    self.app.get_novel_info()
  File "c:\python35\lib\site-packages\lncrawl\core\app.py", line 130, in get_novel_info
    print('NOVEL: %s' % self.crawler.novel_title)
  File "c:\python35\lib\site-packages\colorama\ansitowin32.py", line 41, in write
    self.__convertor.write(text)
  File "c:\python35\lib\site-packages\colorama\ansitowin32.py", line 162, in write
    self.write_and_convert(text)
  File "c:\python35\lib\site-packages\colorama\ansitowin32.py", line 190, in write_and_convert
    self.write_plain_text(text, cursor, len(text))
  File "c:\python35\lib\site-packages\colorama\ansitowin32.py", line 195, in write_plain_text
    self.wrapped.write(text[start:end])
  File "c:\python35\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 17: character maps t
o <undefined>
@KiraYamatoSD KiraYamatoSD added the bug Something isn't working label Oct 22, 2019
@yudilee
Copy link
Contributor

yudilee commented Oct 23, 2019 via email

@dipu-bd
Copy link
Owner

dipu-bd commented Oct 23, 2019

I had a similar guess. I am reviewing all prints and logs in the next version to remove any unicode characters when on windows.

@dipu-bd dipu-bd self-assigned this Oct 26, 2019
@3dycosmo
Copy link

is fine in MacOS, I upgrade Pyhton and pip3
Screen Shot 2019-10-28 at 2 10 41 PM

@dipu-bd
Copy link
Owner

dipu-bd commented Oct 28, 2019

MacOS and Linux has utf-8 support on terminal. only the damn command prompt is different.

@M4n1us
Copy link

M4n1us commented Nov 2, 2019

Works in powershell, just have to specify the lncrawler.exe when added to path:
lncrawler.exe -lll --multi --all --ignore --add-source-url --suppress --format "epub" --source "https://babelnovel.com/books/the-school-s-omnipotent-useless-garbage"

image

@dipu-bd
Copy link
Owner

dipu-bd commented Nov 10, 2019

A similar error was reported here: tartley/colorama#219

I have used win_unicode_console.enable() before colorama.init() as suggested by @yudilee

@KiraYamatoSD can you check the version 2.16.2 if this issue is still there?

@dipu-bd dipu-bd reopened this Nov 10, 2019
@KiraYamatoSD
Copy link
Author

@dipu-bd Thanks for the fix.

However, there is a new error after testing out the update. There are some chapters that cannot be crawl but it is available at the source site.

Z:\SharedFolder\python_apps\lightnovel-crawler>lightnovel-crawler --multi --all --ignore --add-sourc
e-url --suppress --format "epub" --source "https://babelnovel.com/books/soldier-king-s-love-story-in
-the-city"
================================================================================
                            Lightnovel Crawler #2.16.2
                  https://github.com/dipu-bd/lightnovel-crawler
--------------------------------------------------------------------------------
 ! Input is suppressed
--------------------------------------------------------------------------------
Namespace(add_source_url=True, all=True, bot=None, chapters=None, extra={}, first=None, force=False,
 ignore=True, last=None, list_sources=False, log=None, login=None, multi=True, novel_page='https://b
abelnovel.com/books/soldier-king-s-love-story-in-the-city', output_formats=['epub'], output_path=Non
e, page=None, query=None, range=None, single=False, sources=False, suppress=True, volumes=None)

-> Press  Ctrl + C  to exit

CairoSVG was not loaded properly. SVG to PNG conversion will fail.
Retrieving novel info...
https://babelnovel.com/books/soldier-king-s-love-story-in-the-city
NOVEL: Soldier King's Love Story in the City
Downloading chapters |███████████████                 | 142/300Failed to download chapter body
Traceback (most recent call last):
  File "c:\python35\lib\site-packages\lncrawl\core\downloader.py", line 115, in download_chapter_bod
y
    body = app.crawler.download_chapter_body(chapter)
  File "c:\python35\lib\site-packages\lncrawl\spiders\babelnovel.py", line 143, in download_chapter_
body
    data = self.get_json(chapter['json_url'])
  File "c:\python35\lib\site-packages\lncrawl\utils\crawler.py", line 191, in get_json
    return response.json()
  File "c:\python35\lib\site-packages\requests\models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "c:\python35\lib\site-packages\simplejson\__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Body is empty: https://babelnovel.com/books/soldier-king-s-love-story-in-the-city/chapters/c144
Downloading chapters |███████████████▌                | 144/300Failed to download chapter body
Traceback (most recent call last):
  File "c:\python35\lib\site-packages\lncrawl\core\downloader.py", line 115, in download_chapter_bod
y
    body = app.crawler.download_chapter_body(chapter)
  File "c:\python35\lib\site-packages\lncrawl\spiders\babelnovel.py", line 143, in download_chapter_
body
    data = self.get_json(chapter['json_url'])
  File "c:\python35\lib\site-packages\lncrawl\utils\crawler.py", line 191, in get_json
    return response.json()
  File "c:\python35\lib\site-packages\requests\models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "c:\python35\lib\site-packages\simplejson\__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Body is empty: https://babelnovel.com/books/soldier-king-s-love-story-in-the-city/chapters/c146
Downloading chapters |███████████████▌                | 145/300Failed to download chapter body
Traceback (most recent call last):
  File "c:\python35\lib\site-packages\lncrawl\core\downloader.py", line 115, in download_chapter_bod
y
    body = app.crawler.download_chapter_body(chapter)
  File "c:\python35\lib\site-packages\lncrawl\spiders\babelnovel.py", line 143, in download_chapter_
body
    data = self.get_json(chapter['json_url'])
  File "c:\python35\lib\site-packages\lncrawl\utils\crawler.py", line 191, in get_json
    return response.json()
  File "c:\python35\lib\site-packages\requests\models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "c:\python35\lib\site-packages\simplejson\__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Body is empty: https://babelnovel.com/books/soldier-king-s-love-story-in-the-city/chapters/c145
Downloading chapters |███████████████▌                | 146/300Failed to download chapter body
Traceback (most recent call last):
  File "c:\python35\lib\site-packages\lncrawl\core\downloader.py", line 115, in download_chapter_bod
y
    body = app.crawler.download_chapter_body(chapter)
  File "c:\python35\lib\site-packages\lncrawl\spiders\babelnovel.py", line 143, in download_chapter_
body
    data = self.get_json(chapter['json_url'])
  File "c:\python35\lib\site-packages\lncrawl\utils\crawler.py", line 191, in get_json
    return response.json()
  File "c:\python35\lib\site-packages\requests\models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "c:\python35\lib\site-packages\simplejson\__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Body is empty: https://babelnovel.com/books/soldier-king-s-love-story-in-the-city/chapters/c147
Downloading chapters |████████████████                | 147/300Failed to download chapter body
Traceback (most recent call last):
  File "c:\python35\lib\site-packages\lncrawl\core\downloader.py", line 115, in download_chapter_bod
y
    body = app.crawler.download_chapter_body(chapter)
  File "c:\python35\lib\site-packages\lncrawl\spiders\babelnovel.py", line 143, in download_chapter_
body
    data = self.get_json(chapter['json_url'])
  File "c:\python35\lib\site-packages\lncrawl\utils\crawler.py", line 191, in get_json
    return response.json()
  File "c:\python35\lib\site-packages\requests\models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "c:\python35\lib\site-packages\simplejson\__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Body is empty: https://babelnovel.com/books/soldier-king-s-love-story-in-the-city/chapters/c149
Downloading chapters |████████████████                | 148/300Failed to download chapter body
Traceback (most recent call last):
  File "c:\python35\lib\site-packages\lncrawl\core\downloader.py", line 115, in download_chapter_bod
y
    body = app.crawler.download_chapter_body(chapter)
  File "c:\python35\lib\site-packages\lncrawl\spiders\babelnovel.py", line 143, in download_chapter_
body
    data = self.get_json(chapter['json_url'])
  File "c:\python35\lib\site-packages\lncrawl\utils\crawler.py", line 191, in get_json
    return response.json()
  File "c:\python35\lib\site-packages\requests\models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "c:\python35\lib\site-packages\simplejson\__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Body is empty: https://babelnovel.com/books/soldier-king-s-love-story-in-the-city/chapters/c148
Downloading chapters |████████████████                | 149/300Failed to download chapter body
Traceback (most recent call last):
  File "c:\python35\lib\site-packages\lncrawl\core\downloader.py", line 115, in download_chapter_bod
y
    body = app.crawler.download_chapter_body(chapter)
  File "c:\python35\lib\site-packages\lncrawl\spiders\babelnovel.py", line 143, in download_chapter_
body
    data = self.get_json(chapter['json_url'])
  File "c:\python35\lib\site-packages\lncrawl\utils\crawler.py", line 191, in get_json
    return response.json()
  File "c:\python35\lib\site-packages\requests\models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "c:\python35\lib\site-packages\simplejson\__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Body is empty: https://babelnovel.com/books/soldier-king-s-love-story-in-the-city/chapters/c150
Downloading chapters |████████████████                | 150/300Failed to download chapter body
Traceback (most recent call last):
  File "c:\python35\lib\site-packages\lncrawl\core\downloader.py", line 115, in download_chapter_bod
y
    body = app.crawler.download_chapter_body(chapter)
  File "c:\python35\lib\site-packages\lncrawl\spiders\babelnovel.py", line 143, in download_chapter_
body
    data = self.get_json(chapter['json_url'])
  File "c:\python35\lib\site-packages\lncrawl\utils\crawler.py", line 191, in get_json
    return response.json()
  File "c:\python35\lib\site-packages\requests\models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "c:\python35\lib\site-packages\simplejson\__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Body is empty: https://babelnovel.com/books/soldier-king-s-love-story-in-the-city/chapters/c151
Downloading chapters |████████████████                | 151/300Failed to download chapter body
Traceback (most recent call last):
  File "c:\python35\lib\site-packages\lncrawl\core\downloader.py", line 115, in download_chapter_bod
y
    body = app.crawler.download_chapter_body(chapter)
  File "c:\python35\lib\site-packages\lncrawl\spiders\babelnovel.py", line 143, in download_chapter_
body
    data = self.get_json(chapter['json_url'])
  File "c:\python35\lib\site-packages\lncrawl\utils\crawler.py", line 191, in get_json
    return response.json()
  File "c:\python35\lib\site-packages\requests\models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "c:\python35\lib\site-packages\simplejson\__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Body is empty: https://babelnovel.com/books/soldier-king-s-love-story-in-the-city/chapters/c152
Downloading chapters |████████████████                | 152/300Failed to download chapter body
Traceback (most recent call last):
  File "c:\python35\lib\site-packages\lncrawl\core\downloader.py", line 115, in download_chapter_bod
y
    body = app.crawler.download_chapter_body(chapter)
  File "c:\python35\lib\site-packages\lncrawl\spiders\babelnovel.py", line 143, in download_chapter_
body
    data = self.get_json(chapter['json_url'])
  File "c:\python35\lib\site-packages\lncrawl\utils\crawler.py", line 191, in get_json
    return response.json()
  File "c:\python35\lib\site-packages\requests\models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "c:\python35\lib\site-packages\simplejson\__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Body is empty: https://babelnovel.com/books/soldier-king-s-love-story-in-the-city/chapters/c153
Downloading chapters |████████████████                | 153/300Failed to download chapter body
Traceback (most recent call last):
  File "c:\python35\lib\site-packages\lncrawl\core\downloader.py", line 115, in download_chapter_bod
y
    body = app.crawler.download_chapter_body(chapter)
  File "c:\python35\lib\site-packages\lncrawl\spiders\babelnovel.py", line 143, in download_chapter_
body
    data = self.get_json(chapter['json_url'])
  File "c:\python35\lib\site-packages\lncrawl\utils\crawler.py", line 191, in get_json
    return response.json()
  File "c:\python35\lib\site-packages\requests\models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "c:\python35\lib\site-packages\simplejson\__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Body is empty: https://babelnovel.com/books/soldier-king-s-love-story-in-the-city/chapters/c154
Downloading chapters |████████████████▌               | 154/300Failed to download chapter body
Traceback (most recent call last):
  File "c:\python35\lib\site-packages\lncrawl\core\downloader.py", line 115, in download_chapter_bod
y
    body = app.crawler.download_chapter_body(chapter)
  File "c:\python35\lib\site-packages\lncrawl\spiders\babelnovel.py", line 143, in download_chapter_
body
    data = self.get_json(chapter['json_url'])
  File "c:\python35\lib\site-packages\lncrawl\utils\crawler.py", line 191, in get_json
    return response.json()
  File "c:\python35\lib\site-packages\requests\models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "c:\python35\lib\site-packages\simplejson\__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "c:\python35\lib\site-packages\simplejson\decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Body is empty: https://babelnovel.com/books/soldier-king-s-love-story-in-the-city/chapters/c155
Downloading chapters |████████████████▌               | 155/300

@dipu-bd dipu-bd removed their assignment Nov 14, 2019
@dipu-bd dipu-bd added the testing The issue is under testing label Nov 14, 2019
@dipu-bd
Copy link
Owner

dipu-bd commented Dec 5, 2019

However, there is a new error after testing out the update. There are some chapters that cannot be crawl but it is available at the source site.

It is babelnovel.com specific issue. We are working to fix it. Closing this since main issue is solved

@dipu-bd dipu-bd closed this as completed Dec 5, 2019
@dipu-bd dipu-bd removed the testing The issue is under testing label Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants