Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server部署报错 #25

Closed
Nesxc opened this issue Mar 18, 2022 · 4 comments
Closed

Server部署报错 #25

Nesxc opened this issue Mar 18, 2022 · 4 comments

Comments

@Nesxc
Copy link

Nesxc commented Mar 18, 2022

ubuntu 20.04.3 LTS x64
Python 3.8.10
MySQL 5.7.34

报错显示:

[parameters: {'title': '2022高考加油💪', 'created': '2022-02-26', 'updated': '2022-02-26', 'link': 'https://www.jipa.work/2022gk/', 'author': 'JIPA233の小窝', 'avatar': 'https://img.cdn.nesxc.com/2022/03/1647358231690-20220315233030.webp', 'rule': 'rss20', 'createAt': datetime.datetime(2022, 3, 19, 9, 58, 14, 965015)}]
(Background on this error at: https://sqlalche.me/e/14/9h9h) (Background on this error at: https://sqlalche.me/e/14/7s2a)
2022-03-19 02:06:14 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://hesifan.top/atom.xml> (failed 3 times): User timeout caused connection failure: Getting https://hesifan.top/atom.xml took longer than 15.0 seconds..
2022-03-19 02:06:18 [scrapy.core.scraper] ERROR: Error processing {'author': "Haobo's Blog", 'avatar': 'https://img.cdn.nesxc.com/2022/02/202202052207248webp', 'rule': 'atom10', 'title': '【数学】到底什么是信息论 施工中~', 'created': '2022-02-25', 'updated': '2022-02-25', 'link': 'https://discover304.top/2022/02/25/2022q1/144-information-theory/'}
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py", line 858, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/usr/local/lib/python3.8/dist-packages/scrapy/utils/defer.py", line 150, in f
    return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
  File "/home/nserver/circle-of-friends/hexo-circle-of-friends/hexo_circle_of_friends/pipelines/sql_pipe.py", line 73, in process_item
    self.friendpoor_push(item)
  File "/home/nserver/circle-of-friends/hexo-circle-of-friends/hexo_circle_of_friends/pipelines/sql_pipe.py", line 153, in friendpoor_push
    self.session.commit()
  File "<string>", line 2, in commit
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 1431, in commit
    self._transaction.commit(_to_root=self.future)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 827, in commit
    self._assert_active(prepared_ok=True)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
    raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (pymysql.err.DataError) (1366, "Incorrect string value: '\\xF0\\x9F\\x92\\xAA' for column 'title' at row 1")
[SQL: INSERT INTO posts (title, created, updated, link, author, avatar, rule, `createAt`) VALUES (%(title)s, %(created)s, %(updated)s, %(link)s, %(author)s, %(avatar)s, %(rule)s, %(createAt)s)]
[parameters: {'title': '2022高考加油💪', 'created': '2022-02-26', 'updated': '2022-02-26', 'link': 'https://www.jipa.work/2022gk/', 'author': 'JIPA233の小窝', 'avatar': 'https://img.cdn.nesxc.com/2022/03/1647358231690-20220315233030.webp', 'rule': 'rss20', 'createAt': datetime.datetime(2022, 3, 19, 9, 58, 14, 965015)}]
(Background on this error at: https://sqlalche.me/e/14/9h9h) (Background on this error at: https://sqlalche.me/e/14/7s2a)
2022-03-19 02:06:18 [scrapy.core.engine] ERROR: Scraper close failure
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py", line 858, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/home/nserver/circle-of-friends/hexo-circle-of-friends/hexo_circle_of_friends/pipelines/sql_pipe.py", line 81, in close_spider
    self.friendlist_push()
  File "/home/nserver/circle-of-friends/hexo-circle-of-friends/hexo_circle_of_friends/pipelines/sql_pipe.py", line 140, in friendlist_push
    self.session.commit()
  File "<string>", line 2, in commit
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 1431, in commit
    self._transaction.commit(_to_root=self.future)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 827, in commit
    self._assert_active(prepared_ok=True)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
    raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (pymysql.err.DataError) (1366, "Incorrect string value: '\\xF0\\x9F\\x92\\xAA' for column 'title' at row 1")
[SQL: INSERT INTO posts (title, created, updated, link, author, avatar, rule, `createAt`) VALUES (%(title)s, %(created)s, %(updated)s, %(link)s, %(author)s, %(avatar)s, %(rule)s, %(createAt)s)]
[parameters: {'title': '2022高考加油💪', 'created': '2022-02-26', 'updated': '2022-02-26', 'link': 'https://www.jipa.work/2022gk/', 'author': 'JIPA233の小窝', 'avatar': 'https://img.cdn.nesxc.com/2022/03/1647358231690-20220315233030.webp', 'rule': 'rss20', 'createAt': datetime.datetime(2022, 3, 19, 9, 58, 14, 965015)}]
(Background on this error at: https://sqlalche.me/e/14/9h9h) (Background on this error at: https://sqlalche.me/e/14/7s2a)
@Nesxc
Copy link
Author

Nesxc commented Mar 18, 2022

settings.py:

################################请修改以下内容################################
# 友链页地址
# 参数说明:
# link:必填,在这里填写你的友链页面地址
# theme:必填,友链页的获取策略。需要指定该页面的主题,可选参数如下(这些是目前支持的主题):
#   - common: 通用主题,请参考:https://fcircle-doc.js.cool/#/developmentdoc?id=友链页适配
#   - butterfly:butterfly主题
#   - fluid:fluid主题
#   - matery:matery主题
#   - nexmoe:nexmoe主题
#   - stun:stun主题
#   - sakura: sakura主题
#   - volantis:volantis主题
#   - Yun:Yun主题
#   - stellar:stellar主题
# 支持配置多个友链页面并指定不同主题策略,每个用{}分隔,它们会被同时爬取,数据保存在一起。***至少配置一个***
LINK = [
    {
        "link": "https://www.nesxc.com/link/",  # 友链页地址1,修改为你的友链页地址
        "theme": "common"
    },
    #     {
    #     "link": "https://noionion.top/link/",  # 友链页地址2
    #     "theme": "butterfly",  # 友链页的获取策略
    # },
    #     {
    #     "link": "https://immmmm.com/about/",  # 友链页地址3
    #     "theme": "common",  # 友链页的获取策略
    # }

]

# 配置项友链
# enable:# 是否启用配置项友链 True/False(针对还未适配主题或者有定制需求的用户)
# json_api:通过api获取配置项友链,返回格式必须为:{"friends":[[友链1],[友链2],[友链3],[友链4]....]},友链内容同list字段格式
# list字段填写格式:["name", "link", "avatar","suffix"],其中:
#       name:必填,友链的名字
#       link:必填,友链主页地址
#       avatar:必填,头像地址
#       suffix:选填,自定义订阅后缀,主要针对不规范的网站订阅后缀,见示例2
SETTINGS_FRIENDS_LINKS = {
    "enable": True,
    "json_api": "",
    "list": [
        ["小N同学", "https://www.nesxc.com", "https://img.cdn.nesxc.com/upload/wordpress/f3ccdd27d200-1.jpg"],
        ["2BROEAR", "https://blog.2broear.com", "https://img.cdn.nesxc.com/2022/01/202201302330071.png"],
        ["Adil", "https://blog.adil.com.cn", "https://img.cdn.nesxc.com/2022/02/202202052139016webp"],
        ["Akilarの糖果屋", "https://akilar.top", "https://img.cdn.nesxc.com/2022/01/202201302317352.png"],
        ["Android", "https://android99.me", "https://img.cdn.nesxc.com/2022/02/202202052128963webp"],
        ["CC的部落格", "https://blog.ccknbc.cc", "https://img.cdn.nesxc.com/2022/01/202201302337383.png"],
        ["Codeanime", "https://codeanime.cc", "https://img.cdn.nesxc.com/2022/02/202202052157089webp"],
        ["Dragon犬’s blog", "https://blog.furrysp.top", "https://img.cdn.nesxc.com/2022/02/202202052148062webp"],
        ["Dreamy.Xiam'say Blog", "https://blog.dreamyxiam.xyz", "https://img.cdn.nesxc.com/2022/02/202202060712146webp"],
        ["Ethan.Tzy", "https://tzy1997.com", "https://img.cdn.nesxc.com/2022/02/202202232028515webp"],
        ["ethanyi", "https://ethanyi9.gitee.io", "https://img.cdn.nesxc.com/2022/02/202202052140634webp"],
        ["Haobo's Blog", "https://discover304.top", "https://img.cdn.nesxc.com/2022/02/202202052207248webp"],
        ["Heo", "https://blog.zhheo.com", "https://img.cdn.nesxc.com/2022/03/1646950385285-20220311061303.webp"],
        ["Heyiki’Bolg", "https://heyiki.top", "https://img.cdn.nesxc.com/2022/02/202202052142686webp"],
        ["iMaeGoo’s Blog", "https://imaegoo.com", "https://img.cdn.nesxc.com/2022/02/202202052052603.png"],
        ["Internet Bug's blog", "https://myhosts.site", "https://img.cdn.nesxc.com/2022/02/202202141030024webp"],
        ["itsNekoDeng", "https://dyfa.top", "https://img.cdn.nesxc.com/2022/01/202201302333212.png"],
        ["Jasonの小窝", "https://blog.catrol.cn", "https://img.cdn.nesxc.com/2022/02/202202052150175webp"],
        ["Lete乐特", "https://blog.lete114.top/", "https://img.cdn.nesxc.com/2022/01/202201302314242.png"],
        ["OY", "https://oy6090.top", "https://img.cdn.nesxc.com/2022/02/202202052135983webp"],
        ["PT的小破站", "https://sqdpt.top", "https://img.cdn.nesxc.com/2022/02/202202052154989webp"],
        ["Qingxu", "https://blog.linioi.com", "https://img.cdn.nesxc.com/2022/02/202202052132797webp"],
        ["Revincx", "https://blog.revincx.icu", "https://img.cdn.nesxc.com/2022/02/202202061028615webp"],
        ["Sady'Blog", "https://sady0.com", "https://img.cdn.nesxc.com/2022/03/1646832464083-20220309212742.webp"],
        ["Seeker", "https://snow.js.org", "https://img.cdn.nesxc.com/2022/02/202202052117350webp"],
        ["starsのblog", "https://blog.cnortles.top", "https://img.cdn.nesxc.com/2022/02/202202231433081webp"],
        ["Throwable", "https://throwx.cn", "https://img.cdn.nesxc.com/2022/02/202202052201827webp"],
        ["WUMOER", "https://wumoer.com", "https://img.cdn.nesxc.com/2022/02/202202052127764webp"],
        ["wxydejoy", "https://c.undf.top", "https://img.cdn.nesxc.com/2022/01/202201302337548.png"],
        ["Xc's Blog", "https://6ing.xyz", "https://img.cdn.nesxc.com/2022/02/202202052203103webp"],
        ["Zane Liu", "https://	lza59.com", "https://img.cdn.nesxc.com/2022/02/202202052208104webp"],
        ["ZHIHUIのBLONG", "https://hinuohui.com", "https://img.cdn.nesxc.com/2022/02/202202120506100webp"],
        ["凡尘纪", "https://hesifan.top", "https://img.cdn.nesxc.com/2022/02/202202052053092.png"],
        ["十玖八柒", "https://ahzoo.cn", "https://img.cdn.nesxc.com/2022/02/202202231434355webp"],
        ["卓越科技的Blog", "https://zykj.js.org", "https://img.cdn.nesxc.com/2022/02/202202052133081webp"],
        ["呆逼の博客", "https://blog.keepdai.cn", "https://img.cdn.nesxc.com/2022/02/202202052155526webp"],
        ["哀殿first", "https://aidianfirst.top", "https://img.cdn.nesxc.com/2022/02/202202052145343webp"],
        ["墨初博客", "https://mochu.co", "https://img.cdn.nesxc.com/2022/02/202202140355129webp"],
        ["天昕", "https://sutianxin.top", "https://img.cdn.nesxc.com/2022/02/202202052136493webp"],
        ["小冰博客", "https://zfe.space", "https://img.cdn.nesxc.com/2022/01/202201302318766.png"],
        ["小嘉的部落格", "https://blog.imzjw.cn", "https://img.cdn.nesxc.com/2022/02/202202052131354webp"],
        ["小孙同学", "https://sunguoqi.com", "https://img.cdn.nesxc.com/2022/02/202202052201529webp"],
        ["小康博客", "https://antmoe.com", "https://img.cdn.nesxc.com/2022/01/202201311858656.png"],
        ["小胖墩er", "https://chubbyduner.top", "https://img.cdn.nesxc.com/2022/02/202202052202482webp"],
        ["小飞博客", "https://xffjs.com", "https://img.cdn.nesxc.com/2022/02/202202251701039webp"],
        ["常青园晚", "https://blog.catrol.cn", "https://img.cdn.nesxc.com/2022/02/202202052150596webp"],
        ["御网尚书", "https://hack-gov.com.cn", "https://img.cdn.nesxc.com/2022/01/202201302324538.png"],
        ["忽然笔记", "https://blog.huran.xyz", "https://img.cdn.nesxc.com/2022/02/202202241406955webp"],
        ["林木木", "https://immmmm.com", "https://img.cdn.nesxc.com/2022/02/202202061022473webp"],
        ["流浪银河", "https://zero-pointer.com", "https://img.cdn.nesxc.com/2022/02/202202052203328webp"],
        ["灰鸿的空间", "https://space.greyh.cn", "https://img.cdn.nesxc.com/2022/02/202202052149194webp"],
        ["皮皮凛の小窝", "https://owomoe.net", "https://img.cdn.nesxc.com/2022/02/202202052125775webp"],
        ["笑笑的博客", "https://xiaoxiao-love.gitee.io", "https://img.cdn.nesxc.com/2022/02/202202052144600webp"],
        ["花猪のBlog", "https://cnhuazhu.top", "https://img.cdn.nesxc.com/2022/02/202202052136912webp"],
        ["葱苓的小窝", "https://www.itciraos.cn", "https://img.cdn.nesxc.com/2022/02/202202231617872webp"],
        ["虫不知喔", "https://blog.ssykawa.com", "https://img.cdn.nesxc.com/2022/03/202203071216447webp"],
        ["超逸の技术博客", "https://yangchaoyi.vip", "https://img.cdn.nesxc.com/2022/01/202201302335743.png"],
        ["陈YF的博客", "https://blog.cyfan.top", "https://img.cdn.nesxc.com/2022/02/202202061014976webp"],
        ["飞鸟", "https://lzxjack.top", "https://img.cdn.nesxc.com/2022/02/202202052142630web"],
        ["FiveFireX的博客", "https://fivefirex.github.io/", "https://img.cdn.nesxc.com/2022/03/1647358146995-20220315232905.webp"],
        ["JIPA233の小窝", "https://www.jipa.work", "https://img.cdn.nesxc.com/2022/03/1647358231690-20220315233030.webp"],
        ["赤蓝紫", "https://clz.vercel.app/", "https://img.cdn.nesxc.com/2022/03/1647358340804-20220315233219.webp"],
        ["LanYunのBlog", "https://lanyundev.vercel.app/", "https://img.cdn.nesxc.com/2022/03/1647358382860-20220315233301.webp"],
    ]
}

# get links from gitee
# 从gitee issue中获取友链
GITEE_FRIENDS_LINKS = {
    "enable": False,  # True 开启gitee issue兼容
    "type": "normal",  # volantis/stellar用户请在这里填写volantis
    "owner": "ccknbc",  # 填写你的gitee用户名
    "repo": "blogroll",  # 填写你的gitee仓库名
    "state": "open"  # 填写抓取的issue状态(open/closed)
}

# get links from github
# 从github issue中获取友链
GITHUB_FRIENDS_LINKS = {
    "enable": False,  # True 开启github issue兼容
    "type": "normal",  # volantis/stellar用户请在这里填写volantis
    "owner": "ccknbc",  # 填写你的github用户名
    "repo": "ccknbc-actions",  # 填写你的github仓库名
    "state": "open"  # 填写抓取的issue状态(open/closed)
}

# block site list
# 添加屏蔽站点
BLOCK_SITE = [
    # "https://example.com/",
    # "https://example.com/",
]

# 启用HTTP代理,此项设为True,并且需要添加一个环境变量,名称为PROXY,值为[IP]:[端口],比如:192.168.1.106:8080
HTTP_PROXY = False

# 过期文章清除(天)
OUTDATE_CLEAN = 60

# 存储方式,可选项:leancloud,mysql,sqlite,mongodb;默认为leancloud
DATABASE = "mysql"

# 部署方式,可选项:github,server,docker;默认为github
DEPLOY_TYPE = "server"

################################请修改以上内容################################:


##############################除非您了解本项目,否则请勿修改以下内容################################

VERSION = "4.3.1"

# debug
# debug模式
DEBUG = False

# lc
# debug模式使用

#LC_APPID = "MTXYmy79JiLLO9VafgeAn8A-MdYXbMMI"
#LC_APPKEY = "08N7lfcelf7Lkpy7Wp9amsiA"

# proxy
# HTTP_PROXY_URL = "192.168.1.106:10809"
HTTP_PROXY_URL = ""

# debug blog link url
# debug模式使用

# https://yun.yunyoujun.cn/demo/ , Yun
# FRIENDPAGE_LINK = [
#     "https://www.yyyzyyyz.cn/link/",  # butterfly
#     "https://akilar.top/link/",  # butterfly
#     "https://www.zyoushuo.cn/friends/",  # volantis
# ]
#FRIENDPAGE_LINK = ["https://www.yyyzyyyz.cn/link/"]

BOT_NAME = 'hexo_circle_of_friends'
LOG_LEVEL = "ERROR"
SPIDER_MODULES = ['hexo_circle_of_friends.spiders']
NEWSPIDER_MODULE = 'hexo_circle_of_friends.spiders'
USER_AGENT_LIST = [
    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",
    "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
    "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
    "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)",
    "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)",
    "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)",
    "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)",
    "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6",
    "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1",
    "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0",
    "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5",
    "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20",
    "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
    "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)",
    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER",
    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
    "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)",
    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)",
    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
    "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",
    "Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5",
    "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre",
    "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
    "Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10",
]
ROBOTSTXT_OBEY = False
CONCURRENT_REQUESTS = 128
DOWNLOAD_TIMEOUT = 15
COOKIES_ENABLED = False
DOWNLOADER_MIDDLEWARES = {
    # 'hexo_circle_of_friends.middlewares.HexoCircleOfFriendsDownloaderMiddleware': 543,
    'hexo_circle_of_friends.middlewares.RandomUserAgentMiddleware': 400,
    'hexo_circle_of_friends.middlewares.BlockSiteMiddleware': 300,
    'hexo_circle_of_friends.middlewares.ProxyMiddleware': 299,

}

ITEM_PIPELINES = {
    'hexo_circle_of_friends.pipelines.pipelines.DuplicatesPipeline': 200,
}

RETRY_ENABLED = True

@Nesxc
Copy link
Author

Nesxc commented Mar 18, 2022

完整的crawler.log
https://pan.nesxc.com/s/GLua
使用的项目文件
https://pan.nesxc.com/s/e0tX

@hiltay
Copy link
Contributor

hiltay commented Mar 19, 2022

ubuntu 20.04.3 LTS x64 Python 3.8.10 MySQL 5.7.34

报错显示:

[parameters: {'title': '2022高考加油💪', 'created': '2022-02-26', 'updated': '2022-02-26', 'link': 'https://www.jipa.work/2022gk/', 'author': 'JIPA233の小窝', 'avatar': 'https://img.cdn.nesxc.com/2022/03/1647358231690-20220315233030.webp', 'rule': 'rss20', 'createAt': datetime.datetime(2022, 3, 19, 9, 58, 14, 965015)}]
(Background on this error at: https://sqlalche.me/e/14/9h9h) (Background on this error at: https://sqlalche.me/e/14/7s2a)
2022-03-19 02:06:14 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://hesifan.top/atom.xml> (failed 3 times): User timeout caused connection failure: Getting https://hesifan.top/atom.xml took longer than 15.0 seconds..
2022-03-19 02:06:18 [scrapy.core.scraper] ERROR: Error processing {'author': "Haobo's Blog", 'avatar': 'https://img.cdn.nesxc.com/2022/02/202202052207248webp', 'rule': 'atom10', 'title': '【数学】到底什么是信息论 施工中~', 'created': '2022-02-25', 'updated': '2022-02-25', 'link': 'https://discover304.top/2022/02/25/2022q1/144-information-theory/'}
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py", line 858, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/usr/local/lib/python3.8/dist-packages/scrapy/utils/defer.py", line 150, in f
    return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
  File "/home/nserver/circle-of-friends/hexo-circle-of-friends/hexo_circle_of_friends/pipelines/sql_pipe.py", line 73, in process_item
    self.friendpoor_push(item)
  File "/home/nserver/circle-of-friends/hexo-circle-of-friends/hexo_circle_of_friends/pipelines/sql_pipe.py", line 153, in friendpoor_push
    self.session.commit()
  File "<string>", line 2, in commit
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 1431, in commit
    self._transaction.commit(_to_root=self.future)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 827, in commit
    self._assert_active(prepared_ok=True)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
    raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (pymysql.err.DataError) (1366, "Incorrect string value: '\\xF0\\x9F\\x92\\xAA' for column 'title' at row 1")
[SQL: INSERT INTO posts (title, created, updated, link, author, avatar, rule, `createAt`) VALUES (%(title)s, %(created)s, %(updated)s, %(link)s, %(author)s, %(avatar)s, %(rule)s, %(createAt)s)]
[parameters: {'title': '2022高考加油💪', 'created': '2022-02-26', 'updated': '2022-02-26', 'link': 'https://www.jipa.work/2022gk/', 'author': 'JIPA233の小窝', 'avatar': 'https://img.cdn.nesxc.com/2022/03/1647358231690-20220315233030.webp', 'rule': 'rss20', 'createAt': datetime.datetime(2022, 3, 19, 9, 58, 14, 965015)}]
(Background on this error at: https://sqlalche.me/e/14/9h9h) (Background on this error at: https://sqlalche.me/e/14/7s2a)
2022-03-19 02:06:18 [scrapy.core.engine] ERROR: Scraper close failure
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py", line 858, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/home/nserver/circle-of-friends/hexo-circle-of-friends/hexo_circle_of_friends/pipelines/sql_pipe.py", line 81, in close_spider
    self.friendlist_push()
  File "/home/nserver/circle-of-friends/hexo-circle-of-friends/hexo_circle_of_friends/pipelines/sql_pipe.py", line 140, in friendlist_push
    self.session.commit()
  File "<string>", line 2, in commit
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 1431, in commit
    self._transaction.commit(_to_root=self.future)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 827, in commit
    self._assert_active(prepared_ok=True)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
    raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (pymysql.err.DataError) (1366, "Incorrect string value: '\\xF0\\x9F\\x92\\xAA' for column 'title' at row 1")
[SQL: INSERT INTO posts (title, created, updated, link, author, avatar, rule, `createAt`) VALUES (%(title)s, %(created)s, %(updated)s, %(link)s, %(author)s, %(avatar)s, %(rule)s, %(createAt)s)]
[parameters: {'title': '2022高考加油💪', 'created': '2022-02-26', 'updated': '2022-02-26', 'link': 'https://www.jipa.work/2022gk/', 'author': 'JIPA233の小窝', 'avatar': 'https://img.cdn.nesxc.com/2022/03/1647358231690-20220315233030.webp', 'rule': 'rss20', 'createAt': datetime.datetime(2022, 3, 19, 9, 58, 14, 965015)}]
(Background on this error at: https://sqlalche.me/e/14/9h9h) (Background on this error at: https://sqlalche.me/e/14/7s2a)

数据库编码问题,由于title中出现了'💪'emoji表情,数据库改用uft8mb4字符集再次运行即可。

@hiltay
Copy link
Contributor

hiltay commented Mar 19, 2022

已经在文档中进行说明~

@hiltay hiltay closed this as completed Mar 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants