准备提交一个pr,添加蜜柑计划做为数据来源 #74

trim21 · 2017-07-06T14:02:39Z

昨天在添加恋爱禁止的世界的时候,实际抓回来的是捏造陷阱NTR.
最主要的是到现在也没有new game.bangumi.moe那边的数据准确性好像有点低.似乎是自动识别加tag的

蜜柑计划 http://mikanani.me/的数据准确度比较高,本身就做好了番剧和字幕组的区分.
准备提交个pr,加一个数据来源,也从那边抓数据过来.

没有new game看我要死了

The text was updated successfully, but these errors were encountered:

RicterZ · 2017-07-06T14:05:10Z

new game会有的，不过需要等等… 恋爱禁止世界回来的NTR怕不是字幕组发布的时候出现的锅，上游数据问题我也无法（ Trim21 <notifications@github.com>于2017年7月6日周四下午10:02写道：

昨天在添加恋爱禁止的世界的时候,实际抓回来的是捏造陷阱NTR. 最主要的是到现在也没有new game.bangumi.moe那边的数据准确性好像有点低.似乎是自动识别加tag的蜜柑计划 http://mikanani.me/ <http://mikanani.me>的数据准确度比较高,本身就做好了番剧和字幕组的区分. 准备提交个pr,加一个数据来源,也从那边抓数据过来. 没有new game看我要死了 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#74>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFCbx5zxBxoRHJHB8JbR-JX5H5B8KBf0ks5sLOj_gaJpZM4OPrgJ> .

trim21 · 2017-07-06T14:20:38Z

我昨天本来想提issue的，然后发现其实是上游数据的问题。所以准备自己动手添加数据源。这个issue主要是想问一下你是否介意，以及在完成之后是愿意合并。以及如果愿意的话有没有什么实现方法上介意的地方，比如介意添加依赖之类的（

RicterZ · 2017-07-06T14:24:33Z

暂时没有换数据源的想法，其实就是改fetch.py，其他都基本不动… 我想想要不要加一个接口可以自己实现解析数据源这样，可以比较容易的扩展切换。上游数据问题我也没啥解决办法，只能求各位字幕组大爷别出错，然后默默加一个 filter… 如果你要想添加的话可以起一个 fetch_xxx.py，默认不启用，可以手动切换（mv 到 fetch.py），接口遵循好。感觉坑也是多，可能数据结构会改变所以数据库也有相应变化。 Trim21 <notifications@github.com>于2017年7月6日周四下午10:20写道：

…

我昨天本来想提issue的，然后发现其实是上游数据的问题。所以准备自己动手添加数据源。这个issue主要是想问一下你是否介意，以及在完成之后是愿意合并。以及如果愿意的话有没有什么实现方法上介意的地方，比如介意添加依赖之类的（ — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#74 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFCbxxRu9eLAP03Zq2BcZ4wlwEJsfpiTks5sLO03gaJpZM4OPrgJ> .

w3eee · 2017-07-08T03:56:03Z

搭车提个疑问订阅的是怎么把番组和对应的种子文件对应起来的仅仅是名称的比对么?
但是有一些种子的命名不规范怎么办

RicterZ · 2017-07-08T05:53:11Z

有 parser 的 Wee <notifications@github.com>于2017年7月8日周六上午11:56写道：

…

搭车提个疑问订阅的是怎么把番组和对应的种子文件对应起来的仅仅是名称的比对么? 但是有一些种子的命名不规范怎么办 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#74 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFCbx_RqSOQIEOm9RQWuJmUZSq_yn0iuks5sLv3TgaJpZM4OPrgJ> .

trim21 · 2017-08-23T22:05:03Z

重写了fetch.py
把从数据源获取数据抽象成了三个方法

class BangumiMoe(BaseWebsite):
    cover_url=''
    def search_by_keyword(self, keyword, count):
        return []

    def fetch_bangumi_calendar_and_subtitle_group(self):
        return [], []

    def fetch_episode_of_bangumi(self, bangumi_id, subtitle_list=None, max_page=MAX_PAGE):
        return []

如果要修改数据源的话重写这三个方法就可以了..
使用过程中不能更换数据源.

改动有些大,感觉好像跟script.py的作用部分重叠了....

RicterZ · 2017-08-25T15:25:51Z

emm，bgmi script 我打算添加一个自定义 model 的功能，还在构思。
目前的想法是你的蜜柑可以作为一个 api，script 可以传入参数调用就能获取结果这种就很方便了..

from xx import get_bangumi
class Script(xx):
    ...
    def get_bangumi_data(x):
         return get_bangumi(x)

之类的..

trim21 · 2017-08-25T16:55:29Z

之前改改改把fetch.py 最后改成了这样..好像跟你的想法差不多?
在配置项里加入了WEBSITE_NAME 默认为bangumi_moe

# coding=utf-8
from __future__ import print_function, unicode_literals

from bgmi.config import WEBSITE_NAME

from bgmi.website.bangumimoe import BangumiMoe
from bgmi.website.mikan import Mikanani

if WEBSITE_NAME == 'mikan_project':
    website = Mikanani()
else:
    website = BangumiMoe()

trim21 · 2017-08-25T17:28:35Z

bangumimoe.py
现在要添加一个数据源只需要从 bgmi.website.base 引入BaseWebsite,然后实现三个方法
filter,存储数据之类的都放在了BaseWebsite里面
在main.py里面添加了几行代码,在第一次启动时选择数据源...

from bgmi.website.base import BaseWebsite


class BangumiMoe(BaseWebsite):
    cover_url = COVER_URL

    def search_by_keyword(self, keyword, count):
        """
        return a list of dict with at least 4 key: download, name, title, episode
        example:
        ```
            [
                {
                    'name':"路人女主的养成方法",
                    'download': 'magnet:?xt=urn:btih:what ever',
                    'title': "[澄空学园] 路人女主的养成方法 第12话 MP4 720p  完",
                    'episode': 12
                },
            ]
        ```
        :param keyword: search key word
        :type keyword: str
        :param count: how many page to fetch from website
        :type count: int
        :return: list of episode search result
        :rtype: list[dict]
        """
        return []

    def fetch_episode_of_bangumi(self, bangumi_id, subtitle_list=None, max_page=MAX_PAGE):
        """
        get all episode by bangumi id
        example
        ```
            [
                {
                    "download": "magnet:?xt=urn:btih:e43b3b6b53dd9fd6af1199e112d3c7ff15cab82c",
                    "name": "来自深渊",
                    "subtitle_group": "58a9c1c9f5dc363606ab42ec",
                    "title": "【喵萌奶茶屋】★七月新番★[来自深渊/Made in Abyss][07][GB][720P]",
                    "episode": 0,
                    "time": 1503301292
                },
            ]
        ```
        :param bangumi_id: bangumi_id
        :param subtitle_list: list of subtitle group
        :type subtitle_list: list
        :param max_page: how many page you want to crawl if there is no subtitle list
        :type max_page: int
        :return: list of bangumi
        :rtype: list[dict]
        """
        return []

    def fetch_bangumi_calendar_and_subtitle_group(self):
        """
        return a list of all bangumi and a list of all subtitle group

        bangumi dict:
        update time should be one of ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
        example:
        ```
            [
                {
                    "status": 0,
                    "subtitle_group": [
                        "123",
                        "456"
                    ],
                    "name": "名侦探柯南",
                    "keyword": "1234", #bangumi id
                    "update_time": "Sat",
                    "cover": "data/images/cover1.jpg"
                },
            ]
        ```

        subtitle group dict:
        example:
        ```
            [
                {
                    'id': '233',
                    'name': 'bgmi字幕组'
                }
            ]
        ```


        :return: list of bangumi, list of subtitile group
        :rtype: (list[dict], list[dict])
        """

        return [], []

RicterZ · 2017-08-26T01:44:46Z

seems good Trim21 <notifications@github.com>于2017年8月26日周六上午1:28写道：

…

bangumimoe.py 现在要添加一个数据源只需要从bgmi.website.base 引入BaseWebsite,然后实现三个方法 filter之类的放在BaseWebsite里面 from bgmi.website.base import BaseWebsite class BangumiMoe(BaseWebsite): cover_url = COVER_URL def search_by_keyword(self, keyword, count): """ return a list of dict with at least 4 key: download, name, title, episode example: ``` [ { 'name':"路人女主的养成方法", 'download': 'magnet:?xt=urn:btih:what ever', 'title': "[澄空学园] 路人女主的养成方法第12话 MP4 720p 完", 'episode': 12 }, ] ``` :param keyword: search key word :type keyword: str :param count: how many page to fetch from website :type count: int :return: list of episode search result :rtype: list[dict] """ return [] def fetch_episode_of_bangumi(self, bangumi_id, subtitle_list=None, max_page=MAX_PAGE): """ get all episode by bangumi id example ``` [ { "download": "magnet:?xt=urn:btih:e43b3b6b53dd9fd6af1199e112d3c7ff15cab82c", "name": "来自深渊", "subtitle_group": "58a9c1c9f5dc363606ab42ec", "title": "【喵萌奶茶屋】★七月新番★[来自深渊/Made in Abyss][07][GB][720P]", "episode": 0, "time": 1503301292 }, ] ``` :param bangumi_id: bangumi_id :param subtitle_list: list of subtitle group :type subtitle_list: list :param max_page: how many page you want to crawl if there is no subtitle list :type max_page: int :return: list of bangumi :rtype: list[dict] """ return [] def fetch_bangumi_calendar_and_subtitle_group(self): """ return a list of all bangumi update time should be one of ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'] example: ``` [ { "status": 0, "subtitle_group": [ "123", "456" ], "name": "名侦探柯南", "keyword": "1234", #bangumi id "update_time": "Sat", "cover": "data/images/cover1.jpg" }, ] ``` :return: list of bangumi :rtype: list[dict] """ return [], [] — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#74 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFCbxy_84L3OwJv8yNWD2Tn0i1XQ4UC5ks5sbwRDgaJpZM4OPrgJ> .

RicterZ · 2017-08-28T15:13:42Z

README 加一下 datasource 的配置?

trim21 · 2017-08-28T16:29:49Z

我在readme加过了...

Additional config

DATA_SOURCE: data source now support bangumi_moe`(default) and :code:`mikan_project

trim21 · 2017-08-28T16:32:21Z

刚发现parse_episode出bug了..修复中..

RicterZ · 2017-08-28T16:33:07Z

不慌 Trim21 <notifications@github.com>于2017年8月29日周二上午12:32写道：

…

刚发现parse_episode出bug了..修复中.. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#74 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFCbx5D-CM_InH9oijzm4aY7c1xqIXIcks5scuuVgaJpZM4OPrgJ> .

fix #74

trim21 changed the title ~~考虑提交一个pr,添加蜜柑计划做为数据来源~~ 准备提交一个pr,添加蜜柑计划做为数据来源 Jul 6, 2017

trim21 closed this as completed Aug 19, 2017

trim21 reopened this Aug 23, 2017

trim21 mentioned this issue Aug 26, 2017

add data source option mikanani.me #82

Merged

trim21 mentioned this issue Aug 28, 2017

bug fix for parse_episode() #83

Merged

trim21 closed this as completed Aug 28, 2017

RicterZ pushed a commit that referenced this issue Jan 29, 2018

add webm

fe802fe

fix #74

RicterZ pushed a commit that referenced this issue Jan 29, 2018

add webm

50e5995

fix #74

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

准备提交一个pr,添加蜜柑计划做为数据来源 #74

准备提交一个pr,添加蜜柑计划做为数据来源 #74

trim21 commented Jul 6, 2017

RicterZ commented Jul 6, 2017 via email

trim21 commented Jul 6, 2017

RicterZ commented Jul 6, 2017 via email

w3eee commented Jul 8, 2017

RicterZ commented Jul 8, 2017 via email

trim21 commented Aug 23, 2017

RicterZ commented Aug 25, 2017

trim21 commented Aug 25, 2017 •

edited

Loading

trim21 commented Aug 25, 2017 •

edited

Loading

RicterZ commented Aug 26, 2017 via email

RicterZ commented Aug 28, 2017

trim21 commented Aug 28, 2017

trim21 commented Aug 28, 2017

RicterZ commented Aug 28, 2017 via email

准备提交一个pr,添加蜜柑计划做为数据来源 #74

准备提交一个pr,添加蜜柑计划做为数据来源 #74

Comments

trim21 commented Jul 6, 2017

RicterZ commented Jul 6, 2017 via email

trim21 commented Jul 6, 2017

RicterZ commented Jul 6, 2017 via email

w3eee commented Jul 8, 2017

RicterZ commented Jul 8, 2017 via email

trim21 commented Aug 23, 2017

RicterZ commented Aug 25, 2017

trim21 commented Aug 25, 2017 • edited Loading

trim21 commented Aug 25, 2017 • edited Loading

RicterZ commented Aug 26, 2017 via email

RicterZ commented Aug 28, 2017

trim21 commented Aug 28, 2017

trim21 commented Aug 28, 2017

RicterZ commented Aug 28, 2017 via email

trim21 commented Aug 25, 2017 •

edited

Loading

trim21 commented Aug 25, 2017 •

edited

Loading