Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cardigann] Add infohash feature for download block #12258

Merged
merged 3 commits into from
Sep 6, 2021

Conversation

mikeoscar2006
Copy link
Collaborator

Intends to provide a way to fix #11585 and #11389.
Following is the infohash block which is added to the download block which uses 2 selectors, one to get the hash and one to get the title of the torrent which are needed to generate the magnet URL.
As mentioned in the issues, since I was unable to directly test on the aforementioned sites, I modified the torrentv.yml definition to demonstrate the working of infohash block which is published as a gist.

The infohash block looks like this which is an excerpt directly taken from the modified definition:

download:
  infohash:
    hash:
      # Since the magnet link was already present, only the infohash has been extracted for demonstration
      selector: a[href^="magnet:?xt="]
      attribute: href
      filters:
        - name: querystring
          args: xt
        - name: replace
          args: ["urn:btih:", ""]
    title:
      selector: meta[property="og:title"]
      attribute: content
      filters:
        - name: trim

The only concern of mine is with the code where a lot of duplication has risen and I can understand if it doesn't meet the quality of the project. However, if it is showing good results with other trackers, we can definitely improve it and add documentation.

@garfield69
Copy link
Contributor

garfield69 commented Sep 4, 2021

question: is the infohash routine in the download block aware of the before block?
for kinazal I need to first fetch a alternate page (see #11585 (comment) using get_srv_details.php
can I do that and have infohash process that page to fetch the hash and title?
WIP for kinozal

---
id: kinozal
name: Kinozal
description: "Kinozal is a RUSSIAN Semi-Private Torrent Tracker for MOVIES / TV / MUSIC"
language: ru-ru
type: semi-private
encoding: windows-1251
links:
  - http://kinozal.tv/ # site forces http, https is not supported

caps:
  categorymappings:
    # TV
    - {id: 1001, cat: TV, desc: "All TV Shows"}
    - {id: 45, cat: TV, desc: "Russian TV Series"}
    - {id: 46, cat: TV, desc: "TV Series"}
    # Movies
    - {id: 1002, cat: Movies, desc: "All Movies"}
    - {id: 8, cat: Movies, desc: "Movies - Comedy"}
    - {id: 6, cat: Movies, desc: "Movies - Action / War"}
    - {id: 15, cat: Movies, desc: "Movies - Thriller / Detective"}
    - {id: 17, cat: Movies, desc: "Movies - Drama"}
    - {id: 35, cat: Movies, desc: "Movies - Melodrama"}
    - {id: 39, cat: Movies, desc: "Movies - Indian"}
    - {id: 13, cat: Movies, desc: "Movies - Science Fiction"}
    - {id: 14, cat: Movies, desc: "Movies - Fantasy"}
    - {id: 24, cat: Movies, desc: "Movies - Horror / Mystery"}
    - {id: 11, cat: Movies, desc: "Movies - Adventure"}
    - {id: 10, cat: Movies, desc: "Movies - Russian Movies"}
    - {id: 9, cat: Movies, desc: "Movies - Historical"}
    - {id: 47, cat: Movies, desc: "Movies - Asian"}
    - {id: 18, cat: Movies, desc: "Movies - Documentaries"}
    - {id: 37, cat: Movies, desc: "Movies - Sport"}
    - {id: 12, cat: Movies, desc: "Movies - Kids / Family"}
    - {id: 7, cat: Movies, desc: "Movies - Classic"}
    - {id: 48, cat: Movies, desc: "Movies - Concerts"}
    - {id: 49, cat: Movies, desc: "Movies - Shows / TV Shows"}
    - {id: 50, cat: Movies, desc: "Movies - TV Show Mir"}
    - {id: 38, cat: Movies, desc: "Movies - Theatre, Opera, Ballet"}
    - {id: 16, cat: Movies, desc: "Movies - Erotica"}
    # Cartoons
    - {id: 1003, cat: TV/Anime, desc: "All Cartoons/Anime"}
    - {id: 21, cat: TV/Anime, desc: "Cartoons"}
    - {id: 22, cat: TV/Anime, desc: "Cartoons - Russian"}
    - {id: 20, cat: TV/Anime, desc: "Cartoons - Anime"}
    # Music
    - {id: 1004, cat: Audio, desc: "All Music"}
    - {id: 3, cat: Audio, desc: "Music"}
    - {id: 4, cat: Audio, desc: "Music - Russian"}
    - {id: 5, cat: Audio, desc: "Music - Collections"}
    - {id: 42, cat: Audio, desc: "Music - Classical"}
    # Other
    - {id: 1006, cat: TV/Other, desc: "Shows, Concerts, Sports"}
    - {id: 2, cat: Audio/Audiobook, desc: "Other - AudioBooks"}
    - {id: 1, cat: Audio/Video, desc: "Other - Music Video's"}
    - {id: 23, cat: Console, desc: "Other - Games"}
    - {id: 32, cat: PC, desc: "Other - Programs"}
    - {id: 40, cat: Other, desc: "Other - Design / Graphics"}
    - {id: 41, cat: Books, desc: "Other - Library"}

  modes:
    search: [q]
    tv-search: [q, season, ep]
    movie-search: [q]
    music-search: [q]
    book-search: [q]

settings:
  - name: username
    type: text
    label: Username
  - name: password
    type: password
    label: Password
  - name: freeleech
    type: checkbox
    label: Search freeleech only
    default: false
  - name: striprussian
    type: checkbox
    label: Strip Russian Letters
    default: true
  - name: sort
    type: select
    label: Sort requested from site
    default: 0
    options:
      0: created
      1: seeders
      3: size
  - name: type
    type: select
    label: Order requested from site
    default: 0
    options:
      0: desc
      1: asc

login:
  path: takelogin.php
  method: post
  inputs:
    username: "{{ .Config.username }}"
    password: "{{ .Config.password }}"
  error:
    - selector: div.bx1:has(div.red)
      message:
        selector: div.bx1 div.red
  test:
    path: userdetails.php

download:
  before:
    path: get_srv_details.php
    inputs:
      action: 2
      id: "{{ .DownloadUri.Query.id }}"

  infohash:
    hash:
      selector: li:first-child
      filters:
        - name: regexp
          args: ([A-F|0-9]{40})
        - name: strdump
          args: hash
    title:
      selector: div.b
      filters:
        - name: trim
        - name: strdump
          args: title

search:
  paths:
    # http://kinozal.tv/browse.php?s=lucifer+2017&g=0&c=0&v=0&d=0&w=0&t=0&f=0
    - path: browse.php
  keywordsfilters:
#    - name: diacritics # 8686
#      args: replace
    - name: re_replace # S01 to 1
      args: ["(?i)\\bS0*(\\d+)\\b", "$1"]
    - name: re_replace # S01E01 to 1 1
      args: ["(?i)\\bS0*(\\d+)E0*(\\d+)\\b", "$1 $2"]
  inputs:
    # multi cat is not supported. so defaulting to ALL
    c: 0
    s: "{{ .Keywords }}"
    # where 0 title, 1 person, 2 genres, 3 regular expression
    g: 0
    # format 0 all
    v: 0
    # released 0 all
    d: 0
    # filter 0 all, 1 today, 2 yesterday, 3 in 3 days, 4 this week, 5 per month, 6-10 size rages, 11 gold, 12 silver
    w: "{{ if .Config.freeleech }}11{{ else }}0{{ end }}"
    t: "{{ .Config.sort }}"
    f: "{{ .Config.type }}"

  rows:
    selector: table > tbody > tr:has(td.bt)

  fields:
    category:
      selector: td.bt img
      attribute: onclick
      filters:
        - name: re_replace
          args: ["[^\\d+]", ""]
    title:
      selector: td.nam a[href^="/details.php?id="]
      filters:
        # normalize to SXXEYY format
        - name: replace
          args: [" / ", " "]
        - name: replace
          args: ["Кураж-Бамбей", "kurazh"]
        - name: replace
          args: ["Кубик в Кубе", "Kubik"]
        - name: replace
          args: ["Кравец", "Kravec"]
        - name: re_replace
          args: ["\\((\\d+)\\s+[Сс]езон:\\s+(?:(\\d+-*\\d*)\\s+[Сс]ери[ия]\\s+.*\\d+)\\)(.*)\\s([12][0-9]{3})\\s(.*)", "$3 - S$1E$2 - rus $5"]
        - name: re_replace
          args: ["(\\([А-Яа-яЁё\\W]+\\))|(^[А-Яа-яЁё\\W\\d]+\\/ )|([а-яА-ЯЁё \\-]+,+)|([а-яА-ЯЁё]+)", "{{ if .Config.striprussian }}{{ else }}$1$2$3$4{{ end }}"]
        - name: re_replace
          args: ["\\((\\d+p)\\)", "$1"]
        - name: replace
          args: ["-Rip", "Rip"]
        - name: replace
          args: ["WEB-DL", "WEBDL"]
        - name: replace
          args: ["WEBDLRip", "WEBDL"]
        - name: replace
          args: ["HDTVRip", "HDTV"]
    details:
      selector: td.nam a[href^="/details.php?id="]
      attribute: href
    download:
      selector: td.nam a[href^="/details.php?id="]
      attribute: href
    size:
      selector: td:nth-child(4)
      filters:
        - name: replace
          args: ["ТБ", "TB"]
        - name: replace
          args: ["ГБ", "GB"]
        - name: replace
          args: ["МБ", "MB"]
        - name: replace
          args: ["КБ", "KB"]
    seeders:
      selector: td:nth-child(5)
    leechers:
      selector: td:nth-child(6)
    # dates come in four flavours:
    date:
      # now
      # Today 09:10
      # Yesterday 13:04
      selector: td:nth-child(7):not(:contains("."))
      optional: true
      filters:
        - name: replace
          args: [" в", ""]
        - name: replace
          args: ["сейчас", "now"]
        - name: replace
          args: ["сегодня", "Today"]
        - name: replace
          args: ["вчера", "Yesterday"]
    date:
      # 24.10.2017 at 23:44
      selector: td:nth-child(7):contains(".")
      optional: true
      filters:
        - name: replace
          args: [" в", ""]
        - name: append
          args: " +00:00" # auto adjusted by site account profile
        - name: dateparse
          args: "02.01.2006 15:04 -07:00"
    downloadvolumefactor:
      case:
        a.r1: 0 # gold
        a.r2: 0.5 # silver
        "*": 1
    uploadvolumefactor:
      text: 1
    minimumratio:
      text: 1.0
# engine n/a

I can send you creds for my kinozal a/c if you want to do your own debugging

@mikeoscar2006
Copy link
Collaborator Author

Yes, the infohash block doesn't affect any other block except selectors. The only condition change is that if infohash block is present, the selectors block won't be processed whether present or not.
If you don't mind giving an a/c, then yes please mikeoscar2006 -at- gmail.com

@garfield69
Copy link
Contributor

just spotted an error in the regexp for the hash
should be

           args: ([A-F|0-9]{40})

@garfield69
Copy link
Contributor

shit, I'm too tired, I may have been testing kinorun by mistake. I need to go to bed.
please test and see if kinozal works for you. thanks ;-)

@mikeoscar2006
Copy link
Collaborator Author

Don't worry, you got it. Hopefully, will issue a new pull request.

@mikeoscar2006
Copy link
Collaborator Author

mikeoscar2006 commented Sep 5, 2021

Ok, So two main things changed.

  1. I noticed that you were not only asking that before is processed but also that you wanted to extract hash and title from that page so I introduced a before field in infohash block which, if set to true, uses the response from before block otherwise if omitted or set to false, the before block still processes but the infohash and title will be extracted from the details page.
  2. Allowing the above change and setting before: true in your WIP download block should have worked but didn't until I noticed the problem. In the before block you set the path and the inputs so basically your request would be something like get_srv_details.php?action=2&id=1862863 but there was a serious mistake in the code for handling the before block which was making it instead get_srv_details.php?ction=2&id=1862863, so basically the first letter of first input would be removed. Don't know how the before block was working with inputs earlier but it was definitely a flaw.

I've pushed a commit that fixes these 2 problems and hence have tested kinozal indexer working great for which I'm opening a separate pull.

EDIT: If possible, please test some indexers with the download before block to know if this doesn't breaks anything and if it does, we probably would want to fix that before merging this.

@garfield69
Copy link
Contributor

garfield69 commented Sep 5, 2021

there was a serious mistake in the code for handling the before block

awesome find!
I've been dealing with this for years, adding an _ in front of the first var in my yaml to compensate ;-D
I'll be glad to drop that hack and test the cleaned up before code where ever I've got that patch.
[edit] siambit is an example.

@mikeoscar2006
Copy link
Collaborator Author

Sure, go ahead with that and let me know if I can help anyway or more changes are required.

@mikeoscar2006
Copy link
Collaborator Author

I just noticed in siambit and if you would've done the same _action in kinozal, I wouldn't have found it. Took me an hour placing debugging lines everywhere lol.

@garfield69
Copy link
Contributor

I was confused for a bit, as during the testing of siambit indexer with your code, it failed to download when I tried the yaml with action and again when I reset it back to _action, so I was starting to think your code was broken.
But then I restored to the latest Jackett and tested again, and the siambit indexer still failed, so now I know that there has been a change to the siambit website and the indexer needs updating. I guess no one is using siambit much on Jackett because I've not seen a ticket by anyone for this.
I hate it when stuff like this happens, it throws off all the careful testing one sets up :-(
Anyway now I've got to figure out whats changed on simabit before continuing peer reviewing this PR.

@garfield69
Copy link
Contributor

LGTM, tested kinozal, siambot, and EbookParadijs plus assorted public indexers
Thank you.

@garfield69 garfield69 merged commit 2ea2b0b into Jackett:master Sep 6, 2021
@garfield69
Copy link
Contributor

garfield69 commented Sep 6, 2021

this is the doc I added to the wiki https://github.com/Jackett/Jackett/wiki/Definition-format#download

example of a download block using the infohash method

download:
  # [OPTIONAL] HTTP request which needs to be done before downloading the file
  before:
    path: get_srv_details.php
    inputs:
      action: 2
      id: "{{ .DownloadUri.Query.id }}"
  # [OPTIONAL] If you only have a magnet hash to work with, this method will allow you to automatically generate a magnet URI
  infohash:
    # [OPTIONAL] if you want the infohash and title to come from the page generated by the previous BEFORE block then include this clause.
    #            The default is false, which causes the infohash and title to come from the page you provided the link for in the search download block.
    before: true
    # [REQUIRED] Use this selector to provide the file hash for the &xt parameter of the magnet URI
    hash:
      # [REQUIRED] the selector to use to find the file hash
      selector: a[href^="magnet:?xt="]
      attribute: href
      # [OPTIONAL] a list of filters which should be applied to the result of this selector
      filters:
        - name: querystring
          args: xt
        - name: replace
          args: ["urn:btih:", ""]
    # [REQUIRED] Use this selector to provide the title for the &dn parameter of the magnet URI 
    title:
      # [REQUIRED] The selector used to find the title
      selector: meta[property="og:title"]
      attribute: content
      # [OPTIONAL] a list of filters which should be applied to the result of this selector
      filters:
        - name: trim

@mikeoscar2006 mikeoscar2006 deleted the cardigann_infohash branch September 6, 2021 08:01
@mikeoscar2006
Copy link
Collaborator Author

Happy to help :)
And the doc looks amazing!

@mikeoscar2006
Copy link
Collaborator Author

I was confused for a bit, as during the testing of siambit indexer with your code, it failed to download when I tried the yaml with action and again when I reset it back to _action, so I was starting to think your code was broken.
But then I restored to the latest Jackett and tested again, and the siambit indexer still failed, so now I know that there has been a change to the siambit website and the indexer needs updating. I guess no one is using siambit much on Jackett because I've not seen a ticket by anyone for this.
I hate it when stuff like this happens, it throws off all the careful testing one sets up :-(
Anyway now I've got to figure out whats changed on simabit before continuing peer reviewing this PR.

True, the sad part of testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kinozal: torrent download link 404
2 participants