Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Add Basic JSON Parsing from API #12424

Merged
merged 27 commits into from
Oct 20, 2021
Merged

Conversation

mikeoscar2006
Copy link
Collaborator

@mikeoscar2006 mikeoscar2006 commented Oct 15, 2021

Resolves #12334.

  • Basic API Parsing in Cardigann
  • A couple Sample YML definitions for testing (will not be included in final merge)
  • introduce filters
  • allow fields without selectors (such as needed for downloadvolumefactor
  • add a requestdelay parm as it seems most api's rate limit their intake and we may need to regulate this in the yml
  • case block
  • Overhaul

Note: It is still in a PoC state. After adding more features, a final overhaul will be done to improve the code quality.

Copy link
Contributor

@garfield69 garfield69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I like what I see so far.

I've updated the yts.yml to see how close I could get the results matching the yts.cs

next step would be to:

  • introduce filters
  • allow fields without selectors (such as needed for downloadvolumefactor
  • add a requestdelay parm as it seems most api's rate limit their intake and we may need to regulate this in the yml

Later I could start adding more test yml and that may help see what other features may be required.

@mikeoscar2006
Copy link
Collaborator Author

Thanks a lot for the testing @garfield69. Thanks to you, some quality features are being added :)

@garfield69
Copy link
Contributor

Nice!

Little niggle found.

Searching yts-yml for abcd1234 yields on error

Jackett.Common.IndexerException: Exception (yts-yml): Error Parsing Rows Selector
 ---> System.Exception: Error Parsing Rows Selector
   at Jackett.Common.Indexers.CardigannIndexer.PerformQuery(TorznabQuery query) in C:\Users\Garfield69\Documents\GitHub\Jackett\src\Jackett.Common\Indexers\CardigannIndexer.cs:line 1372
   at Jackett.Common.Indexers.BaseIndexer.ResultsForQuery(TorznabQuery query, Boolean isMetaIndexer) in C:\Users\Garfield69\Documents\GitHub\Jackett\src\Jackett.Common\Indexers\BaseIndexer.cs:line 402
   --- End of inner exception stack trace ---
   at Jackett.Common.Indexers.BaseIndexer.ResultsForQuery(TorznabQuery query, Boolean isMetaIndexer) in C:\Users\Garfield69\Documents\GitHub\Jackett\src\Jackett.Common\Indexers\BaseIndexer.cs:line 414
   at Jackett.Common.Indexers.BaseWebIndexer.ResultsForQuery(TorznabQuery query, Boolean isMetaIndexer) in C:\Users\Garfield69\Documents\GitHub\Jackett\src\Jackett.Common\Indexers\BaseIndexer.cs:line 756
   at Jackett.Server.Controllers.ResultsController.Results(ApiSearch requestt) in C:\Users\Garfield69\Documents\GitHub\Jackett\src\Jackett.Server\Controllers\ResultsController.cs:line 228
{{"status":"ok","status_message":"Query was successful","data":{"movie_count":0,"limit":50,"page_number":1},"@meta":{"server_time":1634408056,"server_timezone":"CET","api_version":2,"execution_time":"0 ms"}}

apparently YTS do not provide the expected empty set movies:[], but just movie_count:0 for their no-results response.

bit-titan in comparison returns

{"siteUrl":"https:\/\/bit-titan.net\/","siteName":"BiT-TiTAN","filesFound":"0","pages":0,"viewResults":"0 to 100 from 0","results":[]}

nice of them to provide the empty set results:[] as well as a filesFound:0 :-)

TPB returns

[{"id":"0","name":"No results returned","info_hash":"0000000000000000000000000000000000000000","leechers":"0","seeders":"0","num_files":"0","size":"0","username":"","added":"0","status":"member","category":"0","imdb":"","total_found":"1"}]

which is different again, total_found:1 ?!? I guess we are meant to use id:0 as their no-result response.

clearly we cannot rely on sites' api providing empty sets (like bit-titan),.

Maybe we can trap when the row selector is not found but some other field is provided that indicates that no results were returned, to distinguish genuine row selector not found error where the yml was coded incorrectly or the api was changed after the indexer was coded ?

In the mean time I shall work on a couple more conversions that we can test with ;-)

@mikeoscar2006
Copy link
Collaborator Author

Great find!, I'll add a countSelector field to the block and add it before the row parser.

@mikeoscar2006
Copy link
Collaborator Author

@garfield69 Please care the dot notation according to the latest commit. The arrow notation was resulting in inconsistency so I removed it.

@garfield69
Copy link
Contributor

Been working on try to get TPB-yml working for testing ...
So in the case of TPB, when there are results the api returns a set without a name

[{"id":53167695,"info_hash":"EE5F3CB04A68EF27A72825D77561FBCB3742C3DF","category":505,"name":"NubileFilms 21 10 16 Lilly Bella Take A Bite  720p mp4","status":"trusted","num_files":1,"size":485878644,"seeders":0,"leechers":0,"username":"Mesoglea","added":1634416932,"anon":0,"imdb":null},{"id":53167694,"info_hash":"906B1091590B494DDAF61B455B9EF2E7A81FC6BB","category":101,"name":"Time Life Music - The Complete 80's (+Bonus) 320k musicfromrizzo","status":"trusted","num_files":250,"size":2494136715,"seeders":0,"leechers":0,"username":"moviesbyrizzo","added":1634416698,"anon":0,"imdb":null},{"id":53167693,"info_hash":"AC3862636D51D761608CE87067AB8AEB2BA017CD","category":205,"name":"Mordkommissionen S02E01-07 svensk text","status":"trusted","num_files":7,"size":4184299787,"seeders":0,"leechers":0,"username":"Anonymous","added":1634416556,"imdb":null},{"id":53167692,"info_hash":"D3CC939FE6B84762EAC709D5410ADB4FB9E28052","category":505,"name":"AlettaOceanLive 21 10 15 Delivery Guy  720p mp4","status":"trusted","num_files":1,"size":315586774,"seeders":0,"leechers":0,"username":"Mesoglea","added":1634416506,"anon":0,"imdb":null},{"id":53167691,"info_hash":"CAA7062E573CBFC693C59093C8A5C4657BDCF910","category":205,"name":"Young Justice S04E02 XviD-AFG","status":"vip","num_files":6,"size":220151170,"seeders":0,"leechers":0,"username":"TvTeam","added":1634416290,"anon":0,"imdb":null},{"id":53167690,"info_hash":"34D527B41C914F1EC0F754A95AF9B458F110CA60","category":205,"name":"Young Justice S04E02 480p x264-mSD","status":"vip","num_files":6,"size":128636917,"seeders":0,"leechers":0,"username":"TvTeam","added":1634416282,"anon":0,"imdb":null},{"id":53167689,"info_hash":"6F03FDC69CBEFD910A37142D81E50A9CA8C90F61","category":205,"name":"Young Justice S04E01 XviD-AFG","status":"vip","num_files":6,"size":241661904,"seeders":0,"leechers":0,"username":"TvTeam","added":1634416275,"anon":0,"imdb":null},{"id":53167688,"info_hash":"719F9F79442CE5746AF393767D7458A1121B29FD","category":205,"name":"Young Justice S04E01 480p x264-mSD","status":"vip","num_files":6,"size":132407509,"seeders":0,"leechers":0,"username":"TvTeam","added":1634416268,"anon":0,"imdb":null},{"id":53167687,"info_hash":"E6F3B0548D28BC31C0C7AB73E20FC0ABF15C2001","category":205,"name":"Young.Justice.S04E01.XviD-AFG[TGx]","status":"vip","num_files":3,"size":240926475,"seeders":0,"leechers":0,"username":"Anonymous","added":1634416252,"imdb":null},{"id":53167686,"info_hash":"DA732064FB91D2C20702C4697DB1ECBEFC9D69D6","category":205,"name":"Young.Justice.S04E02.XviD-AFG[TGx]","status":"vip","num_files":3,"size":219299499,"seeders":0,"leechers":0,"username":"Anonymous","added":1634416039,"imdb":null},{"id":53167685,"info_hash":"4759BCED8A093816F10F4F9EF9D8C3F9024388A4","category":208,"name":"Magnum.P.I.2018.S04E03.Texas.Wedge.720p.AMZN.WEBRip.DDP5.1.x264-","status":"vip","num_files":2,"size":2027626373,"seeders":0,"leechers":0,"username":"Anonymous","added":1634416027,"imdb":null},
etc etc etc

so for the row what do I specify as a selector in this case?
Perhaps a special keyword root or just a dot . to identify that the set starts without a groupset name?

also, regarding the count, if I need to access a field from the root, that is to say, a field that is before the set referenced by the row selector, how can I reference it?

@mikeoscar2006
Copy link
Collaborator Author

I think a simple . will work without any change needed in the code.

Count selector refers the root object only, not the rows unless specified as a dot notation path.

@garfield69
Copy link
Contributor

I've loaded thepiratebay.yml so you can take a looksee. It's WIP.
using

  rows:
    selector: .

caused error

Jackett.Common.IndexerException: Exception (thepiratebay-yml): Unexpected end while parsing path.  ---> Newtonsoft.Json.JsonException: Unexpected end while parsing path.    at Newtonsoft.Json.Linq.JsonPath.JPath.ParsePath(List`1 filters, Int32 currentPartStartIndex, Boolean query)    at Newtonsoft.Json.Linq.JsonPath.JPath.ParseMain()    at Newtonsoft.Json.Linq.JsonPath.JPath..ctor(String expression)    at Newtonsoft.Json.Linq.JToken.SelectToken(String path, Boolean errorWhenNoMatch)    at Newtonsoft.Json.Linq.JToken.SelectToken(String path)    at Jackett.Common.Indexers.CardigannIndexer.PerformQuery(TorznabQuery query) in C:\Users\Garfield69\Documents\GitHub\Jackett\src\Jackett.Common\Indexers\CardigannIndexer.cs:line 1377    at Jackett.Common.Indexers.BaseIndexer.ResultsForQuery(TorznabQuery query, Boolean isMetaIndexer) in C:\Users\Garfield69\Documents\GitHub\Jackett\src\Jackett.Common\Indexers\BaseIndexer.cs:line 402    --- End of inner exception stack trace ---    at Jackett.Common.Indexers.BaseIndexer.ResultsForQuery(TorznabQuery query, Boolean isMetaIndexer) in C:\Users\Garfield69\Documents\GitHub\Jackett\src\Jackett.Common\Indexers\BaseIndexer.cs:line 414    at Jackett.Common.Indexers.BaseWebIndexer.ResultsForQuery(TorznabQuery query, Boolean isMetaIndexer) in C:\Users\Garfield69\Documents\GitHub\Jackett\src\Jackett.Common\Indexers\BaseIndexer.cs:line 756    at Jackett.Server.Controllers.ResultsController.Results(ApiSearch requestt) in C:\Users\Garfield69\Documents\GitHub\Jackett\src\Jackett.Server\Controllers\ResultsController.cs:line 228
--

@garfield69
Copy link
Contributor

\o/ got a fully working version of bit-titan.yml

@garfield69
Copy link
Contributor

\o/ full working version of milkie.yml
I'm loving this.
Waiting for the bubble to burst ;-b

@garfield69 garfield69 mentioned this pull request Oct 17, 2021
1 task
@garfield69
Copy link
Contributor

additional feature request:
please allow case block for selectors so I can support fields that require transforming.

    standard:
      selector: ..standard
      case:
        0: ""
        1: "8K"
        2: "2160p"
        3: "1080p"
        4: "1080i"
        5: "720p"
        6: "SD"
    downloadvolumefactor:
      selector: ..sp_state
      case:
        2: 0   # free
        4: 0   # 2x free
        5: 0.5 # 50% free
        6: 0.5 # 2x 50% free
        7: 0.3 # 30% free
        "*": 1
    uploadvolumefactor:
      selector: ..sp_state
      case:
        3: 2 # 2x
        4: 2 # 2x free
        6: 2 # 2x 50% free
        "*": 1

Thanks.

@mikeoscar2006
Copy link
Collaborator Author

I've loaded thepiratebay.yml so you can take a looksee. It's WIP. using

  rows:
    selector: .

caused error

Jackett.Common.IndexerException: Exception (thepiratebay-yml): Unexpected end while parsing path.  ---> Newtonsoft.Json.JsonException: Unexpected end while parsing path.    at Newtonsoft.Json.Linq.JsonPath.JPath.ParsePath(List`1 filters, Int32 currentPartStartIndex, Boolean query)    at Newtonsoft.Json.Linq.JsonPath.JPath.ParseMain()    at Newtonsoft.Json.Linq.JsonPath.JPath..ctor(String expression)    at Newtonsoft.Json.Linq.JToken.SelectToken(String path, Boolean errorWhenNoMatch)    at Newtonsoft.Json.Linq.JToken.SelectToken(String path)    at Jackett.Common.Indexers.CardigannIndexer.PerformQuery(TorznabQuery query) in C:\Users\Garfield69\Documents\GitHub\Jackett\src\Jackett.Common\Indexers\CardigannIndexer.cs:line 1377    at Jackett.Common.Indexers.BaseIndexer.ResultsForQuery(TorznabQuery query, Boolean isMetaIndexer) in C:\Users\Garfield69\Documents\GitHub\Jackett\src\Jackett.Common\Indexers\BaseIndexer.cs:line 402    --- End of inner exception stack trace ---    at Jackett.Common.Indexers.BaseIndexer.ResultsForQuery(TorznabQuery query, Boolean isMetaIndexer) in C:\Users\Garfield69\Documents\GitHub\Jackett\src\Jackett.Common\Indexers\BaseIndexer.cs:line 414    at Jackett.Common.Indexers.BaseWebIndexer.ResultsForQuery(TorznabQuery query, Boolean isMetaIndexer) in C:\Users\Garfield69\Documents\GitHub\Jackett\src\Jackett.Common\Indexers\BaseIndexer.cs:line 756    at Jackett.Server.Controllers.ResultsController.Results(ApiSearch requestt) in C:\Users\Garfield69\Documents\GitHub\Jackett\src\Jackett.Server\Controllers\ResultsController.cs:line 228
--

Sorry, $ should be used instead. Updated the tpb yml and tested!

@mikeoscar2006
Copy link
Collaborator Author

Honestly, I don't have much idea for case block. I'm thinking it as more of multiple matches and replace things and have accordingly pushed a commit. Please test it and if I'm wrong, can you please post some examples too so I can know what can be achieved through the case block.

add default for case block
@ilike2burnthing
Copy link
Contributor

Re: #1774 their API will always add a blank result at the end which results in the error Error while parsing field=id, selector=ID, value=<null>: Selector "ID" didn't match.

Adding :has(MAGNET) to the row selector results in the error Unexpected character while parsing path indexer: M.

Similarly, adding :contains("magnet:") instead results in the error Unexpected character while parsing path indexer: ".

When there are no results (e.g. searching abcde12345) there is nothing to use for the count selector.

@garfield69
Copy link
Contributor

@ilike2burnthing can I see the indexer you've made for this?

@garfield69 garfield69 mentioned this pull request Oct 19, 2021
@ilike2burnthing
Copy link
Contributor

ilike2burnthing commented Oct 19, 2021

---
id: xbit
name: xBiT
description: "xBiT is a Public torrent index"
language: en-US
type: public
encoding: UTF-8
links:
  - https://xbit.pw/

caps:
  categories:
    Other: Other

  modes:
    search: [q]
    tv-search: [q, season, ep]
    movie-search: [q]
    music-search: [q]
    book-search: [q]

settings:
  - name: info_8000
    type: info
    label: About xBiT Categories
    default: xBiT does not return categories in its search results.</br>To add to your Apps' Torznab indexer, replace all categories with 8000(Other).
  - name: rows_selector
    type: text
    label: enter rows selector string
    default: "dht_results:has(MAGNET)"

search:
  paths:
    - path: api
      response:
        type: json
  inputs:
    search: "{{ .Keywords }}"
    limit: "{{ if .Keywords }}100{{ else }}49{{ end }}"

  rows:
    selector: "{{ .Config.rows_selector }}"
#    selector: dht_results:has(MAGNET) # pass
#    selector: dht_results:has(MAGNET):not(TORRENT) # pass
#    selector: dht_results:has(MAGNET):has(NAME) # pass
#    selector: dht_results:has(MAGNET):not(NAME) # pass
#    selector: dht_results:has(MAGNET):not(NAME:contains(720p)) # pass
#    selector: dht_results:has(MAGNET):has(NAME:contains(720p)) # pass
#    selector: dht_results:has(MAGNET):not(TORRENT:contains(720p)) # fail
#    selector: dht_results:has(MAGNET:contains(xt=urn)) # pass
#    selector: dht_results:has(MAGNET:contains(xt=urn)):has(NAME:contains(720p)) # pass

  fields:
    id:
      selector: ID
    category:
      text: Other
    title:
      selector: NAME
    details:
      text: "{{ .Config.sitelink }}{{ if .Result.id }}?id={{ .Result.id }}{{ else }}{{ end }}"
    download:
      selector: TORRENT
      optional: true
    magnet:
      selector: MAGNET
    date:
      # 2021-10-19 10:27:01
      selector: DISCOVERED
      filters:
        - name: append
          args: " -07:00" # PDT
        - name: dateparse
          args: "2006-01-02 15:04:05 -07:00"
    size:
      selector: SIZE
    seeders:
      text: 1
    leechers:
      text: 1
    downloadvolumefactor:
      text: 0
    uploadvolumefactor:
      text: 1
# json engine n/a

@garfield69
Copy link
Contributor

@ilike2burnthing I understand now what you are saying.
You are trying to use :has(), :not(), :contains() but these are CSS selectors valid only on HTML.

@mikeoscar2006 This would be quite a powerful feature to add.
it would apply to the rows selector and field selectors

but this might be quite a large undertaking to develop. Does the JToken from Newtonsoft.Json.Linq have comparators and query capabilities?

so xbit generates a response like this:

{
	"dht_results": [{"ID":"16273799",
"NAME":"www.ds1024.xyz 极品无毛一线天白虎萝莉从蛋蛋后直播大秀 穿着可爱睡衣",
"MAGNET":"magnet:?xt=urn:btih:8904e34eb8331463927a82aac181a043504656cc&dn=www.ds1024.xyz+%E6%9E%81%E5%93%81%E6%97%A0%E6%AF%9B%E4%B8%80%E7%BA%BF%E5%A4%A9%E7%99%BD%E8%99%8E%E8%90%9D%E8%8E%89%E4%BB%8E%E8%9B%8B%E8%9B%8B%E5%90%8E%E7%9B%B4%E6%92%AD%E5%A4%A7%E7%A7%80+%E7%A9%BF%E7%9D%80%E5%8F%AF%E7%88%B1%E7%9D%A1%E8%A1%A3",
"SIZE":"1.15GB",
"DISCOVERED":"2021-10-19 10:37:29" },
 {}]}

where they include an empty set at the end of the results.

So perhaps an alternative might be to just ignore an empty set if we have already processed one other set previously?

@mikeoscar2006
Copy link
Collaborator Author

@garfield69 Can you give few examples so I can understand what exactly we can achieve with :has() and other queries?

PS: Loving the response. Didn't know so much things would be solved with this addition 🤩

@garfield69
Copy link
Contributor

garfield69 commented Oct 19, 2021

examples:
let assume the response set looks like this:

{
  "data": [
    {
      "name": "a regular title with a magnet and freeleech",
      "magnet": "magnet:?xt=urn:btih:8904e34eb8331463927a82aac181a043504656cc",
      "free": "1"
    },
    {
      "name": "a VIP title with a magnet and not free",
      "magnet": "magnet:?xt=urn:btih:8004e34eb8331463927a82aac181a043504656cc",
      "free": "0",
      "vip": "true"
    },
    {
      "name": "a title without a magnet",
      "free": "0"
    }
  ]
}

you could code

  rows:
    selector: data:has(magnet):not(vip):has(free:contains("1"))

  fields:
    title:
      selector: name
    download:
      selector: magnet
    donwloadvolumefactor:
      selector: free
      case:
        0: 1 # not freeleech
        1: 0 # freeleech

and this would drop any row that does not include a selector with the name magnet,
and drop any row that includes a selector with the name vip,
and drop any row that includes a selector with the name free that also has a value that includes 1

I hope I have not made any mistakes in the presentation and that is make sense.

the :has(), :not(), :contains() would also operate at the field selector level to allow fields with the same name to be processed or skipped (in combination with the optional: true statement
for example

rows:
  selector: data

fields:
  date:
    selector: published:contains("-")
    optional: true
    filters:
      - name: dateparse
        args: "2006-01-02"
  date:
    selector: published:not(:contains("-"))
    optional: true
    filters:
     - name: timeago

while the latter is strictly a violation of yaml guidelines (duplicate date keys) its in line with existing yaml indexers that have this type of logic.

@mikeoscar2006
Copy link
Collaborator Author

Got it. AFAIK, There are filter expressions that are not as simple as :has(). Also, there is also support for such filters programmatically but need to be explicitly queried, I mean either by extracting the has, not, contain filters from the text and applying it accordingly or using a filter block to mention the type of filter and parameters as above.

Can only think of these two options rn. lmk which one do you think is better and should we implement it in this PR or a separate one after merging this.

@garfield69
Copy link
Contributor

should we implement it in this PR

I think the current PR is ready for general use, and will allow us to publish the bulk of those json indexer that work within the current PR framework.
IMO the :has(), :not(), :contains(), :not(:contains()) feature would best be developed/tested separately, and if implemented would allow a few more json indexer to be published.

lmk which one do you think is better

the former has the advantage of looking similar to the existing HTML based indexers so that would be my preferred option, but I can see how a filter block might be easier to implement and TBH as long as either option works I am not too fussed if the latter method is implemented instead.
@ilike2burnthing have you a preference for one or the other style of coding,
:has(), :not(), :contains(), :not(:contains())
vs

selector: name
filters:
  - name: contains
    args: "string"

and similarly for has and not?

albeit, I don't see how the :not(:contains()) might look as a filter except perhaps as
- name: notcontains

@ilike2burnthing
Copy link
Contributor

ilike2burnthing commented Oct 19, 2021

Same as you, if they're equally easy to implement then :has(), :not(), :contains(), :not(:contains()) for consistency, but otherwise whichever is easier to implement.

@mikeoscar2006
Copy link
Collaborator Author

Well if the former approach is preferable then I'll definitely start with that and if not implementable, I'll fall back to the latter approach.
But yeah, first let's finish the overhaul and merge this pull request so that the starter ymls you made are available to all :)

@garfield69
Copy link
Contributor

@mikeoscar2006
Copy link
Collaborator Author

Looks great. I'm almost done with the overhaul. I'll be publishing it today for your final review and then we can merge this.

@ilike2burnthing
Copy link
Contributor

Looks good.

Should we bump to v0.19 for this?

Copy link
Contributor

@garfield69 garfield69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work!
I tested my set of 7 json indexers and all continue to work after your last commit.
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[req]: C# Indexer Implementation based on yml File definition
3 participants