Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve 'xml-prune' — support xpath for text content matching #325

Closed
Alex-302 opened this issue Jun 8, 2023 · 1 comment
Closed

Improve 'xml-prune' — support xpath for text content matching #325

Alex-302 opened this issue Jun 8, 2023 · 1 comment

Comments

@Alex-302
Copy link
Member

Alex-302 commented Jun 8, 2023

Problem

It is impossible to select an element that contains certain text.

Example:
Ads on https://pluto.tv/en/on-demand/movies/6111bbe29b91f300138dc5d9
Ad config is stored here in the .MPD file:
https://gist.githubusercontent.com/Alex-302/b8c23ece5fbe1c80deb4efa8af96c51d/raw/8592abecef9398c28033d093f8999c987e6d3499/pluto.mdp

In this case following element contain info about video ad:

Details
  <Period id="6481524028df58ef1148b225-1686214254109" duration="PT30.000S" start="PT5007.463S">
    <BaseURL>https://siloh-cf.pluto.tv/701_02_ad/creative/6481524028df58ef1148b225_ad/720p/20230608_040000/dash/</BaseURL>
    <AdaptationSet id="0" frameRate="15360/512" segmentAlignment="true" par="16:9" contentType="video" maxWidth="854" maxHeight="480">
      <ContentProtection schemeIdUri="urn:mpeg:dash:mp4protection:2011" value="cenc" cenc:default_KID="00000000-621f-2afe-7ab2-c868d5fd2e2f" />
      <ContentProtection schemeIdUri="urn:uuid:9a04f079-9840-4286-ab92-e65be0885f95" value="MSPR 2.0" xmlns:dashif="https://dashif.org/">
        <cenc:pssh>AAACJnBzc2gAAAAAmgTweZhAQoarkuZb4IhflQAAAgYGAgAAAQABAPwBPABXAFIATQBIAEUAQQBEAEUAUgAgAHgAbQBsAG4AcwA9ACIAaAB0AHQAcAA6AC8ALwBzAGMAaABlAG0AYQBzAC4AbQBpAGMAcgBvAHMAbwBmAHQALgBjAG8AbQAvAEQAUgBNAC8AMgAwADAANwAvADAAMwAvAFAAbABhAHkAUgBlAGEAZAB5AEgAZQBhAGQAZQByACIAIAB2AGUAcgBzAGkAbwBuAD0AIgA0AC4AMAAuADAALgAwACIAPgA8AEQAQQBUAEEAPgA8AFAAUgBPAFQARQBDAFQASQBOAEYATwA+ADwASwBFAFkATABFAE4APgAxADYAPAAvAEsARQBZAEwARQBOAD4APABBAEwARwBJAEQAPgBBAEUAUwBDAFQAUgA8AC8AQQBMAEcASQBEAD4APAAvAFAAUgBPAFQARQBDAFQASQBOAEYATwA+ADwASwBJAEQAPgBBAEEAQQBBAEEAQgA5AGkALwBpAHAANgBzAHMAaABvADEAZgAwAHUATAB3AD0APQA8AC8ASwBJAEQAPgA8AEMASABFAEMASwBTAFUATQA+AGIAcgB6AGUAcgBXAEQAbgBpAEIAVQA9ADwALwBDAEgARQBDAEsAUwBVAE0APgA8AC8ARABBAFQAQQA+ADwALwBXAFIATQBIAEUAQQBEAEUAUgA+AA==</cenc:pssh>
        <mspr:pro>BgIAAAEAAQD8ATwAVwBSAE0ASABFAEEARABFAFIAIAB4AG0AbABuAHMAPQAiAGgAdAB0AHAAOgAvAC8AcwBjAGgAZQBtAGEAcwAuAG0AaQBjAHIAbwBzAG8AZgB0AC4AYwBvAG0ALwBEAFIATQAvADIAMAAwADcALwAwADMALwBQAGwAYQB5AFIAZQBhAGQAeQBIAGUAYQBkAGUAcgAiACAAdgBlAHIAcwBpAG8AbgA9ACIANAAuADAALgAwAC4AMAAiAD4APABEAEEAVABBAD4APABQAFIATwBUAEUAQwBUAEkATgBGAE8APgA8AEsARQBZAEwARQBOAD4AMQA2ADwALwBLAEUAWQBMAEUATgA+ADwAQQBMAEcASQBEAD4AQQBFAFMAQwBUAFIAPAAvAEEATABHAEkARAA+ADwALwBQAFIATwBUAEUAQwBUAEkATgBGAE8APgA8AEsASQBEAD4AQQBBAEEAQQBBAEIAOQBpAC8AaQBwADYAcwBzAGgAbwAxAGYAMAB1AEwAdwA9AD0APAAvAEsASQBEAD4APABDAEgARQBDAEsAUwBVAE0APgBiAHIAegBlAHIAVwBEAG4AaQBCAFUAPQA8AC8AQwBIAEUAQwBLAFMAVQBNAD4APAAvAEQAQQBUAEEAPgA8AC8AVwBSAE0ASABFAEEARABFAFIAPgA=</mspr:pro>
        <dashif:laurl>service-concierge.clusters.pluto.tv/v1/pr/alt</dashif:laurl>
      </ContentProtection>
      <ContentProtection schemeIdUri="urn:uuid:edef8ba9-79d6-4ace-a3c8-27dcd51d21ed" xmlns:dashif="https://dashif.org/">
        <cenc:pssh>AAAASnBzc2gAAAAA7e+LqXnWSs6jyCfc1R0h7QAAACoSEAAAAABiHyr+erLIaNX9Li4SEAAAAABiHyr+erLIaNX9Li9I49yVmwY=</cenc:pssh>
        <dashif:laurl>service-concierge.clusters.pluto.tv/v1/wv/alt</dashif:laurl>
      </ContentProtection>
      <InbandEventStream schemeIdUri="www.pluto.tv" value="999" />
      <SupplementalProperty schemeIdUri="urn:mpeg:dash:adaptation-set-switching:2016" value="1" />
      <Role schemeIdUri="urn:mpeg:dash:role:2011" value="main" />
      <Representation id="0" width="640" height="360" sar="1:1" mimeType="video/mp4" codecs="avc1.64001f" bandwidth="687943">
        <SegmentTemplate timescale="15360" startNumber="1" media="video/360p-600/$Number%05d$.m4s" initialization="video/360p-600/init.mp4" presentationTimeOffset="0">
          <SegmentTimeline>
            <S d="76800" r="5" t="0" />
          </SegmentTimeline>
        </SegmentTemplate>
      </Representation>
      <Representation id="1" width="854" height="480" sar="1:1" mimeType="video/mp4" codecs="avc1.64001f" bandwidth="1701984">
        <SegmentTemplate timescale="15360" startNumber="1" media="video/480p-1600/$Number%05d$.m4s" initialization="video/480p-1600/init.mp4" presentationTimeOffset="0">
          <SegmentTimeline>
            <S d="76800" r="5" t="0" />
          </SegmentTimeline>
        </SegmentTemplate>
      </Representation>
      <Representation id="2" width="426" height="240" sar="1:1" mimeType="video/mp4" codecs="avc1.64001f" bandwidth="351528">
        <SegmentTemplate timescale="15360" startNumber="1" media="video/240p-300/$Number%05d$.m4s" initialization="video/240p-300/init.mp4" presentationTimeOffset="0">
          <SegmentTimeline>
            <S d="76800" r="5" t="0" />
          </SegmentTimeline>
        </SegmentTemplate>
      </Representation>
      <Representation id="3" width="854" height="480" sar="1:1" mimeType="video/mp4" codecs="avc1.64001f" bandwidth="1095476">
        <SegmentTemplate timescale="15360" startNumber="1" media="video/480p-1000/$Number%05d$.m4s" initialization="video/480p-1000/init.mp4" presentationTimeOffset="0">
          <SegmentTimeline>
            <S d="76800" r="5" t="0" />
          </SegmentTimeline>
        </SegmentTemplate>
      </Representation>
    </AdaptationSet>
    <AdaptationSet id="2" segmentAlignment="true" label="Original" contentType="audio">
      <ContentProtection schemeIdUri="urn:mpeg:dash:mp4protection:2011" value="cenc" cenc:default_KID="00000000-621f-2afe-7ab2-c868d5fd2e2f" />
      <ContentProtection schemeIdUri="urn:uuid:9a04f079-9840-4286-ab92-e65be0885f95" value="MSPR 2.0" xmlns:dashif="https://dashif.org/">
        <cenc:pssh>AAACJnBzc2gAAAAAmgTweZhAQoarkuZb4IhflQAAAgYGAgAAAQABAPwBPABXAFIATQBIAEUAQQBEAEUAUgAgAHgAbQBsAG4AcwA9ACIAaAB0AHQAcAA6AC8ALwBzAGMAaABlAG0AYQBzAC4AbQBpAGMAcgBvAHMAbwBmAHQALgBjAG8AbQAvAEQAUgBNAC8AMgAwADAANwAvADAAMwAvAFAAbABhAHkAUgBlAGEAZAB5AEgAZQBhAGQAZQByACIAIAB2AGUAcgBzAGkAbwBuAD0AIgA0AC4AMAAuADAALgAwACIAPgA8AEQAQQBUAEEAPgA8AFAAUgBPAFQARQBDAFQASQBOAEYATwA+ADwASwBFAFkATABFAE4APgAxADYAPAAvAEsARQBZAEwARQBOAD4APABBAEwARwBJAEQAPgBBAEUAUwBDAFQAUgA8AC8AQQBMAEcASQBEAD4APAAvAFAAUgBPAFQARQBDAFQASQBOAEYATwA+ADwASwBJAEQAPgBBAEEAQQBBAEEAQgA5AGkALwBpAHAANgBzAHMAaABvADEAZgAwAHUATAB3AD0APQA8AC8ASwBJAEQAPgA8AEMASABFAEMASwBTAFUATQA+AGIAcgB6AGUAcgBXAEQAbgBpAEIAVQA9ADwALwBDAEgARQBDAEsAUwBVAE0APgA8AC8ARABBAFQAQQA+ADwALwBXAFIATQBIAEUAQQBEAEUAUgA+AA==</cenc:pssh>
        <mspr:pro>BgIAAAEAAQD8ATwAVwBSAE0ASABFAEEARABFAFIAIAB4AG0AbABuAHMAPQAiAGgAdAB0AHAAOgAvAC8AcwBjAGgAZQBtAGEAcwAuAG0AaQBjAHIAbwBzAG8AZgB0AC4AYwBvAG0ALwBEAFIATQAvADIAMAAwADcALwAwADMALwBQAGwAYQB5AFIAZQBhAGQAeQBIAGUAYQBkAGUAcgAiACAAdgBlAHIAcwBpAG8AbgA9ACIANAAuADAALgAwAC4AMAAiAD4APABEAEEAVABBAD4APABQAFIATwBUAEUAQwBUAEkATgBGAE8APgA8AEsARQBZAEwARQBOAD4AMQA2ADwALwBLAEUAWQBMAEUATgA+ADwAQQBMAEcASQBEAD4AQQBFAFMAQwBUAFIAPAAvAEEATABHAEkARAA+ADwALwBQAFIATwBUAEUAQwBUAEkATgBGAE8APgA8AEsASQBEAD4AQQBBAEEAQQBBAEIAOQBpAC8AaQBwADYAcwBzAGgAbwAxAGYAMAB1AEwAdwA9AD0APAAvAEsASQBEAD4APABDAEgARQBDAEsAUwBVAE0APgBiAHIAegBlAHIAVwBEAG4AaQBCAFUAPQA8AC8AQwBIAEUAQwBLAFMAVQBNAD4APAAvAEQAQQBUAEEAPgA8AC8AVwBSAE0ASABFAEEARABFAFIAPgA=</mspr:pro>
        <dashif:laurl>service-concierge.clusters.pluto.tv/v1/pr/alt</dashif:laurl>
      </ContentProtection>
      <ContentProtection schemeIdUri="urn:uuid:edef8ba9-79d6-4ace-a3c8-27dcd51d21ed" xmlns:dashif="https://dashif.org/">
        <cenc:pssh>AAAASnBzc2gAAAAA7e+LqXnWSs6jyCfc1R0h7QAAACoSEAAAAABiHyr+erLIaNX9Li4SEAAAAABiHyr+erLIaNX9Li9I49yVmwY=</cenc:pssh>
        <dashif:laurl>service-concierge.clusters.pluto.tv/v1/wv/alt</dashif:laurl>
      </ContentProtection>
      <Representation id="5" audioSamplingRate="48000" mimeType="audio/mp4" codecs="mp4a.40.2" bandwidth="105819">
        <AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="2" />
        <SegmentTemplate timescale="48000" startNumber="1" media="audio/default/$Number%05d$.m4s" initialization="audio/default/init.mp4" presentationTimeOffset="0">
          <SegmentTimeline>
            <S d="240640" t="0" />
            <S d="239616" t="240640" />
            <S d="240640" t="480256" />
            <S d="239616" r="1" t="720896" />
            <S d="240640" t="1200128" />
          </SegmentTimeline>
        </SegmentTemplate>
      </Representation>
    </AdaptationSet>
  </Period>

and it may be matched by _ad/creative/.

Proposed solution

Add ability to match element, which contains a text, including using regexp (like :contains() in Extended CSS).

It is desirable to use a syntax simpler than XPatch //Period[contains(BaseURL, '_ad/creative/')], like in ExtCSS or better like/similar to $jsonprune of CoreLibs.

@Alex-302 Alex-302 added the Feature request Adding new feature label Jun 8, 2023
@slavaleleka slavaleleka added enhancement Improvement of existent feature and removed Feature request Adding new feature labels Jun 8, 2023
@adguard-bot adguard-bot added the Feature request Adding new feature label Jun 8, 2023
@adguard-bot adguard-bot changed the title Improve 'xml-prune' scriptlet Improve 'xml-prune' — match text content Jun 8, 2023
@Yuki2718
Copy link

uBO supports xpath and it turend out to be useful to address Prime video ads where start attribute, but not node itself, should be removed for correct timing.

@adguard-bot adguard-bot changed the title Improve 'xml-prune' — match text content Improve 'xml-prune' — support xpath for text content matching Jul 24, 2023
@adguard-bot adguard-bot assigned slavaleleka and unassigned maximtop Aug 10, 2023
adguard pushed a commit that referenced this issue Aug 18, 2023
#325

Squashed commit of the following:

commit 7bc2b75
Author: Adam Wróblewski <adam@adguard.com>
Date:   Thu Aug 17 12:52:55 2023 +0200

    Change variable name
    Avoid unnecessary nesting

commit bf846a7
Author: Adam Wróblewski <adam@adguard.com>
Date:   Wed Aug 16 12:49:29 2023 +0200

    Add JSDoc
    Rename xPathElements to getXPathElements

commit 0f4588e
Author: Slava Leleka <v.leleka@adguard.com>
Date:   Wed Aug 16 13:18:06 2023 +0300

    Update description

commit 70805ad
Author: Slava Leleka <v.leleka@adguard.com>
Date:   Wed Aug 16 13:17:55 2023 +0300

    Update description

commit 821003e
Merge: b3d8355 9b71be1
Author: Adam Wróblewski <adam@adguard.com>
Date:   Wed Aug 16 09:03:11 2023 +0200

    Merge branch 'master' into feature/AG-22975

commit b3d8355
Author: Adam Wróblewski <adam@adguard.com>
Date:   Wed Aug 16 08:56:42 2023 +0200

    Add support for XPath in xml-prune
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants