Add functionality to extract JS strings as links in a javascript blob #1121

amiremami · 2024-02-23T19:26:53Z

Couldn't get JS strings as links to able to grep

My command:
bbot -t trickest.com -m httpx -c web_spider_distance=2 web_spider_depth=3 web_spider_links_per_page=1000 omit_event_types=[] url_extension_httpx_only=[]

🙏

TheTechromancer · 2024-02-24T14:17:31Z

@liquidsec what do you think about this? We would essentially be implementing js link extractor.

amiremami · 2024-02-26T16:17:52Z

This is my command:

bbot -t react.dev -m httpx -c web_spider_distance=3 web_spider_depth=3 web_spider_links_per_page=500 omit_event_types=[]

And bbot can't detect any of these JS as links

For example this link not exists in output file:
https://react.dev/_next/static/chunks/webpack-8af07453075e2970.js

TheTechromancer · 2024-02-26T23:40:32Z

Added support for extracting URLs from <link> elements: #1132.

amiremami · 2024-02-27T10:39:24Z

I add some more examples here for future testing, I guess all of them are related to JS blob.

openai.com

shopify.com

atlassian.com

whatsapp.com

ahrefs.com

clickup.com

TheTechromancer · 2024-02-27T12:01:04Z

@amiremami thanks for testing. Did bbot fail to extract these? It always finds full URLs regardless of whether they're embedded in js blobs, so it definitely should have gotten the atlassian one.

amiremami · 2024-02-27T13:41:56Z

@amiremami thanks for testing. Did bbot fail to extract these? It always finds full URLs regardless of whether they're embedded in js blobs, so it definitely should have gotten the atlassian one.

bbot -t https://www.atlassian.com/software -m httpx -c web_spider_distance=2 web_spider_depth=2 web_spider_links_per_page=500 omit_event_types=[]

I have it like this tens of times on the output file, but it's not as "url": "https://atl-global.atlassian.com/js/atl-global.min.js"

TheTechromancer · 2024-02-27T15:30:15Z

bbot -t https://www.atlassian.com/software -m httpx -c web_spider_distance=2 web_spider_depth=2 web_spider_links_per_page=500 omit_event_types=[]

I think you're forgetting a config option ;)

(The reason this config option exists is because most everyone wants to search javascript files for secrets etc., but if it didn't contain anything interesting, they usually don't want to see it in the output.)

amiremami · 2024-02-27T18:50:09Z

Thanks 🙏 I also used that config, but still same : (

amiremami · 2024-02-27T18:51:56Z

This is my command:

bbot -t react.dev -m httpx -c web_spider_distance=3 web_spider_depth=3 web_spider_links_per_page=500 omit_event_types=[]

And bbot can't detect any of these JS as links

For example this link not exists in output file: https://react.dev/_next/static/chunks/webpack-8af07453075e2970.js

For this one, I just upgraded bbot to v1.1.7.2998rc and this JS only exists as URL UNVERIFIED, but shouldn't it exist as URL too?

https://react.dev/_next/static/chunks/webpack-a1ff329830897a9a.js

My command:
bbot -t react.dev -m httpx -c web_spider_distance=2 web_spider_depth=2 web_spider_links_per_page=500 omit_event_types=[] url_extension_httpx_only=[]

TheTechromancer · 2024-02-27T22:05:08Z

@amiremami that specific file is 4 levels deep. The reason it's not showing up is because the spider is set to a depth of 2 (web_spider_depth=2).

If you enable --debug, it will tell you the reason:

2024-02-27 17:00:10,924 [DEBUG] bbot.modules.internal.excavate base.py:1175 Tagging URL_UNVERIFIED("https://react.dev/_next/static/chunks/webpack-ccf89d5e32b01f59.js", module=excavate, tags={'in-scope', 'extension-js', 'endpoint'}) as spider-danger because its spider depth or distance exceeds the scan's limits

amiremami · 2024-02-29T13:20:32Z

@amiremami thanks for testing. Did bbot fail to extract these? It always finds full URLs regardless of whether they're embedded in js blobs, so it definitely should have gotten the atlassian one.

Still couldn't get the atlassian neither in URL nor in URL_UNVERIFIED , if this problem is different than JS blob, please check, thanks a lot 🙏

Got this today

TheTechromancer · 2024-02-29T15:47:01Z

@amiremami keep in mind that https://atl-global.atlassian.com/js/atl-global.min.js is on a different subdomain than www.atlassian.com, so it's not in scope. If you want to see it you will need to either:

increase your scope report distance to see the URL_UNVERIFIED (-c scope_report_distance=1)
whitelist all of atlassian.com to also produce a URL (-w atlassian.com)

amiremami added the bug Something isn't working label Feb 23, 2024

TheTechromancer added enhancement New feature or request and removed bug Something isn't working labels Feb 24, 2024

TheTechromancer mentioned this issue Feb 26, 2024

Better link extraction #1132

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add functionality to extract JS strings as links in a javascript blob #1121

Add functionality to extract JS strings as links in a javascript blob #1121

amiremami commented Feb 23, 2024

TheTechromancer commented Feb 24, 2024 •

edited

amiremami commented Feb 26, 2024

TheTechromancer commented Feb 26, 2024

amiremami commented Feb 27, 2024 •

edited

TheTechromancer commented Feb 27, 2024

amiremami commented Feb 27, 2024

TheTechromancer commented Feb 27, 2024 •

edited

amiremami commented Feb 27, 2024

amiremami commented Feb 27, 2024 •

edited

TheTechromancer commented Feb 27, 2024

amiremami commented Feb 29, 2024

TheTechromancer commented Feb 29, 2024 •

edited

Add functionality to extract JS strings as links in a javascript blob #1121

Add functionality to extract JS strings as links in a javascript blob #1121

Comments

amiremami commented Feb 23, 2024

TheTechromancer commented Feb 24, 2024 • edited

amiremami commented Feb 26, 2024

TheTechromancer commented Feb 26, 2024

amiremami commented Feb 27, 2024 • edited

TheTechromancer commented Feb 27, 2024

amiremami commented Feb 27, 2024

TheTechromancer commented Feb 27, 2024 • edited

amiremami commented Feb 27, 2024

amiremami commented Feb 27, 2024 • edited

TheTechromancer commented Feb 27, 2024

amiremami commented Feb 29, 2024

TheTechromancer commented Feb 29, 2024 • edited

TheTechromancer commented Feb 24, 2024 •

edited

amiremami commented Feb 27, 2024 •

edited

TheTechromancer commented Feb 27, 2024 •

edited

amiremami commented Feb 27, 2024 •

edited

TheTechromancer commented Feb 29, 2024 •

edited