Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to scrape url with Jina AI #594

Closed
wants to merge 2 commits into from

Conversation

noamsiegel
Copy link
Contributor

@noamsiegel noamsiegel commented Jun 13, 2024

What this Pull Request (PR) does

  • added --scrape_url or -su CLI command to curl the content of a webpage in markdown form
  • uses Jina AI
  • no API key is needed

How to use it

The easiest way to use it is with this format: fabric -su {URL} | fabric -sp {fabric}. This scrapes the {URL}, transforms it into markdown, and then pipes it into fabric. An example is provided in the screenshot below.

Screenshots

image

@silverstreak
Copy link
Contributor

Great addition!

@noamsiegel
Copy link
Contributor Author

Since Fabric is transitioning to Go, will all PRs to the original repo be held in limbo?

@timothyjoh
Copy link

I really want this added, this would be great tooling to have

@timothyjoh
Copy link

Looks like there is a ton of noise in this PR, possibly due to some formatter that changed it. While I approve of the formatting changes, it would be better to separate that into another PR so that this Jina addition is easier to approve.

@noamsiegel
Copy link
Contributor Author

Looks like there is a ton of noise in this PR, possibly due to some formatter that changed it. While I approve of the formatting changes, it would be better to separate that into another PR so that this Jina addition is easier to approve.

I just removed all the formatting. I'm sorry about that. Let me know what else should be changed for this to be added!

@danielmiessler
Copy link
Owner

This is great! We're working on some more robust scraping options though so I think we wait and address this after it moves to Go.

@noamsiegel noamsiegel deleted the scrape_url branch August 21, 2024 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants