Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/assets/searchgraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/smartscrapergraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/speechgraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 0 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,3 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = 'sphinx_rtd_theme'
html_static_path = ['_static']
7 changes: 5 additions & 2 deletions docs/source/getting_started/examples.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
Examples
========

Here some example of the different ways to scrape with ScrapegraphAI
Let's suppose you want to scrape a website to get a list of projects with their descriptions.
You can use the `SmartScraperGraph` class to do that.
The following examples show how to use the `SmartScraperGraph` class with OpenAI models and local models.

OpenAI models
^^^^^^^^^^^^^
Expand Down Expand Up @@ -78,7 +80,7 @@ After that, you can run the following code, using only your machine resources br
# ************************************************

smart_scraper_graph = SmartScraperGraph(
prompt="List me all the news with their description.",
prompt="List me all the projects with their description.",
# also accepts a string with the already downloaded HTML code
source="https://perinim.github.io/projects",
config=graph_config
Expand All @@ -87,3 +89,4 @@ After that, you can run the following code, using only your machine resources br
result = smart_scraper_graph.run()
print(result)

To find out how you can customize the `graph_config` dictionary, by using different LLM and adding new parameters, check the `Scrapers` section!
21 changes: 15 additions & 6 deletions docs/source/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,35 @@ for this project.
Prerequisites
^^^^^^^^^^^^^

- `Python 3.8+ <https://www.python.org/downloads/>`_
- `pip <https://pip.pypa.io/en/stable/getting-started/>`
- `ollama <https://ollama.com/>` *optional for local models
- `Python >=3.9,<3.12 <https://www.python.org/downloads/>`_
- `pip <https://pip.pypa.io/en/stable/getting-started/>`_
- `Ollama <https://ollama.com/>`_ (optional for local models)


Install the library
^^^^^^^^^^^^^^^^^^^^

The library is available on PyPI, so it can be installed using the following command:

.. code-block:: bash

pip install scrapegraphai

**Note:** It is higly recommended to install the library in a virtual environment (conda, venv, etc.)

If your clone the repository, you can install the library using `poetry <https://python-poetry.org/docs/>`_:

.. code-block:: bash

poetry install

Additionally on Windows when using WSL
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you are using Windows Subsystem for Linux (WSL) and you are facing issues with the installation of the library, you might need to install the following packages:

.. code-block:: bash

sudo apt-get -y install libnss3 libnspr4 libgbm1 libasound2

As simple as that! You are now ready to scrape gnamgnamgnam 👿👿👿



19 changes: 13 additions & 6 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,6 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

Welcome to scrapegraphai-ai's documentation!
=======================================

Here you will find all the information you need to get started.
The following sections will guide you through the installation process and the usage of the library.

.. toctree::
:maxdepth: 2
:caption: Introduction
Expand All @@ -22,6 +16,19 @@ The following sections will guide you through the installation process and the u

getting_started/installation
getting_started/examples

.. toctree::
:maxdepth: 2
:caption: Scrapers

scrapers/graphs
scrapers/llm
scrapers/graph_config

.. toctree::
:maxdepth: 2
:caption: Modules

modules/modules

Indices and tables
Expand Down
2 changes: 1 addition & 1 deletion docs/source/introduction/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Contributing
============

Hey, you want to contribute? Awesome!
Just fork the repo, make your changes, and send me a pull request.
Just fork the repo, make your changes, and send a pull request.
If you're not sure if it's a good idea, open an issue and we'll discuss it.

Go and check out the `contributing guidelines <https://github.com/VinciGit00/Scrapegraph-ai/blob/main/CONTRIBUTING.md>`__ for more information.
Expand Down
32 changes: 18 additions & 14 deletions docs/source/introduction/overview.rst
Original file line number Diff line number Diff line change
@@ -1,31 +1,35 @@
.. image:: ../../assets/scrapegraphai_logo.png
:align: center
:width: 50%
:alt: ScrapegraphAI

Overview
========

In a world where web pages are constantly changing and in a data-hungry world there is a need for a new generation of scrapers, and this is where ScrapegraphAI was born.
An opensource library with the aim of starting a new era of scraping tools that are more flexible and require less maintenance by developers, with the use of LLMs.
ScrapeGraphAI is a open-source web scraping python library designed to usher in a new era of scraping tools.
In today's rapidly evolving and data-intensive digital landscape, this library stands out by integrating LLM and
direct graph logic to automate the creation of scraping pipelines for websites and various local documents, including XML,
HTML, JSON, and more.

.. image:: ../../assets/scrapegraphai_logo.png
:align: center
:width: 100px
:alt: ScrapegraphAI
Simply specify the information you need to extract, and ScrapeGraphAI handles the rest,
providing a more flexible and low-maintenance solution compared to traditional scraping tools.

Why ScrapegraphAI?
==================

ScrapegraphAI in our vision represents a significant step forward in the field of web scraping, offering an open-source solution designed to meet the needs of a constantly evolving web landscape. Here's why ScrapegraphAI stands out:

Flexibility and Adaptability
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Traditional web scraping tools often rely on fixed patterns or manual configuration to extract data from web pages. ScrapegraphAI, leveraging the power of LLMs, adapts to changes in website structures, reducing the need for constant developer intervention.
Traditional web scraping tools often rely on fixed patterns or manual configuration to extract data from web pages.
ScrapegraphAI, leveraging the power of LLMs, adapts to changes in website structures, reducing the need for constant developer intervention.
This flexibility ensures that scrapers remain functional even when website layouts change.

We support many Large Language Models (LLMs) including GPT, Gemini, Groq, Azure, Hugging Face etc.
as well as local models which can run on your machine using Ollama.

Overview
========
Diagram
=======
With ScrapegraphAI you first construct a pipeline of steps you want to execute by combining nodes into a graph.
Executing the graph takes care of all the steps that are often part of scraping: fetching, parsing etc...
Finally the scraped and processed data gets fed to an LLM which generates a response.

.. image:: ../../assets/project_overview_diagram.png
:align: center
:alt: ScrapegraphAI Overview
:alt: ScrapegraphAI Overview
3 changes: 0 additions & 3 deletions docs/source/modules/modules.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
scrapegraphai
=============

.. toctree::
:maxdepth: 4

Expand Down
29 changes: 0 additions & 29 deletions docs/source/modules/yosoai.graphs.rst

This file was deleted.

61 changes: 0 additions & 61 deletions docs/source/modules/yosoai.nodes.rst

This file was deleted.

110 changes: 0 additions & 110 deletions docs/source/modules/yosoai.rst

This file was deleted.

Loading