AI Web Scraper 🤖

An AI Web Scraper using LangChain, HuggingFace, selenium etc.

Usage

Install the required packages: pip install -r requirements.txt.
Set the environments variables as explained below.
Run the Streamlit app: streamlit run streamlit_main.py.
Enter a URL and a description of what you want to parse from the website.
The app will scrape the website, extract the relevant text, and use the HuggingFace model to parse the text.

The AI Web Scraper uses the following environment variables:

HUGGINGFACE_MODEL_ID: The ID of the HuggingFace model to use for parsing the text.
UGGINGFACEHUB_API_TOKEN : HuggingFace Hub API token.
SBR_WEBDRIVER (Optional for captcha support): The URL of the Bright Data Webdriver to use for solving captchas.

The AI Web Scraper is built using the following technologies:

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.example_env		.example_env
.gitignore		.gitignore
Readme.md		Readme.md
example.gif		example.gif
geckodriver		geckodriver
parse_LLM.py		parse_LLM.py
requirements.txt		requirements.txt
scrape.py		scrape.py
streamlit_main.py		streamlit_main.py