Dynamic_Scraper_with_AgentQL-ScrapeGraph

What is this?

This is a web scraping tool that is AI powered, and can be used to extract information dynamically from web pages. It automatically detects pagination, and should be able to scrape most websites on multiple pages by it self.

Set up the .env file:

If you dont have a .env file, create one by copying the .env.example file and filling in the details

Installation

on windows, make a python virtual environment, do it in command prompt:

python -m venv env

Activate the virtual environment:

.\env\Scripts\activate

Install the required dependencies using pip install -r requirements.txt

Usage

Create a configuration file as described in AnExample_CONFIG_FILE.yml
Run the tool using python main.py, make sure to have the config file correctly set up
The tool will scrape the specified URLs and save the results in a JSON file, and return the results in the terminal

if you run into an error for level1 scraping:

you can see what the computer is doing by going here:

# if you want to see the browser, set headless=False -- >ctrl + shift + f this comment, its in level1_scraper.py, line 52
browser = p.chromium.launch(headless=True)

for future Ali AND if you are getting errors:

For some reason it is working fine with python 3.9.13, idk why, but it is. But if not: Make sure your python version is 3.11 or higher, but DONT GET 3.13. The ScrapeGraph API is not compatible with 3.13. if you need to downgrade your python version, you can do it here: https://www.python.org/downloads/ check your python version by running this in command prompt HAVE YOUR VIRTUAL ENV ACTIVATED:

python --version

"# Hackathon_Insta_Scraper"

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Final_Processed_Data		Final_Processed_Data
__pycache__		__pycache__
finished_data		finished_data
level_1_data		level_1_data
level_2_data		level_2_data
.gitignore		.gitignore
AnExample_CONFIG_FILE.yml		AnExample_CONFIG_FILE.yml
AnExample_env_file.txt		AnExample_env_file.txt
README.md		README.md
config.yml		config.yml
current_urls.yml		current_urls.yml
level1_scraper.py		level1_scraper.py
level2_scraper.py		level2_scraper.py
main.py		main.py
prompt.yml		prompt.yml
prompt2.yml		prompt2.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dynamic_Scraper_with_AgentQL-ScrapeGraph

What is this?

Set up the .env file:

Installation

Usage

if you run into an error for level1 scraping:

for future Ali AND if you are getting errors:

About

Uh oh!

Releases

Packages

Languages

AliUofT/Dynamic_Scraper_with_AgentQL-ScrapeGraph

Folders and files

Latest commit

History

Repository files navigation

Dynamic_Scraper_with_AgentQL-ScrapeGraph

What is this?

Set up the .env file:

Installation

Usage

if you run into an error for level1 scraping:

for future Ali AND if you are getting errors:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages