Parser (Selenium + BeautifulSoup4)

This is a command-line website parser written in Python. It search product on site from main page with Playwright after parse HTML and collects product data (name, price, specifications, color, memory, etc.) using BeautifulSoup. It supports error handling and saves to JSON/CSV.

Features

Product data collection
- Full product name
- Color
- Storage capacity
- Manufacturer
- Regular price
- Promotional price (if any)
- All product photos. Photos and links to photos are collected and saved in a list.
- Product code
- Number of reviews
- Screen size
- Display resolution
- Product specifications. All specifications are on the tab. Specifications are collected as a dictionary.

Technology Stack

Backend:
- Python programming language;
- Django framework ;
- PostgreSQL database (Django ORM).
- Playwright + BeautifulSoup4 + lxml

Environment Variables

To run this project, you will need to add the following environment variables:

SECRET_KEY=
ALLOWED_HOSTS=
DEBUG=
MEDIA_ROOT=
STATIC_ROOT=
POSTGRES_DB=
POSTGRES_USER=
POSTGRES_PASSWORD=
POSTGRES_HOST=
POSTGRES_PORT=

Look at the .env.example

ADD in PARSER_REQ_BS / my_apps / parser / management / commands / run_purser.py :

url = " website url "
search_input.type(text="write product full name", delay=0.3) !!! Important product name must be phone name !!!

Getting Started

To get started with the project, follow these steps:

Note: Don't forget about environment variables

Clone the repository:

git clone https://github.com/dalv-oio/playwright_parser.git

Go to the project directory:
```
cd playwright_parser
```
Install the required dependencies:
```
pip install -r requirements.txt
```
Set up the database connection and configurations according to the selected database engine. Apply migrations
```
python manage.py makemigrations
python manage.py migrate
```
Run the Django development server:
```
python manage.py run_parser
```

SOCIAL

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
files		files
my_app		my_app
playwright_parser		playwright_parser
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
manage.py		manage.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parser (Selenium + BeautifulSoup4)

Features

Technology Stack

Environment Variables

To run this project, you will need to add the following environment variables:

ADD in PARSER_REQ_BS / my_apps / parser / management / commands / run_purser.py :

Getting Started

SOCIAL

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Parser (Selenium + BeautifulSoup4)

Features

Technology Stack

Environment Variables

To run this project, you will need to add the following environment variables:

ADD in PARSER_REQ_BS / my_apps / parser / management / commands / run_purser.py :

Getting Started

SOCIAL

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages