Skip to content

dalv-oio/playwright_parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parser (Selenium + BeautifulSoup4)

This is a command-line website parser written in Python. It search product on site from main page with Playwright after parse HTML and collects product data (name, price, specifications, color, memory, etc.) using BeautifulSoup. It supports error handling and saves to JSON/CSV.

Features

  • Product data collection
    • Full product name
    • Color
    • Storage capacity
    • Manufacturer
    • Regular price
    • Promotional price (if any)
    • All product photos. Photos and links to photos are collected and saved in a list.
    • Product code
    • Number of reviews
    • Screen size
    • Display resolution
    • Product specifications. All specifications are on the tab. Specifications are collected as a dictionary.

Technology Stack

  • Backend:
    • Python programming language;
    • Django framework ;
    • PostgreSQL database (Django ORM).
    • Playwright + BeautifulSoup4 + lxml

Environment Variables

To run this project, you will need to add the following environment variables:

  • SECRET_KEY=

  • ALLOWED_HOSTS=

  • DEBUG=

  • MEDIA_ROOT=

  • STATIC_ROOT=

  • POSTGRES_DB=

  • POSTGRES_USER=

  • POSTGRES_PASSWORD=

  • POSTGRES_HOST=

  • POSTGRES_PORT=

Look at the .env.example

ADD in PARSER_REQ_BS / my_apps / parser / management / commands / run_purser.py :

  • url = " website url "
  • search_input.type(text="write product full name", delay=0.3) !!! Important product name must be phone name !!!

Getting Started

To get started with the project, follow these steps:

Note: Don't forget about environment variables

  1. Clone the repository:

    git clone https://github.com/dalv-oio/playwright_parser.git
    
  2. Go to the project directory:

    cd playwright_parser
    
  3. Install the required dependencies:

    pip install -r requirements.txt
    
  4. Set up the database connection and configurations according to the selected database engine. Apply migrations

    python manage.py makemigrations
    python manage.py migrate
    
  5. Run the Django development server:

    python manage.py run_parser
    

SOCIAL

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors