This parser can:
- Execute some vacancies search on Headhunter
- Get all the pagination and find each vacancies
- Collect data from each vacancy
- Convert data into Excel
- Incredible!
Al this with web-interface for your users to work with!
And an additional instruction, how to convert this into an exe-file!!
Tested on Windows!
Load this shiny repository to wherever you like using pip:
- Press Win + R, type cmd or powershell, and press Enter
- Use the cd command to change the current directory to the location where you want to clone the repository
cd path\to\desired\directory
- Use the git clone command to clone the repository. Replace <repository_url> with the actual URL of the GitHub repository
git clone https://github.com/aaskorohodov/hh_parser.git
Now it's time to install python!
- Visit the official Python website: https://www.python.org/downloads/.
- Click on the "Downloads" tab.
- Choose the latest version for your system (tested with Python 3.10.xx).
- Scroll down to the Files section and download the installer for Windows (usually a .exe file)
- Double-click the downloaded installer.
- Check the box that says "Add Python to PATH" during installation.
- Click "Install Now" to start the installation.
- Open Command Prompt or PowerShell.
- Type the following command to check the installed Python version:
python --version
You should see the Python version number.
Now you will need a venv!
- Open Command Prompt or PowerShell. If one encounters some problems - select another!
- To open CMD:
- Press WIN
- Type CMD
- Open this thing!
- To open CMD:
- Type the following command to install virtualenv using pip:
pip install virtualenv
- Open Command Prompt or PowerShell and navigate to downloaded repository (hh_parser, most-likely).
cd path\to\desired\directory
- Create a virtual environment by running:
python -m venv venv
- In the same Command Prompt or PowerShell window, activate the virtual environment:
.\venv\Scripts\activate
You should see the virtual environment's name in your command prompt.
- Install all libraries required! In the same Command Prompt or PowerShell window:
pip install -r requirements.txt
Make sure that you terminal is in the correct directory! You need root-folder with repository, there should be a file named 'requirements.txt'!
Now, you can launch this in development-mode! In the same Command Prompt or PowerShell window:
python main.py
Make sure that you terminal is in the correct directory! You need root-folder with repository, there should be a file named 'main.py', this is the one you are launching!
Now, you should see something like this:
Copy this address and open this thing with your browser! Now you should see this:
Now use it!
Type vacancy you need, for example:
Wait a bit:
Great! Now you can load this into Excel:
Excel will appear in 'results' folder, which will be created in the downloaded repository.
Most-likely you will face some form of outdated parser-code. This is due to the fact, that websites are being updated from time to time. To troubleshoot this, you will need some skills in Python and HTML!
- Make sure url is correct! You can find it in Parser._base_url.
- Get to the actual page and check if this URL actually works (in your browser)
- Check if some elements were changed on the page, for example:
- 'company = vacancy.find('div', {'class': 'vacancy-serp-item__meta-info-company'}).text'
- Is it still div?
- Is it still 'vacancy-serp-item__meta-info-company'?
That should be it!
To simplify usage, you may want to pack this script into exe. This way you will be able to ssend this script to a User, with no need for this User to install any additional software like Python.
Install this:
pip install auto-py-to-exe
Launch a beautiful GUI:
auto-py-to-exe