Skip to content

The training project "PageLoader" on the Python Development course on Hexlet.io

Notifications You must be signed in to change notification settings

IgorGakhov/PageLoader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

35 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PageLoader


The training project "PageLoader" on the Python Development course on Hexlet.io.

Actions Status linter-and-tests-check Maintainability Test Coverage

Built With

Languages, frameworks and libraries used in the implementation of the project:

Dependencies

List of dependencies, without which the project code will not work correctly:

  • python = "^3.8"
  • requests = "^2.28.1"
  • beautifulsoup4 = "^4.11.1"
  • progress = "^1.6"

Description

PageLoader is a command line utility that downloads pages from the Internet and saves them to your computer. Together with the page, it downloads all the resources (pictures, styles and js) making it possible to open the page without the Internet.

By the same principle, saving pages in the browser is arranged.

The utility multi-threadedly downloads resources and shows the progress for each resource in the terminal.

Summary


Installation

Python

Before installing the package, you need to make sure that you have Python version 3.8 or higher installed:

# Windows, Ubuntu, MacOS:
>> python --version # or python -V
Python 3.8.0+

⚠️ If a command without a version does not work, specify the Python version explicitly: python3 --version.

If you have an older version installed, update with the following commands:

# Windows:
>> pip install python --upgrade
# Ubuntu:
>> sudo apt-get upgrade python3.X
# MacOS:
>> brew update && brew upgrade python
# * X - version number to be installed

If you don't have Python installed, you can download and install it from the official Python website. If you are an Ubuntu or MacOS user, then it is better to do this procedure through package managers. Open a terminal and run the command for your operating system:

# Ubuntu:
>> sudo apt update
>> sudo apt install python3.X
# MacOS:
# https://brew.sh/index_ru.html
>> brew install python3.X
# * X - version number to be installed

❗ The configuration of assemblies of different versions of operating systems can vary greatly from each other, which makes it impossible to write a common instruction. If you're running an OS other than the above, or you're having errors after the suggested commands, search Stack Overflow for answers, maybe someone else has come across them before you! Setting up the environment is not easy! πŸ™‚

Poetry

The project uses the Poetry manager. Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you. You can read more about this tool on the official Poetry website.

Poetry provides a custom installer that will install poetry isolated from the rest of your system by vendorizing its dependencies. This is the recommended way of installing poetry.

# Windows (WSL), Linux, MacOS:
>> curl -sSL https://install.python-poetry.org | python3 -
# Windows (Powershell):
>> (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -
# If you have installed Python through the Microsoft Store, replace "py" with "python" in the command above.

⚠️ On some systems, python may still refer to Python 2 instead of Python 3. The Poetry Team suggests a python3 binary to avoid ambiguity.

⚠️ By default, Poetry is installed into a platform and user-specific directory:

  • ~/Library/Application Support/pypoetry on MacOS.
  • ~/.local/share/pypoetry on Linux/Unix.
  • %APPDATA%\pypoetry on Windows.

If you wish to change this, you may define the $POETRY_HOME environment variable:

>> curl -sSL https://install.python-poetry.org | POETRY_HOME=/etc/poetry python3 -

Add Poetry to your PATH.

Once Poetry is installed and in your $PATH, you can execute the following:

>> poetry --version

Project package

To work with the package, you need to clone the repository to your computer. This is done using the git clone command. Clone the project on the command line:

# clone via HTTPS:
>> git clone https://github.com/IgorGakhov/python-project-51.git
# clone via SSH:
>> git clone git@github.com:IgorGakhov/python-project-51.git

It remains to move to the directory and install the package:

>> cd python-project-51
>> poetry build
>> python3 -m pip install --user dist/*.whl
# If you have previously installed a package and want to update it, use the following command:
# >> python3 -m pip install --user --force-reinstall dist/*.whl

Finally, we can move on to using the project functionality!


Usage

As external library

from page_loader import download
file_path = download(url_address, destination)

As CLI tool

Help

The utility provides the ability to call the help command if you find it difficult to use:

>> page-loader --help
usage: page-loader [-h] [--output DESTINATION] url_address

Downloads the page from the network and puts it in the specified existing directory (default: working directory).

positional arguments:
  url_address           page being downloaded

options:
  -h, --help            show this help message and exit
  --output DESTINATION  output directory (default: current dir)

asciicast

Demo

⚑ Only absolute file paths are supported.

πŸ“Œ Page loading

The utility downloads resources and shows the progress of each resource in the terminal.

Example:

>> page-loader --output /home/user/page_storage https://page-loader.hexlet.repl.co/
12:41:24 INFO: Initiated download of page https://page-loader.hexlet.repl.co/ to local directory Β«/home/user/page_storageΒ» ...
12:41:25 INFO: Response from page https://page-loader.hexlet.repl.co/ received.
Page available for download!
Resources Loading |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                        | 25%   [1/4]
12:41:26 INFO: [+] Resource https://page-loader.hexlet.repl.co/script.js saved successfully!
Resources Loading |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                | 50%   [2/4]
12:41:26 INFO: [+] Resource https://page-loader.hexlet.repl.co/assets/professions/nodejs.png saved successfully!
Resources Loading |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ        | 75%   [3/4]
12:41:26 INFO: [+] Resource https://page-loader.hexlet.repl.co/assets/application.css saved successfully!
Resources Loading |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 100%   [4/4]
12:41:26 INFO: [+] Resource https://page-loader.hexlet.repl.co/courses saved successfully!

12:41:26 INFO: FINISHED! Loading is complete successfully!
The downloaded page is located in the Β«/home/user/page_storage/page-loader-hexlet-repl-co.htmlΒ» file.

/home/user/page_storage/page-loader-hexlet-repl-co.html

asciicast


Development

Dev Dependencies

List of dev-dependencies:

  • flake8 = "^4.0.1"
  • pytest = "^7.1.3"
  • pytest-cov = "^3.0.0"
  • requests-mock = "^1.10.0"

Project Organization

>> tree .
.
β”œβ”€β”€ page_loader
β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  β”œβ”€β”€ load_processor
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ downloader.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ file_system_guide.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ html_parser.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ name_converter.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ data_loader.py
β”‚Β Β  β”‚Β Β  └── saver.py
β”‚Β Β  β”œβ”€β”€ cli.py
β”‚Β Β  β”œβ”€β”€ logger.py
β”‚Β Β  β”œβ”€β”€ progress.py
β”‚Β Β  └── scripts
β”‚Β Β      β”œβ”€β”€ __init__.py
β”‚Β Β      └── run.py
└── tests
β”‚   β”œβ”€β”€ auxiliary.py
β”‚   β”œβ”€β”€ fixtures
β”‚   β”‚Β Β  β”œβ”€β”€ downloaded_nodejs_course.html
β”‚   β”‚Β Β  └── mocks
β”‚   β”‚Β Β      β”œβ”€β”€ assets-application.css
β”‚   β”‚Β Β      β”œβ”€β”€ assets-professions-nodejs.png
β”‚   β”‚Β Β      β”œβ”€β”€ courses.html
β”‚   β”‚Β Β      β”œβ”€β”€ packs-js-runtime.js
β”‚   β”‚Β Β      └── source_nodejs_course.html
β”‚   β”œβ”€β”€ test_cli.py
β”‚   β”œβ”€β”€ test_downloader.py
β”‚   β”œβ”€β”€ test_file_system_guide.py
β”‚   └── test_html_parser.py
β”œβ”€β”€ journal.log
β”œβ”€β”€ Makefile
β”œβ”€β”€ poetry.lock
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ README.md
└── setup.cfg

Useful commands

The commands most used in development are listed in the Makefile:

make package-install
Installing a package in the user environment.
make build
Building the distribution of he Poetry package.
make package-force-reinstall
Reinstalling the package in the user environment.
make lint
Checking code with linter.
make test
Tests the code.
make fast-check
Builds the distribution, reinstalls it in the user's environment, checks the code with tests and linter.

Thank you for attention!

πŸ‘¨β€πŸ’» Author: @IgorGakhov