Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Release notes

## v1.1.0

- Renamed package from `djc-core-html-parser` to `djc-core`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means that the package that django-components will depend on will be renamed to djc-core. But the public API of djc-core remains the same.

- Refactored project into a monorepo

## v1.0.3

- Update to Python 3.14
Expand Down
11 changes: 10 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 7 additions & 10 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,14 +1,11 @@
[package]
name = "djc_core_html_parser"
version = "1.0.3"
edition = "2021"
[workspace]
members = [
"crates/djc-core",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

djc-core is the public API of this project. It's what will get exposed to Python. Inside it, it just re-exports other Rust crates.

"crates/djc-html-transformer",
]
resolver = "2"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[lib]
name = "djc_core_html_parser"
crate-type = ["cdylib"]

[dependencies]
[workspace.dependencies]
pyo3 = { version = "0.27.0", features = ["extension-module"] }
quick-xml = "0.38.3"

Expand Down
26 changes: 16 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,27 @@
# djc-core-html-parser
# djc-core

[![PyPI - Version](https://img.shields.io/pypi/v/djc-core-html-parser)](https://pypi.org/project/djc-core-html-parser/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/djc-core-html-parser)](https://pypi.org/project/djc-core-html-parser/) [![PyPI - License](https://img.shields.io/pypi/l/djc-core-html-parser)](https://github.com/django-components/djc-core-html-parser/blob/master/LICENSE/) [![PyPI - Downloads](https://img.shields.io/pypi/dm/djc-core-html-parser)](https://pypistats.org/packages/djc-core-html-parser) [![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/django-components/djc-core-html-parser/tests.yml)](https://github.com/django-components/djc-core-html-parser/actions/workflows/tests.yml)
[![PyPI - Version](https://img.shields.io/pypi/v/djc-core)](https://pypi.org/project/djc-core/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/djc-core)](https://pypi.org/project/djc-core/) [![PyPI - License](https://img.shields.io/pypi/l/djc-core)](https://github.com/django-components/djc-core/blob/master/LICENSE/) [![PyPI - Downloads](https://img.shields.io/pypi/dm/djc-core)](https://pypistats.org/packages/djc-core) [![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/django-components/djc-core/tests.yml)](https://github.com/django-components/djc-core/actions/workflows/tests.yml)

HTML parser used by [django-components](https://github.com/django-components/django-components). Written in Rust, exposed as a Python package with [maturin](https://www.maturin.rs/).

This implementation was found to be 40-50x faster than our Python implementation, taking ~90ms to parse 5 MB of HTML.
Rust-based parsers and toolings used by [django-components](https://github.com/django-components/django-components). Exposed as a Python package with [maturin](https://www.maturin.rs/).

## Installation

```sh
pip install djc-core-html-parser
pip install djc-core
```

## Usage
## Packages

### HTML transfomer

Transform HTML in a single pass. This is a simple implementation.

This implementation was found to be 40-50x faster than our Python implementation, taking ~90ms to parse 5 MB of HTML.

**Usage**

```python
from djc_core_html_parser import set_html_attributes
from djc_core import set_html_attributes

html = '<div><p>Hello</p></div>'
result, _ = set_html_attributes(
Expand All @@ -39,7 +45,7 @@ Then, during the HTML transformation, we check each element for this attribute.
2. Record the attributes that were added to the element, using the value of the watched attribute as the key.

```python
from djc_core_html_parser import set_html_attributes
from djc_core import set_html_attributes

html = """
<div data-watch-id="123">
Expand Down Expand Up @@ -117,4 +123,4 @@ To publish a new version of the package, you need to:

1. Bump the version in `pyproject.toml` and `Cargo.toml`
2. Open a PR and merge it to `main`.
3. Create a new tag on the `main` branch with the new version number (e.g. `v1.0.0`), or create a new release in the GitHub UI.
3. Create a new tag on the `main` branch with the new version number (e.g. `1.0.0`), or create a new release in the GitHub UI.
14 changes: 14 additions & 0 deletions crates/djc-core/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
[package]
name = "djc-core"
version = "1.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[lib]
name = "djc_core"
crate-type = ["cdylib"]

[dependencies]
djc-html-transformer = { path = "../djc-html-transformer" }
pyo3 = { workspace = true }
quick-xml = { workspace = true }
9 changes: 9 additions & 0 deletions crates/djc-core/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
use djc_html_transformer::set_html_attributes;
use pyo3::prelude::*;

/// A Python module implemented in Rust for high-performance transformations.
#[pymodule]
fn djc_core(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_function(wrap_pyfunction!(set_html_attributes, m)?)?;
Ok(())
}
8 changes: 8 additions & 0 deletions crates/djc-html-transformer/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[package]
name = "djc-html-transformer"
version = "1.0.3"
edition = "2021"

[dependencies]
pyo3 = { workspace = true }
quick-xml = { workspace = true }
File renamed without changes.
8 changes: 4 additions & 4 deletions djc_core_html_parser/__init__.py → djc_core/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
# This file is what maturin auto-generates. But it seems maturin omits it when we have a __init__.pyi file.
# So we have to manually include it here.

from .djc_core_html_parser import *
from .djc_core import *

__doc__ = djc_core_html_parser.__doc__
if hasattr(djc_core_html_parser, "__all__"):
__all__ = djc_core_html_parser.__all__
__doc__ = djc_core.__doc__
if hasattr(djc_core, "__all__"):
__all__ = djc_core.__all__
File renamed without changes.
File renamed without changes.
22 changes: 12 additions & 10 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ requires = ["maturin>=1.8,<2.0"]
build-backend = "maturin"

[project]
name = "djc_core_html_parser"
version = "1.0.3"
name = "djc_core"
version = "1.1.0"
requires-python = ">=3.8, <4.0"
description = "HTML parser used by django-components written in Rust."
keywords = ["django", "components", "html"]
Expand All @@ -31,17 +31,19 @@ license = {text = "MIT"}

# See https://docs.pypi.org/project_metadata/#icons
[project.urls]
Homepage = "https://github.com/django-components/djc-core-html-parser/"
Changelog = "https://github.com/django-components/djc-core-html-parser/blob/main/CHANGELOG.md"
Issues = "https://github.com/django-components/djc-core-html-parser/issues"
Homepage = "https://github.com/django-components/djc-core/"
Changelog = "https://github.com/django-components/djc-core/blob/main/CHANGELOG.md"
Issues = "https://github.com/django-components/djc-core/issues"
Donate = "https://github.com/sponsors/EmilStenstrom"

[tool.maturin]
# This is the crate that will be exposed to Python
manifest-path = "crates/djc-core/Cargo.toml"
features = ["pyo3/extension-module"]
include = [
"djc_core_html_parser/__init__.py",
"djc_core_html_parser/__init__.pyi",
"djc_core_html_parser/py.typed",
"djc_core/__init__.py",
"djc_core/__init__.pyi",
"djc_core/py.typed",
]

[tool.black]
Expand All @@ -67,7 +69,7 @@ profile = "black"
line_length = 119
multi_line_output = 3
include_trailing_comma = "True"
known_first_party = "djc_core_html_parser"
known_first_party = "djc_core"

[tool.flake8]
ignore = ['E302', 'W503']
Expand All @@ -92,7 +94,7 @@ exclude = [
]

[[tool.mypy.overrides]]
module = "djc_core_html_parser.*"
module = "djc_core.*"
disallow_untyped_defs = true


Expand Down
10 changes: 0 additions & 10 deletions src/lib.rs

This file was deleted.

2 changes: 1 addition & 1 deletion tests/benchmark.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from statistics import mean, stdev
import time

from djc_core_html_parser import set_html_attributes
from djc_core import set_html_attributes


def generate_large_html(num_elements: int = 1000) -> str:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# This same set of tests is also found in django-components, to ensure that
# this implementation can be replaced with the django-components' pure-python implementation

from djc_core_html_parser import set_html_attributes
from djc_core import set_html_attributes
from typing import Dict, List


Expand Down
Loading