parsel_text

parsel_text is a Python library designed to simplify the extraction of text data from HTML or XML documents using XPath queries on parsel Selector objects. It provides a straightforward interface to obtain and optionally fix mojibake (garbled text due to encoding issues).

Installation

To install parsel_text, use pip:

pip install parsel_text

Usage

Function: `parsel_sel_get_text`

This is the main function of the library, designed to extract all text results from an XPath query on a parsel Selector object.

Parameters

parsel_sel (parsel.Selector): The parsel Selector object from which to extract text.
xpath (str): The XPath query string to specify the text extraction path.
fix_mojibake (bool, optional): A flag to indicate whether to fix mojibake issues in the extracted text. Default is True.

Returns

str: A string containing the concatenated text results from the specified XPath query.

Example

Here's a simple example of how to use the parsel_sel_get_text function:

from parsel import Selector
from parsel_text import parsel_sel_get_text

html_content = """
<html>
  <body>
    <div id="content">
      <p>Hello, world!</p>
      <p>Welcome to the parsel_text library.</p>
    </div>
  </body>
</html>
"""

# Create a parsel Selector object
selector = Selector(text=html_content)

# Define the XPath query
xpath_query = "//div[@id='content']/p//text()"

# Extract text using the parsel_sel_get_text function
extracted_text = parsel_sel_get_text(parsel_sel=selector, xpath=xpath_query)

print(extracted_text)

Output

Hello, world!
Welcome to the parsel_text library.

Contributing

Contributions are welcome! If you find a bug or have a feature request, please open an issue on the GitHub repository.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
parsel_text		parsel_text
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parsel_text

Installation

Usage

Function: `parsel_sel_get_text`

Parameters

Returns

Example

Output

Contributing

License

About

Releases

Packages

Languages

License

carlosplanchon/parsel_text

Folders and files

Latest commit

History

Repository files navigation

parsel_text

Installation

Usage

Function: parsel_sel_get_text

Parameters

Returns

Example

Output

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Function: `parsel_sel_get_text`

Packages