Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Custom string data parser written in python

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 Examples
Octocat-spinner-32 Tests
Octocat-spinner-32 .gitignore
Octocat-spinner-32 README.md
Octocat-spinner-32 custom_string_parser.py
README.md

Custom String Parser

You can use this component if you need to parse any information from any string value which has some syntax logics.

The easiest way to parse data from string in python.

Overview

CustomStringParser, the missing simple string parser for python developers.

Usage

Parsing HTML

Note: for html based parsing you should consider using xpath

Imagine you have this kind of content in your string_data with this content:

<div class="section-item">
    <div class="section-title">
        title1
    </div> <!-- end section-title -->
    <div class="section-comments">
        15
    </div> <!-- end section-comments -->
</div> <!--end section-item-->
<div class="section-item">
    <div class="section-title">
        title2
    </div> <!-- end section-title -->
    <div class="section-comments">
        16
    </div> <!-- end section-comments -->
</div> <!--end section-item-->
<div class="section-item">
    <div class="section-title">
            title3
    </div> <!-- end section-title -->
    <div class="section-comments">
        17
    </div> <!-- end section-comments -->
</div> <!--end section-item-->

We need to parse these items:

  • title
  • comments count

Code to parse this looks like this:

parser = CustomStringParserCore(string_data)
item_parser = ParsingNode('item', '<div class="section-item">', '</div> <!--end section-item-->')

title_parser = ParsingNode('title', '<div class="section-title">', '</div> <!-- end section title -->')
comments_parser = ParsingNode('comments', '<div class="section-comments">', '</div> <!-- end section-comments -->')
# note: our item result will have title and comments inside of it, so we can do this:
item_parser.add_parser(title_parser)
item_parser.add_parser(comments_parser)

# add main parser to the parsing core
parser.add_parser(item_parser)

# call the parse
parser.parse()

<..>

output (print_results(item_parser.results)):

item:

<div class="section-title">
        title1
    </div> <!-- end section-title -->
    <div class="section-comments">
        15
    </div> <!-- end section-comments -->

title:

title1

comments:

15

item:

<div class="section-title">
        title2
    </div> <!-- end section-title -->
    <div class="section-comments">
        16
    </div> <!-- end section-comments -->

title:

title2

comments:

16

item:

<div class="section-title">
            title3
    </div> <!-- end section-title -->
    <div class="section-comments">
        17
    </div> <!-- end section-comments -->

title:

title3

comments:

17

This is very generic, so you can parse practically any structure.

Unit tests

This library suppose to be fully unit tested. So if you want to participate keep that in mind.

Feature ideas ( not yet implemented )

  • Regex based parsers possibility.
  • Grouped regex based parsers possibility.
  • XPath based parsers possibility.
  • Filtering out results by parser name.
Something went wrong with that request. Please try again.