Parsinator

Parsinator turns structured and unstructured text into a header-detail representation. You could use Parsinator to create an XML file from a pdf file or a C# object from a printer spool file. In general, you can use Parsinator to parse relevant data from any text.

Why

Parsinator extracts relevant data from text files based on defined rules. It doesn't use any OCR technology.

You parse a text file by composing small functions to read or ignore text at the page or line level. Parsinator was heavily inspired by functional parsers combinators.

Read more about Parsinator motivation in Parsinator, a tale of a pdf parser.

Usage

Parsinator uses three types of entities:

Skipper: It removes chunks of text from the text to parse
Parser: It captures text based on a pattern
Transformation: It flattens lines spawning multiple pages

Parsinator provides a set of basic skippers, parsers, and transformation methods, but you can add your own entities. Find the list of the available entities in the Wiki.

Parse patterns

Parsinator finds text matching a regular expression in a given page or line.

using Parsinator;

var lines = new List<List<string>>
{
    new List<string>
    {
        "Any text",
        "Name: Alice",
        "Any text",
        "Any text Address: Wonderland"
    }
};

var parsers = new Dictionary<string, IEnumerable<IParser>>
{
    {
        "PersonalData",
        new List<IParser>
        {
            new ParseFromLineWithRegex(key: "FullName", lineNumber: 2, pattern: new Regex("^Name: (\w+)$")),
            new ParseFromRegex(key: "Address", pattern: new Regex("Address: (\w+)$")
        }
    }
};
var parsinator = new Parser(parsers);
Dictionary<string, Dictionary<string, string>> parsed = parsinator.Parse(lines);

Assert.IsTrue(parsed.ContainsKey("PersonalData"));
Assert.AreEqual("Alice", parsed["PersonalData"]["FullName"]);
Assert.AreEqual("Wonderland", parsed["PersonalData"]["Address"]);

Use a Fluent Interface

Alternatively, Parsinator has a fluent API to create skippers and parsers. Refer to the Parsinator.Fluent namespace for all supported skippers and parsers.

using Parsinator.Fluent;

var parsers = new Dictionary<string, IEnumerable<IParser>>
{
    {
        "PersonalData",
        new List<IParser>
        {
            Parse.Key("FullName").FromLine(2).Regex(new Regex("^Name: (\w+)$")),
            Parse.Key("Address").Regex(new Regex("Address: (\w+)$")
        }
    }
};

Create Xml

Parsinators relies on DataSet and DataTable to build XML files from parsed values. It provides extension methods to build tables and columns.

var dataSet = new DataSet("Author")
    .WithTable(new DataTable("PersonalInfo")
        .WithColumn("Name"));

var lines = new List<List<string>>
{
    new List<string>
    {
        "Any text",
        "Name: Alice"
    }
};

var parser = Parse.Key("FullName").FromLine(2).Regex(new Regex("^Name: (\w+)$"));
var parsinator = new Parser(parser);
Dictionary<string, Dictionary<string, string>> parsed = parsinator.Parse(lines);

var xml = parsed.ToDataSet(dataSet).GetXml();

Assert.AreEqual("<Author><PersonalInfo Name="Alice" /></Author>", xml);

Please, take a look at the Sample project to see how to parse a plain-text invoice, a GPS frame, and an ebook table of content.

Installation

Grab your own copy

Contributing

Feel free to report any bug, ask for a new feature or just send a pull-request. All contributions are welcome.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.github/workflows		.github/workflows
Parsinator.Sample		Parsinator.Sample
Parsinator.Tests		Parsinator.Tests
Parsinator		Parsinator
.gitignore		.gitignore
LICENSE		LICENSE
Parsinator.sln		Parsinator.sln
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

Parsinator.Sample

Parsinator.Sample

Parsinator.Tests

Parsinator.Tests

Parsinator

Parsinator

.gitignore

.gitignore

LICENSE

LICENSE

Parsinator.sln

Parsinator.sln

README.md

README.md

Repository files navigation

Parsinator

Why

Usage

Parse patterns

Use a Fluent Interface

Create Xml

Installation

Contributing

License

About

Releases

Packages

Languages

License

canro91/Parsinator

Folders and files

Latest commit

History

Repository files navigation

Parsinator

Why

Usage

Parse patterns

Use a Fluent Interface

Create Xml

Installation

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages