# Advanced Section Breaks

*Temporary Style Settings here*
<style>
li {
    list-style: disc;
    margin-left: 2em;
}
li p {
    list-style: disk;
    line-height: normal;
    margin-bottom: 0;
}
table p {
    list-style: disk;
    line-height: normal;
    margin-bottom: 0;
    text-align: left;
}
</style>

Test definition for evaluating a source item.

A trigger is formed from a conditional definition to be applied to source
items.  The conditional definition is generated from one of the following
sentinel types:

    None:   A place holder conditional that will never pass.

    bool:   A conditional that will either always pass or always fail.

    int:    A conditional that will pass after being called the specified
            number of times. -- Not Yet Implemented.

    str or List[str]:
            A conditional that will pass if the item being tested matches
            with the string (or with any of the strings in the list).  The
            location attribute dictates the type of match required.

    re.Pattern or List[re.Pattern]:  Compiled regular expression pattern(s)
        A conditional that will pass if the pattern (or one of the patterns
        in the list) successfully matches in the item being tested. The
        location attribute dictates the type of regular expression match
        required. Regular Expression patterns must be compiled with
        re.compile(string) to distinguish them from plain text sentinels.

    Callable or List[Callable]:
            A conditional that will pass if the sentinel function (or one
            of the functions in the list) returns a non-blank
            (None, '', []) value when applied to the item being tested.

The location argument is a sentinel modifier that applies to str or
    re.Pattern types of sentinels. location can be one of:
        location    str test                    re.Pattern test
          IN      sentinel in item            sentinel.search(item)
          START   item.startswith(sentinel)   sentinel.match(item)
          END     item.endswith(sentinel),    NotImplementedError
          FULL    sentinel == item            sentinel.fullmatch(item)

When a test is applied, the event property is updated based on whether the
    test passes and the type of test.

    If the test fails:
        event -> None.

    If the test passes:
        sentinel Type                   event Type

        bool (True)                     bool (True)

        int:                            int: the integer value of the
                                            sentinel.

        str or List[str]                str: the specific string in the
                                            list that caused the pass.

        re.Pattern or List[re.Pattern]  re.match: the match object
                                            generated by applying the
                                            pattern to the item.

        Callable or List[Callable]      Any: The return value of the
                                            successful function.

If the supplied sentinel is a list of strings, compiled regular expressions
or functions, the trigger will step through each sentinel element in the
list, evaluating them against the supplied item to test.  When a test
passes, no additional items in the list will be tested.

Attributes:
        sentinel (None, bool, int, str, re.Pattern, Callable, or
                  List[str], List[re.Pattern], List[Callable]): The
            object(s) used to generate the conditional definition.

            Note: int type sentinel is not yet implemented.

        name (str, optional): A reference label for the Trigger.
            A reference name for the section instance.

        event (TriggerEvent): Information resulting from applying the test.

Define test(s) that signal a trigger event.

Arguments:
    sentinel (TriggerOptions): Object(s) used to generate the
        conditional definition.
        Note: int type sentinel is not yet implemented.
    location (str, optional):  A sentinel modifier that applies to str
        or re.Pattern types of sentinels. For other sentinel types it
        is ignored. One of  ['IN', 'START', 'END', 'FULL', None].
        Default is None, which is treated as 'IN'
        if sentinel is a string type:
            location == 'IN' -> sentinel in line,
            location == 'START' -> line.startswith(sentinel), in line,
            location == 'END' -> line.endswith(sentinel),
            location == 'FULL' -> sentinel == line.
        if sentinel is a Regular Expression type:
            location == 'IN' -> sentinel.search(line),
            location == 'START' -> sentinel.match(line),
            location == 'FULL' -> sentinel.fullmatch(line),
            location == 'END' -> raise NotImplementedError.
    name (str, optional): A reference label for the Trigger. Default is
        'Trigger'.



Signature: sections.Trigger.set_sentinel_type(self) -> 'str'
Docstring:
Identify the type of sentinel supplied.

The sentinel type returned can be one of:
    type(sentinel)  sentinel_type string
      None                'None'
      bool                'Boolean'
      int                 'Count'
      str                 'String'
      List[str]           'String'
      re.Pattern          'RE'
      List[re.Pattern]    'RE'
      Callable            'Function'
      List[Callable]      'Function'
If the sentinel is a list of strings, re patterns or functions, set
the self._is_multi_test = True.

Raises:
    NotImplementedError: If self.sentinel is not one of the above types.

Returns:
    str: The string matching the self.sentinel type.


set_test_func(self, location: str)->TestType:
Determine the appropriate test function for the sentinel type.

The test function is set based on based on the sentinel type and
location value.
if sentinel is a string type:
    location == 'IN' -> sentinel in line,
    location == 'START' -> line.startswith(sentinel), in line,
    location == 'END' -> line.endswith(sentinel),
    location == 'FULL' -> sentinel == line.
if sentinel is a Regular Expression type:
    location == 'IN' -> sentinel.search(line),
    location == 'START' -> sentinel.match(line),
    location == 'FULL' -> sentinel.fullmatch(line),
    location == 'END' -> raise NotImplementedError.
if sentinel is a Boolean type:
    sentinel
if sentinel is a Function type:
    sentinel(line, context)
if sentinel is None:
    False
Args:
    location (str): Indicates how string and regular expression
        sentinels will be applied as a test. One of:
            'IN', 'START', 'END', 'FULL'
        location is only relevant for String' and 'RE' sentinel types.
        For all other types it will be ignored.
Returns:
    (Callable[[TriggerTypes, SourceItem, ContextType], TestResult]:
    The test function to apply.



def evaluate(self, item: SourceItem, context: ContextType = None)->bool:
Call the appropriate test(s) on the supplied item.

The designated test(s) are applied to the item.  No testing is done to
ensure that item has an appropriate data type.  If the test passes,
event and event_name are appropriately updated and test_result=True.
If the test does not pass, event and event_name are reset to default
values and test_result=False.

If sentinel is a list, each sentinel element is used to test item.
When one of these tests pass, the particular sentinel element that
passed the test is used to update event and event_name.

Arguments:
    item (SourceItem): The item to apply the trigger test to.
    context (Dict[str, Any], optional): Additional information to be
        passed to the trigger object.  Defaults to an empty dictionary.
Returns (bool): True if the trigger test passes, False otherwise.

Returns:
    (bool): True if the supplied item passed a test, False otherwise.



   SectionBreak(Trigger):
Defines the method of identifying the start or end of a section.

A SectionBreak is a subclass of Trigger, with an additional offset
attribute and related methods. offset is used to identify a location in the
Source sequence, and an offset, specifying the distance (in number of
Source items) between the identified location and the break point.

Offset is an integer indicating the number of additional Source items to
include in the section.  The two most popular offset options, have text
equivalents:
    'After'  -> offset =  0  -> The SectionBreak location is between the
                                current item and the next.
    'Before' -> offset = -1 -> The SectionBreak location is just before the
                               current item (Step back 1 item).
Attributes:
    sentinel (None, bool, int, str or List[str],
              re.Pattern or List[re.Pattern],
              Callable or List[Callable]):
        the object(s) used to generate the conditional definition.
    event (TriggerEvent): Information resulting from applying the test.
        See Trigger class for more information on the sentinel and event
        attributes.
    offset (int): Specifies the distance (in number of Source items)
        between the location identified by trigger and the boundary.
    name (str): A text label for the boundary.


Defines trigger and offset for a Boundary point.

Arguments:
    sentinel (TriggerOptions): Object(s) used to generate the
        conditional definition.
    location (str, optional):  A sentinel modifier that applies to str
        or re.Pattern types of sentinels. For other sentinel types it
        is ignored. One of  ['IN', 'START', 'END', 'FULL', None].
        Default is None, which is treated as 'IN'

    See Trigger class for more information on the sentinel and event
    arguments.

    break_offset (int, str, optional): The number of Source items
        before (negative) or after (positive) between the item that
        causes a trigger event and the boundary.  offset can also be
        one of
            'After' =  0, or
            'Before' = -1
        Defaults to 'Before'.
    name (str, optional): A reference label for the Boundary.


In [None]:
import sections
?sections.SectionBreak


### Sentinels

A variety of different Sentinel types can be used. Some sentinel types can also 
be provided as a list, where if any one of the sentinels in the list pass the 
break is triggered. The table below lists the possible sentinel types:

<table>
<thead>
<tr><th>Sentinel Type</th><th>Description</th></tr>
</thead><tbody>
<tr><td>A boolean</td>
<td>A *SectionBreak* that will either always pass or always fail.</td></tr>

<tr><td>A string or list of strings</td>
<td>A *SectionBreak* that will pass if the item being tested matches with the 
string (or with any of the strings in the list). The location attribute 
dictates the type of match required.</td></tr>

<tr><td>A compiled regular expression pattern or list of compiled regular 
expression patterns</td>
<td>A *SectionBreak* that will pass if the pattern (or one of the patterns in 
the list) successfully matches in the item being tested. The location attribute 
dictates the type of regular expression match required. Regular Expression 
patterns must be compiled with re.compile(string) to distinguish them from 
plain text sentinels.</td></tr>

<tr><td>A function or list of functions</td>
<td>A conditional that will pass if the sentinel function (or one of the 
functions in the list) returns a non-blank (None, '', []) value when applied 
to the item being tested.  Sentinel functions act on an item from the sequence 
and have one of the following argument formats:
<ul>
  <li>func(item)</li>
  <li>func(item, context)</li>
  <li>func(item, **kwargs)</li>
</ul>
where:
<ul>
  <li><i>item</i> is an item from the supplied sequence</li>
  <li><i>context</i> is the Section.context attribute.  
      See <b>Working with context</b> for more information.</li>
  <li><i>**kwargs</i> represents keyword arguments which must be contained in 
      the Section.context dictionary.</li>
</ul>
</td></tr>
</tbody>
<caption>The available sentinel types which can be used to evaluate items in a 
sequence for the purpose of identifying a section's boundaries.</caption>
</table>
                       

### SectionBreak Modifiers

location (str, optional):  A sentinel modifier that applies to str
    or re.Pattern types of sentinels. For other sentinel types it
    is ignored. One of  ['IN', 'START', 'END', 'FULL', None].
    Default is None, which is treated as 'IN'

The location argument is a sentinel modifier that applies to str or
    re.Pattern types of sentinels. location can be one of:
        location    str test                    re.Pattern test
          IN      sentinel in item            sentinel.search(item)
          START   item.startswith(sentinel)   sentinel.match(item)
          END     item.endswith(sentinel),    NotImplementedError
          FULL    sentinel == item            sentinel.fullmatch(item)


    location (str, optional):  A sentinel modifier that applies to str
        or re.Pattern types of sentinels. For other sentinel types it
        is ignored. One of  ['IN', 'START', 'END', 'FULL', None].
        Default is None, which is treated as 'IN'
        if sentinel is a string type:
            location == 'IN' -> sentinel in line,
            location == 'START' -> line.startswith(sentinel), in line,
            location == 'END' -> line.endswith(sentinel),
            location == 'FULL' -> sentinel == line.
        if sentinel is a Regular Expression type:
            location == 'IN' -> sentinel.search(line),
            location == 'START' -> sentinel.match(line),
            location == 'FULL' -> sentinel.fullmatch(line),
            location == 'END' -> raise NotImplementedError.

break_offset (int, str, optional): The number of Source items
    before (negative) or after (positive) between the item that
    causes a trigger event and the boundary.  offset can also be
    one of
        'After' =  0, or
        'Before' = -1
    Defaults to 'Before'.

offset is used to identify a location in the
Source sequence, and an offset, specifying the distance (in number of
Source items) between the identified location and the break point.

Offset is an integer indicating the number of additional Source items to
include in the section.  The two most popular offset options, have text
equivalents:
    'After'  -> offset =  0  -> The SectionBreak location is between the
                                current item and the next.
    'Before' -> offset = -1 -> The SectionBreak location is just before the
                               current item (Step back 1 item).

# Triggers and Rules

## Basic Definitions

### Test

> A dynamically defined conditional statement which takes a single argument and
> returns a Boolean.

### Trigger

> One or more Tests which cause an action to be performed or stopped.
> Triggers have access to the Context in addition to whatever arguments are
> explicitly passed to them.
>
> - Returns a Boolean.

### Rule

> A Trigger-method pair, where the method is applied if the Trigger returns
  > True.
>
> - The output of the method will depend on the type of Rule.

## Trigger Types

<table><thead>
<th>Trigger</th><th>Description</th>
</thead></tbody>
<tr><td>Simple Trigger</td>
  <td>A single Test and method that takes a single argument and returns a
      Boolean.<br>
      <ul><li>Used by Rules to cause an action to be performed or stopped.
      </ul></td></tr>
<tr><td>Complex Trigger</td>
  <td>Multiple conditional statements which are compounded
      e.g. A and (B or C).</td></tr>
<tr><td>Contextual Trigger</td>
  <td>A Complex Trigger that sets or updates Context attributes when one of
      it's conditional statements is evaluated.<br>
      Examples are:
      <ul>
        <li>A re.Match object from a Regex application
        <li>A Counter being initialized, incremented or reset.
        </ul></td></tr>
<tr><td>Break Trigger</td>
  <td> A Trigger used to identify a Section Break.<br>
    Conditions include:<br>
    <ul>
      <li>Testing the Line for the presence of any of the strings in a list of
          strings.
      <li>A compiled Regex, which is true if the Regex achieves a match in the
          Line.
      <li>A specified number of lines after another condition passes.
      <li>A custom counter reaches a certain value e.g. number of lines or
          number of repetitions of some other condition passing.
    </ul>
    Break Triggers can also be instructed to add or update a value in the
    Context.</td></tr>
</tbody></table>

## Rule Types

<table><thead>
<th>Type</th><th>Description</th>
</thead></tbody>
<tr><td>Simple Rule</td>
  <td>A Simple Trigger-method pair, where the output type of the Rule is the
      same as the input argument type.<br>
    <ul>
      <li>If the Trigger returns True, the method is applied.
      <li>If the Trigger returns False, the output of the Rule will be the input
          argument.
      <li>Simple Rules can be chained, since the output matched the input.
      </ul></td></tr>
<tr><td>Complex Rule</td>
  <td>A Trigger-method pair, where the output of the Rule depends on the result
     of the Trigger Test(s).</td></tr>
<tr><td>Cleaning Rule</td>
  <td>A Simple Rule taking a string argument.<br>
    <ul>
      <li>The output of the method will be a Cleaned Line.
    </ul></td></tr>
<tr><td>Parsing Line Rule</td>
  <td>A Complex Rule taking a string argument.<br>
    <ul>
      <li>The output of the method will be <u>zero or more</u> Parsed Lines.
    </ul></td></tr>
<tr><td>Line Processing Rule</td>
  <td>A Trigger-method pair, both taking a single Parsed Line argument<br>
    <ul>
      <li>The output of the method will be <u>zero or more</u> Parsed Lines.
  </ul></td></tr>
</tbody></table>

## Object Types

<table><thead>
<th>Type</th><th>Description</th>
</thead></tbody>
<tr><td>Trigger</td>
  <td>One or more Tests which cause an action to be performed or stopped.
      Triggers have access to the Context in addition to whatever arguments are
      explicitly passed to them.<br>
    <ul>
      <li>Returns a Boolean.<br>
    </ul></td></tr>
<tr><td>TriggerEvent</td>
  <td>Stores information regarding the result of applying a Trigger.<br>
      Contains:<br>
    <ul>
      <li>The name of the Trigger.
      <li>The Trigger results (<i>True</i> or <i>False</u>)
      <li>The name of the Trigger Test. (Useful when the Trigger contains
          multiple Tests.)
      <li>The relevant value returned by the test.
    </ul></td></tr>
<tr><td>Rule</td>
  <td>A Trigger-method pair, both taking a single argument, where the method is
      applied if the Trigger returns True.<br>
    <ul>
      <li>The output of the method will depend on the type of Rule.
    </ul></td></tr>
<tr><td>Rule Set</td>
  <td>A sequence of Rules and a default method.<br>
    <ul>
      <li>each Rule in the sequence will be applied to the input until One of
          the rules triggers.
      <li>if no Rule triggers then the default method is applied.
      <li>Each of the Rules (and the default method) take the same input type
          and return the same output type.
    </ul></td></tr>
<tr><td>Section Break</td>
  <td>A Trigger used to identify the point in the Source at which a Section
      begins or ends.</td></tr>
</tbody></table>


### Add `end_on_first_item=True` to Section

Add end to Section Definition
- Section start **Before** *StartSection*
- Section end **After** *StartSection*
- Simple subsection

```python
sub_section = Section(section_name='SubSection')

full_section = Section(section_name='Full',
    start_section=SectionBreak('StartSection', break_offset='Before'),
    end_section=SectionBreak('StartSection', break_offset='Before'),
    end_on_first_item=True,
    processor=sub_section 
    )
```

In [None]:
sub_section = Section(
    section_name='SubSection',
    #start_section=SectionBreak('StartSection', break_offset='Before'),
    #end_section=SectionBreak('EndSection', break_offset='After'),
    #end_section=SectionBreak(True)
    #end_on_first_item=True
    )
full_section = Section(
    section_name='Full',
    start_section=SectionBreak('StartSection', break_offset='Before'),
    end_section=SectionBreak('StartSection', break_offset='Before'),
    end_on_first_item=True,
    processor=sub_section
    )

test_iter = BufferedIterator(GENERIC_TEST_TEXT2)
pprint(full_section.read(test_iter))
pprint(full_section.read(test_iter))

test_iter = iter(GENERIC_TEST_TEXT2)
pprint(full_section.read(test_iter))
pprint(full_section.read(test_iter))

[]
[]
[]
[]


- Results in empty list because it both starts 
  and ends on first *'StartSection'*.

|Expected|Actual|
|-|-|
|[]|[]|

## `end_on_first_item` Tests

The `end_on_first_item` parameter in a section definition determines whether the 
`end_section` break tests are applied to the first line in a section.
`end_on_first_item=True` does not *force* the section to complete after a single 
line.  It only makes it *possible* to stop after the first line.

The value of this parameter is that repeat sections do not require distinct
starting and ending sentinels. With `end_on_first_item=False` (the default) the 
same SectionBreak can be applied to both `start_section` and `end_section`.  
In this case, the second occurrence of the section begins immediately at the 
end of the previous section.

In [None]:
GENERIC_TEST_TEXT = [
    'Text to be ignored',
    'StartSection A',
    'EndSection A',
    'StartSection B',
    'EndSection B',
    'More text to be ignored',
    ]

#### Setting `end_on_first_item=False` (the default)
- Using identical `start_section` and `end_section`:
    > `start_section=SectionBreak('StartSection', break_offset='Before')`<br>
    > `end_section=SectionBreak('StartSection', break_offset='Before')`<br>

- Do not test first line of section (the default).
    > `end_on_first_item=False` 

In [None]:
start_sub_section = Section(
    section_name='StartSubSection',
    start_section=SectionBreak('StartSection', break_offset='Before'),
    end_section=SectionBreak('StartSection', break_offset='Before'),
    end_on_first_item=False
    )

pprint(start_sub_section.read(GENERIC_TEST_TEXT))

['StartSection A', 'EndSection A']


The first section is returned as a list.

<table>
    <thead><th>Expected</th><th>Actual</th></thead>
    <tr>
        <td><code>
          ['StartSection A', 'EndSection A']
        </code></td>
        <td><code>
          ['StartSection A', 'EndSection A']
        </code></td></tr>
</table>

#### Defining a top section with a repeating subsection
- Using identical `start_section` and `end_section`:
    > `start_section=SectionBreak('StartSection', break_offset='Before')`<br>
    > `end_section=SectionBreak('StartSection', break_offset='Before')`<br>

- Do not test first line of section (the default).
    > `end_on_first_item=False` 

In [None]:
start_sub_section = Section(
    section_name='StartSubSection',
    start_section=SectionBreak('StartSection', break_offset='Before'),
    end_section=SectionBreak('EndSection', break_offset='After'),
    end_on_first_item=False
    )

repeating_section = Section(
    section_name='Top Section',
    end_section=SectionBreak('More text to be ignored', break_offset='Before'),
    processor=start_sub_section
    )
pprint(repeating_section.read(GENERIC_TEST_TEXT))

[['StartSection A', 'EndSection A'], ['StartSection B', 'EndSection B']]


Both subsections are returned as a list of lists.

<table>
    <thead><th>Expected</th><th>Actual</th></thead>
    <tr>
        <td><code>
          [<br>
          ['StartSection A', 'EndSection A'],<br>
           ['StartSection B', 'EndSection B']<br>
          ]
        </code></td>
        <td><code>
          [<br>
          ['StartSection A', 'EndSection A'],<br>
           ['StartSection B', 'EndSection B']<br>
          ]
        </code></td></tr>
</table>

- Using the same section definition, except allow testing of the first line of 
  the section.
    > `end_on_first_item=True` 

In [None]:
start_sub_section = Section(
    section_name='StartSubSection',
    start_section=SectionBreak('StartSection', break_offset='Before'),
    end_section=SectionBreak('StartSection', break_offset='Before'),
    end_on_first_item=True
    )

pprint(start_sub_section.read(GENERIC_TEST_TEXT))

[]


The ending SectionBreak triggers on the same 
item that triggers the start of the section.  This will always result in an 
empty section.

|Expected|Actual|
|-|-|
|[]|[]|

#### Single Line Section.
- Using the same sentinel *('EndSection')*, but different `break_offset`.
    > `start_section=SectionBreak('EndSection', break_offset='Before')`<br>
    > `end_section=SectionBreak('EndSection', break_offset='After')`<br>

- Allow testing of the first line of section.
    > `end_on_first_item=True` 

In [None]:
end_sub_section = Section(
    section_name='EndSubSection',
    start_section=SectionBreak('EndSection', break_offset='Before'),
    end_section=SectionBreak('EndSection', break_offset='After'),
    end_on_first_item=True
    )

pprint(end_sub_section.read(GENERIC_TEST_TEXT))

['EndSection A']


Single line section.
- Starts *Before* **EndSection** 
- Ends *After* **EndSection** (the same line)

|Expected|Actual|
|-|-|
|['EndSection A']|['EndSection A']|

#### Same section definition as above except with `end_on_first_item=False` 

In [None]:
end_sub_section = Section(
    section_name='EndSubSection',
    start_section=SectionBreak('EndSection', break_offset='Before'),
    end_section=SectionBreak('EndSection', break_offset='After'),
    end_on_first_item=False
    )

pprint(end_sub_section.read(GENERIC_TEST_TEXT))

['EndSection A', 'StartSection B', 'EndSection B']


Section continues until **After** next 
*EndSection* is found

<table>
    <thead><th>Expected</th><th>Actual</th></thead>
    <tr>
        <td><code>
          ['EndSection Name: A',<br> 
          'StartSection Name: B',<br>
          'EndSection Name: B']
        </code></td>
        <td><code>
          ['EndSection Name: A',<br>
          'StartSection Name: B',<br>
          'EndSection Name: B']
        </code></td></tr>
</table>

# Relevant Type definitions for Trigger Class and SubClasses.

## Sentinels

Trigger sentinels define tests to be applied to a SourceItem.
Sentinel types that are independent of the SourceItem are `bool` and `int`.
`sentinel=None` becomes boolean `True` (Trigger always passes)<br>
`TriggerSingleTypes = Union[None, bool, int]`

Sentinel types that apply to string type SourceItems are `str` and `re.Pattern`.<br>
`TriggerStringOptions = Union[str, re.Pattern]`

Sentinel can also be any valid Process Functions.
String and Callable sentinel types can also be provided as a list, where if
any one of the sentinels in the list pass the trigger passes.<br>
`TriggerListOptions = Union[TriggerStringOptions, ProcessCallableOptions]`

### All possible sentinel types

`TriggerTypes = Union[TriggerSingleTypes, TriggerListOptions]`

### All possible sentinel types and valid sentinel list types

`TriggerOptions = Union[TriggerTypes, List[TriggerListOptions]]`

Applying a trigger gives a TestResult, which can be a boolean, a regular
expression match object or the return from a Trigger Sentinel Function (ProcessedItem)

```
EventType = Union[bool, int, str, re.match, ProcessedItem, None]
TestResult = Union[bool, re.match, ProcessedItem]
TestType = Callable[[TriggerTypes, SourceItem, ContextType], TestResult]
```

# Relevant Type definitions for SectionBreak Class

`OffsetTypes = Union[int, str]`

Trigger
     One or more conditional statements (tests) which are used by Rules to
     cause an action to be performed or stopped. Used by Rules to cause an
     action to be performed or stopped.
     
     Triggers have access to the Context in addition to whatever arguments are
     explicitly passed to them.

Section Breaking
    Starting or stopping a Section>


### Imports

In [None]:
from typing import List
from pathlib import Path
from pprint import pprint
import re
import sys

import pandas as pd
import xlwings as xw

from buffered_iterator import BufferedIterator
import text_reader as tp
from sections import Rule, RuleSet, SectionBreak, ProcessingMethods, Section

In [None]:
from sections import Section
from pprint import pprint

GENERIC_TEST_TEXT2 = [
    'Text to be ignored',
    'StartSection A',
    'MiddleSection A',
    'EndSection A',
    'Unwanted text between sections',
    'StartSection B',
    'MiddleSection B',
    'EndSection B',
    'StartSection C',
    'MiddleSection C',
    'EndSection C',
    'Even more text to be ignored', 
    ]
full_section = Section(start_section=('StartSection', 'IN', 'Before'),
                       end_section=('EndSection', 'IN', 'After'),
                       section_name='TupleTest')
pprint(full_section.read(GENERIC_TEST_TEXT2))

['StartSection A', 'MiddleSection A', 'EndSection A']


In [None]:
repr(full_section.start_section)

'[SectionBreak(sentinel=StartSection, location=IN, offset=-1, name=SectionBreak)]'

**********

# Section Break options

## Three line sections

In [None]:
GENERIC_TEST_TEXT2 = [
    'Text to be ignored',
    'StartSection A',
    'MiddleSection A',
    'EndSection A',
    'Unwanted text between sections',
    'StartSection B',
    'MiddleSection B',
    'EndSection B',
    'StartSection C',
    'MiddleSection C',
    'EndSection C',
    'Even more text to be ignored', 
    ]

**********

### Initial Section and Sub-Section Definitions

- Only definition line is:<br>
`processor=[sub_section]`

In [None]:
sub_section = Section(
    section_name='SubSection',
    #start_section=SectionBreak('StartSection', break_offset='Before'),
    #end_section=SectionBreak('EndSection', break_offset='After'),
    #end_section=SectionBreak(True)
    #end_on_first_item=True
    )

full_section = Section(
    section_name='Full',
    #start_section=SectionBreak('StartSection', break_offset='Before'),
    #end_section=SectionBreak('EndSection', break_offset='After'),
    #end_on_first_item=True
    processor=sub_section  
    )
pprint(full_section.read(GENERIC_TEST_TEXT2))

[['Text to be ignored',
  'StartSection A',
  'MiddleSection A',
  'EndSection A',
  'Unwanted text between sections',
  'StartSection B',
  'MiddleSection B',
  'EndSection B',
  'StartSection C',
  'MiddleSection C',
  'EndSection C',
  'Even more text to be ignored']]



<table><thead><th>Expected</th><th>Actual</th></thead>
<tr>
  <td><code>
    [<br>
      ['Text to be ignored',<br>
       'StartSection A',<br>
       'MiddleSection A',<br>
       'EndSection A',<br>
       'Unwanted text between sections',<br>
       'StartSection B',<br>
       'MiddleSection B',<br>
       'EndSection B',<br>
       'StartSection C',<br>
       'MiddleSection C',<br>
       'EndSection C',<br>
       'Even more text to be ignored']<br>
      ]</code></td>
  <td><code>
    [<br>
      ['Text to be ignored',<br>
       'StartSection A',<br>
       'MiddleSection A',<br>
       'EndSection A',<br>
       'Unwanted text between sections',<br>
       'StartSection B',<br>
       'MiddleSection B',<br>
       'EndSection B',<br>
       'StartSection C',<br>
       'MiddleSection C',<br>
       'EndSection C',<br>
       'Even more text to be ignored']<br>
      ]</code></td>
</table>



### Add start to Section Definition

- Section start **Before** *StartSection*

`start_section=SectionBreak('StartSection', break_offset='Before'),`

In [None]:
sub_section = Section(
    section_name='SubSection',
    #start_section=SectionBreak('StartSection', break_offset='Before'),
    #end_section=SectionBreak('EndSection', break_offset='After'),
    #end_section=SectionBreak(True)
    #end_on_first_item=True
    )

full_section = Section(
    section_name='Full',
    start_section=SectionBreak('StartSection', break_offset='Before'),
    #end_section=SectionBreak('EndSection', break_offset='After'),
    #end_on_first_item=True,
    processor=sub_section
    )
pprint(full_section.read(GENERIC_TEST_TEXT2))

[['StartSection A',
  'MiddleSection A',
  'EndSection A',
  'Unwanted text between sections',
  'StartSection B',
  'MiddleSection B',
  'EndSection B',
  'StartSection C',
  'MiddleSection C',
  'EndSection C',
  'Even more text to be ignored']]



<table><thead><th>Expected</th><th>Actual</th></thead>
<tr>
  <td><code>
    [<br>
      ['StartSection A',<br>
       'MiddleSection A',<br>
       'EndSection A',<br>
       'Unwanted text between sections',<br>
       'StartSection B',<br>
       'MiddleSection B',<br>
       'EndSection B',<br>
       'StartSection C',<br>
       'MiddleSection C',<br>
       'EndSection C',<br>
       'Even more text to be ignored']<br>
    ]</code></td>
  <td><code>
    [<br>
      ['StartSection A',<br>
       'MiddleSection A',<br>
       'EndSection A',<br>
       'Unwanted text between sections',<br>
       'StartSection B',<br>
       'MiddleSection B',<br>
       'EndSection B',<br>
       'StartSection C',<br>
       'MiddleSection C',<br>
       'EndSection C',<br>
       'Even more text to be ignored']<br>
    ]</code></td>
</table>



### Add end to Section Definition

- Section start **Before** *StartSection*
- Section end **After** *EndSection*

```python
full_section = Section(section_name='Full',
    start_section=SectionBreak('StartSection', break_offset='Before'),
    end_section=SectionBreak('EndSection', break_offset='After'),
    processor=sub_section 
    )
```

In [None]:
sub_section = Section(
    section_name='SubSection',
    #start_section=SectionBreak('StartSection', break_offset='Before'),
    #end_section=SectionBreak('EndSection', break_offset='After'),
    #end_section=SectionBreak(True)
    #end_on_first_item=True
    )

full_section = Section(
    section_name='Full',
    start_section=SectionBreak('StartSection', break_offset='Before'),
    end_section=SectionBreak('EndSection', break_offset='After'),
    #end_on_first_item=True,    
    processor=sub_section
    )

test_iter = BufferedIterator(GENERIC_TEST_TEXT2)
pprint(full_section.read(test_iter))
pprint(full_section.read(test_iter))
pprint(full_section.read(test_iter))
pprint(full_section.read(test_iter))

[['StartSection A', 'MiddleSection A', 'EndSection A']]
[['StartSection B', 'MiddleSection B', 'EndSection B']]
[['StartSection C', 'MiddleSection C', 'EndSection C']]
[]


- Includes all three lines of first section in single sub-list.
- skips the *'Unwanted text between sections'* line.

<table><thead><th>Expected</th><th>Actual</th></thead>
<tr>
  <td><code>
    [<br>
      ['StartSection A',<br>
       'MiddleSection A',<br>
       'EndSection A']<br>
    ]</code></td>
  <td><code>
    [<br>
      ['StartSection A',<br>
       'MiddleSection A',<br>
       'EndSection A']<br>
    ]</code></td>
</tr>
<tr>
  <td><code>
    [<br>
      ['StartSection B',<br>
       'MiddleSection B',<br>
       'EndSection B']<br>
    ]</code></td>
  <td><code>
    [<br>
      ['StartSection B',<br>
       'MiddleSection B',<br>
       'EndSection B']<br>
    ]</code></td>
</tr>
<tr>
  <td><code>
    [<br>
      ['StartSection C',<br>
       'MiddleSection C',<br>
       'EndSection C']<br>
    ]</code></td>
  <td><code>
    [<br>
      ['StartSection C',<br>
       'MiddleSection C',<br>
       'EndSection C']<br>
    ]</code></td>
</tr>
<tr><td><code>[]</code></td><td><code>[]</code></td></tr>
</table>



### Set Same Start and End Breaks for Section

Add end to Section Definition
- Section start **Before** *StartSection*
- Section end **Before** *StartSection*
- Simple subsection
- Multi Section defines Full Section as Sub Section with no start or end (All lines)

```python
sub_section = Section(section_name='SubSection')

full_section = Section(section_name='Full',
    start_section=SectionBreak('StartSection', break_offset='Before'),
    end_section=SectionBreak('StartSection', break_offset='Before'),
    processor=sub_section 
    )

multi_section = Section(
    section_name='Multi',
    processor=full_section 
    )
```

In [None]:
sub_section = Section(
    section_name='SubSection',
    #start_section=SectionBreak('StartSection', break_offset='Before'),
    #end_section=SectionBreak('EndSection', break_offset='After'),
    #end_section=SectionBreak(True)
    #end_on_first_item=True
    )
full_section = Section(
    section_name='Full',
    start_section=SectionBreak('StartSection', break_offset='Before'),
    end_section=SectionBreak('StartSection', break_offset='Before'),
    processor=sub_section,
    #end_on_first_item=True
    )

multi_section = Section(section_name='Multi',
    processor=full_section,
    #end_on_first_item=True
    )

pprint(multi_section.read(GENERIC_TEST_TEXT2))

[[['StartSection A',
   'MiddleSection A',
   'EndSection A',
   'Unwanted text between sections']],
 [['StartSection B', 'MiddleSection B', 'EndSection B']],
 [['StartSection C',
   'MiddleSection C',
   'EndSection C',
   'Even more text to be ignored']]]


- Includes *'Unwanted text between sections'* and
  *'Even more text to be ignored'*.
- Includes *'Unwanted text between sections'* because the end of section A is 
  triggered by *'StartSection B'*. 
- Includes *'Even more text to be ignored'* because there are no more 
  *'StartSection'* lines to trigger an `end_section` break.

<table><thead><th>Expected</th><th>Actual</th></thead>
<tr>
  <td><code>
    [<br>
      [['StartSection A',<br>
        'MiddleSection A',<br>
        'EndSection A',<br>
        'Unwanted text between sections']],<br>
      [['StartSection B',<br>
        'MiddleSection B',<br>
        'EndSection B']],<br>
      [['StartSection C',<br>
        'MiddleSection C',<br>
        'EndSection C',<br>
        'Even more text to be ignored']]<br>
    ]</code></td>
  <td><code>
    [<br>
      [['StartSection A',<br>
        'MiddleSection A',<br>
        'EndSection A',<br>
        'Unwanted text between sections']],<br>
      [['StartSection B',<br>
        'MiddleSection B',<br>
        'EndSection B']],<br>
      [['StartSection C',<br>
        'MiddleSection C',<br>
        'EndSection C',<br>
        'Even more text to be ignored']]<br>
    ]</code></td>
  </tr>
</table>

### Imports

In [None]:
from typing import List
from pathlib import Path
from pprint import pprint
import re
import sys

import pandas as pd
import xlwings as xw

from buffered_iterator import BufferedIterator
import text_reader as tp
from sections import Rule, RuleSet, SectionBreak, ProcessingMethods, Section

# Simple sections experimenting with start and end settings

#### 2-line Section Source

In [None]:
GENERIC_TEST_TEXT = [
    'Text to be ignored',
    'StartSection Name: A',
    'EndSection Name: A',
    'StartSection Name: B',
    'EndSection Name: B',
    'More text to be ignored',
    ]