# The Basics

*Temporary Style Settings here*
<style>
li {
    list-style: disc;
    margin-left: 2em;
}
li p {
    list-style: disk;
    line-height: normal;
    margin-bottom: 0;
}
table p {
    list-style: disk;
    line-height: normal;
    margin: 0 2em
    ;
    text-align: left;
}
</style>

## Section Components

A section defines a continuous portion of a text stream or other iterable.
A Section instance is the set of definitions for managing a sequence. the same 
instance can be used repeatedly on different sequence inputs or even on 
different portions of the same sequence.

The section definition may include:
* Boundary definitions for identifying the section's start and end.
* Formatting or processing instructions for manipulating section items.
* A merge method, which combines all of the processed items into a 
  single item such as a list.


The code in each stage of this example is independent of previous stages. 
Copy the one you want without tracking back to previous stages.

## Boundary Definitions

Every section has a *start_section* and an *end_section* which define the 
boundaries of the section.  The *start_section* and *end_section* each contain 
one or more *SectionBreak* objects

A *SectionBreak* is built from a *sentinel* and two optional modifiers:
* location
* offset

The *SectionBreak* can also be given a name, that can be used to identify which
*SectionBreak* was triggered when a section boundary definition contains 
multiple *SectionBreaks*.  See **Advanced Section Breaks** for more details. 

### Boundary Defaults

If *start_section* is not explicitly defined it defaults to `True` 
(*AlwaysBreak*), indicating the section begins with the first item in the 
supplied sequence.

If *end_section* is not explicitly defined it defaults to `False`, 
(*NeverBreak*)indicating the section continues through the last item in the 
supplied sequence.


In [1]:
from sections import Section
from pprint import pprint

example_sequence = [
    'First line',
    'Second line',
    'Third line',
    'Fourth line'
    ]

default_section = Section()
pprint(default_section.read(example_sequence))

['First line', 'Second line', 'Third line', 'Fourth line']


The section above uses the default boundary definitions (and default everything else for that matter).
With the default boundary definitions, every item in *example_sequence* is included in the section.

### Simple Text Boundary Definitions
The simplest boundary definitions is just a text string contained in the start or end item.

In [2]:
from sections import Section
from pprint import pprint

example_sequence = [
    'First line',
    'Second line',
    'Third line',
    'Fourth line'
    ]
text_boundary_section = Section(start_section='Second', 
                                end_section='Fourth')

pprint(text_boundary_section.read(example_sequence))

['Second line', 'Third line']


* `start_section='Second'` Causes the section to begin with the line containing *'Second'*
* `end_section='Fourth'` Causes the section to end before the line containing *'Fourth'*

### Adding a *location* Modifier to a String Boundary Definition 
By default, when a string is used to define a boundary, any occurrence of that 
string within a sequence item will trigger the boundary.
For example:

In [3]:
from sections import Section
from pprint import pprint


example_sequence = [
    'First line',
    'Second line',
    'Third line',
    'Fourth line'
   ]
text_boundary_section = Section(start_section='S', 
                                end_section='Fo')

pprint(text_boundary_section.read(example_sequence))

['Second line', 'Third line']


<b>S</b> was found in '<b>S</b>econd line' and
<b>Fo</b> was found in '<b>Fo</b>urth line'.

But what if as <b>S</b> was found in an earlier line?  For Example:

In [4]:
from sections import Section, SectionBreak
from pprint import pprint


example_sequence = [
    'Text String to be ignored',
    'First line',
    'Second line',
    'Third line',
    'Fourth line',
    'Even more text to be ignored', 
   ]

text_boundary_section = Section(start_section='S', 
                                end_section='Fo')

pprint(text_boundary_section.read(example_sequence))

['Text String to be ignored', 'First line', 'Second line', 'Third line']


The <b>S</b> in '<b>S</b>tring' triggered the boundary.

The optional `location` modifier allows you to specify where in the item string 
to search for the specified text.

The location argument can be one of:
<table>
<thead><tr><th><code>location</code> Value</th><th>Search Method</th></tr></thead>
<tbody>
<tr><td>'IN'</td><td><code>text in item</code></td></tr>
<tr><td>'START'</td><td><code>item.startswith(text)</code></td></tr>
<tr><td>'END'</td><td><code>item.endswith(text)</code></td></tr>
<tr><td>'FULL'</td><td><code>item == text</code></td></tr>
</tbody></table>

The section definition can then be given as:

In [5]:
from sections import Section, SectionBreak
from pprint import pprint


example_sequence = [
    'Text String to be ignored',
    'First line',
    'Second line',
    'Third line',
    'Fourth line',
    'Even more text to be ignored', 
   ]

text_boundary_section = Section(start_section=('S', 'START'),
                                end_section='Fo')

pprint(text_boundary_section.read(example_sequence))

['Second line', 'Third line']


Notice that `start_section=('S', 'START')` is now being set as a tuple.

An alternative, and perhaps clearer way to write the same this is to explicitly 
pass a `SectionBreak` object to `start_section` like this:

In [6]:
from sections import Section, SectionBreak
from pprint import pprint


example_sequence = [
    'Text String to be ignored',
    'First line',
    'Second line',
    'Third line',
    'Fourth line',
    'Even more text to be ignored', 
   ]

text_boundary_section = Section(
    start_section=SectionBreak(sentinel='S', location='START'),
    end_section='Fo'
    )

pprint(text_boundary_section.read(example_sequence))

['Second line', 'Third line']


For more information on the `location` modifier see *Advanced Section Breaks* 

In [7]:
from sections import Section, SectionBreak
from pprint import pprint


example_sequence = [
    'Text String to be ignored',
    'First line',
    'Second line',
    'Third line',
    'Fourth line',
    'Even more text to be ignored', 
   ]

text_boundary_section = Section(start_section='S', 
                                end_section='Fo')

pprint(text_boundary_section.read(example_sequence))

['Text String to be ignored', 'First line', 'Second line', 'Third line']


### Adding an *Offset* to a Boundary Definition
By default section boundaries occur *before* the item that triggers the 
boundary condition.
This can be changed using the optional `break_offset` argument.

The two most popular `break_offset` options are:
<table>
<thead><tr><th><code>break_offset</code> Value</th><th>Effect</th></tr></thead>
<tbody>
<tr><td>'After'</td>
<td>The SectionBreak occurs between the item that triggered the boundary and the 
next item.</td></tr>
<tr><td>'Before' <i>(the default)</i></td>
<td>The SectionBreak occurs is just before the item that triggered the 
boundary</td></tr>
</tbody></table>

For example, to include *'Fourth line'* in the section without knowing what 
comes next:

In [8]:
from sections import Section, SectionBreak
from pprint import pprint


example_sequence = [
    'Text String to be ignored',
    'First line',
    'Second line',
    'Third line',
    'Fourth line',
    'Even more text to be ignored', 
   ]

text_boundary_section = Section(
    start_section=SectionBreak(sentinel='S', location='START'),
    end_section=SectionBreak(sentinel='Fo', break_offset='After')
    )

pprint(text_boundary_section.read(example_sequence))

['Second line', 'Third line', 'Fourth line']


For more information on the `break_offset` modifier see *Advanced Section Breaks* 

### Defining a section boundary with a function.
A function can also be used to define a boundary.

In it's simplest form, the function should accept a single argument of the 
same type as the input sequence's items and return a boolean.  When the function
returns `True` a boundary is triggered.

For example:

In [9]:
from sections import Section, SectionBreak


numeric_sequence = [i for i in range(1,10)]
print('The sequence is:\t', numeric_sequence)


def multiple_of_three(num):
    return num % 3 == 0  # True if num is a multiple of 3


function_boundary_section = Section(start_section=multiple_of_three,
                                    end_section=multiple_of_three)

print('The section is: \t', function_boundary_section.read(numeric_sequence))

The sequence is:	 [1, 2, 3, 4, 5, 6, 7, 8, 9]
The section is: 	 [3, 4, 5]


In the example above:
* The sequence consists of a list of integers from 1 to 9.
* The function `multiple_of_three` returns `True` when it's input is a multiple 
of 3 and returns `False` otherwise.
* Both `start_section` and `end_section` are set as the `multiple_of_three` 
function.
* The section starts with the first multiple of three (3) and ends before the 
second multiple of three (6).

For more information on section boundaries see *Advanced Section Breaks* and 
*Using Context*

## Basic Processing
Identifying a section of a larger sequence is only the first step in reading a 
section.  To be useful, a section usually needs to apply some custom processing 
to extract and format the desired information from the section items.

Here we will illustrate some basic processing methods.

### Processing Functions That Act on a Single Item

To illustrate the use of processing functions we will use the refurbished PC 
price list below.

The price list is in a comma separated variable (csv) style with the columns:

1. MODEL NAME
2. CPU
3. RAM
4. PRICE


Each line of text will be one source item. 
Our processing goal will be to convert each row into a two-item tuple containing
the model and the price.  

To focus on the processing part, the first line with the column names will be 
excluded from the sequence.

**Refurbished PC Price List**

1. To begin with, we convert each text line into a list of strings by splitting text line 
   at every occurrence of a comma using the following command:<br>
    <code>text_list = text.split(',')</code>

2. We then remove the '\$' with:<br>
    <code>text_list = [txt.replace('$', '') for txt in text_list]</code>

3. Next we remove space from the start and end of each substring using:<br>
    <code>text_list = [txt.strip() for txt in text_list]</code>

4. Finally, we keep only the first and last columns as a two-item tuple:<br>
    <code>selected_output = (text_list[0], text_list[-1])</code>


In [10]:
# pprint is used to produce nicely formatted output.
from pprint import pprint

# Import Section and SectionBreak
from sections import Section

# This is the demo input we will use.
price_list = [
    'THINKCENTRE X1, Core i5/6200, 8, $260',
    'THINKCENTRE M78, AMD A8-6500, 8, $30',
    'THINKCENTRE M53, Celeron, 8, $60',
    'THINKCENTRE M710Q, Intel Pentium, 8, $40',
    'DELL OPTIPLEX 7060, Core i7-8700, 8, $385',
    'DELL OPTIPLEX 790, Core i5/2500, 4, $20'
    ]


def price_list_process(text):
    '''Convert the first and last item from a csv string into a one-item 
    dictionary.

    Split the supplied string at every occurrence of a comma.
    Remove space from the start and end of each substring.
    Remove every occurrence of '$' from each substring.
    Select the first and last column.

    Args:
        text (str): The csv string to be parsed.

    Returns:
        Tuple[str]: The first and last columns from the supplied csv after 
            cleaning.
    '''
    # Split text at every occurrence of ','.
    text_list = text.split(',')
    # Remove space from the start and end of each substring.
    text_list = [txt.strip() for txt in text_list]
    # Remove every occurrence of '$' from each substring.
    text_list = [txt.replace('$', '') for txt in text_list]
    # Keep only the first and last columns
    selected_output = (text_list[0], text_list[-1])
    return selected_output

# Define the section `price_list_section` 
price_list_section = Section(processor=price_list_process)

# Read the `price_list` text using the `price_list_section` 
pprint(price_list_section.read(price_list))

[('THINKCENTRE X1', '260'),
 ('THINKCENTRE M78', '30'),
 ('THINKCENTRE M53', '60'),
 ('THINKCENTRE M710Q', '40'),
 ('DELL OPTIPLEX 7060', '385'),
 ('DELL OPTIPLEX 790', '20')]


### Sequential Processing (Multiple Processing Functions)

In the above example a single function was used to perform all of the 
processing.  An alternative approach is to provide a list of multiple functions 
as the `processor`.  Each function is applied in list order, with the input the 
second function being the output from the first function and so on.  This 
approach has the advantage of allowing functions to be reused on different 
sections and in some cases improves the clarity of the section definitions.

**Note:** The output type of each function must match the expected input type of 
the next function in the series.  No validation tests are done on this.

The example below performs the same processing as the previous example, except 
that it uses a separate function for each step. 

In [11]:
# pprint is used to produce nicely formatted output.
from pprint import pprint

# Import Section and SectionBreak
from sections import Section

# This is the demo input we will use.
price_list = [
    'THINKCENTRE X1, Core i5/6200, 8, $260',
    'THINKCENTRE M78, AMD A8-6500, 8, $30',
    'THINKCENTRE M53, Celeron, 8, $60',
    'THINKCENTRE M710Q, Intel Pentium, 8, $40',
    'DELL OPTIPLEX 7060, Core i7-8700, 8, $385',
    'DELL OPTIPLEX 790, Core i5/2500, 4, $20'
    ]

# Mini-functions for each processing action
def csv_parse(text):
    '''Split the supplied string at every occurrence of a comma.'''
    return text.split(',')


def drop_d(text_list):
    '''Remove every occurrence of '$' from each substring.'''
    return [txt.replace('$', '') for txt in text_list]


def drop_space(text_list):
    '''Remove space from the start and end of each substring.'''
    return [txt.strip() for txt in text_list]


def select_columns(text_list):
    '''Select the first and last columns.'''
    return (text_list[0], text_list[-1])


# Define the section `price_list_section` 
price_list_section = Section(processor=[csv_parse, drop_d, drop_space, 
                                        select_columns])


# Read the `price_list` text using the `price_list_section` 
pprint(price_list_section.read(price_list))

[('THINKCENTRE X1', '260'),
 ('THINKCENTRE M78', '30'),
 ('THINKCENTRE M53', '60'),
 ('THINKCENTRE M710Q', '40'),
 ('DELL OPTIPLEX 7060', '385'),
 ('DELL OPTIPLEX 790', '20')]


For more examples of processing methods refer to the **Advanced Processing** 
tutorial, the **Subsections** tutorial, the **Context** tutorial and the 
**Text Functions** tutorial.

## Basic Item Assembling

Assembling, the final section component involves combining all of the section 
items into a single object.  The default assembly function is `list()`; the 
items in the section are returned as a list.  

In this example we will supply an assembly function that will convert the 
2-item tuple generated by the processing example into a dictionary with the 
model as the key and the price as a float value.  

To focus on the assembly part, the supplied sequence will be the output from 
the previous processing example.

In [12]:
# pprint is used to produce nicely formatted output.
from pprint import pprint

# Import Section
from sections import Section

# This is the demo input we will use.
price_tuple = [
    ('THINKCENTRE X1', '260'),
    ('THINKCENTRE M78', '30'),
    ('THINKCENTRE M53', '60'),
    ('THINKCENTRE M710Q', '40'),
    ('DELL OPTIPLEX 7060', '385'),
    ('DELL OPTIPLEX 790', '20')
    ]

def tuples_to_dict(text_tuples):
    '''Convert a sequence of 2-item tuples into a item dictionary 
    with the first tuple element as the key and the second as a float value.
    '''
    combined_dict = {row[0]: float(row[1]) for row in text_tuples}
    return combined_dict


# Define the section `price_dict_section` 
price_dict_section = Section(aggregate=tuples_to_dict)


# Read the `price_tuple` text using the `price_dict_section` 
pprint(price_dict_section.read(price_tuple))

{'DELL OPTIPLEX 7060': 385.0,
 'DELL OPTIPLEX 790': 20.0,
 'THINKCENTRE M53': 60.0,
 'THINKCENTRE M710Q': 40.0,
 'THINKCENTRE M78': 30.0,
 'THINKCENTRE X1': 260.0}


# The Basics All In One

Here we will conclude the basics introduction with all three components in one 
section.  The three components will:

1. Skip the first (header) line.
2. Convert each row into a two-item tuple containing the model and the price.  
3. Convert the 2-item tuple into a dictionary with the model as the key and the 
   price as a float value.  

In [13]:
# pprint is used to produce nicely formatted output.
from pprint import pprint

# Import Section and SectionBreak
from sections import Section

# This is the demo input we will use.
price_list = [
    'MODEL NAME, CPU, RAM, PRICE',
    'THINKCENTRE X1, Core i5/6200, 8, $260',
    'THINKCENTRE M78, AMD A8-6500, 8, $30',
    'THINKCENTRE M53, Celeron, 8, $60',
    'THINKCENTRE M710Q, Intel Pentium, 8, $40',
    'DELL OPTIPLEX 7060, Core i7-8700, 8, $385',
    'DELL OPTIPLEX 790, Core i5/2500, 4, $20'
    ]

# Mini-functions for each processing action
def csv_parse(text):
    '''Split the supplied string at every occurrence of a comma.'''
    return text.split(',')


def drop_d(text_list):
    '''Remove every occurrence of '$' from each substring.'''
    return [txt.replace('$', '') for txt in text_list]


def drop_space(text_list):
    '''Remove space from the start and end of each substring.'''
    return [txt.strip() for txt in text_list]


def select_columns(text_list):
    '''Select the first and last columns.'''
    return (text_list[0], text_list[-1])


# Assembly function
def tuples_to_dict(text_tuples):
    '''Convert a sequence of 2-item tuples into a item dictionary 
    with the first tuple element as the key and the second as a float value.
    '''
    # Use a dictionary generator to take each two-element tuple, set the first 
    # as the dictionary key and convert the second into a float value.
    combined_dict = {row[0]: float(row[1]) for row in text_tuples}
    return combined_dict


# Define the section `price_all_in_one_section` 
# Define the starting boundary: After the line that starts with 'MODEL'
# Define the processor: split the text at each ',' remove the '\$' and spaces,
#   convert the first and last columns to a two-item tuple.
# Define the Assembler: Convert the 2-item tuple into a dictionary.  
price_all_in_one_section = Section(start_section=('MODEL', 'START', 'After'),
                                   processor=[csv_parse, drop_d, drop_space, 
                                              select_columns],
                                   aggregate=tuples_to_dict)


# Read the `price_list` text using the `price_all_in_one_section` 
pprint(price_all_in_one_section.read(price_list))

{'DELL OPTIPLEX 7060': 385.0,
 'DELL OPTIPLEX 790': 20.0,
 'THINKCENTRE M53': 60.0,
 'THINKCENTRE M710Q': 40.0,
 'THINKCENTRE M78': 30.0,
 'THINKCENTRE X1': 260.0}


These are simple introductory examples to fully unleash the power of the section 
module, see the tutorials in the Users Guide:
* **Advanced Section Breaks**
* **Advanced Processing** 
* **Subsections**
* **Context**
* **Text Functions**