# Section Introductory Tutorial

## Introduction

 There are many good text readers and parsers available for Python (such as *csv*), 
 but they generally assume that the source they are reading from has uniform 
 formatting throughout.  However, often this is not the case. Different parts 
 of a text file may contain different types of information each of which require different approached to reading the data. 

 The Sections module is used to define, read and process distinct groups of items
 -- usually lines of text -- from an iterable source.  

The principal class is:

    Section(section_name: str = 'Section',
            start_section: (SectionBreak, List[SectionBreak], str, Optional)
            end_section: (SectionBreak, List[SectionBreak], str, Optional)
            processor: (ProcessingMethods, Section, List[Section], Optional)
            aggregate: (Callable, Optional)
            keep_partial: bool = False)

- Section defines a continuous portion of a text stream or other iterable.

- A section definition may include:

    - Starting and ending break points.
    - Processing instructions.
    - An aggregation method.

- A Section instance is created by defining one or Once a section has been defined, it can be applied to an iterator using:

`read(source)`
> Where
> *source* is any iterable supplying the text lines to be parsed.

Supporting classes:

`Trigger(sentinel, location=None, name)`: 
>  Define a test for evaluating a source item.

`SectionBreak(sentinel, location, break_offset, name)`: 
>  Identify the start or end of a section.

`Rule(sentinel, location, pass_method, fail_method, name)`: 
>  Apply a method based on trigger test result.

`RuleSet(rule_list, default, name)`:  
>  Apply a sequence of Rules, stopping with the first Rule to pass.
        
`ProcessingMethods(processing_methods, name)`: 
>  Apply a series of functions to a supplied sequence of items.

**Note:** Although the examples given here are focused on text, The Sectionary package works with any type of sequence.

## Imports

#### Standard Python Modules

In [25]:
from pathlib import Path
from pprint import pprint
#import re

#### Useful Third Party Packages

In [45]:
import pandas as pd
#import xlwings as xw

#### Sectionary Imports

In [26]:
import sections
#import text_reader as tp
#from sections import Rule, RuleSet, SectionBreak, ProcessingMethods, Section

## The Text to be Processed

This tutorial uses the output from the Windows `dir` command:
> `DIR "Test Dir Structure" /S /N /-C /T:W >  "test_DIR_Data.txt"`

More information on this command syntax and resulting output can be found 
[here](MS_Dir_Output.html)

The output from the `Dir` command is read in as a list of text lines by the command:
`dir_text = Path('test_DIR_Data.txt').read_text().splitlines()`

In [27]:
dir_text = Path('test_DIR_Data.txt').read_text().splitlines()

`dir_text` also can be obtained directly using an _iPython_ command:
> `dir_text = !DIR "Test Dir Structure" /S /N /-C /T:W`

### `Dir` Output Structure

The first 24 lines of `dir_text` are:


In [28]:
for line in dir_text[0:20]:
    print('\t', line)

	  Volume in drive C is Windows
	  Volume Serial Number is DAE7-D5BA
	 
	  Directory of c:\users\...\Test Dir Structure
	 
	 2021-12-27  03:33 PM    <DIR>          .
	 2021-12-27  03:33 PM    <DIR>          ..
	 2021-12-27  04:03 PM    <DIR>          Dir1
	 2021-12-27  05:27 PM    <DIR>          Dir2
	 2016-02-25  09:59 PM                 3 TestFile1.txt
	 2016-02-15  06:46 PM                 7 TestFile2.rtf
	 2016-02-15  06:47 PM                 0 TestFile3.docx
	 2016-04-21  01:06 PM              3491 xcopy.txt
	                4 File(s)           3501 bytes
	 
	  Directory of c:\users\...\Test Dir Structure\Dir1
	 
	 2021-12-27  04:03 PM    <DIR>          .
	 2021-12-27  04:03 PM    <DIR>          ..
	 2016-02-15  06:48 PM                 0 File in Dir One.txt


We will ignore the first two lines (the *header section*)

In [29]:
for line in dir_text[0:2]:
    print('\t', line)

	  Volume in drive C is Windows
	  Volume Serial Number is DAE7-D5BA


After this come multiple Folder sections something like this:

In [30]:
print(dir_text[3])
print()
for line in dir_text[5:9]:
    print(line)
print(dir_text[13])

 Directory of c:\users\...\Test Dir Structure

2021-12-27  03:33 PM    <DIR>          .
2021-12-27  03:33 PM    <DIR>          ..
2021-12-27  04:03 PM    <DIR>          Dir1
2021-12-27  05:27 PM    <DIR>          Dir2
               4 File(s)           3501 bytes


## Defining a Section

The start and end of a folder listing can be identified by key phrases:
- The section start is identified by the text '*Directory of*'
- The section end is identified by the text '*File(s)*'

### Define a Section Based on these start and end identifiers

In [31]:
dir_section = sections.Section(
    start_section='Directory of', 
    end_section='File(s)'
    )
dir_section.read(dir_text)

[' Directory of c:\\users\\...\\Test Dir Structure',
 '',
 '2021-12-27  03:33 PM    <DIR>          .',
 '2021-12-27  03:33 PM    <DIR>          ..',
 '2021-12-27  04:03 PM    <DIR>          Dir1',
 '2021-12-27  05:27 PM    <DIR>          Dir2',
 '2016-02-25  09:59 PM                 3 TestFile1.txt',
 '2016-02-15  06:46 PM                 7 TestFile2.rtf',
 '2016-02-15  06:47 PM                 0 TestFile3.docx',
 '2016-04-21  01:06 PM              3491 xcopy.txt']

`dir_section.read(dir_text)` returned the first folder listing in *dir_text*.
However, it is missing the final line:

In [32]:
print(dir_text[13])

               4 File(s)           3501 bytes


To include this line, we need to define the `end_section` to end *After* the specified text.  We include this information by creating a `SectionBreak` object and explicitly including the last line using the `break_offset` argument:

In [33]:
dir_section = sections.Section(
    start_section='Directory of',
    end_section=sections.SectionBreak('File(s)', break_offset='After'))

dir_section.read(dir_text)

[' Directory of c:\\users\\...\\Test Dir Structure',
 '',
 '2021-12-27  03:33 PM    <DIR>          .',
 '2021-12-27  03:33 PM    <DIR>          ..',
 '2021-12-27  04:03 PM    <DIR>          Dir1',
 '2021-12-27  05:27 PM    <DIR>          Dir2',
 '2016-02-25  09:59 PM                 3 TestFile1.txt',
 '2016-02-15  06:46 PM                 7 TestFile2.rtf',
 '2016-02-15  06:47 PM                 0 TestFile3.docx',
 '2016-04-21  01:06 PM              3491 xcopy.txt',
 '               4 File(s)           3501 bytes']

dir_text is a list so `dir_section.read(dir_text)` starts over at the beginning each time it is called.

In [34]:
pprint(dir_section.read(dir_text))
pprint(dir_section.read(dir_text))

[' Directory of c:\\users\\...\\Test Dir Structure',
 '',
 '2021-12-27  03:33 PM    <DIR>          .',
 '2021-12-27  03:33 PM    <DIR>          ..',
 '2021-12-27  04:03 PM    <DIR>          Dir1',
 '2021-12-27  05:27 PM    <DIR>          Dir2',
 '2016-02-25  09:59 PM                 3 TestFile1.txt',
 '2016-02-15  06:46 PM                 7 TestFile2.rtf',
 '2016-02-15  06:47 PM                 0 TestFile3.docx',
 '2016-04-21  01:06 PM              3491 xcopy.txt',
 '               4 File(s)           3501 bytes']
[' Directory of c:\\users\\...\\Test Dir Structure',
 '',
 '2021-12-27  03:33 PM    <DIR>          .',
 '2021-12-27  03:33 PM    <DIR>          ..',
 '2021-12-27  04:03 PM    <DIR>          Dir1',
 '2021-12-27  05:27 PM    <DIR>          Dir2',
 '2016-02-25  09:59 PM                 3 TestFile1.txt',
 '2016-02-15  06:46 PM                 7 TestFile2.rtf',
 '2016-02-15  06:47 PM                 0 TestFile3.docx',
 '2016-04-21  01:06 PM              3491 xcopy.txt',
 '        

By creating an iterator from *dir_text* `dir_text_iter = iter(dir_text)` 
(representing a text stream source) 
successive calls to `dir_section.read(dir_text_iter)` 
will return the next directory group

In [35]:
dir_text_iter = iter(dir_text)
pprint(dir_section.read(dir_text_iter))
pprint(dir_section.read(dir_text_iter))

[' Directory of c:\\users\\...\\Test Dir Structure',
 '',
 '2021-12-27  03:33 PM    <DIR>          .',
 '2021-12-27  03:33 PM    <DIR>          ..',
 '2021-12-27  04:03 PM    <DIR>          Dir1',
 '2021-12-27  05:27 PM    <DIR>          Dir2',
 '2016-02-25  09:59 PM                 3 TestFile1.txt',
 '2016-02-15  06:46 PM                 7 TestFile2.rtf',
 '2016-02-15  06:47 PM                 0 TestFile3.docx',
 '2016-04-21  01:06 PM              3491 xcopy.txt',
 '               4 File(s)           3501 bytes']
[' Directory of c:\\users\\...\\Test Dir Structure\\Dir1',
 '',
 '2021-12-27  04:03 PM    <DIR>          .',
 '2021-12-27  04:03 PM    <DIR>          ..',
 '2016-02-15  06:48 PM                 0 File in Dir One.txt',
 '2021-12-27  03:45 PM    <DIR>          SubFolder1',
 '2021-12-27  03:45 PM    <DIR>          SubFolder2',
 '               1 File(s)              0 bytes']


## Section Processing
Identifying sections is only the first step.
Next lets do something with the section text.

### Section Aggregate functions
Summarize a section's content by supplying the section definition with 
an `Aggregate` method.

`aggregate` (AggregateFunction, optional): 
A function used to collect and format, the section into a single object.
Defaults to None, which returns a list.

An `AggregateFunction` is a function that accepts a sequence of items as its first argument.  It may also accept a *Context* dictionary as its second argument, which supplies attributes generates in the Section class.

#### Important Note:
The sequence object passed to the `AggregateFunction` is actually a generator.  This is done to allow stream type sequences to be handled well.

When used with a for loop, a generator behaves just like a list, but a generator cannot be *sliced*.  If slicing is needed, simply convert the generator to a list with:<br>
> `section_list = list(section_gen)`.

 ### Format of a line from a dir listing
 The DIR output is formatted into columns with spaces for padding.
 The information can be extracted by identifying start and end columns:
 (The numbers on teh top an bottom are provided to aid with counting the text)
```
00000000001111111111222222222233333333334444444444555555555566666666667777777777
01234567890123456789012345678901234567890123456789012345678901234567890123456789
2021-12-27  03:33 PM    <DIR>          .
2021-12-27  03:33 PM    <DIR>          ..
2021-12-27  04:03 PM    <DIR>          Dir1
2021-12-27  05:27 PM    <DIR>          Dir2
2016-02-25  09:59 PM                 3 TestFile1.txt
2016-02-15  06:46 PM                 7 TestFile2.rtf
2016-02-15  06:47 PM                 0 TestFile3.docx
2016-04-21  01:06 PM              3491 xcopy.txt
00000000001111111111222222222233333333334444444444555555555566666666667777777777
01234567890123456789012345678901234567890123456789012345678901234567890123456789
```
- The First **20** characters contain the date and time
- The ending characters, starting at character number **39** 
contain the name of the file or directory
- The file size is in characters **29** to **38**
- Sub-directory names can be identified by the text _`<DIR>`_

The `summarize_directory` function converts a folder listing into a list of dictionaries and from there into a *Pandas* DataFrame

In [54]:
def summarize_directory(dir_src):
    dir_list = list(dir_src)  # Convert generator into a list
    
    # The first line contains the folder name: 
    #         Directory of c:\\users\\...\\Test Dir Structure
    folder_name = dir_list[0].rsplit('\\', 1)[1]
    
    # The last line contains the number of files
    file_count = dir_list[-1].strip().split(' ', 1)[0]
    
    # Process the rest of the lines 
    # (There is a blank line between the folder name and the first)
    folder_list = list()
    for dir_line in dir_list[2:-1]:
        # Include the folder info for each listing
        folder_dict = {
            'Folder': folder_name,
            'NumFiles': file_count
            }
        
        # First 20 characters contain the date
        folder_dict['DateModified'] = dir_line[:20].strip()
        
        # Ending Characters, starting at #39 contain the name of 
        # the file or directory
        folder_dict['Name'] = dir_line[39:].strip()
        
        # Check if the listing is a file or directory
        # Directories will contain the text <DIR>
        if '<DIR>' in dir_line:
            folder_dict['IsDir'] = True
            # No FileSize for directories
            folder_dict['FileSize'] = ''
        else:
            folder_dict['IsDir'] = False
            # The File Size is given in characters 29 to 38
            folder_dict['FileSize'] = dir_line[29:38].strip()
        folder_list.append(folder_dict)
    # Convert the list to a Pandas Dataframe for easy viewing
    folder_data = pd.DataFrame(folder_list)
    return  folder_data

With `aggregate=summarize_directory` in the *dir_section* definition,
the command `dir_section.read(dir_text)` results in a DataFrame object representing the folder listing.

In [55]:
dir_section = sections.Section(
    start_section='Directory of',
    end_section=sections.SectionBreak('File(s)', break_offset='After'),
    aggregate=summarize_directory)

dir_section.read(dir_text)

Unnamed: 0,Folder,NumFiles,DateModified,Name,IsDir,FileSize
0,Test Dir Structure,4,2021-12-27 03:33 PM,.,True,
1,Test Dir Structure,4,2021-12-27 03:33 PM,..,True,
2,Test Dir Structure,4,2021-12-27 04:03 PM,Dir1,True,
3,Test Dir Structure,4,2021-12-27 05:27 PM,Dir2,True,
4,Test Dir Structure,4,2016-02-25 09:59 PM,TestFile1.txt,False,3.0
5,Test Dir Structure,4,2016-02-15 06:46 PM,TestFile2.rtf,False,7.0
6,Test Dir Structure,4,2016-02-15 06:47 PM,TestFile3.docx,False,0.0
7,Test Dir Structure,4,2016-04-21 01:06 PM,xcopy.txt,False,3491.0


# *DONE TO HERE*

In [None]:
ones = ''.join([str(i) for i in range(10)])
ones*8
tens = ''.join([str(i)*10 for i in range(8)])
tens
print(tens)
print(ones*8)

00000000001111111111222222222233333333334444444444555555555566666666667777777777
01234567890123456789012345678901234567890123456789012345678901234567890123456789



> For the directory line, extract the directory name from the full path:
`'Folder Name:\t' + dir_line.rsplit('\\', 1)[1]`

> Get the number of files in the directory:
`'\tNumber of Files:\t' + dir_line.strip().split(' ', 1)[0]`

> Identify subdirectories:
`'\tSubdirectory:\t' + dir_line[36:]`

> Identify files:
`'\tFile:\t' + dir_line[36:]`   

In [36]:
def summarize_directory(dir_list):
    output_list = list()
    for dir_line in dir_list:
        # Get the directory name
        if 'Directory of' in dir_line:
            output_line = 'Folder Name:\t' + dir_line.rsplit('\\', 1)[1]
        # Label the subdirectories
        elif '<DIR>' in dir_line:
            output_line = '\tSubdirectory:\t' + dir_line[36:]
        # Label the file counts
        elif 'File(s)' in dir_line:
            output_line = 'Number of Files:\t' + dir_line.strip().split(' ', 1)[0]
        # Label the files
        else:
            output_line = '\tFile:\t\t' + dir_line[36:]
        output_list.append(output_line)
    return output_list 

In [37]:
dir_section = sections.Section(
    start_section='Directory of',
    end_section=sections.SectionBreak('File(s)', break_offset='After'),
    aggregate=summarize_directory)

for item in dir_section.read(dir_text):
    print(item)

Folder Name:	Test Dir Structure
	File:		
	Subdirectory:	   .
	Subdirectory:	   ..
	Subdirectory:	   Dir1
	Subdirectory:	   Dir2
	File:		 3 TestFile1.txt
	File:		 7 TestFile2.rtf
	File:		 0 TestFile3.docx
	File:		91 xcopy.txt
Number of Files:	4


In [None]:
def process_directory(dir_line):
    # Get the directory name
    if 'Directory of' in dir_line:
        output_line = 'Folder Name:\t' + dir_line.rsplit('\\', 1)[1]
    # Label the subdirectories
    elif '<DIR>' in dir_line:
        output_line = '\tSubdirectory:\t' + dir_line[36:]
    # Label the file counts
    elif 'File(s)' in dir_line:
        output_line = 'Number of Files:\t' + dir_line.strip().split(' ', 1)[0]
    # Label the files
    else:
        output_line = '\tFile:\t\t' + dir_line[36:]
    return output_line 

In [None]:
dir_text_iter = iter(dir_text)
pprint(dir_section.read(dir_text_iter))
pprint(dir_section.read(dir_text_iter))

In [None]:
dir_section = Section(start_section='Directory of', 
                      end_section=SectionBreak('File(s)', break_offset='After'),
                      processor=ProcessingMethods([process_directory]))

output = dir_section.read(dir_text)
for line in output:
    print(line)

### Rule and RuleSets
Instead of having one function `process_directory()` that manages all possible 
text lines in the section, the function can be broken down into parts by 
defining *Rules*.

In [None]:
def dir_name_split(line):
    return ['Folder Name:', line.rsplit('\\', 1)[1]]
dir_name_rule = Rule('Directory of', pass_method=dir_name_split)

def file_count_split(line):
    return ['Number of Files:', line.strip().split(' ', 1)[0]]
file_count_rule = Rule('File(s)', pass_method=file_count_split)

def subfolder(line):
    return ['Subdirectory:', line[36:]]
subfolder_rule = Rule('<DIR>', pass_method=subfolder)

def file(line):
    return ['File:' + line[36:]]

dir_process = RuleSet([dir_name_rule, file_count_rule, subfolder_rule], 
                      default=file)

In [None]:
dir_section = Section(start_section='Directory of', 
                      end_section=SectionBreak('File(s)', break_offset='After'),
                      processor=ProcessingMethods([dir_process]))

output = dir_section.read(dir_text)
for line in output:
    print(line)

In [None]:
dir_section = Section(start_section='Directory of', 
                      end_section=SectionBreak('File(s)', break_offset='After'),
                      processor=[dir_process])

output = dir_section.read(dir_text)
for line in output:
    print(line)

In [None]:
def dir_name_split(line):
    # Get the directory name
    if 'Directory of' in line:
        return ['Folder Name:', line.rsplit('\\', 1)[1]]
    return line

def subfolder(line):
    # Label the subdirectories
    if '<DIR>' in line:
        return ['Subdirectory:', line[36:]]
    return line

def file_count_split(line):
    # Label the file counts
    if 'File(s)' in line:
        return ['Number of Files:', line.strip().split(' ', 1)[0]]
    return line

def file(line):
    # Label the files
    return ['File:' + line[36:]] 

In [None]:
print('column index')
print(''.join(str(i)*10 for i in range(10)))
print(''.join(str(i) for i in range(10))*10)
print(dir_text[9])
    

In [None]:


#%% Regex Parsing patterns
# File Count and summary:
     #          1 File(s)          59904 bytes
     #         23 Dir(s)     63927545856 bytes free
folder_summary_pt = re.compile(
    '(?P<files>'       # beginning of files string group
    '[0-9]+'           # Integer number of files
    ')'                # end of files string group
    '[ ]+'             # Arbitrary number of spaces
    '(?P<type>'        # beginning of type string group
    'File|Dir'         # "File" or " Dir" text
    ')'                # end of type string group
    '\\(s\\)'          # "(s)" text
    '[ ]+'             # Arbitrary number of spaces
    '(?P<size>'        # beginning of size string group
    '[0-9]+'           # Integer size of folder
    ')'                # end of size string group
    ' bytes'           # "bytes" text
    )
date_pattern = tp.build_date_re(compile_re=False)
file_listing_pt = re.compile(
    f'{date_pattern}'  # Insert date pattern
    '[ ]+'             # Arbitrary number of spaces
    '(?P<size>'        # beginning of size string group
    '[0-9]+'           # Integer size of folder
    ')'                # end of size string group
    ' '                # Single space
    '(?P<filename>'    # beginning of filename string group
    '.*'               # Integer size of folder
    ')'                # end of size string group
    '$'                # end of string
    )


#%% Line Parsing Functions
# Directory Label Rule

def extract_directory(line: str, event, *args,
                    context=None, **kwargs) -> List[List[str]]:
    '''Extract Directory path from folder header.
    '''
    full_dir = line.replace('Directory of', '').strip()
    return [full_dir]


dir_header_rule = Rule(
    name='Dir Header Rule',
    sentinel='Directory of ',
    pass_method=extract_directory
    )


# skip <DIR>
def blank_line(*args, **kwargs) -> List[List[str]]:
    return [['']]


skip_dir_rule = Rule(
    name='Skip <DIR> Rule',
    sentinel=' <DIR> ',
    pass_method='Blank'
    )
skip_totals_rule = Rule(
    name='Skip Total Files Header Rule',
    sentinel='Total Files Listed:',
    pass_method='Blank'
    )


# Regular file listings
def file_parse(line: str, event, *args, **kwargs) -> List[List[str]]:
    '''Break file data into three columns containing Filename, Date, Size.

    Typical file is:
        2016-02-25  22:59     3 TestFile1.txt
    File line is parsed using a regular expression with 3 named groups.
    Output for the example above is:
        [[TestFile1.txt , 2016-02-25  22:59, 3]]

    Args:
        line (str): The text line to be parsed.
        event (re.match): The results of the trigger test on the line.
            Contains 3 named groups: ['date', 'size', 'filename'].
        *args & **kwargs: Catch unused extra parameters passed to file_parse.

    Returns:
        tp.ParseResults: A one-item list containing the parsed file
            information as a 3-item tuple:
                [(filename: str, date: str, file size: int)].
    '''
    file_line_parts = event.test_value.groupdict(default='')
    parsed_line = tuple([
        file_line_parts['filename'],
        tp.make_date_time_string(event),
        int(file_line_parts['size'])
        ])
    return parsed_line


# Regular File Parsing Rule
file_listing_rule = Rule(file_listing_pt, pass_method=file_parse,
                            name='Files_rule')


# File Count Parsing Rule
def file_count_parse(line: str, event, *args, **kwargs) -> List[List[str]]:
    '''Break file data into two rows containing:
           Number of files, & Directory size.

    Output has the following format:
        ['Number of files', file count value: int]
        ['Directory Size', directory size value: int]

    Typical line is:
        4 File(s)           3501 bytes
    File count is parsed using a regular expression with 2 named groups.

    Args:
        line (str): The text line to be parsed.
        event (re.match): The results of the trigger test size the line.
            Contains 3 named groups: ['files', 'type', 'size'].
        *args & **kwargs: Catch unused extra parameters passed to file_parse.

    Returns:
        tp.ParseResults: The parsed file information.
            The parsed file information consists of three lines with the
            following format:
                'Number of files', file count value: int
                'Directory Size', directory size value: int
    '''
    file_count_parts = event.groupdict(default='')
    # Manage case where bytes free is given:
    # 23 Dir(s)     63927545856 bytes free
    if line.strip().endswith('free'):
        file_count_parts['size_label'] = 'Free Space'
    else:
        file_count_parts['size_label'] = 'Size'
    parsed_line_template = ''.join([
        'Number of {type}s, {files}\n',
        'Directory {size_label}, {size}'
        ])
    parsed_line_str = parsed_line_template.format(**file_count_parts)
    parsed_line = [new_line.split(',')
                   for new_line in parsed_line_str.splitlines()]
    return parsed_line
file_count_rule = Rule(folder_summary_pt, pass_method=file_count_parse,
                          name='Files_rule')


skip_file_count_rule = Rule(
    name='Skip File(s) Rule',
    sentinel=folder_summary_pt,
    pass_method='Blank'
    )


# Files / DIRs Parse
def make_files_rule() -> Rule:
    '''If  File(s) or  Dir(s) extract # files & size
        '''
    def files_total_parse(line, event, *args, **kwargs) -> List[List[str]]:
        '''Break file counts into three columns containing:
           Type (File or Dir), Count, Size.

        The line:
               11 File(s)          72507 bytes
        Results in:
            [('File', 11, 3501)]
        The line:
           23 Dir(s)     63927545856 bytes free
        Results in:
            [('Dir', 23, 3501)]

    Args:
        line (str): The text line to be parsed.
        event (re.match): The results of the trigger test on the line.
            Contains 3 named groups: ['type', 'files', 'size'].
        *args & **kwargs: Catch unused extra parameters passed to file_parse.

    Returns:
        tp.ParseResults: A one-item list containing the parsed file count
            information as a 3-item tuple:
                [(Type: str (File or Dir), Count: int, Size: int)].
        '''
        files_dict = event.test_value.groupdict(default='')
        parsed_line = tuple([
            files_dict["type"],
            files_dict["files"],
            files_dict["size"]
            ])
        return [parsed_line]

    files_total_rule = Rule(folder_summary_pt,
                               pass_method=files_total_parse,
                               name='Files_Total_rule')
    return files_total_rule


default_csv = tp.define_csv_parser('dir_files', delimiter=':',
                                       skipinitialspace=True)


#%% Line Processing
def print_lines(parsed_list):
    output = list()
    for item in parsed_list:
        pprint(item)
        output.append(item)
    return output


def to_folder_dict(folder_list):
    '''Combine folder info into dictionary.
    '''
    # TODO separate directory info from file info
    #The first line in the folder list is the directory path
    directory = ''
    if folder_list:
        d_list = folder_list[0]
        if d_list:
            directory = d_list[0]
    folder_dict = {'Directory': directory}
    for folder_info in folder_list[1:]:
        filename, date, file_size = folder_info
        full_path = '\\'.join([directory, filename])
        file_parts = filename.rsplit('.', 1)
        if len(file_parts) > 1:
            extension = file_parts[1]
        else:
            extension = ''
        folder_dict = {
            'Path': full_path,
            'Directory': directory,
            'Filename': filename,
            'Extension': extension,
            'Date': date,
            'Size': file_size
            }
    return folder_dict


def make_files_table(dir_gen):
    '''Combine folder info dictionaries into Pandas DataFrame.
    '''
    list_of_folders = list(dir_gen)
    files_table = pd.DataFrame(list_of_folders)
    files_table.set_index('Path')
    return files_table


#%% Reader definitions
default_parser = tp.define_csv_parser('dir_files', delimiter=':',
                                       skipinitialspace=True)
heading_reader = ProcessingMethods([
    default_parser,
    tp.trim_items
    ])
folder_reader = ProcessingMethods([
    RuleSet([skip_dir_rule, file_listing_rule, dir_header_rule,
             skip_file_count_rule], default=default_parser),
    tp.drop_blanks
    ])
summary_reader = ProcessingMethods([
    RuleSet([file_count_rule, skip_totals_rule], default=default_parser),
    tp.drop_blanks
    ])


#%% SectionBreak definitions
folder_start = SectionBreak(
    name='Start of Folder', sentinel='Directory of', break_offset='Before')
folder_end = SectionBreak(name='End of Folder',sentinel=folder_summary_pt,
                             break_offset='After')
summary_start = SectionBreak(sentinel='Total Files Listed:',
                                name='Start of DIR Summary', break_offset='Before')


#%% Section definitions
header_section = Section(
    section_name='Header',
    start_section=None,
    end_section=folder_start,
    processor=heading_reader,
    aggregate=print_lines
    )
folder_section = Section(
    section_name='Folder',
    start_section=folder_start,
    end_section=folder_end,
    processor=folder_reader,
    aggregate=to_folder_dict
    )
all_folder_section = Section(
    section_name='All Folders',
    start_section=folder_start,
    end_section=summary_start,
    processor=[folder_section],
    aggregate=make_files_table
    )
summary_section = Section(
    section_name='Summary',
    start_section=summary_start,
    end_section=None,
    processor=summary_reader,
    aggregate=tp.to_dict
    )


#%% Main Iteration
def main():
    # Test File
    base_path = Path.cwd() / 'examples'
    test_file = base_path / 'test_DIR_Data.txt'

    # Call Primary routine
    context = {
        'File Name': test_file.name,
        'File Path': test_file.parent,
        'top_dir': str(base_path),
        'tree_name': 'Test folder Tree'
        }

    source = tp.file_reader(test_file)
    file_info = all_folder_section.read(source, context)
    #summary = summary_section.read(source, **context)

    # Output  Data
    xw.view(file_info)
    print('done')

if __name__ == '__main__':
    main()

In [None]:
print('column index')
print(''.join(str(i)*10 for i in range(10)))
print(''.join(str(i) for i in range(10))*10)
print(dir_text[9])
    

In [None]:
a =dir_text[3]
a.index('\\')
a.rsplit('\\', 1)
#'Folder Name:\t' + a.rsplit('\\', 1)[0]