# Example: Output from Windows Dir command
This tutorial demonstrates the main features of the Sectionary package with a simple example; parsing the output of the Windows `DIR` command.

### Imports

#### Standard Python Modules

In [32]:
from typing import List
from pathlib import Path
from pprint import pprint
import re
import sys

#### Useful Third Party Packages

In [33]:
import pandas as pd
import xlwings as xw

#### Sectionary Imports

In [34]:
#sys.path.append(r'../src/sectionary') 

import text_reader as tp
from sections import Rule, RuleSet, SectionBreak, ProcessingMethods, Section

## The Sample `Dir` Output

The Windows `dir` command displays a list of a directory's files and subdirectories.  
It's output will be used to showcase some of the features of the *sectionary* package.

Adding switches (options) to the `dir` command control what it displays and the format of the output.
In thses examples we will be using the command line:

`DIR "Test Dir Structure" /S /N /-C /T:W >  "test_DIR_Data.txt"`

| Switch | Description                                                                                              |
|--------|----------------------------------------------------------------------------------------------------------|
| /S     | Lists every occurrence of the specified file name within the specified directory and all subdirectories. |
| /N     | Displays a long list format with file names on the far right of the screen.                              |
| /-C    | Hides the thousand separator in file sizes.                                                              |
| /T:W   | Specifies which time field to display as "Last written".                                                 |
| >      | Redirect the output to the specified file.                                                               |

For more information, see [DIR Command Syntax](https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/dir)

In [35]:
test_file = Path.cwd() / 'examples' / 'test_DIR_Data.txt'
dir_text = test_file.read_text().splitlines()

### `Dir` Output Structure

The first 20 lines of the diretory listing are:

In [36]:
for line in dir_text[0:20]:
    print('\t', line)

	  Volume in drive C is Windows
	  Volume Serial Number is DAE7-D5BA
	 
	  Directory of c:\users\...\Test Dir Structure
	 
	 2021-12-27  03:33 PM    <DIR>          .
	 2021-12-27  03:33 PM    <DIR>          ..
	 2021-12-27  04:03 PM    <DIR>          Dir1
	 2021-12-27  05:27 PM    <DIR>          Dir2
	 2016-02-25  09:59 PM                 3 TestFile1.txt
	 2016-02-15  06:46 PM                 7 TestFile2.rtf
	 2016-02-15  06:47 PM                 0 TestFile3.docx
	 2016-04-21  01:06 PM              3491 xcopy.txt
	                4 File(s)           3501 bytes
	 
	  Directory of c:\users\...\Test Dir Structure\Dir1
	 
	 2021-12-27  04:03 PM    <DIR>          .
	 2021-12-27  04:03 PM    <DIR>          ..
	 2016-02-15  06:48 PM                 0 File in Dir One.txt


We want to ignore the first two lines (the *header section*)

In [37]:
for line in dir_text[0:2]:
    print('\t', line)

	  Volume in drive C is Windows
	  Volume Serial Number is DAE7-D5BA


After this come multiple Folder sections something like this:

In [38]:
print(dir_text[3][0:23], '...', dir_text[3][-19:])
print()
for line in dir_text[5:9]:
    print(line)
print(dir_text[13])

 Directory of c:\users\ ... \Test Dir Structure

2021-12-27  03:33 PM    <DIR>          .
2021-12-27  03:33 PM    <DIR>          ..
2021-12-27  04:03 PM    <DIR>          Dir1
2021-12-27  05:27 PM    <DIR>          Dir2
               4 File(s)           3501 bytes


## Define a Section

### Define a Section Based on start and end identifiers:

The start and end of the folder listing can be identified by key phrases:
- The section start is identified by the text '*Directory of*'
- The section end is identified by the text '*File(s)*'

In [39]:
dir_section = Section(start_section='Directory of', end_section='File(s)')
dir_section.read(dir_text)

[' Directory of c:\\users\\...\\Test Dir Structure',
 '',
 '2021-12-27  03:33 PM    <DIR>          .',
 '2021-12-27  03:33 PM    <DIR>          ..',
 '2021-12-27  04:03 PM    <DIR>          Dir1',
 '2021-12-27  05:27 PM    <DIR>          Dir2',
 '2016-02-25  09:59 PM                 3 TestFile1.txt',
 '2016-02-15  06:46 PM                 7 TestFile2.rtf',
 '2016-02-15  06:47 PM                 0 TestFile3.docx',
 '2016-04-21  01:06 PM              3491 xcopy.txt']

### SectionBreak objects

`dir_section.read(dir_text)` returned the first folder listing in *dir_text*.
However, it is missing the final line:

In [40]:
print(dir_text[13])

               4 File(s)           3501 bytes


To include this line, we need to define the end_setion to end *After* the specified text.  We include this information by explicitly creating a `SectionBreak` object:

In [41]:
dir_section = Section(start_section='Directory of', 
                      end_section=SectionBreak('File(s)', break_offset='After'))

dir_section.read(dir_text)

[' Directory of c:\\users\\...\\Test Dir Structure',
 '',
 '2021-12-27  03:33 PM    <DIR>          .',
 '2021-12-27  03:33 PM    <DIR>          ..',
 '2021-12-27  04:03 PM    <DIR>          Dir1',
 '2021-12-27  05:27 PM    <DIR>          Dir2',
 '2016-02-25  09:59 PM                 3 TestFile1.txt',
 '2016-02-15  06:46 PM                 7 TestFile2.rtf',
 '2016-02-15  06:47 PM                 0 TestFile3.docx',
 '2016-04-21  01:06 PM              3491 xcopy.txt',
 '               4 File(s)           3501 bytes']

## Iterating through multiple sections

dir_text is a list so `dir_section.read(dir_text)` starts over at the beginning each time it is called.

In [42]:
dir_section.read(dir_text)

[' Directory of c:\\users\\...\\Test Dir Structure',
 '',
 '2021-12-27  03:33 PM    <DIR>          .',
 '2021-12-27  03:33 PM    <DIR>          ..',
 '2021-12-27  04:03 PM    <DIR>          Dir1',
 '2021-12-27  05:27 PM    <DIR>          Dir2',
 '2016-02-25  09:59 PM                 3 TestFile1.txt',
 '2016-02-15  06:46 PM                 7 TestFile2.rtf',
 '2016-02-15  06:47 PM                 0 TestFile3.docx',
 '2016-04-21  01:06 PM              3491 xcopy.txt',
 '               4 File(s)           3501 bytes']

In [43]:
dir_section.read(dir_text)

[' Directory of c:\\users\\...\\Test Dir Structure',
 '',
 '2021-12-27  03:33 PM    <DIR>          .',
 '2021-12-27  03:33 PM    <DIR>          ..',
 '2021-12-27  04:03 PM    <DIR>          Dir1',
 '2021-12-27  05:27 PM    <DIR>          Dir2',
 '2016-02-25  09:59 PM                 3 TestFile1.txt',
 '2016-02-15  06:46 PM                 7 TestFile2.rtf',
 '2016-02-15  06:47 PM                 0 TestFile3.docx',
 '2016-04-21  01:06 PM              3491 xcopy.txt',
 '               4 File(s)           3501 bytes']

By creating an iterator from *dir_text* `dir_text_iter = iter(dir_text)` 
(representing a text stream source) 
successive calls to `dir_section.read(dir_text_iter)` 
will return the next directory group

In [44]:
dir_text_iter = iter(dir_text)
dir_section.read(dir_text_iter)

[' Directory of c:\\users\\...\\Test Dir Structure',
 '',
 '2021-12-27  03:33 PM    <DIR>          .',
 '2021-12-27  03:33 PM    <DIR>          ..',
 '2021-12-27  04:03 PM    <DIR>          Dir1',
 '2021-12-27  05:27 PM    <DIR>          Dir2',
 '2016-02-25  09:59 PM                 3 TestFile1.txt',
 '2016-02-15  06:46 PM                 7 TestFile2.rtf',
 '2016-02-15  06:47 PM                 0 TestFile3.docx',
 '2016-04-21  01:06 PM              3491 xcopy.txt',
 '               4 File(s)           3501 bytes']

In [45]:
dir_section.read(dir_text_iter)

[' Directory of c:\\users\\...\\Test Dir Structure\\Dir1',
 '',
 '2021-12-27  04:03 PM    <DIR>          .',
 '2021-12-27  04:03 PM    <DIR>          ..',
 '2016-02-15  06:48 PM                 0 File in Dir One.txt',
 '2021-12-27  03:45 PM    <DIR>          SubFolder1',
 '2021-12-27  03:45 PM    <DIR>          SubFolder2',
 '               1 File(s)              0 bytes']

## Section Processing

Once identified, a section's content can be *processed* before being returned.
Automatic processing of the items in a section's content is specified with the 
*processor* argument in the *Section* definition. 

The *processor* argument takes a list of functions, *Rules*, or *RuleSets*. If 
the processor argument is not given or is `None` the items in the section are 
returned as-is.  *Rules* and *RuleSets* will be discussed in the next section.

Processor functions have one required positional argument, the item to be 
processed.  In addition, the function may contain a second positional argument,
a *context* dictionary.  The *context* dictionary will be discussed in a more
detail in a later section.  Additional keyword arguments may also be included.  
If the keyword matches with a key in the section's *context*, The corresponding 
*context* value will be supplied.  Otherwise the keyword argument will be 
ignored.

The functions will be applied in list order with the input of the function being 
the output from the previous function.  This means that the expected input type 
of a processor function should be able to handle all possible output types from 
the previous function in the list.

Processor functions may also be generator functions, in which case the required 
positional argument is the sequence to iterate over.  This can be useful if the 
processing involves skipping items or merging of multiple items.  Examples of 
this will be given in a separate tutorial.

### Processing Directory Listing Parts
There are 4 different text line types in a directory listing section as we have 
defined it.  
1. The directory path
2. Subdirectory listings
3. File listings
4. number of flies

Here we will write simple functions for each line type and a single processor 
function to handle all 4 types.

#### Directory Path
- The directory path line begins with the text *Directory of*:
> `Directory of c:\users\...\Test Dir Structure`
- Extract the directory name from the full path:
    1. Split the path at the last '\'. 
    2. Keep the right hand part after the split.<br>
    `text_line.rsplit('\\', 1)[1]`
- Return a tab delimited line with:
    - *Folder Name:* before the tab and 
    - The directory name after the tab
  
`output_line = 'Folder Name:\t' + dir_line.rsplit('\\', 1)[1]`

In [46]:
def dir_name_split(dir_line):
    output_line = 'Folder Name:\t' + dir_line.rsplit('\\', 1)[1]
    return output_line

#### Number of Files
- The last line in the listing gives the number of files in the directory.
- That line contains the text *File(s)*:
> `	                4 File(s)           3501 bytes`
- Extract the number of files from the beginning of the line:
    1. Strip off the initial white space.
    2. Split the remaining text after the first space
    3. Keep the left hand part before the split.<br>
    `text_line.strip().split(' ', 1)[0]`    
- Return a tab delimited line with:
    - An initial tab
    - The text *Number of Files:* followed by another tab
    - The extracted number of files.

`output_line = 'Number of Files:\t' + dir_line.strip().split(' ', 1)[0]`

In [47]:
def file_count_split(dir_line):
    output_line = 'Number of Files:\t' + dir_line.strip().split(' ', 1)[0]
    return output_line

#### Subdirectories
- Lines containing a directory listing are indicated with the text *\<DIR\>*
> `2021-12-27  04:03 PM    <DIR>          Dir1`
- The name of the subdirectory begins at text column 36<br>
    `text_line[36:]`    
- Return a tab delimited line with:
    - An initial tab
    - The text *Subdirectory:* followed by another tab
    - The extracted name of the subdirectory.

`output_line = '\tSubdirectory:\t' + dir_line[36:]`

In [48]:
def get_subfolder_name(dir_line):
    output_line = '\tSubdirectory:\t' + dir_line[36:]
    return output_line

#### Files
- The remaining lines are assumed to contain file information.
- `	 2016-02-25  09:59 PM                 3 TestFile1.txt`
- The name of the file begins at text column 36<br>
    `text_line[36:]`    
- Return a tab delimited line with:
    - An initial tab
    - The text *File:* followed by another tab
    - The extracted name of the file.

`output_line = '\tFile:\t\t' + dir_line[36:]`

In [49]:
def get_file_name(dir_line):
    output_line = '\tFile:\t\t' + dir_line[36:]
    return output_line

#### Process Directory Function
Combine the above functions into one function that checks for the appropriate line

In [50]:
def process_directory(dir_line):
    # Get the directory name
    if 'Directory of' in dir_line:
        output_line = dir_name_split(dir_line)
    # Label the subdirectories
    elif '<DIR>' in dir_line:
        output_line = get_subfolder_name(dir_line)
    # Label the file counts
    elif 'File(s)' in dir_line:
        output_line = file_count_split(dir_line)
    # Label the files
    else:
        output_line = get_file_name(dir_line)
    return output_line 

#### New Dir Section Definition

In [51]:
dir_section = Section(start_section='Directory of',
                      end_section=SectionBreak('File(s)', break_offset='After'),
                      processor=[process_directory])

output = dir_section.read(dir_text)
for line in output:
    print(line)

Folder Name:	Test Dir Structure
	File:		
	Subdirectory:	   .
	Subdirectory:	   ..
	Subdirectory:	   Dir1
	Subdirectory:	   Dir2
	File:		 3 TestFile1.txt
	File:		 7 TestFile2.rtf
	File:		 0 TestFile3.docx
	File:		91 xcopy.txt
Number of Files:	4


## Rule and RuleSets
Instead of having one function `process_directory()` that manages all possible 
text lines in the section, the function can be broken down into parts by 
defining *Rules*.

#### Rules
Rules define an action to take on an item depending on the result of a test.

A *Rule* definition has two parts:
1. Trigger:
   > Defines the test to be applied to the source item
   > Trigger related arguments:
   > - sentinel
   >   - For string items, sentinel can be a string or compiled regular expression.
   > - location
   >   - A sentinel modifier that applies to str or re.Pattern types of sentinels. One of  ['IN', 'START', 'END', 'FULL', None]. Default is None, which is treated as 'IN'

2. Action
   > Defines the actions to take depending on the Trigger outcome.
   > Action related arguments:
   > - pass_method
   > - fail_method
   >
   > Both take functions, or the name of standard actions to be implemented if the test passes or fails respectively.
   >
   > The pass_method and fail_method functions can be simple process functions, with one positional argument and additional keyword arguments. The functions can also contain a second positional argument *event* which allows the function to access information about the test results.  This is particularly useful when the sentinel is a regular expression.
   >
   > pass_method and fail_method can also be a string with the name of one of the standard actions.  The most common are:
   > - 'Original': return the item being.
   > - 'None': return None
   > - 'Blank': return ''  (an empty string)


#### RuleSets
RuleSets combine related Rules to provide multiple choices for actions.

- A Rule Set takes A sequence of Rules and a default method.
- Each Rule in the sequence will be applied to the input until One of the rules triggers. At that point The sequence ends.  
- If no Rule triggers then the default method is applied.  
- Each of the Rules (and the default) should expect the same input type and should produce the same output type.  
- The default_method can be any valid process function or standard action.



*Triggers*, *TriggerEvent*, *Rules* and *RuleSets* will be covered in more detail in a separate tutorial.
        

### Convert the Process Directory Function into Rules
The process_directory function consists of a set of `if` statements which each call a different function.  Each `if` statement can be converted into is own rule.

#### Get the directory name
```
if 'Directory of' in dir_line:
    output_line = dir_name_split(dir_line)
```
**Becomes the Rule:**

In [52]:
dir_name_rule = Rule('Directory of', pass_method=dir_name_split)

#### Label the subdirectories
```
elif '<DIR>' in dir_line:
    output_line = get_subfolder_name(dir_line)
```
**Becomes the Rule:**

In [53]:
subfolder_rule = Rule('<DIR>', pass_method=get_subfolder_name)

#### Label the file counts
```
elif 'File(s)' in dir_line:
    output_line = file_count_split(dir_line)
```
**Becomes the Rule:**

In [54]:
file_count_rule = Rule('File(s)', pass_method=file_count_split)

#### Label the files
```
else:
    output_line = get_file_name(dir_line)

```
This is not converted into a rule because there is no conditional.  Instead it becaomes the default method for a *RuleSet*:

In [55]:
dir_process = RuleSet([dir_name_rule, subfolder_rule, file_count_rule], 
                      default=get_file_name)

#### New Dir Section Definition

In [56]:
dir_section = Section(start_section='Directory of', 
                      end_section=SectionBreak('File(s)', break_offset='After'),
                      processor=[dir_process])

output = dir_section.read(dir_text)
for line in output:
    print(line)

Folder Name:	Test Dir Structure
	File:		
	Subdirectory:	   .
	Subdirectory:	   ..
	Subdirectory:	   Dir1
	Subdirectory:	   Dir2
	File:		 3 TestFile1.txt
	File:		 7 TestFile2.rtf
	File:		 0 TestFile3.docx
	File:		91 xcopy.txt
Number of Files:	4


# *DONE TO HERE*

In [None]:
for line in dir_text[0:20]:
    print('\t', line)

	  Volume in drive C is Windows
	  Volume Serial Number is DAE7-D5BA
	 
	  Directory of c:\users\...\Test Dir Structure
	 
	 2021-12-27  03:33 PM    <DIR>          .
	 2021-12-27  03:33 PM    <DIR>          ..
	 2021-12-27  04:03 PM    <DIR>          Dir1
	 2021-12-27  05:27 PM    <DIR>          Dir2
	 2016-02-25  09:59 PM                 3 TestFile1.txt
	 2016-02-15  06:46 PM                 7 TestFile2.rtf
	 2016-02-15  06:47 PM                 0 TestFile3.docx
	 2016-04-21  01:06 PM              3491 xcopy.txt
	                4 File(s)           3501 bytes
	 
	  Directory of c:\users\...\Test Dir Structure\Dir1
	 
	 2021-12-27  04:03 PM    <DIR>          .
	 2021-12-27  04:03 PM    <DIR>          ..
	 2016-02-15  06:48 PM                 0 File in Dir One.txt


In [60]:
print('column index')
print(''.join(str(i)*10 for i in range(8)))
print(''.join(str(i) for i in range(10))*8)
print(dir_text[8])
print(dir_text[12]) 

column index
00000000001111111111222222222233333333334444444444555555555566666666667777777777
01234567890123456789012345678901234567890123456789012345678901234567890123456789
2021-12-27  05:27 PM    <DIR>          Dir2
2016-04-21  01:06 PM              3491 xcopy.txt


In [70]:
a = tp.FixedWidthParser(locations=[20,30,39])
a.parse(dir_text[12])

['2016-04-21  01:06 PM', '          ', '    3491 ', 'xcopy.txt']

In [72]:
b = tp.define_fixed_width_parser(locations=[20,30,39])
b(dir_text[8])

<generator object FixedWidthParser.parser at 0x00000203D1DEE6D0>

In [73]:
list(b(dir_text[8]))

[['2021-12-27  05:27 PM', '    <DIR> ', '         ', 'Dir2']]

In [None]:
# Define Functions
def dir_name_split(dir_line):
    output_dict = {'Folder Name': dir_line.rsplit('\\', 1)[1]}
    return output_dict
def file_count_split(dir_line):
    output_dict = {'Number of Files': dir_line.strip().split(' ', 1)[0]}
    return output_dict
def get_subfolder_name(dir_line):
    output_dict = {'Subdirectory': dir_line[36:]}
    return output_dict
def get_file_name(dir_line):
    output_dict = {'File': dir_line[36:]}
    return output_dict

# Define Rules
dir_name_rule = Rule('Directory of', pass_method=dir_name_split)
subfolder_rule = Rule('<DIR>', pass_method=get_subfolder_name)
file_count_rule = Rule('File(s)', pass_method=file_count_split)

#Define Rule Set
dir_process = RuleSet([dir_name_rule, subfolder_rule, file_count_rule], 
                      default=get_file_name)


## Documentation

In [31]:
#print(Section.__doc__)
#print(SectionBreak.__init__.__doc__)
#print(Section.__init__.__doc__)
#print(ProcessingMethods.__doc__)
#print(ProcessingMethods.__init__.__doc__)
#print(Rule.__doc__)
#print(Rule.__init__.__doc__)
#print(RuleSet.__doc__)
#print(RuleSet.__init__.__doc__)

#from sections import Trigger, TriggerEvent
#print(Trigger.__doc__)
#print(Trigger.__init__.__doc__)
#print(TriggerEvent.__doc__)
#print(TriggerEvent.__init__.__doc__)


In [32]:
dir_section = Section(start_section='Directory of', 
                      end_section=SectionBreak('File(s)', break_offset='After'),
                      processor=ProcessingMethods([dir_process]))

output = dir_section.read(dir_text)
for line in output:
    print(line)

['Folder Name:', 'Test Dir Structure']
['File:']
['Subdirectory:', '   .']
['Subdirectory:', '   ..']
['Subdirectory:', '   Dir1']
['Subdirectory:', '   Dir2']
['File: 3 TestFile1.txt']
['File: 7 TestFile2.rtf']
['File: 0 TestFile3.docx']
['File:91 xcopy.txt']
['Number of Files:', '4']


## Section Aggregates

A section's content can be summarized by supplying the section with an 
`Aggregate` method.  The `aggregate` argument takes an *Aggregate* function; one
that combines the section sequence into a single object.

#### Aggregate Functions
Aggregate function are functions that can act on a sequence to combine them in 
some form.  The simplest aggregate function (and also the default) is the 
built-in list command.

The aggregate function has one required positional argument, the sequence to be 
aggregated.  In addition, the function may contain a second positional argument,
a *context* dictionary.  The *context* dictionary will be discussed in a more
detail in a later section.  Additional keyword arguments may also be included.  
If the keyword matches with a key in the section's *context*, The corresponding 
*context* value will be supplied.  Otherwise the keyword argument will be 
ignored.

In [35]:
print('column index')
print(''.join(str(i)*10 for i in range(10)))
print(''.join(str(i) for i in range(10))*10)
print(dir_text[9])
    

column index
0000000000111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
2016-02-25  09:59 PM                 3 TestFile1.txt


In [42]:


#%% Regex Parsing patterns
# File Count and summary:
     #          1 File(s)          59904 bytes
     #         23 Dir(s)     63927545856 bytes free
folder_summary_pt = re.compile(
    '(?P<files>'       # beginning of files string group
    '[0-9]+'           # Integer number of files
    ')'                # end of files string group
    '[ ]+'             # Arbitrary number of spaces
    '(?P<type>'        # beginning of type string group
    'File|Dir'         # "File" or " Dir" text
    ')'                # end of type string group
    '\\(s\\)'          # "(s)" text
    '[ ]+'             # Arbitrary number of spaces
    '(?P<size>'        # beginning of size string group
    '[0-9]+'           # Integer size of folder
    ')'                # end of size string group
    ' bytes'           # "bytes" text
    )
date_pattern = tp.build_date_re(compile_re=False)
file_listing_pt = re.compile(
    f'{date_pattern}'  # Insert date pattern
    '[ ]+'             # Arbitrary number of spaces
    '(?P<size>'        # beginning of size string group
    '[0-9]+'           # Integer size of folder
    ')'                # end of size string group
    ' '                # Single space
    '(?P<filename>'    # beginning of filename string group
    '.*'               # Integer size of folder
    ')'                # end of size string group
    '$'                # end of string
    )



In [None]:

#%% Line Parsing Functions
# Directory Label Rule

def extract_directory(line: str, event, *args,
                    context=None, **kwargs) -> List[List[str]]:
    '''Extract Directory path from folder header.
    '''
    full_dir = line.replace('Directory of', '').strip()
    return [full_dir]


dir_header_rule = Rule(
    name='Dir Header Rule',
    sentinel='Directory of ',
    pass_method=extract_directory
    )


# skip <DIR>
def blank_line(*args, **kwargs) -> List[List[str]]:
    return [['']]


skip_dir_rule = Rule(
    name='Skip <DIR> Rule',
    sentinel=' <DIR> ',
    pass_method='Blank'
    )
skip_totals_rule = Rule(
    name='Skip Total Files Header Rule',
    sentinel='Total Files Listed:',
    pass_method='Blank'
    )


# Regular file listings
def file_parse(line: str, event, *args, **kwargs) -> List[List[str]]:
    '''Break file data into three columns containing Filename, Date, Size.

    Typical file is:
        2016-02-25  22:59     3 TestFile1.txt
    File line is parsed using a regular expression with 3 named groups.
    Output for the example above is:
        [[TestFile1.txt , 2016-02-25  22:59, 3]]

    Args:
        line (str): The text line to be parsed.
        event (re.match): The results of the trigger test on the line.
            Contains 3 named groups: ['date', 'size', 'filename'].
        *args & **kwargs: Catch unused extra parameters passed to file_parse.

    Returns:
        tp.ParseResults: A one-item list containing the parsed file
            information as a 3-item tuple:
                [(filename: str, date: str, file size: int)].
    '''
    file_line_parts = event.test_value.groupdict(default='')
    parsed_line = tuple([
        file_line_parts['filename'],
        tp.make_date_time_string(event),
        int(file_line_parts['size'])
        ])
    return parsed_line


# Regular File Parsing Rule
file_listing_rule = Rule(file_listing_pt, pass_method=file_parse,
                            name='Files_rule')


# File Count Parsing Rule
def file_count_parse(line: str, event, *args, **kwargs) -> List[List[str]]:
    '''Break file data into two rows containing:
           Number of files, & Directory size.

    Output has the following format:
        ['Number of files', file count value: int]
        ['Directory Size', directory size value: int]

    Typical line is:
        4 File(s)           3501 bytes
    File count is parsed using a regular expression with 2 named groups.

    Args:
        line (str): The text line to be parsed.
        event (re.match): The results of the trigger test size the line.
            Contains 3 named groups: ['files', 'type', 'size'].
        *args & **kwargs: Catch unused extra parameters passed to file_parse.

    Returns:
        tp.ParseResults: The parsed file information.
            The parsed file information consists of three lines with the
            following format:
                'Number of files', file count value: int
                'Directory Size', directory size value: int
    '''
    file_count_parts = event.groupdict(default='')
    # Manage case where bytes free is given:
    # 23 Dir(s)     63927545856 bytes free
    if line.strip().endswith('free'):
        file_count_parts['size_label'] = 'Free Space'
    else:
        file_count_parts['size_label'] = 'Size'
    parsed_line_template = ''.join([
        'Number of {type}s, {files}\n',
        'Directory {size_label}, {size}'
        ])
    parsed_line_str = parsed_line_template.format(**file_count_parts)
    parsed_line = [new_line.split(',')
                   for new_line in parsed_line_str.splitlines()]
    return parsed_line
file_count_rule = Rule(folder_summary_pt, pass_method=file_count_parse,
                          name='Files_rule')


skip_file_count_rule = Rule(
    name='Skip File(s) Rule',
    sentinel=folder_summary_pt,
    pass_method='Blank'
    )


# Files / DIRs Parse
def make_files_rule() -> Rule:
    '''If  File(s) or  Dir(s) extract # files & size
        '''
    def files_total_parse(line, event, *args, **kwargs) -> List[List[str]]:
        '''Break file counts into three columns containing:
           Type (File or Dir), Count, Size.

        The line:
               11 File(s)          72507 bytes
        Results in:
            [('File', 11, 3501)]
        The line:
           23 Dir(s)     63927545856 bytes free
        Results in:
            [('Dir', 23, 3501)]

    Args:
        line (str): The text line to be parsed.
        event (re.match): The results of the trigger test on the line.
            Contains 3 named groups: ['type', 'files', 'size'].
        *args & **kwargs: Catch unused extra parameters passed to file_parse.

    Returns:
        tp.ParseResults: A one-item list containing the parsed file count
            information as a 3-item tuple:
                [(Type: str (File or Dir), Count: int, Size: int)].
        '''
        files_dict = event.test_value.groupdict(default='')
        parsed_line = tuple([
            files_dict["type"],
            files_dict["files"],
            files_dict["size"]
            ])
        return [parsed_line]

    files_total_rule = Rule(folder_summary_pt,
                               pass_method=files_total_parse,
                               name='Files_Total_rule')
    return files_total_rule


default_csv = tp.define_csv_parser('dir_files', delimiter=':',
                                       skipinitialspace=True)


#%% Line Processing
def print_lines(parsed_list):
    output = list()
    for item in parsed_list:
        pprint(item)
        output.append(item)
    return output


def to_folder_dict(folder_list):
    '''Combine folder info into dictionary.
    '''
    # TODO separate directory info from file info
    #The first line in the folder list is the directory path
    directory = ''
    if folder_list:
        d_list = folder_list[0]
        if d_list:
            directory = d_list[0]
    folder_dict = {'Directory': directory}
    for folder_info in folder_list[1:]:
        filename, date, file_size = folder_info
        full_path = '\\'.join([directory, filename])
        file_parts = filename.rsplit('.', 1)
        if len(file_parts) > 1:
            extension = file_parts[1]
        else:
            extension = ''
        folder_dict = {
            'Path': full_path,
            'Directory': directory,
            'Filename': filename,
            'Extension': extension,
            'Date': date,
            'Size': file_size
            }
    return folder_dict


def make_files_table(dir_gen):
    '''Combine folder info dictionaries into Pandas DataFrame.
    '''
    list_of_folders = list(dir_gen)
    files_table = pd.DataFrame(list_of_folders)
    files_table.set_index('Path')
    return files_table


#%% Reader definitions
default_parser = tp.define_csv_parser('dir_files', delimiter=':',
                                       skipinitialspace=True)
heading_reader = ProcessingMethods([
    default_parser,
    tp.trim_items
    ])
folder_reader = ProcessingMethods([
    RuleSet([skip_dir_rule, file_listing_rule, dir_header_rule,
             skip_file_count_rule], default=default_parser),
    tp.drop_blanks
    ])
summary_reader = ProcessingMethods([
    RuleSet([file_count_rule, skip_totals_rule], default=default_parser),
    tp.drop_blanks
    ])


#%% SectionBreak definitions
folder_start = SectionBreak(
    name='Start of Folder', sentinel='Directory of', break_offset='Before')
folder_end = SectionBreak(name='End of Folder',sentinel=folder_summary_pt,
                             break_offset='After')
summary_start = SectionBreak(sentinel='Total Files Listed:',
                                name='Start of DIR Summary', break_offset='Before')


#%% Section definitions
header_section = Section(
    section_name='Header',
    start_section=None,
    end_section=folder_start,
    processor=heading_reader,
    aggregate=print_lines
    )
folder_section = Section(
    section_name='Folder',
    start_section=folder_start,
    end_section=folder_end,
    processor=folder_reader,
    aggregate=to_folder_dict
    )
all_folder_section = Section(
    section_name='All Folders',
    start_section=folder_start,
    end_section=summary_start,
    processor=[folder_section],
    aggregate=make_files_table
    )
summary_section = Section(
    section_name='Summary',
    start_section=summary_start,
    end_section=None,
    processor=summary_reader,
    aggregate=tp.to_dict
    )


#%% Main Iteration
def main():
    # Test File
    base_path = Path.cwd() / 'examples'
    test_file = base_path / 'test_DIR_Data.txt'

    # Call Primary routine
    context = {
        'File Name': test_file.name,
        'File Path': test_file.parent,
        'top_dir': str(base_path),
        'tree_name': 'Test folder Tree'
        }

    source = tp.file_reader(test_file)
    file_info = all_folder_section.read(source, context)
    #summary = summary_section.read(source, **context)

    # Output  Data
    xw.view(file_info)
    print('done')

if __name__ == '__main__':
    main()

In [None]:
print('column index')
print(''.join(str(i)*10 for i in range(10)))
print(''.join(str(i) for i in range(10))*10)
print(dir_text[9])
    

In [None]:
a =dir_text[3]
a.index('\\')
a.rsplit('\\', 1)
#'Folder Name:\t' + a.rsplit('\\', 1)[0]