If you need to handle reasonably large outputs ~ 100MB, it would be faster to extract the particular block you are interested in, instead of creating a markup for the whole document

In [1]:
import orcaparse as op

You can create your own regex or use one from the standard ones (see add_block_example)

In [2]:
regex = (
    op.RegexSettings(op.DEFAULT_ORCA_REGEX_FILE)
    .items["TypeKnownBlocks"]
    .items["BlockOrcaFinalSinglePointEnergy"]
)
regex

RegexRequest(p_type=Block, p_subtype=BlockOrcaFinalSinglePointEnergy, pattern=^(-{25}\s+-{20}\nFINAL SI..., flags=re.MULTILINE, comment=This pattern matches the ...)

In [3]:
with open("example.out", "r") as file:
    long_orca = file.read()

In [4]:
processed_text, new_blocks = regex.apply(long_orca, show_progress=True)

Processing BlockOrcaFinalSinglePointEnergy: 100%|██████████| 625/625 [00:00<00:00, 1806643.69it/s] 


In [5]:
new_blocks

{8077673562398: {'Element': <orcaparse.orca_elements.BlockOrcaFinalSinglePointEnergy at 0x758badd011e0>,
  'CharPosition': (354, 500),
  'LinePosition': (17, 19)}}

In [6]:
for v in new_blocks.values():
    block: op.elements.Block = v["Element"]
    print(block.data()["Energy"])

-440.508559636589 hartree


If you want to work with something that weight more than 1GB and you have the same pattern repeating many times, you might benefit form excluding the creation of `Block` instances at all, working with the pure regex: 

In [9]:
compiled_pattern = regex.compile()
compiled_pattern

re.compile(r'^(-{25}\s+-{20}\nFINAL SINGLE POINT ENERGY\s+-?\d+\.\d+\n-{25}\s+-{20})$',
           re.MULTILINE|re.UNICODE)

In [10]:
for match in compiled_pattern.finditer(long_orca):
    print(match.group(1))

-------------------------   --------------------
FINAL SINGLE POINT ENERGY      -440.508559636589
-------------------------   --------------------
