Why IndentedStringImporter is gone ? #105

regexgit · 2019-11-05T16:03:27Z

About one year ago I had IndentedStringImporter installed with anytree. Now after a reinstallation of the OS and all my tools I realize that it is no longer present.

I use it a lot, for example biological taxonomies or import/export of photo tools hierarchical keywords are "controlled vocabulary" indented text files.

Fortunately I kept a backup of the code but but I would prefer an official installation.

c0fec0de · 2020-01-10T00:14:48Z

It was no official implementation. Just on a branch. I will double check.

als0052 · 2021-01-14T12:02:10Z

Just to add to the convo here I'm also interested in seeing the IndentedStringImporter (perhaps also an IndentedStringExporter?) added in. I read the initial feature request and tried looking for the source code by my github-foo is not very good.

LionKimbro · 2021-01-23T02:06:28Z

I was just thinking about implementing an indented string importer, something that would read:

Foo
  Bar
  Baz
    Boz
    Bitz
  Blah

...and construct a tree with just that.

If I implement such a thing in a branch, is there any chance that it would be accepted?
Is the project taking contributions?

LionKimbro · 2021-01-24T01:50:46Z

I've created a pull request with an implementation of the functionality, docstring documentation, and nose tests.

angely-dev · 2023-02-14T16:16:36Z

Any updates on this?

regexgit · 2023-02-15T09:17:25Z

It seems not.
If it can help you while waiting for an official version: I still use the original version (file indentedstringimporter.py of 2019) which I carefully kept.
No warranty of course but for my needs it's enough.

# -*- coding: utf-8 -*-
from anytree import AnyNode

#---------------------------------------
def _get_indentation(line):
	# Split string using version without indentation
	# First item of result is the indentation itself.
	content = line.lstrip(' ')
	indentation_length = len(line.split(content)[0])
	return indentation_length, content

#*******************************************************************************
class IndentedStringImporter(object):

	def __init__(self, nodecls=AnyNode):
		u"""
		Import Tree from a single string (with all the lines) or list of strings
		(lines) with indentation.
		
		Every indented line is converted to an instance of `nodecls`. The string
		(without indentation) found on the lines are set as the respective node name.
		
		This importer do not constrain indented data to have a definite number of
		whitespaces (multiple of any number). Nodes are considered child of a
		parent simply if its indentation is bigger than its parent.
		
		This means that the tree can have siblings with different indentations,
		as long as the siblings indentations are bigger than the respective parent
		(but not necessarily the same considering each other).
		
		Keyword Args:
		    nodecls: class used for nodes.
		
		Example using a string list:
		>>> from anytree.importer import IndentedStringImporter
		>>> from anytree import RenderTree
		>>> importer = IndentedStringImporter()
		>>> lines = [
		...    'Node1',
		...    'Node2',
		...    '    Node3',
		...    'Node5',
		...    '    Node6',
		...    '        Node7',
		...    '    Node8',
		...    '        Node9',
		...    '      Node10',
		...    '    Node11',
		...    '  Node12',
		...    'Node13',
		...]
		>>> root = importer.import_(lines)
		>>> print(RenderTree(root))
		AnyNode(name='root')
		├── AnyNode(name='Node1')
		├── AnyNode(name='Node2')
		│   └── AnyNode(name='Node3')
		├── AnyNode(name='Node5')
		│   ├── AnyNode(name='Node6')
		│   │   └── AnyNode(name='Node7')
		│   ├── AnyNode(name='Node8')
		│   │   ├── AnyNode(name='Node9')
		│   │   └── AnyNode(name='Node10')
		│   ├── AnyNode(name='Node11')
		│   └── AnyNode(name='Node12')
		└── AnyNode(name='Node13')
		Example using a string:
		>>> string = "Node1\n  Node2\n  Node3\n    Node4"
		>>> root = importer.import_(string)
		>>> print(RenderTree(root))
		 AnyNode(name='root')
		└── AnyNode(name='Node1')
		    ├── AnyNode(name='Node2')
		    └── AnyNode(name='Node3')
		        └── AnyNode(name='Node4')
		"""
		
		self.nodecls = nodecls
	
	#------------------------------------
	def _tree_from_indented_str(self, data):
		if isinstance(data, str):
			lines = data.splitlines()
		else:
			lines = data
		root = self.nodecls(name="root")
		indentations = {}
		for line in lines:
			cur_indent, name = _get_indentation(line)

			if len(indentations) == 0:
				parent = root
			elif cur_indent not in indentations:
				# parent is the next lower indentation
				keys = [key for key in indentations.keys()
						  if key < cur_indent]
				parent = indentations[max(keys)]
			else:
				# current line uses the parent of the last line
				# with same indentation
				# and replaces it as the last line with this given indentation
				parent = indentations[cur_indent].parent

			indentations[cur_indent] = self.nodecls(name=name, parent=parent)

			# delete all higher indentations
			keys = [key for key in indentations.keys() if key > cur_indent]
			for key in keys:
				indentations.pop(key)
		return root
	
	#------------------------------------
	def import_(self, data):
		# data: single string or a list of lines
		return self._tree_from_indented_str(data)

angely-dev · 2023-03-01T09:24:48Z

Thanks @regexgit for pointing out the original version, yet I ended up doing my own and lightweight implementation. It converts an indented config (not text, strictly speaking, since I assume each line to be unique per indented blocks) to an n-ary tree using raw nested dicts.

The goal was to compare (and merge) two config files whilst being aware of the indented blocks scope. Unlike anytree, it won't meet everyone's requirements but if anyone is interested: text to tree conversion in 10 lines of code and an example. I also published a simple gist.

lverweijen · 2023-07-04T19:57:03Z

I would also be interested in this.

I actually created my own version. It wasn't written for anytree (but can probably easily be changed) and it may not be very flexible or fault-tolerant, but it should be reasonably fast for correct input:

    def from_indented_file(file, indent='@'):  # Change to "    " if 4 spaces are desired
        # Each line consists of indent and code
        pattern = re.compile(rf"^(?P<prefix>({re.escape(indent)})*)(?P<code>.*)")

        root = Node()
        stack = [root]

        for line in file:
            match = pattern.match(line)
            prefix, code = match['prefix'], match['code']
            depth = len(prefix) // len(indent)
            parent_node = stack[depth]
            node = parent_node.add(code)  # Should probably change to node = Node(parent=parent_node)

            # Place node as last item on index depth + 1
            del stack[depth + 1:]
            stack.append(node)

   return root

If a pull request is accepted, maybe the best parts of all three implementations can be combined.
I would also like to have an export to an indented file with the same options.

als0052 mentioned this issue Jan 14, 2021

Feature Request - Node Tree from list of delimited strings #153

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why IndentedStringImporter is gone ? #105

Why IndentedStringImporter is gone ? #105

regexgit commented Nov 5, 2019

c0fec0de commented Jan 10, 2020

als0052 commented Jan 14, 2021

LionKimbro commented Jan 23, 2021

LionKimbro commented Jan 24, 2021

angely-dev commented Feb 14, 2023

regexgit commented Feb 15, 2023

angely-dev commented Mar 1, 2023 •

edited

lverweijen commented Jul 4, 2023 •

edited

Why IndentedStringImporter is gone ? #105

Why IndentedStringImporter is gone ? #105

Comments

regexgit commented Nov 5, 2019

c0fec0de commented Jan 10, 2020

als0052 commented Jan 14, 2021

LionKimbro commented Jan 23, 2021

LionKimbro commented Jan 24, 2021

angely-dev commented Feb 14, 2023

regexgit commented Feb 15, 2023

angely-dev commented Mar 1, 2023 • edited

lverweijen commented Jul 4, 2023 • edited

angely-dev commented Mar 1, 2023 •

edited

lverweijen commented Jul 4, 2023 •

edited