# Playing with Abstract Syntax Tree (AST)

Pyhton has a standard library called ast which can be used to conver code into ast. Lets take ast_example.py and convert it into AST.

In [1]:
import ast
file_path = 'ast_example.py'
ast_head = compile(open(file_path, 'rb').read(), file_path, 'exec', ast.PyCF_ONLY_AST)

**ast_head** is currently pointing to the head of AST. If we call body on it, it show its children. It will have two children. One representing statement **a=1** and one representing the function f.

In [2]:
ast_head.body

[<_ast.Assign at 0x1e77553bba8>, <_ast.FunctionDef at 0x1e77553bc50>]

Lets see what's inside the first children which is representing **a=1**. Whenever you don't know how to look inside a node, use **dir** method.

In [3]:
first_child = ast_head.body[0]
dir(first_child)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_attributes',
 '_fields',
 'col_offset',
 'lineno',
 'targets',
 'value']

Ohh so looks like it has a **targets** attribute and a **value** attribute which makes sense. If we call this method on the node then we will get the corresponding children nodes. Lets first check what is the value node which should be RHS of the assignment **a=1**.

In [4]:
first_child.value

<_ast.Num at 0x1e77553bc18>

As expected, it is a number. Lets look what's inside it.

In [5]:
dir(first_child.value)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_attributes',
 '_fields',
 'col_offset',
 'lineno',
 'n']

In [6]:
first_child.value.n

1

Woah!!! That's awesome. Lets check the LHS now.

In [7]:
first_child.targets

[<_ast.Name at 0x1e77553bbe0>]

It is a list because we can do **a=b=1**. Lets dig deeper.

In [8]:
dir(first_child.targets[0])

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_attributes',
 '_fields',
 'col_offset',
 'ctx',
 'id',
 'lineno']

In [9]:
first_child.targets[0].id

'a'

Now lets look into the second child of the head. We can also use **ast.dump** to see what's inside a node instead of **dir**.

In [10]:
second_child = ast_head.body[1]
ast.dump(second_child)

"FunctionDef(name='f', args=arguments(args=[arg(arg='b', annotation=None)], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[]), body=[Assign(targets=[Name(id='c', ctx=Store())], value=Str(s='AST is fun')), Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Name(id='c', ctx=Load())], keywords=[]))], decorator_list=[], returns=None)"

Looking at this we quickly know how to access all the nodes. Like **second_child.name** should return **f**.

In [11]:
second_child.name

'f'

In [12]:
second_child.args.args[0].arg

'b'

In [13]:
second_child.body[0].targets[0].id

'c'

In [14]:
second_child.body[0].value.s

'AST is fun'

In [15]:
second_child.body[1].value.func.id

'print'

In [16]:
second_child.body[1].value.args[0].id

'c'

Now lets try to change the first statement **a=1** to **a=2**. We saw that 1 is represented by a node of type **ast.Num** whose **n** attribute is set to 1. So we need to create a ast node with **n** attribute set to 2 and make it value of **first_child** node.

In [17]:
new_node = ast.Num(n=2)
first_child.value = new_node

Now we can use astunparse library to conver to ast back to code. This is not a standard library so you will need to install it.

In [330]:
file_path = "choices_test.py"

In [357]:
ast_head = compile(open(file_path, 'rb').read(), file_path, 'exec', ast.PyCF_ONLY_AST)

In [362]:
bfs_list = []
bfs_list = ast_head.body.copy()
parser_argument_list = []

i = 0

while True:
    if i >= len(bfs_list):
        break
        
    element = bfs_list[i]
    i += 1
    
    # Skip if there are any imports     
    if element.__class__ == ast.Import or element.__class__ == ast.ImportFrom:
        continue
    
    # Check for "if" class
    if element.__class__ == ast.If:
        bfs_list.extend(element.body)
    
    # Check for "functionDef" class     
    if isinstance(element, ast.FunctionDef):
        bfs_list.extend(element.body)
    
    # Check if it has the value attribute that we can extract
    if "value" not in element._fields:
        continue
        
    # Check if the element contains a function field
    if "func" not in element.value._fields and isinstance(element.value, ast.Call):
        continue

    # Check if "attr" field exists in this
    if "attr" not in element.value.func._fields:
        continue
    
    # Check if "add_argument" exists     
    if element.value.func.attr != "add_argument":
        continue
    
    # Find the parameter name
    if 0 < len(element.value.args) > 1:
        continue
    
    
    argument = element.value.args
    keywords = element.value.keywords
    parser_details = {"parameter": argument[0].s}
    
    for keyword in keywords:
        parser_key = keyword.arg
        parser_value_arg = keyword.value
        parsed_value = None
        
        parsed_value = extractor(parser_value_arg) 
        
        parser_details[parser_key] = parsed_value
        
    parser_argument_list.append(parser_details)
parser_argument_list

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_attributes', '_fields', 'args', 'col_offset', 'func', 'keywords', 'lineno']


[{'parameter': '--check-choices',
  'help': 'string help',
  'type': 'str',
  'required': True,
  'choices': ['rock', 'paper', 'scissors']},
 {'parameter': 'door', 'type': 'int', 'choices': 'Unknown Type'}]

In [361]:
def extractor(parser_value_arg):
    if isinstance(parser_value_arg, ast.Str):
        parsed_value = parser_value_arg.s
            
    elif isinstance(parser_value_arg, ast.Name):
        parsed_value = parser_value_arg.id

    elif isinstance(parser_value_arg, ast.NameConstant):
        parsed_value = parser_value_arg.value

    elif isinstance(parser_value_arg, ast.List):
        parsed_value = [extractor(x) for x in parser_value_arg.elts]
        
    elif isinstance(parser_value_arg, ast.Call):
        if parser_value_arg.value.func.id == "range":
            parsed_value = {"function_type": "range", ""}
        
    else:
        print(dir(parser_value_arg))
        parsed_value = "Unknown Type"
    return parsed_value

In [380]:
ast_head.body[1].body[2].value.keywords[1].value.func.id

'range'

In [385]:
ast_head.body[1].body[2].value.keywords[1].value.args[0].n

1