# Analysis Scope

The goal of this notebook is to give an overview of how LineaPy traces code, in order to provide a mental model for understanding the scope of the code extraction, by running through a number of examples.

## Simple Expressions

Lineapy translates expressions in Python into function calls and is able to understand the dependencies between them, by tracing their arguments:

In [1]:
# NBVAL_IGNORE_OUTPUT
# (print may be different if the .linea folder needs to be created)
from lineapy import save

x = 10

y = x + 100
z = x + 200

print(save(y, "y").get_code())

x = 10
y = x + 100



This also works with user defined functions:

In [2]:
def my_fn(x, y):
    return x + y + 1

z = my_fn(x, y)
print(save(z, "z").get_code())

x = 10
y = x + 100
def my_fn(x, y):
    return x + y + 1
z = my_fn(x, y)



## Function that mutate

Many functions in Python mutate their arguments, so we cannot rely on just traversing the call graph to see what nodes are needed. 

In order to properly understand these functions, we have manually annotated which ones mutate their arguments. We have covered many builtin functions and some functions from external libraries.

For example, we know that if you call `append` on a list, this will mutate the list, so if we are saving it, we should include the append calls:

In [3]:
x = []
x.append(10)
print(save(x, "x").get_code())

x = []
x.append(10)



The mutation tracking also differentiates between calls before and after a mutation, for example calling `len` before appending should not include the append call, but after should!

In [4]:
x = []
early_len = len(x)
x.append(10)
later_len = len(x)

print("Early:")
print(save(early_len, "early_len").get_code())
print("Later:")
print(save(later_len, "later_len").get_code())

Early:
x = []
early_len = len(x)

Later:
x = []
x.append(10)
later_len = len(x)



This means that if you are calling some function which mutates one of its args, and we haven't annotated that behavior, lineapy won't know about it and it might be lost when slicing!

## Tracking "views"

When an object is "mutated" we also consider all other objects which have references to it as having "mutated" as well. This is because their behavior might change after mutating this object.

Consider a dictionary with a list in it as a key:

In [5]:
l = []
d = {"a": l}

And we want to get the sum of the length of all values in the dict:

In [6]:
early_len = sum(map(len, d.values()))
early_len

0

In [7]:
print(save(early_len, "early_len").get_code())

l = []
d = {"a": l}
early_len = sum(map(len, d.values()))



Now we add an item to that list.

If we get the length after this, the value will be different:

In [8]:
l.append(10)
later_len = sum(map(len, d.values()))
later_len

1

So if we slice on the `later_len` it needs to include the code for the append to be accurate:

In [9]:
print(save(later_len, "later_len").get_code())

l = []
d = {"a": l}
l.append(10)
later_len = sum(map(len, d.values()))



We are able to track this by also annotating function if they add any "views" between their arguments or results. We consider views to be bidirectional, and if two objects are views of one another, a mutation of one will mutate the other.

## Control Flow

We currently have special cases for all Python statements which are not simple expressions, such as `for` loops, `while` loops, `if` statements and `with` statements. We don't do trim any lines out from inside of these, but do analyze them to see what functions are called:

In [10]:
z = 5

# We will include all of thise code, because we do not slice inside of control flow
if True:
    x = 10 + z
    y = 10

    
print(save(y, 'y').get_code())

z = 5
if True:
    x = 10 + z
    y = 10



We also analyze all function calls that happen in these blocks, to know if they mutate their arguments:

In [11]:
z = []

# We know that this code calls append on `z` so we know it mutates z
if True:
    z.append(10)

# We know that this code does not mutate `z`
if True:
    x = len(z)

print(save(z, 'z').get_code())

z = []
if True:
    z.append(10)



We only track the first call to each instruction while tracing control flow, so if it calls a different function the second time we won't pick it up:

In [12]:
z = []

# The first time in the loop, we call append, so we know about that call
for method in ("append", "__getitem__"):
    getattr(z, method)(0)

# In this loop, the first time we call __getitem__, so we only know about that, not append, so we don't know this mutates z
for method in ("__getitem__", "append"):
    getattr(z, method)(0)

print(save(z, 'z').get_code())

z = []
for method in ("append", "__getitem__"):
    getattr(z, method)(0)



## User Defined Functions

We also support some limited analysis of user defined functions.

We don't track what functions are called inside of them, so we don't know if they mutate their args, so we assume any global they access was mutated:

In [15]:
# we know that calling this function will depend on the `a` global
def use_global():
    return a + 10

a = 400

z = use_global()

print(save(z, "z").get_code())

def use_global():
    return a + 10
a = 400
z = use_global()



In [16]:
# we know this function uses `a`, but we don't know that it updates it. We assume it does
def append_global():
    return a.append(10)

a = []

append_global()

print(save(a, "a").get_code())

def append_global():
    return a.append(10)
a = []
append_global()



In [17]:
# we assume this mutates a, even if it does not!
def len_global():
    return len(a)

a = []

len_global()

print(save(a, "a").get_code())

def len_global():
    return len(a)
a = []
len_global()

