# Pointers In Python

In this lecture we are going to take a deeper dive into:

* scoping
* assignment
* references
* pointers

Let's start with understanding how a computer runs a program. At a high level a program is run from a stack.  Instructions are popped off the stack in sequential order and then executed.  So let's come up with how the following program would be executed using our disassembler:

In [1]:
import dis
def function(x, y):
    return x + y

def main():
    print("Hello there")
    greeting = "Hello there"
    print(greeting)
    function(5, 6)
    print("We're all done")
    
main()
dis.dis(main)

Hello there
Hello there
We're all done
  6           0 LOAD_GLOBAL              0 (print)
              2 LOAD_CONST               1 ('Hello there')
              4 CALL_FUNCTION            1
              6 POP_TOP

  7           8 LOAD_CONST               1 ('Hello there')
             10 STORE_FAST               0 (greeting)

  8          12 LOAD_GLOBAL              0 (print)
             14 LOAD_FAST                0 (greeting)
             16 CALL_FUNCTION            1
             18 POP_TOP

  9          20 LOAD_GLOBAL              1 (function)
             22 LOAD_CONST               2 (5)
             24 LOAD_CONST               3 (6)
             26 CALL_FUNCTION            2
             28 POP_TOP

 10          30 LOAD_GLOBAL              0 (print)
             32 LOAD_CONST               4 ("We're all done")
             34 CALL_FUNCTION            1
             36 POP_TOP
             38 LOAD_CONST               0 (None)
             40 RETURN_VALUE


As you can see from the disassembled program, called `main` we do a series of `LOAD` instructions `CALL` instructions, `POP` instructions `STORE` instructions and `RETURN` instructions.  Each instruction type is the first part of the instruction name.

Any `LOAD` instruction is an example of using some memory and any `STORE` instruction is storing that instruction in memory.

So when we store the value "Hello there" in greeting we first load the string and then store it in the variable.  Understanding this high level program syntax will be helpful throughout our discussion of scoping rules and how they actually work.

We'll need one more tool in order to complete our analysis.  We'll need to understand how memory works, at a high level, in our computer.  

Memory is basically just a big list.  The addresses in memory are like the numbers in the index of a list.  In fact, in lower level languages memory access is done by simply adding a fixed amount from a start position to get the next discrete piece of data.  Because addition is very fast in computers, this means our memory can be accessed very efficiently when we do it in this way.

This of course has _many_ disadvantages from the programmers perspective making it unwieldy in practice.  We'll need this understanding at a high level though, so we can understand what we are accessing and this effects the run of our programs.  For this we'll explicitly work with the addresses of our different variables and objects.

In [3]:
def get_memory_address(variable):
    return hex(id(variable))

x = 10
get_memory_address(x)

'0xa6dde0'

The `id` function gets us the memory address of any variable or object in `Python`.  The `hex` function just converts this to hexidecimal notation, which is typically how memory addresses are stored in the machine.

We are now ready for our first example!

In [4]:
x = 10
y = x
print(get_memory_address(x))
print(get_memory_address(y))

0xa6dde0
0xa6dde0


As you can see the memory addresses of the variable `x` and the variable `y` are the same.  This isn't a big deal because they both reference the same value - `10`.  But what if we update one of the values?  We'd expect the memory addresses of x and y to change, because they no longer reference the same thing.

In [6]:
x = 10
y = x
x += 5
print(get_memory_address(x))
print(get_memory_address(y))

15
10
0xa6de80
0xa6dde0


This works as expected!  The `x` variable gets updated and so the memory addresses are no longer the same thing.  Let's see if this is always the case:

In [8]:
x = [10]
y = x
print("Before Augmentation")
print(x)
print(y)
print(get_memory_address(x))
print(get_memory_address(y))
x[0] += 5
print()
print("After Augmentation")
print(x)
print(y)
print(get_memory_address(x))
print(get_memory_address(y))

Before Augmentation
[10]
[10]
0x7fd5a0743348
0x7fd5a0743348

After Augmentation
[15]
[15]
0x7fd5a0743348
0x7fd5a0743348


Uh oh!  What happened?!? Here we are doing the same thing.  But remember, we are doing something strong than saying x equals y.  We are saying they take on the same memory address.  This means what happens to one happens to the other.  To borrow a concept from physics we can think of this as spooky interaction at a distance.  When we augment one thing, it effects the other, even though they aren't in physical contact of one another.  

If we want to fix this, we'll need to take an extra step:

In [10]:
import copy

x = [10]
y = copy.deepcopy(x)
print("Before Augmentation")
print(x)
print(y)
print(get_memory_address(x))
print(get_memory_address(y))
x[0] += 5
print()
print("After Augmentation")
print(x)
print(y)
print(get_memory_address(x))
print(get_memory_address(y))


Before Augmentation
[10]
[10]
0x7fd5a07757c8
0x7fd5a06e8148

After Augmentation
[15]
[10]
0x7fd5a07757c8
0x7fd5a06e8148


The reason this differs is because it differed from the start.  When we do a `copy.deepcopy` what we are doing is copying the values but not the memory addresses.  So we are creating more objects in memory _but_ we avoid the logical error we encounter above.

Let's see how else we can run into trouble with this.

In [13]:
def func_one():
    x = 10
    print(get_memory_address(x)) 
    
def func_two():
    y = 10
    print(get_memory_address(y))
    
func_one()
func_two()

0xa6dde0
0xa6dde0


This doesn't seem like a big deal, except for the fact that our function scoping rules can make this tricky.  Let's see how this can be used to really mess with scoping rules.

In [15]:
def func_one(listing):
    print(listing)
    listing[0] += 10
    print(listing)
    print(get_memory_address(listing))
    print(get_memory_address(listing[0]))
    
def func_two(listing):
    print(listing)
    listing[0] += 10
    print(listing)
    print(get_memory_address(listing))
    print(get_memory_address(listing[0]))
    
listing = [0]
func_one(listing)
func_two(listing)

[0]
[10]
0x7fd59c5280c8
0xa6dde0
[10]
[20]
0x7fd59c5280c8
0xa6df20


Oh no!  Not again :(  We didn't pass back our list, but because of the way lists are referenced in memory, the values get updated incorrectly.  What are some ways we can deal with this?

In [17]:
import copy

def func_one(listing):
    new_list = copy.deepcopy(listing)
    print(new_list)
    new_list[0] += 10
    print(new_list)
    print(get_memory_address(new_list))
    print(get_memory_address(new_list[0]))
    
def func_two(listing):
    new_list = copy.deepcopy(listing)
    print(new_list)
    new_list[0] += 10
    print(new_list)
    print(get_memory_address(new_list))
    print(get_memory_address(new_list[0]))
    
listing = [0]
func_one(listing)
func_two(listing)

[0]
[10]
0x7fd59c52ef08
0xa6dde0
[0]
[10]
0x7fd59ce575c8
0xa6dde0


Now everything works as expected.  Another strategy you can employ is only augment lists that you update:

In [20]:
def func_one(listing):
    listing[0] += 10
    print(get_memory_address(listing))
    print(get_memory_address(listing[0]))
    return listing
    
def func_two(listing):
    listing[0] += 10
    print(get_memory_address(listing))
    print(get_memory_address(listing[0]))
    return listing
    
listing = [0]
listing = func_one(listing)
print(listing)
listing = func_two(listing)
print(listing)

0x7fd59cda6288
0xa6dde0
[10]
0x7fd59cda6288
0xa6df20
[20]


As you can see from the above code, we get the expected behavior now by simply ensuring that our list transforms happen in order.  This is mostly because of the assignment statement after the return.  Unfortunately we could also do the following:

In [21]:
def func_one(listing):
    listing[0] += 10
    print(get_memory_address(listing))
    print(get_memory_address(listing[0]))
    return listing
    
def func_two(listing):
    listing[0] += 10
    print(get_memory_address(listing))
    print(get_memory_address(listing[0]))
    return listing
    
listing = [0]
func_one(listing)
print(listing)
func_two(listing)
print(listing)

0x7fd59c525c88
0xa6dde0
[10]
0x7fd59c525c88
0xa6df20
[20]


While this certainly "works" no reasonable programmer would look at this and know what it means.  So even though it's not "necessary" it's a very good idea to include the assignment statements.

In [27]:
class Node:
    def __init__(self, data, next):
        self.data = data
        self.next = next
        
    def __str__(self):
        return repr(self.data)


class LinkedList:
    def __init__(self):
        self.head = None
        
    def append(self, data):
        if self.head is None:
            self.head = Node(data, None)
        else:
            cur = self.head
            while cur.next:
                cur = cur.next
            cur.next = Node(data, None)
    

linked_list = LinkedList()
linked_list.append([])
linked_list.append([])

linked_list.head.data.append(1)

cur = linked_list.head
while cur:
    print(cur.data)
    cur = cur.next

[1]
[]
