# Python performance issues

# suppose we execute this code in C

```
long j, total;

total = 0;
j = 300;

for(; j<1000; j++)
    total += j;

the C compilier will reserve two 8 byte chunks of memory to hold the current values
of total and j. 

after 1st loop

    j: 300
    total: 300
    
after 2nd loop

    j: 301
    total: 601
```

in each loop, the values of j and total will be updated in their reserved memory chunks.
their previous values will be lost.



# look at Python version of loop

```
total = 0
for j in range(300, 1000):
    total += j
    
there is NO dedicated memory for storing the values of j and total. j and total
just point to objects in the heap. 

after 1st loop(heap object on right:

    j -> 300
    total -> 300

    after 2nd loop

    j -> 301
    total -> 601
      -> 300   # nothing is pointing to 300
      
    after 3rd loop
    j -> 302
    total -> 903
      -> 301
      -> 601
```

each time around the loop, we create two new int objects, and 
throw two objects away! way more work than C is doing!


# Boxed and Unboxed Data
- 'unboxed' refers to the the data itself
- 'long x;' in C reserves 8 bytes for x
    - no overhead
- 'boxed' refers to all the memory associated with the object
- an int in Python is an OBJECT.
- a '500' int object has other fields aside from the '500' value
    - attributes dict 
    - type info 
    - reference count
- very substantial memory overhead
    - int object uses 28 bytes!

In [None]:
import sys

# tells you how many bytes an object is using
sys.getsizeof(500)

# C code is complied, Python interpreted
- C compiliers generate highly optimized code
- Python interpreter is much slower

# Another issue - arrays
- suppose we want to sum a large array
- in C, we might do

```
// this array will be a contiguous chunk of 8000 bytes
long data[1000];

// something happens here to load data 
// now want to sum it

long j, total;

total = 0; 
j = 0;
for(; j< 1000; j++)
    total += data[j]
``` 


# Python doesn't have arrays, closest thing is a List

```
# make 1000 element list

data = 1000*[0]

# load data
# now sum it

total = 0
for e in data:
    total += e
    
```

# the numbers in data will NOT be contiguous, leading to poor cache performance
# lists take up memory


# Python performance problems
- Memory bloat
- interpreter is slow
- no contiguous arrays
    - poor utilization of cache
    - lists take extra memory

# Another problem - vector arithmetic
- unlike languages like Matlab, Mathematica and C++, Python does not provide
vector 'arithmetic', which is extremely useful in:
    - machine learning
    - statistics 
    - big data
    - parallel processing
    - science and engineering in general


In [None]:
# first time i tried this i was surprised
# expected to get [3,6,9]

[1,2,3]*3

In [None]:
# this doesn't work at all,
# i expected [4,10,18]

[1,2,3]*[4,5,6]

In [None]:
# this doesn't work either
# i expected to get back a 
# list of sin evals

import math

math.sin([1.,2.,3.])

In [None]:
# concatenates
# expected [5,7,9]

[1,2,3]+[4,5,6]