# Numpy

**Prepared By:**
- Ashish Sharma
- Email: accssharma@gmail.com
- AI Developers, Boise
- AI Saturdays - Week 2

## References:
- [1] https://webcourses.ucf.edu/courses/1249560/pages/python-lists-vs-numpy-arrays-what-is-the-difference

## What is Numpy?

- efficient library for scientific computing in Python
- "NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays"


**About Cython**

Cython is an optimising static compiler for both the Python programming language and the extended Cython programming language (based on Pyrex). It makes writing C extensions for Python as easy as Python itself.

Cython gives you the combined power of Python and C to let you:

- write Python code that calls back and forth from and to C or C++ code natively at any point.
- easily tune readable Python code into plain C performance by adding static type declarations.
- use combined source code level debugging to find bugs in your Python, Cython and C code.
- interact efficiently with large data sets, e.g. using multi-dimensional NumPy arrays.
- quickly build your applications within the large, mature and widely used CPython ecosystem.
- integrate natively with existing code and data from legacy, low-level or high-performance libraries and applications.

The Cython language is a superset of the Python language that additionally supports calling C functions and declaring C types on variables and class attributes. This allows the compiler to generate very efficient C code from Cython code. The C code is generated once and then compiles with all major C/C++ compilers in CPython 2.6, 2.7 (2.4+ with Cython 0.20.x) as well as 3.3 and all later versions.

Cython programming language works best in wrapping external C libraries, embedding CPython into existing applications, and for fast C modules that speed up the execution of Python code.

## Numpy

- NumPy is the fundamental package for scientific computing in Python. 
- It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

At the core of the NumPy package, is the **ndarray** object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There are several important differences between NumPy arrays and the standard Python sequences:

- NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.
- The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory. The exception: one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays of different sized elements.
- NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.
- A growing plethora of scientific and mathematical Python-based packages are using NumPy arrays; though these typically support Python-sequence input, they convert such input to NumPy arrays prior to processing, and they often output NumPy arrays. In other words, in order to efficiently use much (perhaps even most) of today’s scientific/mathematical Python-based software, just knowing how to use Python’s built-in sequence types is insufficient - one also needs to know how to use NumPy arrays.

## ndarray object
- an array object that represents arrays in numpy which are:
    - High Performance MULTI-DIMENSIONAL array object
    - HOMOGENEOUS
    - FIXED-SIZE items
    
  
Numpy Array:
- grid of values, 
- all values have same type
- indexed by a tuple of nonnegative integers
- number of of dimensions is the rank of the array
- the shape of the array is a tuple of integers giving the size of the array along each dimension


List:
- python equivalent of an array (can also be used as stack, queue, etc.)
- size changes dynamically (is RESIZABLE)
- can contain elements of different types ( NOT necessarily HOMOGENEOUS)

Question: What's the real difference?
Answer: **Performance**

- Numpy data structures take up less space.
- Faster in computations
- Have optimized functions for operations like linear algebra, 

Discussions:
- [How much of Numpy is in C?](https://stackoverflow.com/questions/1825857/how-much-of-numpy-and-scipy-is-in-c)
- 

### MULTI-DIMENSIONAL

In [18]:
from random import random, sample
import numpy as np

Lists in python can also be multi-dimensional

In [90]:
# def create_random_array(n):
#     return sample(range(n), n)

# def create_nd_matrix(dim_n):
#     out = create_random_array(dim_n)
#     for i in range(dim_n-1):
#         out = [out, out]   
#     return out

#A = create_nd_matrix(5)

In [172]:
shape_arg = (20,20000,200,2)
num_type = np.int8

def create_random_array(n):
    # return sample(range(n), n)
    return [num_type(0) for i in range(n)]

def create_nd_matrix(shape=(2,2)):
    # print ("Generating a matrix of shape: {} and dimension: {} ".format(shape, len(shape)))
    rev = reversed(shape)
    out = create_random_array(rev.__next__())
    for j in range(len(shape)-1):
        out = [out for i in range(rev.__next__())]
    assert len(out) == shape[0]
    assert len(out[0]) == shape[1]
    return out

In [173]:
from functools import reduce
import sys

itemsize = num_type().itemsize
print ("Itemsize:", itemsize, "bytes")

total_array_size = reduce(lambda x, y: x*y, shape_arg)*itemsize
print ("Total array size:", total_array_size, "bytes")

Itemsize: 1 bytes
Total array size: 160000000 bytes


In [174]:
%time
A_py = create_nd_matrix(shape=shape_arg)
# list array of references 
# when things are added, array of references is resized, but the data must not
# be necessarily contiguously stored

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 6.68 µs


In [179]:
sys.getsizeof(A_py)

264

In [180]:
A_py[0][113][199][1]

0

In [176]:
%time 
A_np = np.zeros(shape_arg, num_type)
# An instance of class ndarray consists of a contiguous one-dimensional segment of computer memory

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 4.05 µs


In [177]:
print ("Total bytes consumed by numpy array:", A_np.nbytes)
sys.getsizeof(A_np) # whole ndarray object, including attributes and methods

Total bytes consumed by numpy array: 160000000


160000144

In [33]:
type(A_np[0][1][2])

numpy.float64

4

In [None]:
import numpy as np

sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

In [None]:
# Numpy inte