<a href="https://colab.research.google.com/github/belanasaikiran/CSE-5717-Big-Data-Analytics/blob/main/02/02_03_Computation_on_arrays_ufuncs_practice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
# practice

# Computation on NumPy Arrays: Universal Functions

Up until now, we have been discussing some of the basic nuts and bolts of NumPy; in the next few sections, we will dive into the reasons that NumPy is so important in the Python data science world.
Namely, it provides an easy and flexible interface to optimized computation with arrays of data.

Computation on NumPy arrays can be very fast, or it can be very slow.
The key to making it fast is to use *vectorized* operations, generally implemented through NumPy's *universal functions* (ufuncs).
This section motivates the need for NumPy's ufuncs, which can be used to make repeated calculations on array elements much more efficient.
It then introduces many of the most common and useful arithmetic ufuncs available in the NumPy package.

## The Slowness of Loops

Python's default implementation (known as CPython) does some operations very slowly.
This is in part due to the dynamic, interpreted nature of the language: the fact that types are flexible, so that sequences of operations cannot be compiled down to efficient machine code as in languages like C and Fortran.
Recently there have been various attempts to address this weakness: well-known examples are the [PyPy](http://pypy.org/) project, a just-in-time compiled implementation of Python; the [Cython](http://cython.org) project, which converts Python code to compilable C code; and the [Numba](http://numba.pydata.org/) project, which converts snippets of Python code to fast LLVM bytecode.
Each of these has its strengths and weaknesses, but it is safe to say that none of the three approaches has yet surpassed the reach and popularity of the standard CPython engine.

The relative sluggishness of Python generally manifests itself in situations where many small operations are being repeated – for instance looping over arrays to operate on each element.
For example, imagine we have an array of values and we'd like to compute the reciprocal of each.
A straightforward approach might look like this:

In [3]:
import numpy as np
np.random.seed(0)

def compute_reciprocals(values):
  output = np.empty(len(values))
  for i in range(len(values)):
    output[i] = 1.0/values[i]
  return output

values = np.random.randint(1, 10, size= 5)
compute_reciprocals(values)

array([0.16666667, 1.        , 0.25      , 0.25      , 0.125     ])

This implementation probably feels fairly natural to someone from, say, a C or Java background.
But if we measure the execution time of this code for a large input, we see that this operation is very slow, perhaps surprisingly so!
We'll benchmark this with IPython's ``%timeit`` magic (discussed in [Profiling and Timing Code](01.07-Timing-and-Profiling.ipynb)):

In [4]:
big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)

1.3 s ± 65.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


It takes several seconds to compute these million operations and to store the result!
When even cell phones have processing speeds measured in Giga-FLOPS (i.e., billions of numerical operations per second), this seems almost absurdly slow.
It turns out that the bottleneck here is not the operations themselves, but the type-checking and function dispatches that CPython must do at each cycle of the loop.
Each time the reciprocal is computed, Python first examines the object's type and does a dynamic lookup of the correct function to use for that type.
If we were working in compiled code instead, this type specification would be known before the code executes and the result could be computed much more efficiently.

## Introducing UFuncs

For many types of operations, NumPy provides a convenient interface into just this kind of statically typed, compiled routine. This is known as a *vectorized* operation.
This can be accomplished by simply performing an operation on the array, which will then be applied to each element.
This vectorized approach is designed to push the loop into the compiled layer that underlies NumPy, leading to much faster execution.

Compare the results of the following two:

In [5]:
print(compute_reciprocals(values))
print(1.0/values)

[0.16666667 1.         0.25       0.25       0.125     ]
[0.16666667 1.         0.25       0.25       0.125     ]


In [6]:
%timeit (1.0 / big_array)

2.67 ms ± 280 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


> I'm not adding any comments from here as this is a practice folder and I can save more time by skipping that. You can use Class Lecture notes folder for detailed information

In [7]:
np.arange(5) / np.arange(1, 6)

array([0.        , 0.5       , 0.66666667, 0.75      , 0.8       ])

In [8]:
# Ufunc operations are not just for 1D arrays but can also be implemented for multi-dimensional arrays as well

In [9]:
x = np.arange(9).reshape((3,3))

In [10]:
x

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [11]:
2 ** x

array([[  1,   2,   4],
       [  8,  16,  32],
       [ 64, 128, 256]], dtype=int32)

## Exploring Numpy's UFuncs

1. Unary ufuncs
2. binary ufuncs

In [12]:
x = np.arange(4)


In [13]:
print("x = ", x)

x =  [0 1 2 3]


In [15]:
print("x + 5 = ", x + 5)
print("x - 5 = ", x - 5)
print("x * 5 = ", x * 2)
print("x / 5 = ", x / 2)
print("x // 5 = ", x // 2) #floor division

x + 5 =  [5 6 7 8]
x - 5 =  [-5 -4 -3 -2]
x * 5 =  [0 2 4 6]
x / 5 =  [0.  0.5 1.  1.5]
x // 5 =  [0 0 1 1]


In [17]:
# negation
print("-x= ", -x)

-x=  [ 0 -1 -2 -3]


In [18]:
print("x ** 2 = ", x ** 2)
print("x % 2 = ", x % 2)

x ** 2 =  [0 1 4 9]
x % 2 =  [0 1 0 1]


In [19]:
-(0.5*x + 1) ** 2

array([-1.  , -2.25, -4.  , -6.25])

In [20]:
np.add(x, 2)

array([2, 3, 4, 5])

The following table lists the arithmetic operators implemented in NumPy:

| Operator	    | Equivalent ufunc    | Description                           |
|---------------|---------------------|---------------------------------------|
|``+``          |``np.add``           |Addition (e.g., ``1 + 1 = 2``)         |
|``-``          |``np.subtract``      |Subtraction (e.g., ``3 - 2 = 1``)      |
|``-``          |``np.negative``      |Unary negation (e.g., ``-2``)          |
|``*``          |``np.multiply``      |Multiplication (e.g., ``2 * 3 = 6``)   |
|``/``          |``np.divide``        |Division (e.g., ``3 / 2 = 1.5``)       |
|``//``         |``np.floor_divide``  |Floor division (e.g., ``3 // 2 = 1``)  |
|``**``         |``np.power``         |Exponentiation (e.g., ``2 ** 3 = 8``)  |
|``%``          |``np.mod``           |Modulus/remainder (e.g., ``9 % 4 = 1``)|

Additionally there are Boolean/bitwise operators; we will explore these in [Comparisons, Masks, and Boolean Logic](02.06-Boolean-Arrays-and-Masks.ipynb).

In [21]:
x = np.array([-2, -1, 0, 1, 2])
abs(x)

array([2, 1, 0, 1, 2])

In [22]:
np.absolute(x)

array([2, 1, 0, 1, 2])

In [23]:
np.abs(x)

array([2, 1, 0, 1, 2])

In [24]:
x = np.array([3 -4j, 4 - 3j, 2 + 0j, 0 + 1j]) #imaginary values
np.abs(x)

array([5., 5., 2., 1.])

# Trigonometry

In [25]:
theta = np.linspace(0, np.pi, 3)

In [26]:
theta

array([0.        , 1.57079633, 3.14159265])

In [27]:
np.sin(theta)

array([0.0000000e+00, 1.0000000e+00, 1.2246468e-16])

In [28]:
np.cos(theta)

array([ 1.000000e+00,  6.123234e-17, -1.000000e+00])

In [29]:
np.tan(theta)

array([ 0.00000000e+00,  1.63312394e+16, -1.22464680e-16])

In [30]:
x = [-1, 0, 1]
print(x)
print ( " cot = " ,  np.arctan(x))

[-1, 0, 1]
 cot =  [-0.78539816  0.          0.78539816]


In [31]:
print ( " sec = " ,  np.arccos(x))

 sec =  [3.14159265 1.57079633 0.        ]


In [32]:
print ( " cosec = " ,  np.arcsin(x))

 cosec =  [-1.57079633  0.          1.57079633]


## Exponents and Logarithms

In [33]:
x = [1, 2, 3]

In [34]:
x

[1, 2, 3]

In [35]:
print("e^x = ",  np.exp(x))

e^x =  [ 2.71828183  7.3890561  20.08553692]


In [36]:
print("2^x = ",  np.exp2(x))

2^x =  [2. 4. 8.]


In [37]:
print("3^x = ",  np.power(3, x))

3^x =  [ 3  9 27]


In [44]:
x = [1, 2, 4, 16]

In [46]:
np.log(x)


array([0.        , 0.69314718, 1.38629436, 2.77258872])

In [47]:
np.log2(x)

array([0., 1., 2., 4.])

In [48]:
np.log10(x)

array([0.        , 0.30103   , 0.60205999, 1.20411998])

> Specialized version for precise small input

In [49]:
x = [0, 0.001, 0.01, 0.1]
print("exp(x) - 1", np.expm1(x))
print("log(1+x) = ", np.log1p(x))

exp(x) - 1 [0.         0.0010005  0.01005017 0.10517092]
log(1+x) =  [0.         0.0009995  0.00995033 0.09531018]


## specialized ufuncs

### **scipy** for statistics

In [51]:
from scipy import special

In [52]:
x = [1, 5, 10]

In [53]:
special.gamma(x)

array([1.0000e+00, 2.4000e+01, 3.6288e+05])

In [54]:
special.gammaln(x)

array([ 0.        ,  3.17805383, 12.80182748])

In [55]:
special.beta(x, 2)

array([0.5       , 0.03333333, 0.00909091])

In [56]:
x = np.array([0, 0.3, 0.7, 1.0])

In [57]:
special.erf(x)
special.erfc(x)
special.erfinv(x)

array([0.        , 0.27246271, 0.73286908,        inf])

## Advanced Ufunc Features

Many NumPy users make use of ufuncs without ever learning their full set of features.
We'll outline a few specialized features of ufuncs here.

### Specifying output

For large calculations, it is sometimes useful to be able to specify the array where the result of the calculation will be stored.
Rather than creating a temporary array, this can be used to write computation results directly to the memory location where you'd like them to be.
For all ufuncs, this can be done using the ``out`` argument of the function:

In [58]:
x = np.arange(5)
y = np.empty(len(x))
np.multiply(x, 10, out = y)
print(y)

[ 0. 10. 20. 30. 40.]


In [59]:
y = np.zeros(10)
np.power(2, x, out = y[::2])
print(y)

[ 1.  0.  2.  0.  4.  0.  8.  0. 16.  0.]


In [60]:
x = np.arange(1, 6)
np.add.reduce(x)

15

In [61]:
np.multiply.reduce(x)

120

In [62]:
np.add.accumulate(x)

array([ 1,  3,  6, 10, 15])

In [63]:
np.multiply.accumulate(x)

array([  1,   2,   6,  24, 120])

In [64]:
np.sum(x)

15

In [65]:
np.prod(x)

120

In [66]:
np.cumsum(x)

array([ 1,  3,  6, 10, 15])

In [67]:
np.cumprod(x)

array([  1,   2,   6,  24, 120])

In [70]:
x = np.arange(1, 6)
x

array([1, 2, 3, 4, 5])

In [71]:

np.multiply.outer(x, x)

array([[ 1,  2,  3,  4,  5],
       [ 2,  4,  6,  8, 10],
       [ 3,  6,  9, 12, 15],
       [ 4,  8, 12, 16, 20],
       [ 5, 10, 15, 20, 25]])