# Numpy

When we think of data manipulation in python,the first thing we use is `numpy`. Any package/library built for data science especially for data manipulation are built upon numpy. 

Numpy is used widely for it's array datatypes. You can ask your self,we already have data structures in python (lists and dictionaries) then why need of arrays?

It is true that the built-in python lists can do the same things what numpy arrays can do but numpy arrays are more efficient and consume less memory. These numpy arrays are homogenous in nature compared to python lists.

This is just an introduction notebook. For full functions provided by numpy are found [here](https://docs.scipy.org/doc/)

In [4]:
# importing packages
from IPython.display import Image
import numpy as np

The numpy arrays are called ndarrays(n-dimentional arrays). An n-dimension array is generally used for creating a matrix or tensors, again mainly for the mathematical calculation purpose. Compare to python list base n-dimension arrays, NumPy not only saves the memory usage, it provide a significant number of additional benefits which makes it easy to mathematical calculations.

In [11]:
np_array = np.array([1,2,3,4,5])


np_md_arr = np.array ( [[[
                        [1, 2, 3, 4, 5],
                        [6, 7, 8, 9, 10]
                    ] ]])
print("shape of  np_array is "+ str(np_array.shape))
print("Shape of the np_md_array is "+str(np_md_arr.shape))


shape of  np_array is (5,)
Shape of the np_md_array is (1, 1, 2, 5)


The numpy arrays have some properties which we can use. One of them is `shape` which provides the matrix/array size of it. Any array which has more than one dimention are matrices.

In the above example,for the first array,the shape is (5,) which means that it's a one-dimentional array. In many of the cases,this creates a problem 
Eg: The transpose of the first array should be (1,5) but it will be the same. Which will create a mathematical error in our programs. 

To avoid it,we can **convert it into a 2-dimentional array.** For this,we have a property known as `reshape`


In [6]:
np_array_reshaped = np_array.reshape(1,5)
print("Shape of the np_array_reshaped is "+str(np_array_reshaped.shape)+"\n")
print("Transpose of the np_array is "+str(np_array.T)+"\n")
print("Transpose of the np_array_reshaped is \n"+str(np_array_reshaped.T)+"\n")

Shape of the np_array_reshaped is (1, 5)

Transpose of the np_array is [1 2 3 4 5]

Transpose of the np_array_reshaped is 
[[1]
 [2]
 [3]
 [4]
 [5]]



The other useful properties are as follows(the functionality is explained in the comments 

In [7]:
# To know the data type of the array
np_array_reshaped.dtype

dtype('int64')

In [8]:
# To know the total dimensions in the array
np_array_reshaped.ndim

2

In [9]:
# To know total number of elements
np_array_reshaped.size

5

### Note:
    While creating an array,make sure to enter the data as a list. Each list indicate a dimension. 

In [31]:
# To create a sequence of numbers
np.arange(1,10,2) # The inputs are given in this order - start,stop,step


array([1, 3, 5, 7, 9])

### ufunc
 The functions which **operate element wise** are known as `universal(ufunc)` functions. For example, the arithmetic operations performed in nummpy are element-wise. The arithmetic ufuncs are just basically same arithmetic operators we use in python but the interpreter will be invoking numpy's operators.
 
The `+` sign is equivalent to `np.sum()` operator.

In [12]:
a = np.arange(1,4)
b = np.arange(5,8)

print("First array is :"+str(a))
print("Second array is :"+str(b))

#Addition 
print("Addition Op: "+str(a+b))

#Subtraction
print("Subtraction of array b with array a: "+str(b-a))

# MUltiplication
print("Multiplication: "+str(a*b))

#Division
print("Division: "+str(b/a))

First array is :[1 2 3]
Second array is :[5 6 7]
Addition Op: [ 6  8 10]
Subtraction of array b with array a: [4 4 4]
Multiplication: [ 5 12 21]
Division: [5.         3.         2.33333333]


The other ufunc operattors are `absolute`,`trignometric functions`(sin,cos etc),`exponents` and `logarithmic `functions.

### Broadcasting Variables

This is one of the properties of ufuncs. Let us see the below example.

In [13]:
a = np.array([1,2,3,4])
b = 10
print(a+b)

[11 12 13 14]


As we can see, the integer 10 is added to all the elements in the array a.

In the background,what is happenning is the variable b is converted into an array of size of an array "a" which contains the copies of the value b. After this step, the summation looks something like this

[10,10,10,10] + [1,2,3,4] = [11,12,13,14]

This is the output we got in the above example.


#### Broadcasting Rules:
1. The first rule of broadcasting is that if all input arrays do not have the same number of dimensions, a “1” will be repeatedly prepended to the shapes of the smaller arrays until all the arrays have the same number of dimensions.

2. The second rule of broadcasting ensures that arrays with a size of 1 along a particular dimension act as if they had the size of the array with the largest shape along that dimension. The value of the array element is assumed to be the same along that dimension for the “broadcast” array.

For in-depth understanding and rules refer [here](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)


### Array Indexing and Slicing

All the numpy arrays have the same indexing and slicing properties as python lists. 
#### One-dimensional Array

In [14]:
#creating a random numpy array
a = np.arange(10)
print("The whole array: "+str(a))
# Indexing
# The index of all the elements starts from 0 and ends at n-1 where n is total number of elements
print("The 3rd element of the array is: "+str(a[2]))

# Slicing 
print("Elements from index 2 to 5 is: "+str(a[2:5]))

The whole array: [0 1 2 3 4 5 6 7 8 9]
The 3rd element of the array is: 2
Elements from index 2 to 5 is: [2 3 4]


**Note** : The format for slicing is **array_name[start_index: stop index : step value]** and the resulting array will have elements from start_index to stop_index-1. In other words, the element of stop_index is excluded.
#### Multi - dimensional Array

In [15]:
#creating a multi-dim array
a = np.arange(9).reshape(3,3)
print("The whole array:\n" + str(a))

# Indexing 
# These arrays have one index per axis/dimension. In our case, 2 indices.
print("The first element of first row is: "+str(a[0,0]))
print("The second element of second row is: "+str(a[1,1]))

#Slicing
print("The first column of the array is: "+str(a[:,0]))
print("The first row of the array is: "+str(a[0,:]))

The whole array:
[[0 1 2]
 [3 4 5]
 [6 7 8]]
The first element of first row is: 0
The second element of second row is: 4
The first column of the array is: [0 3 6]
The first row of the array is: [0 1 2]


**Note** : The slicing is a best alternative for the for loop and it's quicker than python's naive for loop.

### Array Shape Manipulation

In [17]:
a = np.arange(12).reshape(3,4)
print("The whole array:\n" + str(a))

# Flatten the array
print("Falttened version of array is: "+str(a.ravel()))

# Reshaping the array
print("Reshaped array: "+str(a.reshape(6,3)))

# Transpose 
print("Transpose of the array is: "+str(a.T))

The whole array:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Falttened version of array is: [ 0  1  2  3  4  5  6  7  8  9 10 11]


ValueError: cannot reshape array of size 12 into shape (6,3)

**Note** : 
1. Transpose is a matrix function which interchanges the elements in such a way that all row elements become column elements and vice versa
2. The reshape function makes a copy of the desired shape but doesn't alter the original matrix. To alter the main matrix,use `resize()` function

### Aggregation functions
The SQL users are familiar with the Aggregate functions. These are functions take whole arrays and are grouped together to form a single summary value. 
Eg. Sum,min/max,mean,std( standard deviation ) etc. These are mainly useful during data transformations.

In [19]:
a = np.random.random((3, 3))*10 # creates array size of (3,3) filled with random variables
print("Array: "+str(a))

#Summation of all values 
print("Summation: "+str(a.sum()))

# Mean
print("Mean of all Values: "+str(a.mean()))

# Min and Max 
print("Min and max values in the array are: "+str(a.min())+","+str(a.max()))

# We can retrieve values from all rows/columns by mentioning the axis value
a.min(axis=0) # Minimum values in each column

Array: [[6.12140872 8.57578767 8.83554113]
 [3.69835838 6.49932328 0.88526444]
 [9.78602033 7.85461281 0.08315821]]
Summation: 52.339474971452105
Mean of all Values: 5.815497219050234
Min and max values in the array are: 0.08315820989519773,9.78602032866599


array([3.69835838, 6.49932328, 0.08315821])

**Note** : 

The way the axis is specified here can be confusing to users coming from other languages. The axis keyword specifies the dimension of the array that will be collapsed, rather than the dimension that will be returned. So specifying axis=0 means that the first axis will be collapsed: for two-dimensional arrays, this means that values within each column will be aggregated.

#### Some of the useful Agg. Functions
