## Numpy

NumPy is a general-purpose array-processing package. It provides a high-performance multidimensional array object, and tools for working with these arrays. It is the fundamental package for scientific computing with Python

### What is an array

An array is a data structure that stores values of same data type. In python, this is the main difference between arrays and lists. While python lists can contain values corresponding to different data types, arrays in python can only contain values corresponding to same data type.

- If you want to install numpy library seperately

pip install numpy

- If you have installed anaconda environment and numpy is not installed then

conda install numpy

In [3]:
## Initially sets import numpy

import numpy as np

In [4]:
my_lst=[1,2,3,4,5]

arr=np.array(my_lst)

In [5]:
type(arr)

numpy.ndarray

In [6]:
print(arr)

[1 2 3 4 5]


In [8]:
arr

array([1, 2, 3, 4, 5])

In [9]:
## .shape tells you number of rows and columns
## In case of 1D array it will just you number of elements present in an array

arr.shape

(5,)

In [10]:
type(arr)

numpy.ndarray

In [14]:
## Multidimentional array
my_lst1 = [1,2,3,4,5]
my_lst2 = [2,3,4,5,6]
my_lst3 = [3,4,5,6,7]

arr = np.array([my_lst1,my_lst2,my_lst3])

In [15]:
arr

array([[1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [3, 4, 5, 6, 7]])

In [16]:
arr.shape

(3, 5)

In [19]:
arr.reshape(5,3) 

## reshape returns an array containing the same data with a new shape
## 5x3 = 15 which is the same count - so make sure you always have the same count

array([[1, 2, 3],
       [4, 5, 2],
       [3, 4, 5],
       [6, 3, 4],
       [5, 6, 7]])

In [21]:
arr.reshape(1,15)

array([[1, 2, 3, 4, 5, 2, 3, 4, 5, 6, 3, 4, 5, 6, 7]])

In [22]:
## check the shape of the array

arr.shape

(3, 5)

### Indexing

In [23]:
## Accessing the array elements

arr=np.array([1,2,3,4,5,6,7,8,9])

In [24]:
arr[3]

4

In [26]:
my_lst1 = [1,2,3,4,5]
my_lst2 = [2,3,4,5,6]
my_lst3 = [3,4,5,6,7]

arr = np.array([my_lst1,my_lst2,my_lst3])

In [27]:
arr

array([[1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [3, 4, 5, 6, 7]])

In [29]:
arr[:,:]

## In the left side : means it is retrieving all the row indices
## In the right side : means it is retrieving all the column indices

array([[1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [3, 4, 5, 6, 7]])

In [31]:
## If you need 0th and 1st rows and 0th and 1st column as well

arr[0:2,0:2]

array([[1, 2],
       [2, 3]])

In [32]:
arr[1:3,3:5]

array([[5, 6],
       [6, 7]])

In [33]:
## OR

arr[1:,3:]

array([[5, 6],
       [6, 7]])

In [37]:
arr=np.arange(0,10,step=2)

In [38]:
arr

array([0, 2, 4, 6, 8])

In [46]:
arr=np.arange(0,10)

In [47]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [41]:
np.linspace(1,10,50)

## 50 is the number of point - rqually spaced points

array([ 1.        ,  1.18367347,  1.36734694,  1.55102041,  1.73469388,
        1.91836735,  2.10204082,  2.28571429,  2.46938776,  2.65306122,
        2.83673469,  3.02040816,  3.20408163,  3.3877551 ,  3.57142857,
        3.75510204,  3.93877551,  4.12244898,  4.30612245,  4.48979592,
        4.67346939,  4.85714286,  5.04081633,  5.2244898 ,  5.40816327,
        5.59183673,  5.7755102 ,  5.95918367,  6.14285714,  6.32653061,
        6.51020408,  6.69387755,  6.87755102,  7.06122449,  7.24489796,
        7.42857143,  7.6122449 ,  7.79591837,  7.97959184,  8.16326531,
        8.34693878,  8.53061224,  8.71428571,  8.89795918,  9.08163265,
        9.26530612,  9.44897959,  9.63265306,  9.81632653, 10.        ])

In [49]:
## copy() function and broadcasting
arr[3:]=100

In [50]:
arr

array([  0,   1,   2, 100, 100, 100, 100, 100, 100, 100])

In [51]:
arr1=arr

In [52]:
arr[3:]=500
print(arr1)

[  0   1   2 500 500 500 500 500 500 500]


In [55]:
arr

## This replacement of 500 has updated arr also - called as reference type
## In case of reference type we are sharing the same memory
## any operation you do in one variable will also impact the memory of the other variable
## Suppose I have a single memory call where two variables are stored any update in one variable will replicate that particular value in that particular memory itself 
## In order to pevent this we have copy function

array([  0,   1,   2, 500, 500, 500, 500, 500, 500, 500])

In [56]:
## 
arr1 = arr.copy()

In [57]:
print(arr)
arr1[3:]=1000
print(arr1)

[  0   1   2 500 500 500 500 500 500 500]
[   0    1    2 1000 1000 1000 1000 1000 1000 1000]


In [58]:
arr

array([  0,   1,   2, 500, 500, 500, 500, 500, 500, 500])

In [62]:
## Some conditions very useful in Exploratory Data Analysis

val=2 #defined a variable called val

arr<2

array([ True,  True, False, False, False, False, False, False, False,
       False])

In [63]:
arr*2

array([   0,    2,    4, 1000, 1000, 1000, 1000, 1000, 1000, 1000])

In [64]:
arr/2

array([  0. ,   0.5,   1. , 250. , 250. , 250. , 250. , 250. , 250. ,
       250. ])

In [65]:
arr%2

array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [67]:
## if you need the exact values in case of less than

arr[arr<300]

array([0, 1, 2])

In [68]:
## create arrays and reshape

np.arange(0,10).reshape(5,2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [69]:
arr1=np.arange(0,10).reshape(2,5)

In [70]:
arr2=np.arange(0,10).reshape(2,5)

In [71]:
arr1*arr2

array([[ 0,  1,  4,  9, 16],
       [25, 36, 49, 64, 81]])

In [73]:
## np.ones basically creates an array where all the elements are replaced by one
## so here it is basically replacing 4 values by one and . is mentioned becoz by default its dtype is float
np.ones(4)

array([1., 1., 1., 1.])

In [74]:
np.ones((2,5),dtype=int)

array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

In [75]:
## random distribution
## .random.rand select some random value between (0 to 1) of the given shape 
np.random.rand(3,3)

array([[0.92655485, 0.70628332, 0.23894954],
       [0.10854776, 0.89105365, 0.87470067],
       [0.38425072, 0.42429581, 0.94489932]])

In [76]:
## randn is for standard random distribution
arr_ex=np.random.randn(4,4)

In [77]:
arr_ex

array([[-1.07978791, -0.31659215,  0.92549796, -0.3953519 ],
       [-0.41341047, -0.26455165,  2.10446756, -0.16075586],
       [ 0.40191153,  0.53491881, -1.62347068, -1.18651519],
       [-0.77545273, -0.62052577, -0.08055282, -0.66692814]])

In [78]:
import seaborn as sns
import pandas as pd

In [79]:
sns.distplot(pd.DataFrame(arr_ex.reshape(16,1))

SyntaxError: unexpected EOF while parsing (<ipython-input-79-8d86c965c21d>, line 1)

In [80]:
<matplotlib.axes._subplots.AxesSubplot at 0x230e31c3518>

SyntaxError: invalid syntax (<ipython-input-80-43701766208a>, line 1)

In [82]:
np.random.randint(0,100,8) ##everytime it will change

array([ 5, 29, 98, 18, 21, 85, 75, 14])

In [84]:
np.random.randint(0,100,8).reshape(2,4)

array([[15, 21, 57, 74],
       [ 9, 78, 59,  5]])

In [83]:
np.random.random_sample((1,5))

array([[0.10492135, 0.24700291, 0.28127576, 0.8153099 , 0.5417457 ]])