<div>
<a href="#" style="font-size:48px;font-weight:600;margin-bottom:20px;"> Numpy, Scipy and Pandas for Dummy Data Scientists </a>
<div style="padding-top: 20px;"><p style="font-size:24px;">Numpy, Scipy and pandas library basics with code example and brief description</p></div>
</div>

<div>
<a href="#" style="font-size:48px;font-weight:600;margin-bottom:20px;"> Numpy </a>
<div style="padding-top: 20px;"><p style="font-size:24px;">Numpy library basics with code example</p></div>
</div>


## **N-Dimensional array**
> Arrays allows you to perform mathematical operations on whole blocks of data.

In [1]:
# easiest way to create an array is by using an array function
import numpy as np # I am importing numpy as np

scores = [89,56.34, 76,89, 98]
first_arr =np.array(scores)
print(first_arr)
print(first_arr.dtype)  # .dtype return the data type of the array object

[89.   56.34 76.   89.   98.  ]
float64


In [2]:
# Nested lists with equal length, will be converted into a multidimensional array
scores_1 = [[34,56,23,89], [11,45,76,34]]
second_arr = np.array(scores_1)
print(second_arr)
print(second_arr.ndim)  #.ndim gives you the dimensions of an array.
print(second_arr.shape) #(number of rows, number of columns)
print(second_arr.dtype) 

[[34 56 23 89]
 [11 45 76 34]]
2
(2, 4)
int32


In [3]:
x = np.zeros(10) # returns a array of zeros, the same applies for np.ones(10)
x

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [4]:
np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [5]:
np.zeros((4,3)) # you can also mention the shape of the array

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [6]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [7]:
np.eye(6)

array([[1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1.]])

In [8]:
#Batch operations on data can be performed without using for loops, this is called vectorization
scores = [89,56.34, 76,89, 98]
first_arr =np.array(scores)
print(first_arr)
print(first_arr * first_arr)
print(first_arr - first_arr)
print(1/(first_arr))
print(first_arr ** 0.5)

[89.   56.34 76.   89.   98.  ]
[7921.     3174.1956 5776.     7921.     9604.    ]
[0. 0. 0. 0. 0.]
[0.01123596 0.01774938 0.01315789 0.01123596 0.01020408]
[9.43398113 7.5059976  8.71779789 9.43398113 9.89949494]


## **Indexing and Slicing**

In [9]:
# you may want to select a subset of your data, for which Numpy array indexing is really useful
new_arr = np.arange(12)
print(new_arr)
print(new_arr[5])
print(new_arr[4:9])
new_arr[4:9] = 99 #assign sequence of values from 4 to 9 as 99
print(new_arr)

[ 0  1  2  3  4  5  6  7  8  9 10 11]
5
[4 5 6 7 8]
[ 0  1  2  3 99 99 99 99 99  9 10 11]


In [10]:
# A major diffence between lists and array is that, array slices are views on the original array. This means that
# the data is not copied, and any modifications to the view will be reflected in the source
#  array. 
modi_arr = new_arr[4:9] 
modi_arr[1] = 123456
print(new_arr)                  # you can see the changes are refelected in main array. 
modi_arr[:]                  # the sliced variable      

[     0      1      2      3     99 123456     99     99     99      9
     10     11]


array([    99, 123456,     99,     99,     99])

In [11]:
# arrays can be treated like matrices
matrix_arr =np.array([[3,4,5],[6,7,8],[9,5,1]])
print(matrix_arr)
print(matrix_arr[1])
print(matrix_arr[0][2]) #first row and third column
print(matrix_arr[0,2]) # This is same as the above operation

# from IPython.display import Image  # importing a image from my computer.
# i = Image(filename='Capture.png')
# i # Blue print of a matrix 

[[3 4 5]
 [6 7 8]
 [9 5 1]]
[6 7 8]
5
5


In [12]:
# 3d arrays -> this is a 2x2x3 array
three_d_arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(three_d_arr.shape)
print("returns the second list inside first list {}".format(three_d_arr[0,1]))

(2, 2, 3)
returns the second list inside first list [4 5 6]


In [13]:
three_d_arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(three_d_arr[0])
#if you omit later indices, the returned object will be a lowerdimensional
# ndarray consisting of all the data along the higher dimensions


[[1 2 3]
 [4 5 6]]


In [14]:
copied_values = three_d_arr[0].copy() # copy arr[0] value to copied_values
three_d_arr[0]= 99  # change all values of arr[0] to 99 
print("New value of three_d_arr: {}".format(three_d_arr))  # check the new value of three_d_arr 
three_d_arr[0] = copied_values # assign copied values back to three_d_arr[0]
print(" three_d_arr again: {}".format(three_d_arr))

New value of three_d_arr: [[[99 99 99]
  [99 99 99]]

 [[ 7  8  9]
  [10 11 12]]]
 three_d_arr again: [[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]


In [15]:
matrix_arr =np.array([[3,4,5],[6,7,8],[9,5,1]])
print("The original matrix {}:".format(matrix_arr))
print("slices the first two rows:{}".format(matrix_arr[:2])) # similar to list slicing. returns first two rows of the array
print("Slices the first two rows and two columns:{}".format(matrix_arr[:2, 1:]))
print("returns 6 and 7: {}".format(matrix_arr[1,:2]))
print("Returns first column: {}".format(matrix_arr[:,:1])) #Note that a colon by itself means to take the entire axis

The original matrix [[3 4 5]
 [6 7 8]
 [9 5 1]]:
slices the first two rows:[[3 4 5]
 [6 7 8]]
Slices the first two rows and two columns:[[4 5]
 [7 8]]
returns 6 and 7: [6 7]
Returns first column: [[3]
 [6]
 [9]]


In [16]:
# from IPython.display import Image  # importing a image from my computer.
# j = Image(filename='Expre.png')
# j # diagrammatic explanation of matrix array slicing works.

In [17]:
#Import random module from Numpy 
personals = np.array(['Manu', 'Jeevan', 'Prakash', 'Manu', 'Prakash', 'Jeevan', 'Prakash'])
print(personals == 'Manu') #checks for the string 'Manu' in personals. If present it returns true; else false#

[ True False False  True False False False]


In [18]:
from numpy import random 
random_no = random.randn(7,4)
print(random_no)
random_no[personals =='Manu'] #The function returns the rows for which the value of manu is true
# Check the image displayed in the cell below. 

[[-0.47912398  0.78909359  1.05320659 -0.06970916]
 [-1.0555618  -0.04461679 -1.44916822 -1.0177723 ]
 [-0.55034265  0.24114568 -0.12483894  0.03190038]
 [-0.21461512  0.9498703  -0.83977764 -0.2866592 ]
 [ 0.45492905  0.23178137 -0.72280291  0.62478203]
 [ 0.73501049 -0.1962241   0.29679043 -0.11209306]
 [ 0.03654028  1.20224781 -0.52836146  1.48254676]]


array([[-0.47912398,  0.78909359,  1.05320659, -0.06970916],
       [-0.21461512,  0.9498703 , -0.83977764, -0.2866592 ]])

In [19]:
# from IPython.display import Image
# k = Image(filename='Matrix.png')
# k

In [20]:
random_no[personals == 'Manu', 2:] #First two columns and first two rows.

array([[ 1.05320659, -0.06970916],
       [-0.83977764, -0.2866592 ]])

In [21]:
# To select everything except 'Manu', you can != or negate the condition using -:
print(personals != 'Manu')
random_no[~(personals == 'Manu')] #get everything except 1st and 4th rows

[False  True  True False  True  True  True]


array([[-1.0555618 , -0.04461679, -1.44916822, -1.0177723 ],
       [-0.55034265,  0.24114568, -0.12483894,  0.03190038],
       [ 0.45492905,  0.23178137, -0.72280291,  0.62478203],
       [ 0.73501049, -0.1962241 ,  0.29679043, -0.11209306],
       [ 0.03654028,  1.20224781, -0.52836146,  1.48254676]])

In [22]:
# you can use boolean operator &(and), |(or)
new_variable = (personals == 'Manu') | (personals == 'Jeevan')
print(new_variable)
random_no[new_variable] 

[ True  True False  True False  True False]


array([[-0.47912398,  0.78909359,  1.05320659, -0.06970916],
       [-1.0555618 , -0.04461679, -1.44916822, -1.0177723 ],
       [-0.21461512,  0.9498703 , -0.83977764, -0.2866592 ],
       [ 0.73501049, -0.1962241 ,  0.29679043, -0.11209306]])

In [23]:
random_no[random_no < 0] =0 
random_no # This will set all negative values to zero

array([[0.        , 0.78909359, 1.05320659, 0.        ],
       [0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.24114568, 0.        , 0.03190038],
       [0.        , 0.9498703 , 0.        , 0.        ],
       [0.45492905, 0.23178137, 0.        , 0.62478203],
       [0.73501049, 0.        , 0.29679043, 0.        ],
       [0.03654028, 1.20224781, 0.        , 1.48254676]])

In [24]:
random_no[ personals != 'Manu'] = 9 # This will set all rows except 1 and 4 to 9. 
random_no

array([[0.        , 0.78909359, 1.05320659, 0.        ],
       [9.        , 9.        , 9.        , 9.        ],
       [9.        , 9.        , 9.        , 9.        ],
       [0.        , 0.9498703 , 0.        , 0.        ],
       [9.        , 9.        , 9.        , 9.        ],
       [9.        , 9.        , 9.        , 9.        ],
       [9.        , 9.        , 9.        , 9.        ]])

## **Fancy Indexing(Indexing using integer arrays)**
> Fancy indexing copies data into a new array

In [25]:
from numpy import random
algebra = random.randn(7,4) # empty will return a matrix of size 7,4
for j in range(7):
    algebra[j] = j
algebra

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.]])

In [26]:
# To select a subset of rows in particular order, you can simply pass a list.
algebra[[4,5,1]] #returns a subset of rows

array([[4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [1., 1., 1., 1.]])

In [27]:
fancy = np.arange(36).reshape(9,4) #reshape is to reshape an array
print(fancy)
fancy[[1,4,3,2],[3,2,1,0]] #the position of the output array are[(1,3),(4,2),(3,1),(2,0)]

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]
 [24 25 26 27]
 [28 29 30 31]
 [32 33 34 35]]


array([ 7, 18, 13,  8])

In [28]:
fancy[[1, 4, 8, 2]][:, [0, 3, 1, 2]] # entire first row is selected, but the elements are interchanged, same goes for 4th, 8th and 2 nd row.

array([[ 4,  7,  5,  6],
       [16, 19, 17, 18],
       [32, 35, 33, 34],
       [ 8, 11,  9, 10]])

In [29]:
# another way to do the above operation is by using np.ix_ function.
fancy[np.ix_([1,4,8,2],[0,3,1,2])]

array([[ 4,  7,  5,  6],
       [16, 19, 17, 18],
       [32, 35, 33, 34],
       [ 8, 11,  9, 10]])

## **Transposing Arrays**

In [30]:
transpose= np.arange(12).reshape(3,4) 
transpose.T # the shape has changed to 4,3

array([[ 0,  4,  8],
       [ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11]])

In [31]:
#you can use np.dot function to perform matrix computations. You can calculate X transpose X as follows:
np.dot(transpose.T, transpose)

array([[ 80,  92, 104, 116],
       [ 92, 107, 122, 137],
       [104, 122, 140, 158],
       [116, 137, 158, 179]])

## **Universal functions**
> They perform element wise operations on data in arrays.

In [32]:
funky =np.arange(8)
print(np.sqrt(funky))
print(np.exp(funky)) #exponent of the array
# these are called as unary functions

[0.         1.         1.41421356 1.73205081 2.         2.23606798
 2.44948974 2.64575131]
[1.00000000e+00 2.71828183e+00 7.38905610e+00 2.00855369e+01
 5.45981500e+01 1.48413159e+02 4.03428793e+02 1.09663316e+03]


In [33]:
# Binary functions take two value, Others such as maximum, add
x = random.randn(10)
y = random.randn(10)
print(x)
print(y)
print(np.maximum(x,y))# element wise operation
print(np.modf(x))# function modf returns the fractional and integral parts of a floating point arrays

[ 1.07335428 -1.75662247 -0.41171222  0.05046567 -0.45591799 -0.58046005
 -0.34601982 -0.562953    0.33814007  0.8336363 ]
[ 0.01271147 -2.32596125 -0.89415833  0.27339431 -0.92478253  2.12100018
  0.85991929  2.12538965  2.25901439  1.19005328]
[ 1.07335428 -1.75662247 -0.41171222  0.27339431 -0.45591799  2.12100018
  0.85991929  2.12538965  2.25901439  1.19005328]
(array([ 0.07335428, -0.75662247, -0.41171222,  0.05046567, -0.45591799,
       -0.58046005, -0.34601982, -0.562953  ,  0.33814007,  0.8336363 ]), array([ 1., -1., -0.,  0., -0., -0., -0., -0.,  0.,  0.]))


In [34]:
# List of unary functions avaliable
from IPython.display import Image  
l = Image(filename='../input/unary.png')
l

FileNotFoundError: [Errno 2] No such file or directory: '../input/unary.png'

In [35]:
#List of binary functions available
from IPython.display import Image  
l = Image(filename='../input/binary.png')
l
#logical operators , and  greater, greater_equal,less, less_equal, equal, not_equal operations can also be performed

FileNotFoundError: [Errno 2] No such file or directory: '../input/binary.png'

# **Data processing using Arrays**

In [36]:
import numpy as np
matrices =np.arange(-5,5,1)
x, y = np.meshgrid(matrices, matrices) #mesh grid function takes two 1 d arrays and produces two 2d arrays
print("Default matrices: ", matrices)
print("Matrix values of y: {}".format(y))
print("Matrix values of x: {}".format(x))

Default matrices:  [-5 -4 -3 -2 -1  0  1  2  3  4]
Matrix values of y: [[-5 -5 -5 -5 -5 -5 -5 -5 -5 -5]
 [-4 -4 -4 -4 -4 -4 -4 -4 -4 -4]
 [-3 -3 -3 -3 -3 -3 -3 -3 -3 -3]
 [-2 -2 -2 -2 -2 -2 -2 -2 -2 -2]
 [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
 [ 0  0  0  0  0  0  0  0  0  0]
 [ 1  1  1  1  1  1  1  1  1  1]
 [ 2  2  2  2  2  2  2  2  2  2]
 [ 3  3  3  3  3  3  3  3  3  3]
 [ 4  4  4  4  4  4  4  4  4  4]]
Matrix values of x: [[-5 -4 -3 -2 -1  0  1  2  3  4]
 [-5 -4 -3 -2 -1  0  1  2  3  4]
 [-5 -4 -3 -2 -1  0  1  2  3  4]
 [-5 -4 -3 -2 -1  0  1  2  3  4]
 [-5 -4 -3 -2 -1  0  1  2  3  4]
 [-5 -4 -3 -2 -1  0  1  2  3  4]
 [-5 -4 -3 -2 -1  0  1  2  3  4]
 [-5 -4 -3 -2 -1  0  1  2  3  4]
 [-5 -4 -3 -2 -1  0  1  2  3  4]
 [-5 -4 -3 -2 -1  0  1  2  3  4]]


In [37]:
x1= np.array([1,2,3,4,5])
y1 = np.array([6,7,8,9,10])
cond =[True, False, True, True, False]
#If you want to take a value from x1 whenever the corresponding value in cond is true, otherwise take value from y.
z1 = [(x,y,z) for x,y,z in zip(x1, y1, cond)] # I have used zip function To illustrate the concept
print(z1)
np.where(cond, x1, y1) 

[(1, 6, True), (2, 7, False), (3, 8, True), (4, 9, True), (5, 10, False)]


array([ 1,  7,  3,  4, 10])

In [38]:
ra = np.random.randn(5,5)
# If you want to replace negative values in ra with -1 and positive values with 1. You can do it using where function
print(ra)
print(np.where(ra>0, 1, -1)) # If values in ra are greater than zero, replace it with 1, else replace it with -1.
# to set only positive values
np.where(ra >0, 1, ra) # same implies to negative values

[[-0.84786653 -0.9967568  -1.2567254  -1.84470591 -0.17406084]
 [ 1.45126524 -0.32754644  0.41546344  0.62739536 -1.1638048 ]
 [ 0.42029263  0.9768389   0.59310259  0.21162371 -0.07436867]
 [-2.52824004 -1.38384255 -1.25961069 -0.07756106  0.43245795]
 [ 0.41330206 -1.01241854 -0.33988601  0.38381539 -1.05033975]]
[[-1 -1 -1 -1 -1]
 [ 1 -1  1  1 -1]
 [ 1  1  1  1 -1]
 [-1 -1 -1 -1  1]
 [ 1 -1 -1  1 -1]]


array([[-0.84786653, -0.9967568 , -1.2567254 , -1.84470591, -0.17406084],
       [ 1.        , -0.32754644,  1.        ,  1.        , -1.1638048 ],
       [ 1.        ,  1.        ,  1.        ,  1.        , -0.07436867],
       [-2.52824004, -1.38384255, -1.25961069, -0.07756106,  1.        ],
       [ 1.        , -1.01241854, -0.33988601,  1.        , -1.05033975]])

## **Statistical methods**

In [39]:
thie = np.random.randn(5,5)
print(thie)
print(thie.mean()) # calculates the mean of thie
print(np.mean(thie)) # alternate method to calculate mean
print(thie.sum())

[[ 0.11088788  0.76718915  0.87810445  0.18165246  0.21529114]
 [-0.34830644 -0.91206523  0.14076785 -2.68904751  0.27059392]
 [ 0.51860932 -0.53160957 -0.67263039  0.46543742  0.25676532]
 [ 0.60305362  0.53747935  1.74980542  0.84835585  1.33774012]
 [-0.2176865  -0.95880012 -0.00518256 -0.54673704 -1.40125327]]
0.02393658614609004
0.02393658614609004
0.598414653652251


In [40]:
jp =np.arange(12).reshape(4,3)
print("The arrays are: {}".format(jp))
print("The sum of rows are :{}".format(np.sum(jp, axis =0))) #axis =0, gives you sum of the columns. axis =1 , gives sum of rows.
# remember this zero is for columns and one is for rows.

The arrays are: [[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
The sum of rows are :[18 22 26]


In [41]:
jp.sum(1)#returns sum of rows

array([ 3, 12, 21, 30])

In [42]:
jp.cumsum(0)  #cumulative sum of columns, try the same for jp.cumprod(0)

array([[ 0,  1,  2],
       [ 3,  5,  7],
       [ 9, 12, 15],
       [18, 22, 26]], dtype=int32)

In [43]:
jp.cumsum(1)#cumulative sum of rows

array([[ 0,  1,  3],
       [ 3,  7, 12],
       [ 6, 13, 21],
       [ 9, 19, 30]], dtype=int32)

In [44]:
xp =np.random.randn(100)
print((xp > 0).sum()) # sum of all positive values
print((xp < 0).sum())
tandf =np.array([True,False,True,False,True,False])
print(tandf.any())#checks if any of the values are true
print(tandf.all()) #returns false even if a single value is false
#These methods also work with non-boolean arrays, where non-zero elements evaluate to True.

43
57
True
False


> Other array functions are:
    * std, var -> standard deviation and variance
    * min, max -> Minimum and Maximum
    * argmin, argmax -> Indices of minimum and maximum elements

## **Sorting**

In [45]:
lp = np.random.randn(8)
print(lp)
lp.sort()
lp

[ 1.65872416 -1.52583408 -0.88560874 -0.18059008  0.78259141  0.15040551
  2.16625321 -0.11402223]


array([-1.52583408, -0.88560874, -0.18059008, -0.11402223,  0.15040551,
        0.78259141,  1.65872416,  2.16625321])

In [46]:
tp = np.random.randn(4,4)
tp

array([[ 0.33933736, -0.96784048, -2.68297329, -0.32691466],
       [ 3.17069102, -0.61514381,  1.30608162,  1.04829355],
       [ 0.29635307, -0.17757665, -0.0948743 ,  0.25181073],
       [ 0.97394346, -1.2962098 ,  0.27758876, -0.8861598 ]])

In [47]:
tp.sort(1) #check the rows are sorted
tp

array([[-2.68297329, -0.96784048, -0.32691466,  0.33933736],
       [-0.61514381,  1.04829355,  1.30608162,  3.17069102],
       [-0.17757665, -0.0948743 ,  0.25181073,  0.29635307],
       [-1.2962098 , -0.8861598 ,  0.27758876,  0.97394346]])

In [48]:
personals = np.array(['Manu', 'Jeevan', 'Prakash', 'Manu', 'Prakash', 'Jeevan', 'Prakash'])
np.unique(personals)# returns the unique elements in the array

array(['Jeevan', 'Manu', 'Prakash'], dtype='<U7')

In [49]:
set(personals) # set is an alternative to unique function

{'Jeevan', 'Manu', 'Prakash'}

In [50]:
np.in1d(personals, ['Manu']) #in1d function checks for the value 'Manu' and returns True, other wise returns False

array([ True, False, False,  True, False, False, False])

> Other Functions are :
    * intersect1d(x, y)-> Compute the sorted, common elements in x and y
    * union1d(x,y) -> compute the sorted union of elements
    * setdiff1d(x,y) -> set difference, elements in x that are not in y
    * setxor1d(x, y) -> Set symmetric differences; elements that are in either of the arrays, but not both

## **Linear Algebra**

In [51]:
cp = np.array([[1,2,3],[4,5,6]])
dp = np.array([[7,8],[9,10],[11,12]])
print("CP array :{}".format(cp))
print("DP array :{}".format(dp))

CP array :[[1 2 3]
 [4 5 6]]
DP array :[[ 7  8]
 [ 9 10]
 [11 12]]


In [52]:
# element wise multiplication
cp.dot(dp) # this is equivalent to np.dot(x,y)

array([[ 58,  64],
       [139, 154]])

In [53]:
np.dot(cp, np.ones(3)) 

array([ 6., 15.])

In [54]:
# numpy.linalg has standard matrix operations like determinants and inverse.
from numpy.linalg import inv, qr
cp = np.array([[1,2,3],[4,5,6]])
new_mat = cp.T.dot(cp) # multiply cp inverse and cp, this is element wise multiplication
print(new_mat)

[[17 22 27]
 [22 29 36]
 [27 36 45]]


In [55]:
sp = np.random.randn(5,5)
print(inv(sp))
rt = inv(sp)

[[ 1.26812527 -0.82458732  0.40397289 -2.02878432 -2.61871151]
 [ 0.79264956 -0.4041058   0.13270575 -0.61926693 -0.8475755 ]
 [-0.29884088  0.48811845  0.13073602  0.37263884  0.85344308]
 [ 0.01040039 -0.75483236  0.46564806 -0.10481684 -0.81914129]
 [ 0.07161937  0.06259495  0.01642321 -0.17436547  0.6000147 ]]


In [56]:
# to calculate the product of a matrix and its inverse
sp.dot(rt)

array([[ 1.00000000e+00, -7.28583860e-17, -7.71951947e-17,
         1.66533454e-16,  3.33066907e-16],
       [ 0.00000000e+00,  1.00000000e+00, -4.51028104e-17,
         1.66533454e-16,  1.11022302e-16],
       [ 1.24900090e-16, -2.87964097e-16,  1.00000000e+00,
        -2.28983499e-16, -4.71844785e-16],
       [-2.77555756e-17,  2.77555756e-17, -1.73472348e-17,
         1.00000000e+00,  0.00000000e+00],
       [ 6.93889390e-17, -2.77555756e-17,  6.93889390e-18,
         2.77555756e-17,  1.00000000e+00]])

In [57]:
q,r = qr(sp)
print(q)
r

[[-0.53205667 -0.66728817  0.49595543  0.11330789  0.11327726]
 [ 0.43098656 -0.49174631 -0.05090574 -0.74836022  0.09900372]
 [ 0.24343874 -0.5412623  -0.59636004  0.53986356  0.02597588]
 [-0.66275624 -0.02098086 -0.59306725 -0.36404275 -0.2757863 ]
 [-0.18071535  0.13966745 -0.20991132 -0.05602237  0.94901723]]


array([[ 1.12857527, -2.18533377,  0.8308966 , -0.07567827,  0.25224078],
       [ 0.        , -1.97074344, -2.66246819,  0.0984475 ,  1.37033484],
       [ 0.        ,  0.        , -1.53569072, -0.87224941,  0.64368055],
       [ 0.        ,  0.        ,  0.        ,  1.10926304,  1.42099983],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  1.58165662]])

> Other Matrix Functions
    * diag : Return the diagonal (or off-diagonal) elements of a square matrix as a 1D array, or convert a 1D array into a square
    * matrix with zeros on the off-diagonal
    * trace: Compute the sum of the diagonal elements
    * det: Compute the matrix determinant
    * eig: Compute the eigenvalues and eigenvectors of a square matrix
    * pinv: Compute the pseudo-inverse of a square matrix
    * svd: Compute the singular value decomposition (SVD)
    * solve: Solve the linear system Ax = b for x, where A is a square matrix
    * lstsq: Compute the least-squares solution to y = Xb

<div>
<a href="#" style="font-size:48px;font-weight:600;margin-bottom:20px;"> Scipy </a>
<div style="padding-top: 20px;"><p style="font-size:24px;">Scipy library basics with code example</p></div>
</div>

<div>
<a href="#" style="font-size:48px;font-weight:600;margin-bottom:20px;"> Pandas </a>
<div style="padding-top: 20px;"><p style="font-size:24px;">Pandas library basics with code example</p></div>
</div>

Pandas contains high level data structures and manipulation tools to make data analysis fast and easy in Python.

In [58]:
import pandas as pd
from pandas import Series, DataFrame # Series and Data Frame are two data structures available in pandas

## Series
Series is a one-dimensional array like object containing an array of data(any Numpy data type, and an associated array of data labels, called its index.

In [59]:
mjp= Series([5,4,3,2,1])# a simple series
print(mjp)       # A series is represented by index on the left and values on the right
print(mjp.values) # similar to dictionary. ".values" command returns values in a series 

0    5
1    4
2    3
3    2
4    1
dtype: int64
[5 4 3 2 1]


In [60]:
print(mjp.index.values) # returns the index values of the series

[0 1 2 3 4]


In [61]:
jeeva = Series([5,4,3,2,1,-7,-29], index =['a','b','c','d','e','f','h']) # The index is specified
print(jeeva) # try jeeva.index and jeeva.values
print(jeeva['a']) # selecting a particular value from a Series, by using index

a     5
b     4
c     3
d     2
e     1
f    -7
h   -29
dtype: int64
5


In [62]:
jeeva['d'] = 9 # change the value of a particular element in series
print(jeeva)
jeeva[['a','b','c']] # select a group of values

a     5
b     4
c     3
d     9
e     1
f    -7
h   -29
dtype: int64


a    5
b    4
c    3
dtype: int64

In [63]:
print(jeeva[jeeva>0]) # returns only the positive values
print(jeeva *2) # multiplies 2 to each element of a series

a    5
b    4
c    3
d    9
e    1
dtype: int64
a    10
b     8
c     6
d    18
e     2
f   -14
h   -58
dtype: int64


In [64]:
import numpy as np
np.mean(jeeva) # you can apply numpy functions to a Series

-2.0

In [65]:
print('b' in jeeva) # checks whether the index is present in Series or not
print('z' in jeeva)

True
False


In [66]:
player_salary ={'Rooney': 50000, 'Messi': 75000, 'Ronaldo': 85000, 'Fabregas':40000, 'Van persie': 67000} 
new_player = Series(player_salary)# converting a dictionary to a series
print(new_player) # the series has keys of a dictionary

Rooney        50000
Messi         75000
Ronaldo       85000
Fabregas      40000
Van persie    67000
dtype: int64


In [67]:
players =['Klose', 'Messi', 'Ronaldo', 'Van persie', 'Ballack'] 
player_1 =Series(player_salary, index= players)
print(player_1) # I have changed the index of the Series. Since, no value was not found for Klose and Ballack, it appears as NAN

Klose             NaN
Messi         75000.0
Ronaldo       85000.0
Van persie    67000.0
Ballack           NaN
dtype: float64


In [68]:
pd.isnull(player_1)#checks for Null values in player_1, pd denotes a pandas dataframe

Klose          True
Messi         False
Ronaldo       False
Van persie    False
Ballack        True
dtype: bool

In [69]:
pd.notnull(player_1)# Checks for null values that are not Null

Klose         False
Messi          True
Ronaldo        True
Van persie     True
Ballack       False
dtype: bool

In [70]:
player_1.name ='Bundesliga players' # name for the Series
player_1.index.name='Player names' #name of the index
player_1

Player names
Klose             NaN
Messi         75000.0
Ronaldo       85000.0
Van persie    67000.0
Ballack           NaN
Name: Bundesliga players, dtype: float64

In [71]:
player_1.index =['Neymar', 'Hulk', 'Pirlo', 'Buffon', 'Anderson'] # is used to alter the index of Series
player_1

Neymar          NaN
Hulk        75000.0
Pirlo       85000.0
Buffon      67000.0
Anderson        NaN
Name: Bundesliga players, dtype: float64

## Data Frame
Data frame is a spread sheet like structure, containing ordered collection of columns. Each column can have different value type. Data frame has both row index and column index.

In [72]:
states ={'State' :['Gujarat', 'Tamil Nadu', ' Andhra', 'Karnataka', 'Kerala'],
                  'Population': [36, 44, 67,89,34],
                  'Language' :['Gujarati', 'Tamil', 'Telugu', 'Kannada', 'Malayalam']}
india = DataFrame(states) # creating a data frame
india

Unnamed: 0,State,Population,Language
0,Gujarat,36,Gujarati
1,Tamil Nadu,44,Tamil
2,Andhra,67,Telugu
3,Karnataka,89,Kannada
4,Kerala,34,Malayalam


In [73]:
DataFrame(states, columns=['State', 'Language', 'Population']) # change the sequence of column index

Unnamed: 0,State,Language,Population
0,Gujarat,Gujarati,36
1,Tamil Nadu,Tamil,44
2,Andhra,Telugu,67
3,Karnataka,Kannada,89
4,Kerala,Malayalam,34


In [74]:
new_frame = DataFrame(states, columns=['State', 'Language', 'Population', 'Per Capita Income'], index =['a','b','c','d','e'])
new_frame
#if you pass a column that isnt in states, it will appear with Na values

Unnamed: 0,State,Language,Population,Per Capita Income
a,Gujarat,Gujarati,36,
b,Tamil Nadu,Tamil,44,
c,Andhra,Telugu,67,
d,Karnataka,Kannada,89,
e,Kerala,Malayalam,34,


In [75]:
print(new_farme.columns)
print(new_farme['State']) # retrieveing data like dictionary

NameError: name 'new_farme' is not defined

In [76]:
new_farme.Population # like Series

NameError: name 'new_farme' is not defined

In [77]:
new_farme.ix[3] # rows can be retrieved using .ic function
# here I have retrieved 3rd row

NameError: name 'new_farme' is not defined

In [78]:
 new_farme

NameError: name 'new_farme' is not defined

In [79]:
new_farme['Per Capita Income'] = 99 # the empty per capita income column can be assigned a value
new_farme

NameError: name 'new_farme' is not defined

In [80]:
new_farme['Per Capita Income'] = np.arange(5) # assigning a value to the last column
new_farme

NameError: name 'new_farme' is not defined

In [81]:
series = Series([44,33,22], index =['b','c','d'])
new_farme['Per Capita Income'] = series
#when assigning list or arrays to a column, the values length should match the length of the DataFrame
new_farme # again the missing values are displayed as NAN

NameError: name 'new_farme' is not defined

In [82]:
new_farme['Development'] = new_farme.State == 'Gujarat'# assigning a new column
print(new_farme)
del new_farme['Development'] # will delete the column 'Development'
new_farme

NameError: name 'new_farme' is not defined

In [83]:
new_data ={'Modi': {2010: 72, 2012: 78, 2014 : 98},'Rahul': {2010: 55, 2012: 34, 2014: 22}}
elections = DataFrame(new_data) 
print(elections) # the outer dict keys are columns and inner dict keys are rows
elections.T # transpose of a data frame

      Modi  Rahul
2010    72     55
2012    78     34
2014    98     22


Unnamed: 0,2010,2012,2014
Modi,72,78,98
Rahul,55,34,22


In [84]:
DataFrame(new_data, index =[2012, 2014, 2016]) # you can assign index for the data frame

Unnamed: 0,Modi,Rahul
2012,78.0,34.0
2014,98.0,22.0
2016,,


In [85]:
ex= {'Gujarat':elections['Modi'][:-1], 'India': elections['Rahul'][:2]}
px =DataFrame(ex)
px

Unnamed: 0,Gujarat,India
2010,72,55
2012,78,34


In [86]:
px.index.name = 'year'
px.columns.name = 'politicians'
px

politicians,Gujarat,India
year,Unnamed: 1_level_1,Unnamed: 2_level_1
2010,72,55
2012,78,34


In [87]:
px.values

array([[72, 55],
       [78, 34]], dtype=int64)

In [88]:
jeeva = Series([5,4,3,2,1,-7,-29], index =['a','b','c','d','e','f','h'])
index = jeeva.index
print(index) #u denotes unicode
print(index[1:]) # returns all the index elements except a. 
index[1] = 'f' # you cannot modify an index element. It will generate an error. In other words, they are immutable

Index(['a', 'b', 'c', 'd', 'e', 'f', 'h'], dtype='object')
Index(['b', 'c', 'd', 'e', 'f', 'h'], dtype='object')


TypeError: Index does not support mutable operations

In [None]:
print(px)
2013 in px.index # checks if 2003 is an index in data frame px

## Reindex

In [89]:
var = Series(['Python', 'Java', 'c', 'c++', 'Php'], index =[5,4,3,2,1])
print(var)
var1 = var.reindex([1,2,3,4,5])# reindex creates a new object 
print(var1) 

5    Python
4      Java
3         c
2       c++
1       Php
dtype: object
1       Php
2       c++
3         c
4      Java
5    Python
dtype: object


In [90]:
var.reindex([1,2,3,4,5,6,7])# introduces new indexes with values Nan

1       Php
2       c++
3         c
4      Java
5    Python
6       NaN
7       NaN
dtype: object

In [91]:
var.reindex([1,2,3,4,5,6,7], fill_value =1) # you can use fill value to fill the Nan values. Here I have used fill value as 1. You can use any value.

1       Php
2       c++
3         c
4      Java
5    Python
6         1
7         1
dtype: object

In [92]:
gh =Series(['Dhoni', 'Sachin', 'Kohli'], index =[0,2,4])
print(gh)
gh.reindex(range(6), method ='ffill') #ffill is forward fill. It forward fills the values

0     Dhoni
2    Sachin
4     Kohli
dtype: object


0     Dhoni
1     Dhoni
2    Sachin
3    Sachin
4     Kohli
5     Kohli
dtype: object

In [93]:
gh.reindex(range(6), method ='bfill')# bfill, backward fills the values

0     Dhoni
1    Sachin
2    Sachin
3     Kohli
4     Kohli
5       NaN
dtype: object

In [94]:
import numpy as np
fp = DataFrame(np.arange(9).reshape((3,3)),index =['a','b','c'], columns =['Gujarat','Tamil Nadu', 'Kerala'])
fp

Unnamed: 0,Gujarat,Tamil Nadu,Kerala
a,0,1,2
b,3,4,5
c,6,7,8


In [95]:
fp1 =fp.reindex(['a', 'b', 'c', 'd'], columns = states) # reindexing columns and indices
fp1

Unnamed: 0,State,Population,Language
a,,,
b,,,
c,,,
d,,,


## Other Reindexing arguments
    **limit** When forward- or backfilling, maximum size gap to fill
    **level** Match simple Index on level of MultiIndex, otherwise select subset of
    **copy** Do not copy underlying data if new index is equivalent to old index. True by default (i.e. always copy data).

## Dropping entries from an axis

In [96]:
er = Series(np.arange(5), index =['a','b','c','d','e'])
print(er)
er.drop(['a','b']) #drop method will return a new object  with values deleted from an axis

a    0
b    1
c    2
d    3
e    4
dtype: int32


c    2
d    3
e    4
dtype: int32

In [97]:
states ={'State' :['Gujarat', 'Tamil Nadu', ' Andhra', 'Karnataka', 'Kerala'],
                  'Population': [36, 44, 67,89,34],
                  'Language' :['Gujarati', 'Tamil', 'Telugu', 'Kannada', 'Malayalam']}
india = DataFrame(states, columns =['State', 'Population', 'Language'])
print(india)
india.drop([0,1])# will drop index 0 and 1

        State  Population   Language
0     Gujarat          36   Gujarati
1  Tamil Nadu          44      Tamil
2      Andhra          67     Telugu
3   Karnataka          89    Kannada
4      Kerala          34  Malayalam


Unnamed: 0,State,Population,Language
2,Andhra,67,Telugu
3,Karnataka,89,Kannada
4,Kerala,34,Malayalam


In [98]:
india.drop(['State', 'Population'], axis =1 )# the function dropped population and state columns. Apply the same concept with axis =0

Unnamed: 0,Language
0,Gujarati
1,Tamil
2,Telugu
3,Kannada
4,Malayalam


## Selection, Indexing and Filtering

In [99]:
var = Series(['Python', 'Java', 'c', 'c++', 'Php'], index =[5,4,3,2,1])
var

5    Python
4      Java
3         c
2       c++
1       Php
dtype: object

In [100]:
print (var[5])
print (var[2:4])

Python
3      c
2    c++
dtype: object


In [101]:
var[[3,2,1]]

3      c
2    c++
1    Php
dtype: object

In [102]:
var[var == 'Php']

1    Php
dtype: object

In [103]:
states ={'State' :['Gujarat', 'Tamil Nadu', ' Andhra', 'Karnataka', 'Kerala'],
                  'Population': [36, 44, 67,89,34],
                  'Language' :['Gujarati', 'Tamil', 'Telugu', 'Kannada', 'Malayalam']}
india = DataFrame(states, columns =['State', 'Population', 'Language'])
india

Unnamed: 0,State,Population,Language
0,Gujarat,36,Gujarati
1,Tamil Nadu,44,Tamil
2,Andhra,67,Telugu
3,Karnataka,89,Kannada
4,Kerala,34,Malayalam


In [104]:
india[['Population', 'Language']] # retrieve data from data frame

Unnamed: 0,Population,Language
0,36,Gujarati
1,44,Tamil
2,67,Telugu
3,89,Kannada
4,34,Malayalam


In [105]:
india[india['Population'] > 50] # returns data for population greater than 50

Unnamed: 0,State,Population,Language
2,Andhra,67,Telugu
3,Karnataka,89,Kannada


In [106]:
india[:3] # first three rows

Unnamed: 0,State,Population,Language
0,Gujarat,36,Gujarati
1,Tamil Nadu,44,Tamil
2,Andhra,67,Telugu


In [107]:
# for selecting specific rows and columns, you can use ix function
import pandas as pd
states ={'State' :['Gujarat', 'Tamil Nadu', ' Andhra', 'Karnataka', 'Kerala'],
                  'Population': [36, 44, 67,89,34],
                  'Language' :['Gujarati', 'Tamil', 'Telugu', 'Kannada', 'Malayalam']}
india = DataFrame(states, columns =['State', 'Population', 'Language'], index =['a', 'b', 'c', 'd', 'e'])
india

Unnamed: 0,State,Population,Language
a,Gujarat,36,Gujarati
b,Tamil Nadu,44,Tamil
c,Andhra,67,Telugu
d,Karnataka,89,Kannada
e,Kerala,34,Malayalam


In [108]:
india.ix[['a','b'], ['State','Language']] # this is how you select subset of rows

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.


Unnamed: 0,State,Language
a,Gujarat,Gujarati
b,Tamil Nadu,Tamil


## Numpy cheat sheet

Import numpy as np

#### Importing/exporting
* np.loadtxt('file.txt') | From a text file
* np.genfromtxt('file.csv',delimiter=',') | From a CSV file
* np.savetxt('file.txt',arr,delimiter=' ') | Writes to a text file
* np.savetxt('file.csv',arr,delimiter=',') | Writes to a CSV file

#### Creating Arrays
* np.array([1,2,3]) | One dimensional array
* np.array([(1,2,3),(4,5,6)]) | Two dimensional array
* np.zeros(3) | 1D array of length 3 all values 0
* np.ones((3,4)) | 3x4 array with all values 1
* np.eye(5) | 5x5 array of 0 with 1 on diagonal (Identity matrix)
* np.linspace(0,100,6) | Array of 6 evenly divided values from 0 to 100
* np.arange(0,10,3) | Array of values from 0 to less than 10 with step 3 (eg [0,3,6,9])
* np.full((2,3),8) | 2x3 array with all values 8
* np.random.rand(4,5) | 4x5 array of random floats between 0–1
* np.random.rand(6,7)*100 | 6x7 array of random floats between 0–100
* np.random.randint(5,size=(2,3)) | 2x3 array with random ints between 0–4

#### Inspecting Properties
* arr.size | Returns number of elements in arr
* arr.shape | Returns dimensions of arr (rows,columns)
* arr.dtype | Returns type of elements in arr
* arr.astype(dtype) | Convert arr elements to type dtype
* arr.tolist() | Convert arr to a Python list
* np.info(np.eye) | View documentation for np.eye

#### Copying/sorting/reshaping
* np.copy(arr) | Copies arr to new memory
* arr.view(dtype) | Creates view of arr elements with type dtype
* arr.sort() | Sorts arr
* arr.sort(axis=0) | Sorts specific axis of arr
* two_d_arr.flatten() | Flattens 2D array two_d_arr to 1D
* arr.T | Transposes arr (rows become columns and vice versa)
* arr.reshape(3,4) | Reshapes arr to 3 rows, 4 columns without changing data
* arr.resize((5,6)) | Changes arr shape to 5x6 and fills new values with 0

#### Adding/removing Elements
* np.append(arr,values) | Appends values to end of arr
* np.insert(arr,2,values) | Inserts values into arr before index 2
* np.delete(arr,3,axis=0) | Deletes row on index 3 of arr
* np.delete(arr,4,axis=1) | Deletes column on index 4 of arr

#### Combining/splitting
* np.concatenate((arr1,arr2),axis=0) | Adds arr2 as rows to the end of arr1
* np.concatenate((arr1,arr2),axis=1) | Adds arr2 as columns to end of arr1
* np.split(arr,3) | Splits arr into 3 sub-arrays
* np.hsplit(arr,5) | Splits arr horizontally on the 5th index

#### Indexing/slicing/subsetting
* arr[5] | Returns the element at index 5
* arr[2,5] | Returns the 2D array element on index [2][5]
* arr[1]=4 | Assigns array element on index 1 the value 4
* arr[1,3]=10 | Assigns array element on index [1][3] the value 10
* arr[0:3] | Returns the elements at indices 0,1,2 (On a 2D array: returns rows 0,1,2)
* arr[0:3,4] | Returns the elements on rows 0,1,2 at column 4
* arr[:2] | Returns the elements at indices 0,1 (On a 2D array: returns rows 0,1)
* arr[:,1] | Returns the elements at index 1 on all rows
* arr<5 | Returns an array with boolean values
* (arr1<3) & (arr2>5) | Returns an array with boolean values
* ~arr | Inverts a boolean array
* arr[arr<5] | Returns array elements smaller than 5

#### Scalar Math
* np.add(arr,1) | Add 1 to each array element
* np.subtract(arr,2) | Subtract 2 from each array element
* np.multiply(arr,3) | Multiply each array element by 3
* np.divide(arr,4) | Divide each array element by 4 (returns np.nan for division by zero)
* np.power(arr,5) | Raise each array element to the 5th power

#### Vector Math
* np.add(arr1,arr2) | Elementwise add arr2 to arr1
* np.subtract(arr1,arr2) | Elementwise subtract arr2 from arr1
* np.multiply(arr1,arr2) | Elementwise multiply arr1 by arr2
* np.divide(arr1,arr2) | Elementwise divide arr1 by arr2
* np.power(arr1,arr2) | Elementwise raise arr1 raised to the power of arr2
* np.array_equal(arr1,arr2) | Returns True if the arrays have the same elements and shape
* np.sqrt(arr) | Square root of each element in the array
* np.sin(arr) | Sine of each element in the array
* np.log(arr) | Natural log of each element in the array
* np.abs(arr) | Absolute value of each element in the array
* np.ceil(arr) | Rounds up to the nearest int
* np.floor(arr) | Rounds down to the nearest int
* np.round(arr) | Rounds to the nearest int

#### Statistics
* np.mean(arr,axis=0) | Returns mean along specific axis
* arr.sum() | Returns sum of arr
* arr.min() | Returns minimum value of arr
* arr.max(axis=0) | Returns maximum value of specific axis
* np.var(arr) | Returns the variance of array
* np.std(arr,axis=1) | Returns the standard deviation of specific axis
* arr.corrcoef() | Returns correlation coefficient of array

## Pandas Cheatsheet

import pandas as pd
import numpy as np

#### Importing Data
* pd.read_csv(filename) | From a CSV file
* pd.read_table(filename) | From a delimited text file (like TSV)
* pd.read_excel(filename) | From an Excel file
* pd.read_sql(query, connection_object) | Read from a SQL table/database
* pd.read_json(json_string) | Read from a JSON formatted string, URL or file.
* pd.read_html(url) | Parses an html URL, string or file and extracts tables to a list of dataframes
* pd.read_clipboard() | Takes the contents of your clipboard and passes it to read_table()
* pd.DataFrame(dict) | From a dict, keys for columns names, values for data as lists

#### Exporting Data
* df.to_csv(filename) | Write to a CSV file
* df.to_excel(filename) | Write to an Excel file
* df.to_sql(table_name, connection_object) | Write to a SQL table
* df.to_json(filename) | Write to a file in JSON format

#### Create Test Objects
Useful for testing code segements

* pd.DataFrame(np.random.rand(20,5)) | 5 columns and 20 rows of random floats
* pd.Series(my_list) | Create a series from an iterable my_list
* df.index = pd.date_range('1900/1/30', periods=df.shape[0]) | Add a date index

#### Viewing/Inspecting Data
* df.head(n) | First n rows of the DataFrame
* df.tail(n) | Last n rows of the DataFrame
* df.shape | Number of rows and columns
* df.info() | Index, Datatype and Memory information
* df.describe() | Summary statistics for numerical columns
* s.value_counts(dropna=False) | View unique values and counts
* df.apply(pd.Series.value_counts) | Unique values and counts for all columns

#### Selection
* df[col] | Returns column with label col as Series
* df[[col1, col2]] | Returns columns as a new DataFrame
* s.iloc[0] | Selection by position
* s.loc['index_one'] | Selection by index
* df.iloc[0,:] | First row
* df.iloc[0,0] | First element of first column

#### Data Cleaning
* df.columns = ['a','b','c'] | Rename columns
* pd.isnull() | Checks for null Values, Returns Boolean Arrray
* pd.notnull() | Opposite of pd.isnull()
* df.dropna() | Drop all rows that contain null values
* df.dropna(axis=1) | Drop all columns that contain null values
* df.dropna(axis=1,thresh=n) | Drop all rows have have less than n non null values
* df.fillna(x) | Replace all null values with x
* s.fillna(s.mean()) | Replace all null values with the mean (mean can be replaced with almost any function from the statistics module)
* s.astype(float) | Convert the datatype of the series to float
* s.replace(1,'one') | Replace all values equal to 1 with 'one'
* s.replace([1,3],['one','three']) | Replace all 1 with 'one' and 3 with 'three'
* df.rename(columns=lambda x: x + 1) | Mass renaming of columns
* df.rename(columns={'old_name': 'new_ name'}) | Selective renaming
* df.set_index('column_one') | Change the index
* df.rename(index=lambda x: x + 1) | Mass renaming of index

#### Filter, Sort, and Groupby
* df[df[col] > 0.5] | Rows where the column col is greater than 0.5
* df[(df[col] > 0.5) & (df[col] < 0.7)] | Rows where 0.7 > col > 0.5
* df.sort_values(col1) | Sort values by col1 in ascending order
* df.sort_values(col2,ascending=False) | Sort values by col2 in descending order
* df.sort_values([col1,col2],ascending=[True,False]) | Sort values by col1 in ascending order then col2 in descending order
* df.groupby(col) | Returns a groupby object for values from one column
* df.groupby([col1,col2]) | Returns groupby object for values from multiple columns
* df.groupby(col1)[col2] | Returns the mean of the values in col2, grouped by the values in col1 (mean can be replaced with almost any function from the statistics module)
* df.pivot_table(index=col1,values=[col2,col3],aggfunc=mean) | Create a pivot table that groups by col1 and calculates the mean of col2 and col3
* df.groupby(col1).agg(np.mean) | Find the average across all columns for every unique col1 group
* df.apply(np.mean) | Apply the function np.mean() across each column
* nf.apply(np.max,axis=1) | Apply the function np.max() across each row

#### Join/Combine
* df1.append(df2) | Add the rows in df1 to the end of df2 (columns should be identical)
* pd.concat([df1, df2],axis=1) | Add the columns in df1 to the end of df2 (rows should be identical)
* df1.join(df2,on=col1,how='inner') | SQL-style join the columns in df1 with the columns on df2 where the rows for
* col have identical values. 'how' can be one of 'left', 'right', 'outer', 'inner'

#### Statistics
These can all be applied to a series as well.
* df.describe() | Summary statistics for numerical columns
* df.mean() | Returns the mean of all columns
* df.corr() | Returns the correlation between columns in a DataFrame
* df.count() | Returns the number of non-null values in each DataFrame column
* df.max() | Returns the highest value in each column
* df.min() | Returns the lowest value in each column
* df.median() | Returns the median of each column
* df.std() | Returns the standard deviation of each column

In [109]:
# Numpy
# import numpy as np
# a = np.arange(9) 
# a = a.reshape(3,3)
# size = a.size
# shape = a.shape
# for x in np.nditer(a, order='C'):
#     print(x, end=',')
# for i in range(a.shape[0]):
#     for j in range(a.shape[1]):
#         print(a[i][j], end=',')
# x = np.zeros((3, 5, 2), dtype=np.int)
# d3_shape = x.shape
# y = np.zeros((3,5), dtype=np.int)
# d2_shape = y.shape
# a = np.array([[0, 1, 4],[3, 4, 5],[6, 7, 8]])
# Diagonal difference problem hakerrank 
# l_d = 0
# r_d = 0
# for i in range(a.shape[0]):
#     for j in range(a.shape[1]):
#         if i == j:
#             l_d += a[i][j]
#         if j == (a.shape[1] - i-1):
#             r_d += a[i][j]
# output = abs(l_d-r_d)
# print(output)
# List comprehension
# b = [20, 25, 30, 14]
# b_list = [i for i in b if (i % 2 == 0 and i % 5 == 0)]
# b_list
# c = [[0, 1, 4],[3, 4, 5],[6, 7, 8]]
# len(c[0])
# ans = 0
# for i in range(len(c)):
#     ans += c[i][i]
#     ans -= c[i][len(c)-1-i]
    
# ans

> Special Thanks and acknowledgement to 
    * https://www.dataquest.io/blog/numpy-cheat-sheet/
    * https://www.dataquest.io/blog/pandas-cheat-sheet/