# Introduction
The pandas library has emerged into a power house of data manipulation tasks in python since it was developed in 2008. With its intuitive syntax and flexible data structure, it's easy to learn and enables faster data computation. The development of numpy and pandas libraries has extended python's multi-purpose nature to solve machine learning problems as well. The acceptance of python language in machine learning has been phenomenal since then.
This notebook talks about using numpy and pandas libraries for data manipulation from scratch.


## Tabel of Contents
1. Some important points about Numpy and Pandas
2. Beginning with Numpy
3. Beginning with Pandas
4. Exploring a Machine Learning Data Set
5. Building a Random Forest Model

## Some important points about Numpy and Panda
1. Data manipulation capabiltiies of Pandas is built on top of NumPy. So pandas can be stated as a dependency of numpy.
2. Pandas is best at handling tabular data sets comprising different variable types (integer, float, double, etc.). In addition, the pandas library can also be used to perform even the most naive of tasks such as loading data or doing feature engineering on time series data.
3. Numpy is most suitable and generally used for performing basic numerical and statistical computations such as mean, median, range, etc. Alongside, it also supports the creation of multi-dimensional arrays.
4. Numpy library can also be used to integrate C/C++ and Fortran code.


### Beginning with numpy

In [1]:
import numpy as np    #importing the numpy module as np
np.__version__        #checking the version of numpy

'1.13.1'

In [2]:
L=list(range(10))  #creating a list with range 0-9
print(L)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [3]:
[str(c) for c in L]    #converting integers to list. this is called list comprehension.

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [4]:
[type(item) for item in L]  #finding out the data-type of each element in list

[int, int, int, int, int, int, int, int, int, int]

#### Let's create some arrays
Numpy array are homogeneous in nature. This means that the elements that they contain can only be of on data-type. All the elements have to be either int, float, double, etc

In [5]:
#creating an array with all elements as 0
np.zeros(10, dtype='int')   #10 defines the range of array and dtype declares the data type as integer

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [6]:
#creating a 4 row x 3 column matrix with all elements as 1
np.ones((4,3), dtype='int')

array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])

In [7]:
#creating a matrix with predefined values 
np.full((3,5),215.4553)

array([[ 215.4553,  215.4553,  215.4553,  215.4553,  215.4553],
       [ 215.4553,  215.4553,  215.4553,  215.4553,  215.4553],
       [ 215.4553,  215.4553,  215.4553,  215.4553,  215.4553]])

In [8]:
# creatinga n array with set sequence
np.arange(0,20,2)     
#arguement explaination: array should start from 0, end at 20 and diffrence between each element should be 2

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [9]:
#create an array of even space between the given range of values
np.linspace(0,1,5)
#arguement explaination: array should start at 0, should end at 1 and lay 5 elements between 0 and 1 with equal distribution.

array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])

In [10]:
#creating a 3x3 array with mean=0 and standard deviation=1 in a given dimension. Normal(Gaussian) distribution in this case
np.random.normal(0,1,(3,3))

array([[ 0.21891376,  0.67658691,  0.20451169],
       [-1.09590176, -0.34033581, -0.45275445],
       [-0.62903498, -0.5439045 ,  1.77317032]])

In [11]:
#creating an identity matrix
np.eye(3) #3 is to define that the matrix will be 3x3

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

In [12]:
#setting a random seed
np.random.seed(0)

x1 = np.random.randint(10, size=6) #one dimension
x2 = np.random.randint(10, size=(3,4)) #two dimension
x3 = np.random.randint(10, size=(3,4,5)) #three dimension

print("x1 ndim:", x2.ndim)
print("x1 shape:", x2.shape)
print("x1 size: ", x2.size)
print("x2 ndim:", x2.ndim)
print("x2 shape:", x2.shape)
print("x2 size: ", x2.size)
print("x3 ndim:", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

('x1 ndim:', 2)
('x1 shape:', (3, 4))
('x1 size: ', 12)
('x2 ndim:', 2)
('x2 shape:', (3, 4))
('x2 size: ', 12)
('x3 ndim:', 3)
('x3 shape:', (3, 4, 5))
('x3 size: ', 60)


#### Array indexing


In [13]:
#creating a numpy array is simple
x1=np.array([74,83,45,22,12])
x1

array([74, 83, 45, 22, 12])

In [14]:
#we can access any element of the numpy array just as we do in python
x1[1]

83

In [15]:
x1[-1]  #gets last element. Similarly -2 will get scond from last and so on

12

In [16]:
#creating a multidimensional array is also easy
x2=np.array([[1,2,3],[4,5,6],[7,8,9]])
x2

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [17]:
print(x2[2,0],x2[0,2])   #accessing the elemnts of a matrix like this

(7, 3)


In [18]:
#3rd row and last value from the 3rd column
x2[2,-1]

9

#### Array Slicing
Now we will try acessing multiple or a range of elemetns from an array


In [19]:
x=np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [20]:
#from start till 4th postition
x[:5]

array([0, 1, 2, 3, 4])

In [21]:
#from 4th to end
x[4:]

array([4, 5, 6, 7, 8, 9])

In [22]:
#from 4th to 6th 
x[4:7]

array([4, 5, 6])

In [23]:
#return elelments at even place
x[::2]

array([0, 2, 4, 6, 8])

In [24]:
#return elelments from first position and step by two\
x[1::2]

array([1, 3, 5, 7, 9])

In [25]:
#reversing the array
x[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

#### Array Concatenation
Combining arrays to make tasks easier and avoid making new arrays

In [26]:
#we can concatenate more than 2 arrays at once
x=np.array([1,2,3])
y=np.array([4,5,6])
z=np.array([7,8,9])
np.concatenate([x,y,z])

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [27]:
#a cooler way to produce multi-dimensional array
multiArray=np.array([[23,24,25],[12,13,14]])
np.concatenate([multiArray,multiArray])

array([[23, 24, 25],
       [12, 13, 14],
       [23, 24, 25],
       [12, 13, 14]])

In [28]:
#using concatenate's axis parameter, we can define row-wise or column-wise matrix
np.concatenate([multiArray,multiArray],axis=1)

array([[23, 24, 25, 23, 24, 25],
       [12, 13, 14, 12, 13, 14]])

np.concatenate() is undoubtedly epic for conccatenating arrays of equal dimensions.
But, what if we have to combine a 2D array and a 1D array? 
This is where hstack() and vstack() come into play.
