<h2 style="text-align: center;">Basics of Data Science Using Python</h2>

In this session, we will learn the basics of data science using Python. In the previous sessions, we already discussed about the basics of data science and the applications of data science. So, we will move forward to today's session where we will be covering many python modules and their applications.

The modules covered in this session are,
<ol>
    <li>Numpy</li>
    <li>Pandas</li>
    <li>Matplotlib</li>
</ol>

<h3 style="text-align: center;"> 1. Learning NumPy </h3>

#### What is NumPy?
<p> NumPy is a python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform, and matrices. NumPy stands for Numerical Python.</p>

#### What are we going to learn?
<p>We are going to learn the following </p>
<ol>
    <li>Numpy Arrays</li>
    <li>Aggregation Functions</li>
    <li>Comparison, Mask and Boolean Logic</li>
    <li>Fancy Indexing</li>
    <li>Sorting Arrays</li>
    <li>Structured Data</li>
</ol>

### Import NumPy 

In [1]:
import numpy as np

### 1. NumPy Arrays

In [2]:
np.random.seed(42)

arrLen_1d = 6
x1 = np.random.randint(5, size=arrLen_1d) # Creating 1D array

arrShape_2d = (3, 4)
x2 = np.random.randint(8, size=arrShape_2d) # creating 2D array

arrShape_3d = (4, 3, 2)
x3 = np.random.randint(low=0, high=30, size=arrShape_3d) # creating 3D array

In [3]:
print('x1:\n', x1.tolist()) # printing array x1 as list
print('x2:\n', x2.tolist()) # printing array x2 as list
print('x3:\n', x3.tolist()) # printing array x3 as list

x1:
 [3, 4, 2, 4, 4, 1]
x2:
 [[2, 6, 2, 2], [7, 4, 3, 7], [7, 2, 5, 4]]
x3:
 [[[1, 23], [11, 29], [5, 1]], [[27, 20], [0, 11], [25, 21]], [[28, 11], [24, 16], [26, 26]], [[9, 27], [27, 15], [14, 29]]]


In [4]:
def print_array_property(arr):
    print('Number of Dimension:', arr.ndim) # number of dimensions
    print('Array Shape:', arr.shape) # size of each dimension
    print('Array Size:', arr.size) # Total size of the array
    print('Array Datatype:', arr.dtype) # Data type of the given array
    print('Array Item Size:', arr.itemsize, 'bytes') # size of each array element in bytes 
    print('Array total Size:', arr.nbytes, 'bytes') # total size of array in bytes

In [7]:
print_array_property(x2)

Number of Dimension: 2
Array Shape: (3, 4)
Array Size: 12
Array Datatype: int32
Array Item Size: 4 bytes
Array total Size: 48 bytes


### Array Indexing
Array indexing in Numpy is almost similar as list indexing using python. If you are familiar with lists in python, this section won't take much of your time. 

In [8]:
print('Array x1 from index 2 to 4:', x1[2:4])
print('Element of array x3 in index (2,1):', x3[2][1])
print('Last element of array x2:', x2[-1])
print('First 3 elements of array x1:', x1[:3])
print('Last 3 elements of array x1:', x1[-3:])

Array x1 from index 2 to 4: [2 4]
Element of array x3 in index (2,1): [24 16]
Last element of array x2: [7 2 5 4]
First 3 elements of array x1: [3 4 2]
Last 3 elements of array x1: [4 4 1]


### 2. Aggregation Functions
<p>To get the statistical summary of a data, the aggregation functions like mean, median, mode plays an important role. Some of these operations are demonstrated below.</p>

In [9]:
arr = np.random.randint(low=1, high=10, size=(10))

print('Created array is:', arr.tolist()) 

print('Sum of all array elements in arr:', np.sum(arr))
print('Product of all array elements in arr:', np.prod(arr))

print('Mean of all array elements in arr:', np.mean(arr))
print('Median of all array elements in arr:', np.median(arr))
print('Average of all array elements in arr:', np.average(arr))

print('Maximum value of all array elements in arr:', np.max(arr))
print('Index of maximum value in arr:', np.argmax(arr))
print('Minimum value of all array elements in arr:', np.min(arr))
print('Index of minimum value in arr:', np.argmin(arr))

print('Standard Deviation of all array elements in arr:', np.std(arr))
print('Variance of all array elements in arr:', np.var(arr))

print('If any of the array elements are true in arr:', np.any(arr))
print('If all of the array elements are true in arr:', np.all(arr))

Created array is: [3, 7, 4, 9, 3, 5, 3, 7, 5, 9]
Sum of all array elements in arr: 55
Product of all array elements in arr: 10716300
Mean of all array elements in arr: 5.5
Median of all array elements in arr: 5.0
Average of all array elements in arr: 5.5
Maximum value of all array elements in arr: 9
Index of maximum value in arr: 3
Minimum value of all array elements in arr: 3
Index of minimum value in arr: 0
Standard Deviation of all array elements in arr: 2.247220505424423
Variance of all array elements in arr: 5.05
If any of the array elements are true in arr: True
If all of the array elements are true in arr: True


### 3. Comparison, Mask and Boolean Logic

In [10]:
x = np.array([7, 2, 9, 8, 5, 10, 1, 3, 4, 6]) # Creating a numpy array

print('[MASKS] Print False if the condition is not satisfied. Else print true\n')

print('Elements in x that are less than 4 OR greater than 8', ((x<4) | (x>8))) # OR OPERATOR
print('Elements in x that are less than 4 AND greater than 8', ((x<4) & (x>8))) # AND OPERATOR
print('Number of elements in x that are not equal to 4', np.sum(x != 4 )) # NOT OPERATOR
print('XOR Operation', (x ^ 7)) # XOR OPERATOR
print('\n')

print('The elements in x are less than 4:', x<4)
print('The elements in x are less than equal to 2:', x<=2)
print('The elements in x are greater than 7:', x>7)
print('The elements in x are greater equal to 5:', x>=5)
print('The elements in x are equal to 5:', x==5)
print('For the elements in x, 2x = x^2', 2*x==x**2)
print('\n')

print('Every elements in x that are less than 5:', (x<5).all())
print('Any elements in x that is less than 5:', (x<5).any())

[MASKS] Print False if the condition is not satisfied. Else print true

Elements in x that are less than 4 OR greater than 8 [False  True  True False False  True  True  True False False]
Elements in x that are less than 4 AND greater than 8 [False False False False False False False False False False]
Number of elements in x that are not equal to 4 9
XOR Operation [ 0  5 14 15  2 13  6  4  3  1]


The elements in x are less than 4: [False  True False False False False  True  True False False]
The elements in x are less than equal to 2: [False  True False False False False  True False False False]
The elements in x are greater than 7: [False False  True  True False  True False False False False]
The elements in x are greater equal to 5: [ True False  True  True  True  True False False False  True]
The elements in x are equal to 5: [False False False False  True False False False False False]
For the elements in x, 2x = x^2 [False  True False False False False False False False False]




In [11]:
# DEFINING ARRAYS TO PERFORM BOOLEAN OPERATIONS
A = np.array([1, 0, 1, 1, 0])
B = np.array([0, 1, 1, 0, 1])
print('A OR B', (A|B))
print('A OR B', (A&B))
print('A XOR B', (A^B))

A OR B [1 1 1 1 1]
A OR B [0 0 1 0 0]
A XOR B [1 1 0 1 1]


### 4. Fancy Indexing
As of now, we saw how to access portions of arrays using simple indices. Now, we'll look at another style of array indexing, known as fancy indexing. Fancy indexing is like the simple indexing we've already seen, but we pass arrays of indices in place of single scalars. This allows us to very quickly access and modify complicated subsets of an array’s values.

In [12]:
print('Fancy Indexed Array from array X1:',[x1[1], x1[4], x1[0]])
print('Fancy Indexed Array from array X2:',[x2[0].tolist(), x2[2].tolist()])
print('Fancy Indexed Array from array X3:',[x3[1][0][1], x3[2][1][0], x3[0][0][1] ])

Fancy Indexed Array from array X1: [4, 4, 3]
Fancy Indexed Array from array X2: [[2, 6, 2, 2], [7, 2, 5, 4]]
Fancy Indexed Array from array X3: [20, 24, 23]


### 5. Sorting Arrays

In [13]:
# WE CAN SORT AN ARRAY IN TWO WAYS..

# sort method - 1
a = np.array([1, 2, 9, 6, 2, 4, 1, 0, 1])
a = np.sort(a)
print(a)

#sort method - 2
b = np.array([9, 2, 8, 6, 20, 4, 11, 2, 6])
b.sort()
print(b)

[0 1 1 1 2 2 4 6 9]
[ 2  2  4  6  6  8  9 11 20]


In [15]:
# NUMPY ALSO ALLOWS US TO SORT A MULTIDIMENSIONAL MATRIX ALONG IT'S AXIS (ROW WISE, COLUMN WISE...)

# sorting array through axis
rand = np.random.RandomState(42)       
X = rand.randint(0, 10, (4, 6))
print('Unsorted Array:\n', X)
X = np.sort(X, axis=0)
print('Sorted Array along axis 0:\n',X) # SORRY IT'S AXIS 1, JUST AN ERROR IN PRINT STATEMENT

Unsorted Array:
 [[6 3 7 4 6 9]
 [2 6 7 4 3 7]
 [7 2 5 4 1 7]
 [5 1 4 0 9 5]]
Sorted Array along axis 0:
 [[2 1 4 0 1 5]
 [5 2 5 4 3 7]
 [6 3 7 4 6 7]
 [7 6 7 4 9 9]]


### 6. Structured and Record Arrays

#### Structured Array
While often our data can be well represented by a homogeneous array of values, sometimes this is not the case. This code cells demonstrates the use of NumPy’s structured arrays and record arrays, which provide efficient storage for compound, hetero geneous data. While the patterns shown here are useful for simple operations, scenarios like this often lend themselves to the use of Pandas DataFrames. We will see Pandas in the next part.

In [17]:
name = ['Alex', 'Andrew', 'Casey', 'Halen'] # 4 PEOPLE ARE THERE
age = [25, 45, 37, 19]       # THEIR AGES ARE DEFINED BY age ARRAY
weight = [55.0, 85.5, 68.0, 61.5] # THEIR WEIGHTS ARE DEFINED USING weight ARRAY

data = np.zeros(4, dtype={
                        'names':('name', 'age', 'weight'), 
                        'formats':('U10', 'i4', 'f8')
                    }) 
# ASSIGNING VALUES
data['name'] = name 
data['age'] = age
data['weight'] = weight

print('Complete Structured Data:\n', data.tolist()) # PRINTED AS LIST

# ACCESSING USING THE KEYS
print('All names:', data['name'])
# USING SOME OPERATIONS
print('Name of people whose age is less than 30:',  data[data['age'] < 30]['name'])

Complete Structured Data:
 [('Alex', 25, 55.0), ('Andrew', 45, 85.5), ('Casey', 37, 68.0), ('Halen', 19, 61.5)]
All names: ['Alex' 'Andrew' 'Casey' 'Halen']
Name of people whose age is less than 30: ['Alex' 'Halen']


#### RecordArrays
NumPy also provides the np.recarray class, which is almost identical to the structured arrays just described, but with one additional feature: fields can be accessed as attributes rather than as dictionary keys

In [18]:
data_rec = data.view(np.recarray)
print(data_rec.age)
print(data_rec.weight)
print(data_rec.name)

[25 45 37 19]
[55.  85.5 68.  61.5]
['Alex' 'Andrew' 'Casey' 'Halen']


The downside is that for record arrays, there is some extra overhead involved in accessing the fields, even when using the same syntax. We can see this here:

In [19]:
%timeit data['age']        
%timeit data_rec['age']        
%timeit data_rec.age

173 ns ± 4.14 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
5.15 µs ± 129 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
7.05 µs ± 304 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [None]:
# THIS CONCLUDES OUR INTRODUCTION TO NUMPY

# THANK YOU FOR WATCHING. PLEASE LIKE THIS VIDEO AND SUBSCRIBE TO THIS CHANNEL.