!["title"](Python1.jpg)

# Introduction

After completing this workshop, you should:
    <br/><br/>
Understand and use NumPy arrays <br/>
Use basic Pandas tools<br/>
Apply NumPy and Pandas to solve data science problems<br/>


# NumPy

Numpy is the most basic and a powerful package for working with data in python.

If you are going to work on data analysis or machine learning projects, then having a solid understanding of numpy is nearly mandatory.

So what does numpy provide? At the core, numpy provides the excellent ndarray objects, short for n-dimensional arrays.

In a ‘ndarray’ object, aka ‘array’, you can store multiple items of the same data type. It is the facilities around the array object that makes numpy so convenient for performing math and data manipulations.

## NumPy Basics

### Installing and importing NumPy

In [72]:
#Install from Jupyter
!pip install numpy



You are using pip version 19.0.3, however version 19.1.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.


In [None]:
import numpy as np

### Creating 1D array from a list


In [3]:
list1 = [0,1,2,3,4]
arr1d = np.array(list1)
print(type(arr1d))
arr1d

<class 'numpy.ndarray'>


array([0, 1, 2, 3, 4])


The key difference between an array and a list is, arrays are designed to handle vectorized operations while a python list is not. 
That means, if you apply a function it is performed on every item in the array.

In [4]:
list1 + 2  # error

TypeError: can only concatenate list (not "int") to list

In [5]:
arr1d + 2

array([2, 3, 4, 5, 6])

### More ways of creating an array

In [55]:
np.arange(1, 10)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [58]:
np.repeat(1, 10)

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

### Creating a 2D array from a list of lists


In [6]:
list2 = [[0,1,2], [3,4,5], [6,7,8]]
arr2d = np.array(list2)
arr2d

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [8]:
# Convert to 'float' datatype
arr2d_f = arr2d.astype('float')
arr2d_f

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

In [9]:
# Convert arr2d_f to int then to str datatype



### Inspecting a numpy array

In [10]:
# shape
print('Shape: ', arr2d.shape)

# dtype
print('Datatype: ', arr2d.dtype)

# size
print('Size: ', arr2d.size)

# ndim
print('Num Dimensions: ', arr2d.ndim)

Shape:  (3, 3)
Datatype:  int32
Size:  9
Num Dimensions:  2


### Indexing and slicing

In [13]:
my_array = np.arange(10) 

# Accessing first element
print(my_array[0])

# Accessing second element
print(my_array[1])

# Accessing last element


0
1


In [16]:
# slice items starting from index 
print("From index 2 to end:", my_array[2:])

# slice items between indexes 
print("From index 2 to 5 (excluding 5):", my_array[2:5])


From index 2 to end: [2 3 4 5 6 7 8 9]
From index 2 to 5 (excluding 5): [2 3 4]


**Now let's slice a 2D array.**

In [17]:
# array to begin with 
a = np.array([[1,2,3],[3,4,5],[4,5,6]]) 
a

array([[1, 2, 3],
       [3, 4, 5],
       [4, 5, 6]])

In [20]:
# this returns array of items in the second column 
print('The items in the second column are:', a[...,1])


The items in the second column are: [2 4 5]


In [22]:
# Now we will slice all items from the second row 
print('The items in the second row are:', a[1,...] ) 

The items in the second row are: [3 4 5]


In [24]:
# Now we will slice all items from column 1 onwards 
print('The items column 1 onwards are:', a[...,1:])

The items column 1 onwards are: [[2 3]
 [4 5]
 [5 6]]


!["title"](arr.png)

In [None]:
# Create a 2D array of 4 rows and 4 colums, containing the values displayed on the image above


# Using slicing, create a second 2D array from the first array, which is made of the 4 values highlited in dark grey



## NumPy arithmetics

In [26]:
a = 4
b = 3

print('Applying power function' )
print("4^2 = ", np.power(a,2))
print("4^3 = ", np.power(a,b))

Applying power function
4^2 =  16
4^3 =  64


In [27]:
a = np.array([10,20,30]) 
b = np.array([3,5,7]) 

print('First array:', a)
print('Second array:', b) 

First array: [10 20 30]
Second array: [3 5 7]


In [28]:
print('a mod() b: ', np.mod(a,b) )

print('a remainder() b:', np.remainder(a,b))

a mod() b:  [1 0 2]
a remainder() b: [1 0 2]


In [34]:
#Create 20 random integers
array_1 = np.random.randint(low=1, high=100, size=20)
array_1

array([74, 26, 63, 73, 90, 99, 39, 41, 79,  3, 65, 79, 45, 10, 84, 48, 18,
       55, 32,  8])

In [36]:
#Check which values are greater than or equal to 30
array_index_30 = array_1 >= 30
array_index_30

array([ True, False,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True, False,  True,  True, False,  True,
        True, False])

In [37]:
array_greater_30 = array_1[array_index_30]
array_greater_30

array([74, 63, 73, 90, 99, 39, 41, 79, 65, 79, 45, 84, 48, 55, 32])

In [70]:
# Create an array containing every odd value from array_1


In [46]:
# Create random 2D array
array = np.random.random((2,2))
print("Our array:", array)

# Sum of all elements in array
print("\nSum of all elements in array", array.sum())

# Min value in array
print("\nMinimum:", array.min())

# Max value in array
print("\nMaximum: ", "?")

Our array: [[0.6388356  0.11125357]
 [0.62330076 0.83699374]]

Sum of all elements in array 2.210383660352341

Minimum: 0.11125356554051902

Maximum:  ?


In [45]:
# Sum per column
print("Sum per column:", array.sum(axis=0))

Sum per column: [0.97714905 0.98472902]


In [47]:
# Min per row
array.min(axis=1)


array([0.11125357, 0.62330076])

In [49]:
# Create 3x3 2D array
a = np.arange(9).reshape(3,3)
a

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [50]:
b = np.array([2,4,6])
np.add(a,b)

array([[ 2,  5,  8],
       [ 5,  8, 11],
       [ 8, 11, 14]])

In [65]:
# Sorting a NumPy array
a = np.array([[12, 15], [10, 1]]) 
print ("Our initial array : \n", a)         

# sort along the first axis 
arr1 = np.sort(a, axis = 0)         
print ("\nSorted along X axis : \n", arr1)         
    
# sort along the last axis 
arr2 = np.sort(a, axis = 1)         
print ("\nSorted along Y axis : \n", arr2) 
   
arr3 = np.sort(a, axis = None)         
print ("\nSorted along None axis : \n", arr3) 

Our initial array : 
 [[12 15]
 [10  1]]

Sorted along X axis : 
 [[10  1]
 [12 15]]

Sorted along Y axis : 
 [[12 15]
 [ 1 10]]

Sorted along None axis : 
 [ 1 10 12 15]


In [69]:
arr = np.random.random((4,4))
print("Original:\n", arr)

# Sorting with regards to nth column
n = 0
arr = arr[arr[:,n].argsort()]
print("\nSorted:\n", arr)


Original:
 [[0.95715229 0.23377505 0.22982344 0.52766675]
 [0.23379942 0.81523898 0.33403353 0.19824422]
 [0.83757836 0.95434655 0.14518519 0.17846219]
 [0.4401715  0.58329204 0.96480703 0.85959889]]

Sorted:
 [[0.23379942 0.81523898 0.33403353 0.19824422]
 [0.4401715  0.58329204 0.96480703 0.85959889]
 [0.83757836 0.95434655 0.14518519 0.17846219]
 [0.95715229 0.23377505 0.22982344 0.52766675]]
