# Python Frameworks for Machine Learning

In this tutorial, you will learn how to operate with two of the most fundamental frameworks when it comes to Machine Learning: Pandas and NumPy

## NumPy (https://github.com/numpy/numpy)

NumPy is a Python library aimed at array operations. This library is well-suited for Machine Learning tasks, where speed and resources are very important (processing using NumPy arrays is up to 50x faster than doing so using Python lists).

Let's begin by installing and importing the library. (You just have to run the next cells)

In [None]:
pip install numpy

Note: Usually, you need to create a Python environment and install all the required packages there. However, since this is a jupyter notebook tutorial, we can do it like this.

In [2]:
import numpy as np

Now, let's do some exercises leveraging the potentialities of NumPy

#### Exercise 1 - Create a numpy array

It is quite simple to create a numpy array. You'll see an example and reproduce the method, using a different sequence of numbers.

In [None]:
example_array = np.array([1, 2, 3, 4, 5])
print("Here's an example: " + str(example_array) + "\n")
print("Now it's your turn.\n")

your_array = #insert code here

print(your_array)

As you can see, we used a list to create a NumPy array, but we can also use tuples.

In [15]:
array_from_tuple = np.array((1, 2, 3, 4, 5))

print(array_from_tuple)

[1 2 3 4 5]


#### Exercise 2 - Multi-dimensional arrays

So far, we've introduced one-dimensional arrays (which are basically matrices), but NumPy offers the possibility to create arrays with n dimensions. See an example:

In [17]:
array_3d = np.array([[2, 3, 4], [5, 6, 7], [2, 3, 7]])

print(array_3d)

[[2 3 4]
 [5 6 7]
 [2 3 7]]


Now, create an array with 7 dimensions.

In [None]:
array_7d = #insert code here

print(array_7d)

You can check the shape of your array using the following method:

In [None]:
print(array_7d.shape())

#### Exercise 4 - Indexation

In NumPy, you access an element within an array using straight brackets - [] - same as you do with Python lists. See the following example.

In [19]:
array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

print(array[0])
print(array[1])
print(array[7])


1
2
8


Insert the code necessary to access and print the 9th element of the previous array.

In [None]:
ninth = #insert code here

print(ninth)

What is the length of the array? Inser the code needed to answer this question.

In [22]:
arr_len = #insert code here

print()

9


Since the length of the array is 9, the 9th element of the array is the last one. You can access the last element of an array in a different manner:

In [24]:
last_element = array[-1]

print(last_element)

9


For multi-dimensional arrays, you need to use multiple indexes. Consider the following bi-dimensional array.

In [25]:
bidim_array = np.array([[1, 2, 3], [4, 5, 6]])

print(bidim_array)

[[1 2 3]
 [4 5 6]]


You want to access the element with value "4" in this array, which is situate in the second row, first column. How can you do that?

In [27]:
required_element = #insert code here

print(required_element)

4


#### Exercise 5 - Array Slicing

Now that you know all about indexing, let's talk slicing. You can slice sub-parts of arrays using the following notation:

In [29]:
array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

sliced_array = array[1:6]

print(sliced_array)

[2 3 4 5 6]


Your turn. Access and print the 3 middle elements of the array ("4", "5", and "6") using a slicing technique.

In [None]:
slice = #insert code here

print(slice)

You can also use negative indexes. See the example and try it yourself.

In [31]:
print(array[-6:-2])


[4 5 6 7]


In [None]:

your_slice = #insert code here

print(yout_slice)

Now let's get a sequence of elements through steps.

In [32]:
step_slice = array[1:9:2]

print(step_slice)

[2 4 6 8]


Your move. Try different step and limit combinations (including negatives).

In [None]:
my_step_slice = #insert code here

print(my_step_slice)

#### Exercise 6 - Data Types

We've only worked with integers so far, but ndarrays are compatible with other types of data. Try to create an array out of a list of strings.

In [None]:
str_array = #insert code here

print(str_array)

Did it work? Now try a list of floats.

In [None]:
flt_array = #insert code here

print(flt_array)

You can also convert arrays to a different data type (as long as the conversion is possible). Check the following conversion of an integer array to a boolean.

In [36]:
int_array = np.array([0, 1, 0, 0])

bool_array = int_array.astype('bool')

print(bool_array)

[False  True False False]


Now convert the float array you created into an integer array

In [None]:
my_int_array = #insert code here

print(my_int_array)

#### Exercise 7 - Join and Split Arrays

If we wish to join the contents of two or more arrays in a single array, we can use the concatenate method.

In [5]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

arr = np.concatenate((arr1, arr2))
print(arr)

[1 2 3 4 5 6]


Join the contents of the following 4 arrays.

In [None]:
arr1 = np.array([5, 23, 4, 99])
arr2 = np.array([98, 3, 15])
arr3 = np.array([70, 20, 12])
arr4 = np.array([1])

final_arr = # insert code here
print(final_arr)

Now let's split the array you created.

In [None]:
split_results = np.array_split(final_arr, 2)

print(split_results)

Your turn. Split final_arr into 5 arrays.

In [None]:
splitted_arr = # insert code here

print(splitted_results)

#### Exercise 8 - Search Arrays

In data analysis tasks, it may be useful to search for the instances that verify a certain condition within an array. For this purpose, NumPy offers the method where, which does exacly that. See the next example.

In [16]:
arr = np.array([4, 93, 83, 94, 12])

print(np.where(arr==12))

(array([4]),)


As you can see, the algorithm correctly situated the value "12" in the fourth position of the array.
You can use any type of condition for your search. For certain tasks, it might be useful to situate all instances verifying a certain condition. For instance, considering an array containing the ages of all the patients within a clinical facility, we might want to identify the adults exclusively (+ 18 yo). How would you do that?

In [None]:
ages = np.array([8, 19, 18, 29, 88, 82, 3, 45, 51, 54, 74, 54, 66, 23, 3, 7, 92, 65, 64])

adults_idx = #insert code here

print(adults_idx)

Now keep in mind the previous example. What if you wanted to sort the patients according to their age? NumPy also has a method for that: np.sort. How do you think you can implement it? Complete the following cell adequately to obtain an ordered ages array.

In [None]:
ordered_ages = #insert code here

print(ordered_ages)

Sometimes, we do not want an ordered representation, but the ordered indexes instead, so that we can access them in the original array. For that, we use np.argsort. Give it a try!

In [None]:
sorted_indexes = np.argsort(ages)

print(sorted_indexes)

What do you think happens when you apply np.sort to a string array? Check your guess in the next cell.

In [None]:
string_arr = ["what", "is", "going", "on", "here"]

sorted_str_arr = #insert code here

print(sorted_str_arr)

#### Exercise 9 - Filtering

NumPy is also a powerful tool to filter information. Check the next example.

In [29]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# We want only the odd numbers in the array
odd_numbers = []
for number in arr:
    if number % 2 != 0:
        odd_numbers.append(True)
    else:
        odd_numbers.append(False)

filtered_array = arr[odd_numbers]

print(filtered_array)

[1 3 5 7 9]


You can also use the indexes of the elements to filter arrays. Consider Exercise 8, where you had to search for the adult patients within an array of patients' ages. How can you obtain an array composed exclusively by ages of adult patients using filtering techniques?

In [None]:
adults_idxs = #insert code here

filtered_ages = #insert code here

print(filtered_ages)

You concluded your introductory class on NumPy! Congratulations!

# Pandas (https://pandas.pydata.org/)

In [None]:
pip install pandas

In [None]:
import pandas