# Masterclass Programming for Data science

## Masterclass 4

### Tuesday, 23 May 2023

---

## Agenda for today

**4.0 Recap previous class and homework** 

**4.1 Object-oriented programming & Classes**

**4.2 NumPy & Vectorization**

**4.3 Your own project**

---

# 4.0 Recap

## Topics previous class
**3.1 Data-wrangling I: data acquisition**

**3.2 Data-wrangling II: data transformation**


# 4.1 Object-oriented programming & Classes

Primitive data structures—like numbers, strings, and lists—are designed to represent simple pieces of information, such as the cost of an apple, the name of a poem, or your favorite colors, respectively. What if you want to represent something more complex?

For example, let’s say you want to track employees in an organization. You need to store some basic information about each employee, such as their name, age, position, and the year they started working.

You can store them all in lists, dictionaries or whatever way you can think of, but here is where Classes come in.

### Classes versus instances

Classes are used to create user-defined data structures. Classes define functions called methods, which identify the behaviors and actions that an object created from the class can perform with its data.

In this tutorial, you’ll create a Dog class that stores some information about the characteristics and behaviors that an individual dog can have.

A class is a blueprint for how something should be defined. It doesn’t actually contain any data. The Dog class specifies that a name and an age are necessary for defining a dog, but it doesn’t contain the name or age of any specific dog.

While the class is the blueprint, an instance is an object that is built from a class and contains real data. An instance of the Dog class is not a blueprint anymore. It’s an actual dog with a name, like Miles, who’s four years old.

Put another way, a class is like a form or questionnaire. An instance is like a form that has been filled out with information. Just like many people can fill out the same form with their own unique information, many instances can be created from a single class.

## Defining a Class

Let's start by defining a class directly. A class always starts with the class-keyword, followed by the name of the class and ending it with a semicolon.

In [1]:
class Dog:
    pass

The Dog-class above isn't very spectacular though. Let's give it some more body.

The properties that all Dog objects must have are defined in a method called .__init__(). Every time a new Dog object is created, .__init__() sets the initial state of the object by assigning the values of the object’s properties. That is, .__init__() initializes each new instance of the class.

You can give .__init__() any number of parameters, but the first parameter will always be a variable called self. When a new class instance is created, the instance is automatically passed to the self parameter in .__init__() so that new attributes can be defined on the object.

Let’s update the Dog class with an .__init__() method that creates .name and .age attributes:

In [2]:
class Dog:
    def __init__(self, name, age):
        # instance attributes
        self.name = name
        self.age = age

In the body of .__init__(), there are two statements using the self variable:

1. self.name = name creates an attribute called name and assigns to it the value of the name parameter.
2. self.age = age creates an attribute called age and assigns to it the value of the age parameter.

Attributes created in .__init__() are called instance attributes. An instance attribute’s value is specific to a particular instance of the class. All Dog objects have a name and an age, but the values for the name and age attributes will vary depending on the Dog instance.

On the other hand, class attributes are attributes that have the same value for all class instances. You can define a class attribute by assigning a value to a variable name outside of .__init__().

Let's add a species to the Dog-class.

In [3]:
class Dog:
    # Class attribute
    species = "Dog"

    def __init__(self, name, age):
        self.name = name
        self.age = age

## Instantiating an object

In [4]:
class Dog:
    pass

Creating a new object from a class is called instantiating an object. You can instantiate a new Dog object by typing the name of the class, followed by opening and closing parentheses:

In [5]:
Dog()

<__main__.Dog at 0x183aa9247c0>

You now have a new Dog object at 0x2a163640bc8. This string of letters and numbers is a memory address that indicates where the Dog object is stored in your computer's memory. Note that the address you see on your screen will be different.

Let's see what happens when we instantiate another Dog-object.

In [6]:
Dog()

<__main__.Dog at 0x183aa9247f0>

The new Dog instance is located at a different memory address. That’s because it’s an entirely new instance and is completely unique from the first Dog object that you instantiated.

In [7]:
bo = Dog()
diesel = Dog()

bo == diesel

False

In [8]:
bo = Dog()

type(bo)

__main__.Dog

## Class and Instance attributes

Let's create a new Dog class with a class attribute called .species and two instance attributes called .name and .age:

In [9]:
class Dog:
    # class attribute
    species = "Dog"
    def __init__(self, name, age):
        # instance attributes
        self.name = name
        self.age = age

To instantiate objects of this Dog class, you need to provide values for the name and age. If you don’t, then Python raises a TypeError:

In [10]:
Dog()

TypeError: __init__() missing 2 required positional arguments: 'name' and 'age'

To pass arguments to the name and age parameters, put values into the parentheses after the class name:

In [11]:
bo = Dog(name = 'Bo', age = 6)
diesel = Dog('Diesel', 11)

To get the values of the attributes, just type a dot after the instance with the attribute name after the dot. Just like below:

In [12]:
print(bo.name)
print(bo.age)

print(diesel.name)
print(diesel.age)

Bo
6
Diesel
11


You can access the class attribute the same way:

In [13]:
bo.species

'Dog'

The attribute of an created object can be changed later on as well. Let's see how that works below.

In [14]:
bo.age = 1
bo.age

1

In [15]:
bo.species = 'Cat'
bo.species

'Cat'

Suddenly, the 6-year old dog Bo has become a 1-year old kitten. In this case it sounds strange, but in Python this is possible.

The most important thing to remember here is that custom objects are mutable by default. An object is mutable if it can be altered dynamically.

## Instance methods

Instance methods are functions that are defined inside a class and can only be called from an instance of that class. Just like .__init__(), an instance method’s first parameter is always self.

In [17]:
class Dog:
    species = "Dog"

    def __init__ (self, name, age):
        self.name = name
        self.age = age

    # Instance method
    def description (self):
        return f"{self.name} is {self.age} years old."

    # Another instance method
    def speak (self, sound):
        return f"{self.name} says {sound}."
    
    def bark (self):
        return f"Woof!"

This Dog class has 3 instance methods:
1. .description() returns a string displaying the name and age of the dog.
2. .speak() has one parameter called sound and returns a string containing the dog’s name and the sound the dog makes.
3. .bark() returns the default barking sound of a dog.

In [18]:
diesel = Dog('Diesel', 8)

In [19]:
diesel.description

<bound method Dog.description of <__main__.Dog object at 0x00000183ABBA6BE0>>

In [20]:
diesel.description()

'Diesel is 8 years old.'

As you can see in the output above, we need the parentheses when we call an instance method. Only the name of the instance method is not enough.

In [21]:
diesel.speak('woof')

'Diesel says woof.'

In [22]:
diesel.bark()

'Woof!'

Rembember the message when you run the line below? It is a cryptic looking message telling you that diesel is a Dog object at the memory address after main.Dog at. This message isn’t very helpful. But you can change what gets printed by defining a special instance method called .str().

In [23]:
print(diesel)

<__main__.Dog object at 0x00000183ABBA6BE0>


In [24]:
class Dog:
    species = "Dog"

    def __init__ (self, name, age):
        self.name = name
        self.age = age

    # Instance method
    def __str__ (self):
        return f"{self.name} is {self.age} years old."

    # Another instance method
    def speak (self, sound):
        return f"{self.name} says {sound}."
    
    def bark (self):
        return f"Woof!"

In [25]:
diesel = Dog('Diesel', 8)
print(diesel)

Diesel is 8 years old.


Methods like ```.__init__()``` and ```.__str__()``` are called dunder methods because they begin and end with double underscores. There are many dunder methods that you can use to customize classes in Python. These methods are an important part of object-oriented programming in Python, so dive into them if you want to master it!

<div class="alert alert-success">

<h3>Exercise 1</h3>

We have introduced Classes with some example code and explanation, but now it's time to write your own Class.

Let's create a Car-class with at least 3 instance attributes: brand, color and mileage. In addition, write at least two instance methods: sound and a description (let's try the dunder method here).

</div>

<div class="alert alert-success">

<h3>Exercise 2</h3>

Let's create an Elevator-class with the attributes: num_floors, position, num_passengers and direction (-1 for going down, 0 for standing still, 1 for going up). Create methods to create Elevator instances, a method that returns whether an elevator is available, and a method that calls an elevator. When someone calls an elevator, check whether it is available. If so, change its position to the floor on which it is called. If the elevator is not available, return "Try another elevator".

Finally, create an elevator instance and check whether it's available. Then call the elevator and print its position and its number of passengers, after it has been called.

We will build on this class in the exercises at the end of this notebook!

</div>

[0, 0]
[7, 4]


## Inheritance from other Classes

Inheritance is the process by which one class takes on the attributes and methods of another. Newly formed classes are called child classes, and the classes that child classes are derived from are called parent classes.

Remember that we already made instances of the class Dog(), called Bo and Diesel? They are both dogs, but they are from a different breed. And different breeds make different bark sounds. We already had an instance method bark that returned the 'woof'-sound. But this can be different for every breed. Let's see how we can make this work with inheritance.

In [26]:
class Dog:
    species = "Dog"

    def __init__(self, name, age, breed):
        self.name = name
        self.age = age
        self.breed = breed
        
    def bark (self):
        return f"Woof!"

Probably your first thought to differentiate the breeds is to add an additional attribute breed. That's indeed an option to differ the breed, but does not change a thing about the bark method as everyt dog will still Woof!

In [None]:
bo = Dog('Bo', 8, 'Labrador')
bo.bark()

But we are doing things differently now. Start by creating your parent class first, which can be inherited by the children-classes we make afterwards.

In [2]:
class Dog:
    species = "Dog"

    def __init__ (self, name, age):
        self.name = name
        self.age = age

    # Instance method
    def __str__ (self):
        return f"{self.name} is {self.age} years old."

    # Another instance method
    def speak (self, sound):
        return f"{self.name} says {sound}."
    
    def bark (self):
        return f"Woof!"

We can create a child clas as below. Just create a class like we learned to, but add the parent class you want to inherit from between the parentheses.

In [28]:
class Labrador (Dog):
    pass

In [29]:
bo = Labrador('Bo', 8)

In [30]:
type(bo)

__main__.Labrador

As we can see above, we can call the Labrador-class exactly like the parent Dog-class. Now Bo is from class Labrador instead of just a general Dog.

We also have a builtin-method that can check if bo is an instance of Labrador. Look at the code below:

In [31]:
isinstance(bo, Labrador)

True

And is Bo also an instance of Dog? Yes, of course.

In [32]:
isinstance(bo, Dog)

True

But what happens when we create a child class Bulldog and let Diesel be a Bulldog. Is Diesel also an instance of a Labrador?

In [33]:
class Bulldog (Dog):
    pass

In [34]:
diesel = Bulldog('Diesel', '6')

In [35]:
isinstance(diesel, Labrador)

False

<div class="alert alert-success">

<h3>Exercise 3</h3>
    
As you can see above, diesel is no instance of Labrador. However, Labrador and Bulldog are both instances of Dog,
But how can we let Diesel be an instance of both Dog and Bulldog, as well as an instance of Labrador? Create the class Bulldog in such a way that each Bulldog is a Labrador.

</div>



In [None]:
#Enter your code here











## Extending functionality of parent Class

Since different breeds of dogs have slightly different barks, you want to provide a default value for the sound argument of their respective .bark() methods. To do this, you need to override .bark() in the class definition for each breed.

To override a method defined on the parent class, you define a method with the same name on the child class. Here’s what that looks like for the Bulldog class:

In [3]:
class Bulldog(Dog):
    def bark(self, sound="Ruff"):
        return f"{sound}"

In [4]:
diesel = Bulldog('Diesel', 8)

In [5]:
diesel.bark()

'Ruff'

In [6]:
diesel.bark('Woof')

'Woof'

One thing to remember about inheritance is that when you change the bark-method in the Dog-class at this moment, the bark method of Bulldog won't change. That's because we have overwritten the bark-method in the Bulldog-class. Let's look at the code below.

In [7]:
class Dog:
    species = "Dog"

    def __init__ (self, name, age):
        self.name = name
        self.age = age

    # Instance method
    def __str__ (self):
        return f"{self.name} is {self.age} years old."

    # Another instance method
    def speak (self, sound):
        return f"{self.name} says {sound}."
    
    def bark (self, sound):
        return f"{self.name} says: {sound}!"

In [8]:
diesel = Bulldog('Diesel', 8)

In [9]:
diesel.bark()

'Ruff'

However, there is a way to access the parent class from inside a method of a child class by using super():

In [10]:
class Bulldog(Dog):
    def bark(self, sound="Ruff"):
        return super().bark(sound)

In [11]:
diesel = Bulldog('Diesel', 8)

In [12]:
diesel.bark('Woof')

'Diesel says: Woof!'

<div class="alert alert-success">

<h3>Exercise 4</h3>
    
Let's see if you understand the logic of inheritance. Let's continue with your earlier created Car-class and create two children classes (Electric and Non-Electric) who both make a different sound.

</div>

---

# 4.2 NumPy & Vectorization

Source: https://numpy.org/devdocs/user/quickstart.html

In [28]:
import numpy as np

### Creation of arrays

There are several ways to create arrays. For example, you can create an array from a regular Python list or tuple using the array function. The type of the resulting array is deduced from the type of the elements in the sequences.

One option is to give a Python list as an argument.

In [29]:
arr = np.array([1, 2, 3, 4, 5])
arr

array([1, 2, 3, 4, 5])

Another option is to add one or more tuples as an argument.

In [30]:
arr2 = np.array([(1.8, 6, 2.4), (2, 3.1, 7)])
arr2

array([[1.8, 6. , 2.4],
       [2. , 3.1, 7. ]])

And not like this:

In [31]:
arr = np.array(1, 2, 3, 4, 5)

TypeError: array() takes from 1 to 2 positional arguments but 5 were given

When creating an array, you can also define of which data type the values should be.

In [32]:
arr3 = np.array([(1.8, 6, 2.4), (2, 3.1, 7)], dtype=float)
arr3

array([[1.8, 6. , 2.4],
       [2. , 3.1, 7. ]])

In [33]:
arr4 = np.array([(1.8, 6, 2.4), (2, 3.1, 7)], dtype=int)
arr4

array([[1, 6, 2],
       [2, 3, 7]])

Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to create arrays with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation.

The function zeros creates an array full of zeros, the function ones creates an array full of ones, and the function empty creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is float64, but it can be specified via the key word argument dtype.

In [34]:
np.zeros((3, 4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [35]:
np.ones((3, 4, 5), dtype = int)

array([[[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]],

       [[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]],

       [[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]]])

In [36]:
np.empty((3, 4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

If you have a case where you have an existing array and want to create a new array with the same size but filled with zeros or ones, the .zeros_like() and .ones_like()-methods are your friend. Let's see how that works.

In [37]:
arr5 = np.array([(1.8, 6, 2.4), (2, 3.1, 7)], dtype=float)
arr5

array([[1.8, 6. , 2.4],
       [2. , 3.1, 7. ]])

In [38]:
np.zeros_like(arr5)

array([[0., 0., 0.],
       [0., 0., 0.]])

In [39]:
np.ones_like(arr5)

array([[1., 1., 1.],
       [1., 1., 1.]])

To create sequences of numbers, NumPy provides the arange function which is analogous to the Python built-in range, but returns an array.

In [40]:
# For example with ints
np.arange(0, 21, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

In [41]:
# But also possible with floats
np.arange(0, 22, 1.5)

array([ 0. ,  1.5,  3. ,  4.5,  6. ,  7.5,  9. , 10.5, 12. , 13.5, 15. ,
       16.5, 18. , 19.5, 21. ])

It is also possible to create an array with x-amount of values between value a and b. Here, the np.linspace()-method comes in.

In [42]:
np.linspace(0, 1, 10)

array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

It is also possible to create arrays with random values. The random-method is very useful for this.

In [45]:
np.random.randint(10, size=(5, 4))

array([[8, 5, 9, 4],
       [1, 0, 3, 0],
       [9, 7, 2, 6],
       [2, 4, 2, 9],
       [5, 7, 2, 3]])

### Dimensions

In [46]:
arr1 = np.array([1, 2, 3, 4, 5])

In [47]:
arr2 = np.array([[1, 2, 3], [4, 5, 6]])

In [48]:
arr3 = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

In [49]:
print(f'Array 1 has {arr1.ndim} dimension(s).')
print(f'Array 2 has {arr2.ndim} dimension(s).')
print(f'Array 3 has {arr3.ndim} dimension(s).')

Array 1 has 1 dimension(s).
Array 2 has 2 dimension(s).
Array 3 has 3 dimension(s).


In [50]:
arrx = np.array([1, 2, 3, 4], ndmin=6)

In [51]:
arrx.ndim

6

In [52]:
print(arrx)

[[[[[[1 2 3 4]]]]]]


### Basic operations

Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.

In [53]:
a = np.array([20, 30, 40, 50])

In [54]:
b = np.arange(4)
b

array([0, 1, 2, 3])

In [55]:
c = a - b
c

array([20, 29, 38, 47])

In [56]:
b**2

array([0, 1, 4, 9])

In [57]:
10 * np.sin(a)

array([ 9.12945251, -9.88031624,  7.4511316 , -2.62374854])

In [58]:
a < 35

array([ True,  True, False, False])

Unlike in many matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product can be performed using the @ operator (in python >=3.5) or the dot function or method:

In [59]:
a = np.array([[1, 1],
              [0, 1]])
b = np.array([[2, 0],
              [3, 4]])

In [60]:
a * b     # elementwise product

array([[2, 0],
       [0, 4]])

In [61]:
a @ b     # matrix product (see explanation below)

array([[5, 4],
       [3, 4]])


<img src="https://www.codingem.com/wp-content/uploads/2022/02/matrix-multiplication-2.png" alt="NumPy multiply" style="width: 500px;" ></img>

You can also add NumPy arrays with a short notation. Do you recognise this from earlier in the course?

In [62]:
b += a
b

array([[3, 1],
       [3, 5]])

### Broadcasting

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.

In [63]:
a = np.array([1, 2, 3])
b = np.array([2, 2, 2])
c = 2

In [64]:
a*b

array([2, 4, 6])

In [65]:
a*c

array([2, 4, 6])

As we can see, both operations end up with the same result. That's because in the latter case the value is stretched.

![image.png](attachment:image.png)

**Broadcasting rules**


When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when

1. they are equal, or
2. one of them is 1

If these conditions are not met, a ValueError: operands could not be broadcast together exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the size that is not 1 along each axis of the inputs.

In [66]:
a = np.array([[ 0,  0,  0],
           [10, 10, 10],
           [20, 20, 20],
           [30, 30, 30]])
b = np.array([1, 2, 3])
a + b

array([[ 1,  2,  3],
       [11, 12, 13],
       [21, 22, 23],
       [31, 32, 33]])

A one dimensional array added to a two dimensional array results in broadcasting if number of 1-d array elements matches the number of 2-d array columns

![image.png](attachment:image.png)

In [67]:
b = np.array([1, 2, 3, 4])
a + b

ValueError: operands could not be broadcast together with shapes (4,3) (4,) 

When the trailing dimensions of the arrays are unequal, broadcasting fails because it is impossible to align the values in the rows of the 1st array with the elements of the 2nd arrays for element-by-element addition.

![image.png](attachment:image.png)

Due to broadcasting, the operations below work.

In [None]:
a = np.random.randint(100, size = (4, 3, 5))
b = np.random.randint(100, size = (4, 1, 5))
a + b

In [None]:
a = np.random.randint(100, size = (4, 1, 5))
b = np.random.randint(100, size = (1, 3, 1))
a + b

In [None]:
a = np.random.randint(100, size = (4, 3, 5))
b = np.random.randint(100, size = (1, 5))
a + b

But these operations do NOT work.

In [None]:
a = np.random.randint(100, size = (4, 3, 5))
b = np.random.randint(100, size = (1, 3, 4))
a + b

In [None]:
a = np.random.randint(100, size = (4, 3, 5))
b = np.random.randint(100, size = (2, 5))
a + b

<div class="alert alert-success">

<h3>Playing with arrays</h3>

* Create multiple arrays with different dimensions and fill it in different ways
* Do some basic operations on your arrays and see if happens what you expected

In [None]:
# ENter your code here












### Methods and universal functions

Many unary operations, such as computing the sum of all the elements in the array, are implemented as methods of the ndarray class.

In [None]:
a = np.array([[1, 1],
              [0, 1]])

b = np.array([[2, 0],
              [3, 4]])

In [None]:
a.sum()

In [None]:
b.mean()

In [None]:
a.min()

In [None]:
b.max()

But if you add the axis parameter you can specify if you want the operation along the columns (0) or rows (1).

In [None]:
a.sum(axis=0)

In [None]:
b.mean(axis=1)

There are a lot of universal functions provided in NumPy. In short these are called **ufunc**.

In [None]:
np.add(a, b)

In [None]:
np.concatenate((a, b))

In [None]:
np.sqrt(a)

In [None]:
np.square(b)

Of course there are a lot more functions out there. Some om them are below:

all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj, corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, invert, lexsort, max, maximum, mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, transpose, var, vdot, vectorize, where

In [14]:
x = [1, 2, 3, 4,2,3,4,5,6,7,8,3,4,5,3,2,3,4,5]
y = [4, 5, 6, 7,3,4,2,3,4,5,3,2,23,4,5,3,2,1,3]
z = []

for i, j in zip(x, y):
  z.append(i * j)
print(z)

[4, 10, 18, 28, 6, 12, 8, 15, 24, 35, 24, 6, 92, 20, 15, 6, 6, 4, 15]


In [15]:
x = np.array(x)
y = np.array(y)
z = np.multiply(x, y)

print(z)


[ 4 10 18 28  6 12  8 15 24 35 24  6 92 20 15  6  6  4 15]


<div class="alert alert-success">

<h3>Getting to know the ufunc</h3>

* Create some new arrays
* Apply some methods on your newly created arrays
* Apply some new universal functions on them

In [None]:
# Enter your code here










### Accessing elements

In [None]:
arr = np.array([1, 2, 3, 4])

In [None]:
arr[1]

In [None]:
arr[0] + arr[-1]

In [None]:
arr2 = np.array([[1,2,3,4,5], [6,7,8,9,10]])

In [None]:
print(f'4th element on 1st row: {arr2[0, 3]}')

In [None]:
print(f'1st element on 2nd row: {arr2[1, 0]}')

In [None]:
arr3 = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

Guess what the result of the code block below is before running the code. Did you get it right? And can you explain it?

In [None]:
arr3[0][1][2]

### Slicing

Just like lists, you can slice arrays the same way.

In [None]:
arr = np.arange(1,11)
arr

In [None]:
arr[1:7]

In [None]:
arr[:2]

In [None]:
arr[8:]

In [None]:
arr[-4:-1]

In [None]:
arr[1:7:2]

Let's see how it works with 2+ dimensional arrays below.

In [None]:
arr2 = np.array([[1,2,3,4,5], [6,7,8,9,10]])

In [None]:
arr2[0, 2:]

In [None]:
arr2[1, :3]

In [None]:
arr2[:, 3]

### Iterating over arrays

In [16]:
arr2 = np.array([[1,2,3,4,5], [6,7,8,9,10]])

You can loop over arrays row by row...

In [17]:
for row in arr2:
    print(row)

[1 2 3 4 5]
[ 6  7  8  9 10]


or element by element.

In [18]:
for element in arr2.flat:
    print(element)

1
2
3
4
5
6
7
8
9
10


And we can use list-comprehensions for them as well.

In [None]:
[x for x in arr2]

In [None]:
[x for x in arr2.flat]

<div class="alert alert-success">

<h3>Access, slice and iterate</h3>

* Create some new arrays with different dimensions
* Define which value to access and try to access it afterwards
* Make some slices of your arrays
* Try to iterate over your arrays. What is your preferred way?

In [None]:
# Enter your code here













### Copies and Views

When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is often a source of confusion for beginners. There are three cases:

**No copy at all**

Simple assignments make no copy of objects or their data.

In [19]:
a = np.random.randint(5, size=(2, 4))
a

array([[4, 3, 3, 0],
       [0, 2, 4, 2]])

Let's assign the array to a new variable.

In [20]:
b = a

And check if they are the same.

In [21]:
b is a

True

As we see above, they are. But what happens when we assign a new value to any existing value in the array.

In [22]:
b[0, 0] = 99

In [23]:
a

array([[99,  3,  3,  0],
       [ 0,  2,  4,  2]])

We assigned the new value to the b-array, but when we call the a-array the value is changed as well. That's because Python passes a reference to the new variable, but makes no copy.

**View or shallow copy**

Different array objects can share the same data. The .view() method creates a new array object that looks at the same data.

Let's create a view of the array above and check if they are the same.

In [24]:
c = a.view()
c is a

False

As we can see, they are not the same. c is a view of a and only shares the same data.

With the .base attribute we can check which array the origin of the view is.

In [25]:
c.base

array([[99,  3,  3,  0],
       [ 0,  2,  4,  2]])

In [26]:
c.base is a

True

We now know for sure that a and c are not the same. Let's check what happens when we reshape the array.

In [27]:
c = c.reshape((4, 2))
a

array([[99,  3,  3,  0],
       [ 0,  2,  4,  2]])

In [28]:
c

array([[99,  3],
       [ 3,  0],
       [ 0,  2],
       [ 4,  2]])

The shape of c changed from (2,4) to (4,2), but the shape from a stayed the same.

Now we know that a and c are not the same, and that the shapes are different. But what happens when we change the data of one of both?

In [29]:
c[0, 1] = 123 
a

array([[ 99, 123,   3,   0],
       [  0,   2,   4,   2]])

In [30]:
a[1,3] = 789
c

array([[ 99, 123],
       [  3,   0],
       [  0,   2],
       [  4, 789]])

As we can see above, if we change the data of either a or c the data of both changes. That's because they share the same data.

**Deep copy**

The .copy()-method makes a complete copy of the array and its data. Let's try that below.

In [None]:
d = a.copy()
d is a

In [None]:
d.base is a

In [None]:
d[0, 0] = 9999
d

In [None]:
a

As we can see in the results above, d and a are not the same. In addition, a is also not the base of d and when we change data from d it does not change data from a.

Let's now check for all of the four arrays above what happens when we change data from a. Did you expect the outcome?

In [None]:
a[0, 0] = 1

In [None]:
a

In [None]:
b

In [None]:
c

In [None]:
d

### Vectorization

Vectorization is a technique where array operations are used. Instead, we use functions defined by various modules which are highly optimized that reduces the running and execution time of code. Vectorized array operations will be faster than their pure Python equivalents, with the biggest impact in any kind of numerical computations.

Python for-loops are slower than their C/C++ counterpart. Python is an interpreted language and most of the implementation is slow. The main reason for this slow computation comes down to the dynamic nature of Python and the lack of compiler level optimizations which incur memory overheads. NumPy being a C implementation of arrays in Python provides vectorized actions on NumPy arrays.

In [36]:
import numpy as np
from timeit import Timer
 
# Creating a large array of size 10**5
array = np.random.randint(100, size=10**5)
 
def sum_with_forloop():
  sum_array=0
  for element in array:
    sum_array += element
 
def sum_with_builtin_method():
  sum_array = sum(array)
 
def sum_with_numpy():
  sum_array = np.sum(array)
 
time_forloop = Timer(sum_with_forloop).timeit(1)
time_builtin = Timer(sum_with_builtin_method).timeit(1)
time_numpy = Timer(sum_with_numpy).timeit(1)
 
print("Summing elements takes %0.9f units using for loop"%time_forloop)
print("Summing elements takes %0.9f units using builtin method"%time_builtin)
print("Summing elements takes %0.9f units using numpy"%time_numpy)

Summing elements takes 0.012368100 units using for loop
Summing elements takes 0.008480300 units using builtin method
Summing elements takes 0.000104400 units using numpy


In [37]:
time_forloop / time_numpy

118.46839080701812

In [38]:
time_builtin / time_numpy

81.22892720499635

This means that for such an operation, using vectorization in numpy is more than 100 times faster than the builtin-method sum() and more than 200 times faster than a for loop.

### NumPy in Pandas

In [1]:
import pandas as pd

In [None]:
df_new = pd.DataFrame()

In [None]:
df_new['first_column'] = np.random.randint(1000, size = (10**3))
df_new['second_column'] = np.random.randint(1000, size = (10**3))
df_new.shape

In [None]:
df_new['sum_column'] = np.add(df_new.first_column, df_new.second_column)

In [None]:
df_new

<div class="alert alert-success">

# Selfstudy 
*Exercises from Think Python, picked from H10-12, Allen B. Downey*

### H10, Exercise 3  
Write a function called middle that takes a list and returns a new list that contains all but the first and last elements. For example:
```python
>>> t = [1, 2, 3, 4]
>>> middle(t)
>>> [2, 3]
 ```
    
### H10, Exercise 4  
Write a function called chop that takes a list, modifies it by removing the first and last elements, and returns None. For example:
```python
>>> t = [1, 2, 3, 4]
>>> chop(t)
>>> t
[2, 3]
```
    
### H10, Exercise 5   
Write a function called is_sorted that takes a list as a parameter and returns True if the list is sorted in ascending order and False otherwise.

### H10, Exercise 7  
Write a function called has_duplicates that takes a list and returns True if there is any element that appears more than once. It should not modify the original list.

### H11, Exercise 1  
Write a function that reads the words in _Class2_Words.txt_ (created during class 3) and stores them as keys in a dictionary. It doesn’t matter what the values are. Then you can use the in operator as a fast way to check whether a string is in the dictionary.

### H12, Exercise 1  
Write a function called most_frequent that takes a string and prints the letters in decreasing order of frequency. 

Optional: Find text samples from several different languages and see how letter frequency varies between languages. Compare your results with the tables at (http://en.wikipedia.org/wiki/Letter_frequencies).

### More on classes 
In this exercise we will expand our elevator class.

* Create a class named 'Building', containing an 'init' method and an attribute 'elevators'. This is an empty list in which we will store an overview of all elevators in the building.
* Create a method in 'Building' to add an elevator to this building, by appending the list of elevators.
* Create a method in 'Building' that finds all available elevators. Return a list that contains all elevators that have no passengers.
* Add 5 elevators to a building instance. Test your class by calling 2 elevators and finding which elevators are still available

[<__main__.Elevator object at 0x000001223AA85790>, <__main__.Elevator object at 0x000001223AA85A60>, <__main__.Elevator object at 0x000001223AA85820>]


<div class="alert alert-success">
    
# And...
* Continue with your project and apply your newly gained knowledge