# Intro to Numpy

_This is drawn from multiple online tutorials_

Numpy is the fundamental package for numeric computing with Python. It provides powerful ways to create, store, and/or manipulate data, which makes it able to seamlessly and speedily integrate with a wide variety of databases. This is also the foundation that Pandas is built on, which we will look at later.

This notebook will show how to create an array with certain data types, manipulate that array, select elements from arrays, and perform basic operations.

In [None]:
# Useful imports
import numpy as np
import math

# Array Creation

Arrays are displayed as a list or list of lists and can be created through list as well. When creating an
array, we pass in a list as an argument in numpy array

In [None]:

a = np.array([1, 2, 3])
print(a)
# We can print the number of dimensions of a list using the ndim attribute
print(a.ndim)

If we pass in a list of lists in numpy array, we create a multi-dimensional array, for instance, a matrix

In [None]:

b = np.array([[1,2,3],[4,5,6]])
b

We can print out the length of each dimension by calling the shape attribute, which returns a tuple

In [None]:

b.shape

We can also check the type of items in the array

In [None]:

a.dtype

Besides integers, floats are also accepted in numpy arrays

In [None]:

c = np.array([2.2, 5, 1.1])
c.dtype.name

In [None]:
# Let's look at the data in our array
c

Note that numpy automatically converts integers, like 5, up to floats, since there is no loss of prescision.
Numpy will try and give you the best data type format possible to keep your data types homogeneous, which
means all the same, in the array

Sometimes we know the shape of an array that we want to create, but not what we want to be in it. numpy
offers several functions to create arrays with initial placeholders, such as zero's or one's.
Lets create two arrays, both the same shape but with different filler values

In [None]:

d = np.zeros((2,3))
print(d)

e = np.ones((2,3))
print(e)

We can also generate an array with random numbers

In [None]:

np.random.rand(2,3)

You'll see zeros, ones, and rand used quite often to create example arrays, especially in stack overflow
posts and other forums.

We can also create a sequence of numbers in an array with the arrange() function. The fist argument is the
starting bound and the second argument is the ending bound, and the third argument is the difference between
each consecutive numbers

Let's create an array of every even number from ten (inclusive) to fifty (exclusive)

In [None]:

f = np.arange(10, 50, 2)
f

if we want to generate a sequence of floats, we can use the linspace() function. In this function the third
argument isn't the difference between two numbers, but the total number of items you want to generate

In [None]:

np.linspace( 0, 2, 15 ) # 15 numbers from 0 (inclusive) to 2 (inclusive)

# Array Operations

We can do many things on arrays, such as mathematical manipulation (addition, subtraction, square,
exponents) as well as use boolean arrays, which are binary values. We can also do matrix manipulation such
as product, transpose, inverse, and so forth.

In [None]:
# Arithmetic operators on arrays apply elementwise.

# Let's create a couple of arrays
a = np.array([10,20,30,40])
b = np.array([1, 2, 3,4])

# Now let's look at a minus b
c = a-b
print(c)

# And let's look at a times b
d = a*b
print(d)

In [None]:

fahrenheit = np.array([0,-10,-5,-15,0])

# And the formula for conversion is ((°F − 32) × 5/9 = °C)
celsius = (fahrenheit - 31) * (5/9)
celsius

Another useful and important manipulation is the boolean array. We can apply an operator on an array, and a
boolean array will be returned for any element in the original, with True being emitted if it meets the condition and False oetherwise.
For instance, if we want to get a boolean array to check celcius degrees that are greater than -20 degrees

In [None]:

celsius > -20

In [None]:
# Here's another example, we could use the modulus operator to check numbers in an array to see if they are even. Recall that modulus does division but throws away everything but the remainder (decimal) portion)
celsius%2 == 0

Besides elementwise manipulation, it is important to know that numpy supports matrix manipulation. Let's
look at matrix product. if we want to do elementwise product, we use the "*" sign

In [None]:

A = np.array([[1,1],[0,1]])
B = np.array([[2,0],[3,4]])
print(A*B)

# if we want to do matrix product, we use the "@" sign or use the dot function
print(A@B)

You don't have to worry about complex matrix operations for this course, but it's important to know that
numpy is the underpinning of scientific computing libraries in python, and that it is capable of doing both
element-wise operations (the asterix) as well as matrix-level operations (the @ sign).

A few more linear algebra concepts are worth layering in here. You might recall that the product of two
matrices is only plausible when the inner dimensions of the two matrices are the same. The dimensions refer
to the number of elements both horizontally and vertically in the rendered matricies you've seen here. We
can use numpy to quickly see the shape of a matrix:

In [None]:

A.shape

When manipulating arrays of different types, the type of the resulting array will correspond to 
the more general of the two types. This is called upcasting.

In [None]:


# Let's create an array of integers
array1 = np.array([[1, 2, 3], [4, 5, 6]])
print(array1.dtype)

# Now let's create an array of floats
array2 = np.array([[7.1, 8.2, 9.1], [10.4, 11.2, 12.3]])
print(array2.dtype)

Integers (int) are whole numbers only, and Floating point numbers (float) can have a whole number portion
and a decimal portion. The 64 in this example refers to the number of bits that the operating system is
reserving to represent the number, which determines the size (or precision) of the numbers that can be
represented.

In [None]:
# Let's do an addition for the two arrays
array3=array1+array2
print(array3)
print(array3.dtype)

Notice how the items in the resulting array have been upcast into floating point numbers

Numpy arrays have many interesting aggregation functions on them, such as  sum(), max(), min(), and mean()

In [None]:

print(array3.sum())
print(array3.max())
print(array3.min())
print(array3.mean())

For two dimensional arrays, we can do the same thing for each row or column
let's create an array with 15 elements, ranging from 1 to 15, 
with a dimension of 3X5

In [None]:

b = np.arange(1,16,1).reshape(3,5)
print(b)

Now, we often think about two dimensional arrays being made up of rows and columns, but you can also think
of these arrays as just a giant ordered list of numbers, and the *shape* of the array, the number of rows
and columns, is just an abstraction that we have for a particular purpose. Actually, this is exactly how
basic images are stored in computer environments.

# Indexing, Slicing and Iterating

Indexing, slicing and iterating are extremely important for data manipulation and analysis because these techinques allow us to select data based on conditions, and copy or update data.

## Indexing

First we are going to look at integer indexing. A one-dimensional array, works in similar ways as a list -
To get an element in a one-dimensional array, we simply use the offset index.

In [None]:

a = np.array([1,3,5,7])
a[2]

In [None]:
# For multidimensional array, we need to use integer array indexing, let's create a new multidimensional array
a = np.array([[1,2], [3, 4], [5, 6]])
a

if we want to select one certain element, we can do so by entering the index, which is comprised of two
integers the first being the row, and the second the column

In [None]:

a[1,1] # remember in python we start at 0!

if we want to get multiple elements 
for example, 1, 4, and 6 and put them into a one-dimensional array
we can enter the indices directly into an array function

In [None]:
np.array([a[0, 0], a[1, 1], a[2, 1]])

we can also do that by using another form of array indexing, which essentiall "zips" the first list and the
second list up

In [None]:

print(a[[0, 1, 2], [0, 1, 1]])

## Boolean Indexing

Boolean indexing allows us to select arbitrary elements based on conditions. For example, in the matrix we
just talked about we want to find elements that are greater than 5 so we set up a conditon a >5 

In [None]:

print(a >5)
# This returns a boolean array showing that if the value at the corresponding index is greater than 5

We can then place this array of booleans like a mask over the original array to return a one-dimensional 
array relating to the true values.

In [None]:
print(a[a>5])

As we will see, this functionality is essential in the pandas toolkit as we will see later.

## Slicing

Slicing is a way to create a sub-array based on the original array. For one-dimensional arrays, slicing 
works in similar ways to a list. To slice, we use the : sign. For instance, if we put :3 in the indexing
brackets, we get elements from index 0 to index 3 (excluding index 3)

In [None]:

a = np.array([0,1,2,3,4,5])
print(a[:3])

In [None]:
# By putting 2:4 in the bracket, we get elements from index 2 to index 4 (excluding index 4)
print(a[2:4])

In [None]:
# For multi-dimensional arrays, it works similarly, lets see an example
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
a

First, if we put one argument in the array, for example a[:2] then we would get all the elements from the 
first (0th) and second row (1th)

In [None]:

a[:2]

If we add another argument to the array, for example a[:2, 1:3], we get the first two rows but then the
second and third column values only

In [None]:
a[:2, 1:3]

So, in multidimensional arrays, the first argument is for selecting rows, and the second argument is for 
selecting columns

It is important to realize that a slice of an array is a view into the same data. This is called passing by
reference. So modifying the sub array will consequently modify the original array

Here we will change the element at position [0, 0], which is 2, to 50, then we can see that the value in the
original array is changed to 50 as well

In [None]:
sub_array = a[:2, 1:3]
print("sub array index [0,0] value before change:", sub_array[0,0])
sub_array[0,0] = 50
print("sub array index [0,0] value after change:", sub_array[0,0])
print("original array index [0,1] value after change:", a[0,1])

So that's a quick tour of numpy, the core scientific computing library in python, which serves as a basis for pandas.