# Section 2.3 | Getting Started with NumPy

## Why Numpy?

NumPy (often pronounced "Num-pie") is a widely used module that is optimized to efficiently execute a variety of mathematical operations. Numpy will be extremely useful for your budding career as an astronomer because of everything it can help you do - read in and manipulate large amounts of data, quickly do simple or complex math, and much much more. While using Numpy, you will be working with  on objects called __"arrays"__. Arrays are similar to lists, with some key differences. Lets go over the differences between arrays and lists


## Lists vs Arrays; Why use arrays instead of lists?
The most important thing about arrays is that they allow you to do math operations on the whole array and/or also individual elements. We will demonstrate this utility later on! Below are some key similarities and differences between arrays and lists.

__Similarities:__

> Both store multiple pieces of information <br>
> You can store different types of information as elements within them (floats, strings, lists, etc) <br>
> Both are indexed and iterated over identically to one another <br>

__Differences__:

> Arrays have to be created using Numpy, whereas lists are part of Python's libraries <br>
> You can do math on numerical elements within arrays, but __not__ lists <br>

You may recall that in [Module 2: Section 1](https://github.com/bueno646/CIERA-HS-Program-2021/blob/master/IDEASpy-Mike-Updates/Module_2/Section_1.ipynb) we likened lists to a row of lockers. Since both store data, but arrays allow you to do math on numerical elements, we will think of arrays as turbo charged lists.

## NumPy Arrays Basics
At the start of a Jupyter notebook, a Python interactive session, or a any Python script, __you must always first import the package as is done below__

#### Importing Numpy

In [1]:
import numpy as np

As with other modules, it's wise to import NumPy under a distinct name, such as np (you'll see this used frequently), to avoid confusing its functions and methods with others from the math module, or with Python's built-in functions.



### Lists vs Array revisited

In section 1.2 of this notebook, we mentioned two differences between arrays and lists. These differences are important to consider when thinking about the numpy's utility. As you do more complex math and have to work with larger amounts of data, Numpy will likely become a familiar tool for you because of the wide array of mathematical operations it can help you perform and how quickly it performs them. 

__Differences__:

> Arrays have to be created using Numpy, whereas lists are part of Python's libraries <br>
> You can do math on numerical elements within arrays, but __not__ lists <br>



Lets take a look at these differences in the code below

#### Arrays have to be created using Numpy
To create a simple one-dimensional array, you can provide as arguments a list object or objects (remember Python lists: x=[1, 2, 3, 4]). The list must be encased in an "np.array" like in the example in the cell below. 

In [None]:
example_array = np.array([1, 2, 3, 4])

#### You can do math on numerical elements within arrays, but __not__ lists
Lets say you have a list of planetary radii (within the solar system and in meters). Lets say you quickly wanted to use those radii to calculate the area for each planet. As you may recall, the area (A) for a sphere of radius R is as follows: A = 4$\pi$ $\times$ $R^{2}$ 

Lets take a look at the code below to see what happens if we try to square the __list__ of planetary radii and then multiply it by 4$\pi$ to get a __list__ of areas. 

In [10]:
#define list of radii and planet names
list_of_radii   = [71492000, 60268000, 25559000, 24764000, 6378000, 6052000, 3396000, 2439000, 1195000]
list_of_planets = ['jupiter', 'saturn', 'uranus', 'neptune', 'earth', 'venus', 'mars', 'mercury', 'pluto']


##### Lets try squaring the radii (before multiplying by 4$\pi$)

In [12]:
# recall that we use ** to raise a number or object to a power
list_of_radii**2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

##### That didn't work
lets try using an __array__ of plantary radii instead. <br>
Pay close attention to the comments in the cell below describing the code

In [2]:
# create array here
array_of_radii   = np.array([71492000, 60268000, 25559000, 24764000,
                             6378000, 6052000, 3396000, 2439000, 1195000])

# square the whole array
radii_squared    = array_of_radii**2

# Print the squared radii
print(radii_squared)
print()

# Lets print the square of the first element in "array_of_radii" to make sure our code works as expected
print(array_of_radii[0]**2)
print(radii_squared[0])

print("if the last two numbers are the same, our code worked as expected")

[5111106064000000 3632231824000000  653262481000000  613255696000000
   40678884000000   36626704000000   11532816000000    5948721000000
    1428025000000]

5111106064000000
5111106064000000
if the last two numbers are the same, our code worked as expected


###### __Note__: You can create an array from an existing list stored in a variable with np.array( ), as done below

In [3]:
array_of_radii_note   = np.array(list_of_radii)

print(array_of_radii_note)
print("below is the same array, though it was made by putting the list itself as done in the previous cell")
print(array_of_radii)

NameError: name 'list_of_radii' is not defined

## Checking the length, shape, or size  of your Array
When we covered lists, we used Python "len" function as follows:

> Example_list = ["pear","apple","cherry"]<br>
> print(len(example_list))

Arrays have a variety of built in tools, called attributes. Array attributes allow you to extract information about the array itself. In the code block below, we will see the attribute needed to get information about the shape of your array - which will tell you the length and size of your array.

In [3]:
example_array = np.array([1, 2, 3, 4])
example_array.shape

(4,)

This array attribute is telling you that the array you created is a one-dimensional array of length 4.

We can see the format (__or "syntax"__) for calling attributes above: Name_of_your_array.attribute

There are two other array attributes we will go over:

> .size - reports the number of elements in an array <br>
> .dtype - reports the data-type of the array’s elements.

Lets look at examples for these in the code cells below!

#### Array Attributes: Size &  Dtype

In [8]:
example_array = np.array([1, 2, 3, 4])
print("The size of this array is",example_array.size)
print()
print("The data type for elements in this array are", example_array.dtype)

The size of this array is 4

The data type for elements in this array are int64


In the example above we see that size attribute told us the number of elements in our array - 4. In line 4, we see that the dtype attribute told us our array is composed of integer numbers. Python will automatically determine the dtype by the numbers contained in the array. 



#### Creating an Array with a specific data type  
You can set the data type manually when you use a "parameter" to define the array as done in the example below. A parameter, similar to an attribute, is a place where you can pass information into your code. In the cell below, we can see that "np.float" when passed into "np.array" will manually change the elements in that array. 

We won't cover all the other optional arguments, parameters, or attributes that arrays can take, but the following links may be helpful if you want to learn more:

> __[Numpy Array Arguments and Paremeters](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html)__ <br>
> __[Numpy Array Attributes](https://numpy.org/doc/stable/reference/arrays.ndarray.html#array-attributes)__ (search for the "array attributes" header)


In [14]:
example_list = np.array([1, 2, 3, 4], np.float)

In the cell below, do the following

> Check the dtype again <br>
> Print out the array in the cell below

What do you think is different from the example under 1.4.0.1?

__your answer here:__



In [None]:
example_list = np.array([1, 2, 3, 4], np.float)

# Your code here





#### Common Mistake when Making an Array 

A common mistake to avoid is calling array with multiple numeric arguments, rather than providing a single list of numbers as an argument:

> example_array = np.array(1, 2, 3, 4)               # __Incorrect__

> example_array = np.array([1, 2, 3, 4])             # __Correct__

__note:__ In the incorrect example, there are __missing square brackets__ to let numpy know 1,2,3,4 is a list [ 1,2,3,4].
In the __correct__ example, there are brackets around 1,2,3,4, which lets numpy know it is a list.

Try out the incorrect way in the cell below and see what happens.

In [9]:
example_array = np.array(1, 2, 3, 4)

TypeError: array() takes from 1 to 2 positional arguments but 4 were given

You can combine multiple lists together to create two-dimensional arrays by enclosing a sequence of sequences inside another set of parentheses, separated by commas, as shown below (everything within the inner set of parentheses is to be interpreted as the actual components of the array):

In [4]:
b = np.array(([1, 2, 3, 4, 5], [6, 7, 8, 9, 10]))

What is the shape of this array? In the cell below, try creating an array containing 4 rows and 2 columns (shape(4,2)).

If you have Python lists defined, you can also convert them into a NumPy array like this:

In [5]:
list1 = [1, 2, 3, 4, 5]
list2 = [6, 7, 8, 9, 10]
b = np.array((list1, list2))

Again, note the additional inner parentheses required when using a __sequence__ of lists.

### Indexing and Slicing

To reference elements of a simple one-dimensional array, you can use syntax just like you used for Python lists:

In [6]:
a = np.array([2, 4, 6, 8])
print(a[2])                # This should print out the third array element (6)


6


When you have two-dimensional arrays, you use one index per axis, separated by a comma:

In [7]:
b = np.array(([1, 2, 3, 4, 5], [6, 7, 8, 9, 10]))
print(b[1,4])          # This should print out the number 10

10


The first index (here, 1) indicates which sequence or vector we are referencing, and the second index (here, 4) indicates which element in that vector. A three-dimensional array would require three indices to access a single element, and so on. In this course, we won't go beyond 2-D arrays, but if you want to learn more about working with N-dimensional arrays, checkout this __[tutorial](https://docs.scipy.org/doc/numpy/user/quickstart.html)__.

What happens when you list just one index when referencing your two-dimensional array? 
Try it out in the cell below:

Array slicing is a bit trickier. The general syntax for a slice is 
> array[start:stop:step, start:stop:step, ...]

depending on the dimensions of the array. Any or all of the values start, stop, and step may be left out (and if step is left out the colon in front of it may also be left out). Again, this syntax works similarly to Python lists, except now we may have more than one axis to reference. You may need a bit of practice to get used to how array slicing works, so here we go.



## Practice

Try out the following in the cell below:

> 1. Create any 4x5 array and assign it to the variable "a"<br>
> 2. Do you know what a[1] will equal? Print to check.<br>
> 3. Knowing the basic syntax for slicing, array[start:stop:step, start:stop:step, ...], can you predict the output of these slices? Take a guess first, then print each of the following to check.<br>
>>a[:, 3]<br>
>>a[1:4, 2:5]<br>
>>a[1:, 2]<br>
>>a[::2, ::-1]<br>

## Functions for Generating Arrays

Sometimes you know you want to create an array of a certain size, but you don't yet know the numbers that will fill it. Hence, NumPy offers several functions to create arrays with initial placeholder content (such as all zeros). Take a look at the output of the two examples below by assiging each line to a variable and printing it in the cell below.

>np.zeros((3,4))    # creates an array of shape (3,4) filled with zeros<br>
>np.ones((5,2))     # creates an array of shape (5,2) filled with ones<br>

Sometimes you'll want to create an array with a sequence of automatically generated numbers. This can be useful for plotting, as well as for many other applications. 

One way to do this is with the very useful NumPy function linspace. Most often, you'll want to provide three arguments (but there are additional arguments you can include): 
>linspace(min, max, num) 

The arguments min and max specify the range of linear space you want to include), and num specifies how many numbers you want to select over that space. You can leave out the third argument, and you'll get the default number of points of 50. The code snippet below shows how it works.

In [8]:
np.linspace(0,10)                      # 50 (default) linearly-spaced numbers from 0 to 10
np.linspace(0,10,200)                  # With 3 arguments, this generates 200 numbers should generated
x = np.linspace(0, 2*np.pi, 100)       # This is helpful, for example, if you wanted to plot the sin( ) function

Note that since we've imported NumPy previously, we can reference the value of pi with np.pi. 

It's very common to need to generate a bunch of random numbers, whether integers or floats. Again, NumPy has functions for that, too. 

> np.random.randint(5,100,20) &nbsp; &nbsp; # 1D array of 20 integers drawn uniformly over range [5,100) (excluding 100)<br>

> np.random.randint(-10,10,(3,5)) &nbsp; &nbsp; # array of shape (3,5) with integers over range [5,100) (excluding 100)<br>

> np.random.random(25) &nbsp; &nbsp; # 1D array of 25 floats drawn uniformly from [0,1) (excluding 1)<br>

> np.random.random((3,5)) &nbsp; &nbsp; # array of shape (3,5) with floats drawn uniformly from [0,1) (excluding 1)<br>

> np.random.uniform(5,20,100)    # array of 100 floats drawn from range [5,20) (excluding 20) <br>

### Basic Array Operations

As a final note, NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called universal functions. Within NumPy, these functions operate elementwise on an array, producing an array as output. Similarly, basic arithmetic operations act elementwise on arrays, such as addition, subtraction, multiplication, division, and exponentiation. Please note that array multiplication in NumPy is not matrix multiplication (aka dot product, but this can be done via the dot function. For two arrays a and b, these operations work as follows:

> np.sin(a)           &nbsp; &nbsp; &nbsp; &nbsp; # elementwise computation of the sin of each element<br>
> a+b                 &nbsp; &nbsp; &nbsp; &nbsp; # elementwise addition of the arrays<br>
> a&ast;&ast;3               &nbsp; &nbsp; &nbsp; &nbsp; # elementwise exponentiation of each number in the array<br>
> a&ast;b                &nbsp; &nbsp; &nbsp; &nbsp; # elementwise multiplication of the elements of the array<br>
> np.dot(a,b)         &nbsp; &nbsp; &nbsp; &nbsp; # matrix multiplication, or dot product, of the arrays<br>
> a.sum()             &nbsp; &nbsp; &nbsp; &nbsp; # returns the sum of all elements of the array<br>
> (a&ast;b).sum()         &nbsp; &nbsp; &nbsp; &nbsp; # in one step, multiply the elements of two arrays, then sum the result<br>
> a.min()             &nbsp; &nbsp; &nbsp; &nbsp; # returns the min value of the array<br>
> a.max()             &nbsp; &nbsp; &nbsp; &nbsp; # returns the max value of the array<br>
> a.argmax()          &nbsp; &nbsp; &nbsp; &nbsp; # returns the index of the max value of the array<br>
> a.std()             &nbsp; &nbsp; &nbsp; &nbsp; # returns the standard deviation of the array of numbers<br>

## Practice

In the cell below, generate an array shape (5,4) fill with random data and perform each of the operations listed above.

With NumPy being as powerful as it is, we recommend that you check out the NumPy __[user guide](https://docs.scipy.org/doc/numpy/user/index.html#user)__ for a more thorough introduction. Google can often help you find what you are looking for too!

## Takeaways

> - There are many different ways to create arrays: np.array, np.zeroes, np.ones, np.linspace, np.random.randint, np.random.random, and several others.<br>
> - Indexing and slicing arrays work very much the same way it does for Python lists.<br>
> - There are tons of useful functions for operating on arrays, including doing element-wise operations, computing sums or other array-wide quantities, and more complex operations, like computing the dot product.<br>