---
# Datatypes

In computer programming, a data type is a classification identifying one of various types that data
can have. 

The most common data type we will see in this class are:

* **Integers** (`int`): Integers are the classic cardinal numbers: ... -3, -2, -1, 0, 1, 2, 3, 4, ...

* **Floating Point** (`float`): Floating Point are numbers with a decimal point: 1.2, 34.98, -67,23354435, ...

  * **Scientific Notation** - Floating point values can also be expressed in scientific notation: `1e3 = 1000`


* **Booleans** (`bool`): Booleans types can only have one of two values: `True` or `False`. In many languages 0 is considered `False`, and any other value is considered `True`.

* **Strings** (`str`): Strings can be composed of one or more characters: ’a’, ’spam’, ’spam spam eggs and spam’. Usually quotes (’) are used to specify a string. For example ’12’ would refer to the string, not the integer.

In [None]:
my_var_a = 1
my_var_b = 2.3e3
my_var_c = True
my_var_d = 'Spam'
my_var_e = '4.5'

### You can use the command `type()` to see what your datatype is

In [None]:
type(my_var_a), type(my_var_b), type(my_var_c), type(my_var_d), type(my_var_e)

### Python is pretty good at figuring out what you need if you use different datatypes

In [None]:
my_var_a + my_var_b

In [None]:
type(my_var_a + my_var_b)

In [None]:
my_var_a + my_var_c    # True = 1

In [None]:
type(my_var_a + my_var_c)

### However, trying to do math with `str` is always problematic

In [None]:
my_var_b + my_var_e

### You can covert datatypes using `float(), str(), int(), bool()`

In [None]:
bool(my_var_b)

In [None]:
str(my_var_b)

In [None]:
int(my_var_b)

### `str()` math is strange ...

In [None]:
str(my_var_b) + my_var_e

---

# `NumPy` (Numerical Python)

## The fundamental package for scientific computing with Python.

### Load the numpy library:

In [None]:
import numpy as np

## To use `numpy` commands, put a `np.` in front of the command

- ${\pi}$ ⭢ `np.pi`
- $\sin({\pi})$ ⭢ `np.sin(np.pi)`
- $\sqrt{2}$ ⭢ `np.sqrt(2)`
- `np.log(), np.exp(), np.mean()`

## Here is a link to all [numpy Math functions](https://docs.scipy.org/doc/numpy/reference/routines.math.html) and [Statistical functions](https://numpy.org/doc/stable/reference/routines.statistics.html).


----
## My most common mistake!

* #### Most computer languages use the caret `^` for exponentiation (e.g. `2^4 = 16`). 
* ### **Python does not do this!** 
* #### Python uses the `**` for exponentiation (e.g. `2 ** 4 = 16`)

In [None]:
2 ** 4

#### In Python the caret `^` is used for the [bitwise XOR operator](https://wiki.python.org/moin/BitwiseOperators). It only works on integers, and is very likely, NOT what you wanted!

In [None]:
2 ^ 4

#### You can also use the numpy `power()` function - `power(x, y) =  x ** y`

In [None]:
np.power(2,4)

### Somewhat related ...

The following three expressions are equivalent ways to calculate: $\large e^{3.8}$

In [None]:
np.e ** 3.8

In [None]:
np.power(np.e, 3.8)

In [None]:
np.exp(3.8)

### Use: `np.exp(3.8)` - it is so much easier to debug!

---
# Python Operator Precedence

### In Python operator expressions are evaluated and the order of precedence. 

### For standard mathematical expressions the order from highest to lowest precedence is:

* Parentheses ()
* Exponentiation **
* Multiplication *
* Division /
* Addition +
* Subtraction - 

### Expressions are evaluated left to right - [Full List of Operator Precedence](https://docs.python.org/3/reference/expressions.html#operator-precedence)

---

## An Example


$$ \Large
\frac{1329}{\sqrt{\mathrm{3.4}}} = 720.75
$$



### First Try - Not what I expected

In [None]:
1329 / 3.4 ** 1/2

#### This is evaluated as: `1329 / (3.4 ** 1) / 2`

* `(3.4 ** 1) = 3.4`
* `1329 / 3.4 = 390.88`
* `390.88 / 2 = 195.44`

### This is probably what you wanted:

* Make the 1/2 fraction evaluate first

In [None]:
1329 / 3.4 ** (1/2)

### Better ways to do this:
 - Use one of these forms - much easier to read and debug

In [None]:
1329 / np.sqrt(3.4)

In [None]:
1329 / np.power(3.4, 1/2)

----

# Arrays - Collections of datatypes

### Our basic array will be the NumPy array

* Each element of the array has a **Value**
* The *position* of each **Value** is called its **Index**

![Image of Index](https://uwashington-astro300.github.io/A300_images/PosIndex_sm.png)

In [None]:
my_array = np.array([7, 4, 8, 5, 7, 3])

In [None]:
my_array

## Indexing

In [None]:
my_array[0]    # The Value at Index = 0

In [None]:
my_array[-1]    # The last Value in the array

![Image of Index](https://uwashington-astro300.github.io/A300_images/NegIndex_sm.png)

## Slices

`x[start:stop:step]`
 
- `start` is the first Index that you want [default = first element]
- `stop`  is the first Index that you **do not** want [default = last element]
- `step`  defines size of `step` and whether you are moving forwards (positive) or backwards (negative) [default = 1]

In [None]:
my_array

In [None]:
my_array[0:4]           # first 4 items

In [None]:
my_array[:4]            # same

In [None]:
my_array[0:4:2]         # first four item, step = 2

In [None]:
my_array[::-1]          # Reverse the array x

## Number of elements in an array

In [None]:
np.size(my_array)

---
# Sorting

- By default sorting will be from small to large
- Can reverse by using the `[::-1]` slice

In [None]:
my_array = np.array([7, 4, 8, 5, 7, 3])

In [None]:
np.sort(my_array)

In [None]:
np.sort(my_array)[::-1]

## A very common task is to find the largest (or smallest) values in an array

 - You can do this easily with a sort and slice
   - The smallest value will always be the first value of a sorted array
   - The largest value will be the last

In [None]:
np.sort(my_array)[0]      # smallest value of an array

In [None]:
np.sort(my_array)[0:3]    # smallest 3 values of an array

In [None]:
np.sort(my_array)[-1]     # largest value of an array

In [None]:
np.sort(my_array)[-3:]    # largest 3 values of an array

### If you only want the largest (or smallest) value you can use `np.min()` and `np.max()`

In [None]:
np.min(my_array)

In [None]:
np.max(my_array)

---
 
 ### Another very common task is to find WHERE the largest (or smallest) value is in an array

 - `np.argmax()` gives you the **index** of the (first) maximum value
 - `np.argmin()` gives you the **index** of the (first) minimum value

In [None]:
my_flux = np.array([1, 1, 2, 5, 8, 6, 2, 2, 1])

In [None]:
np.argmax(my_flux)

In [None]:
my_flux[4]

#### A Better way to do this

In [None]:
my_flux[np.argmax(my_flux)]

In [None]:
my_flux[np.argmin(my_flux)]

#### A pseudo astronomical example:

- The data in `my_flux` represents the measured fluxes at the wavelengths in the array `my_wavelengths`
- Find the wavelength where the flux is at the maximum

In [None]:
my_wavelengths = np.array([10, 15, 20, 25, 30, 35, 40, 45, 50])

In [None]:
my_wavelengths[np.argmax(my_flux)]

---
# Creating Arrays

### Numpy has a wide variety of ways of creating arrays: [Array creation routines](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.array-creation.html)
### The most useful ones are:

-  `np.arange(start, stop, step)`    Return evenly spaced values within a given interval (`step` is optional).
-  `np.linspace(start, stop, number of points)`    Return evenly spaced numbers over a specified interval.
-  `np.logspace(start, stop, number of points, base)`    Return numbers spaced evenly on a log scale.

In [None]:
array_two = np.arange(10,20)

array_two

In [None]:
array_three = np.linspace(10,20,7)  # 7 points equally spaced between 10 and 20

array_three

In [None]:
array_four = np.logspace(1,2,5,10)  # 5 points equally log-spaced between 10**1 and 10**2

array_four

---
# Assignment vs. Comparison

* Assignment: The variable named on the left should now refer to the value on the right
* Comparison: Compare the variable/value on the left with the variable/value on the left and return a `True` or `False`

### Assignments

In [None]:
my_x = 10
my_y = 20
my_z = my_x + my_y

In [None]:
my_x, my_y, my_z

### Comparisons

In [None]:
my_x == my_y

In [None]:
my_x <= my_z

---

# Creating subsets of data

## Masking - `array[comparison]`

In [None]:
my_array

#### A subset of `my_array` with values greater than 5

In [None]:
my_array[my_array > 5]

#### A subset of `my_array` with values greater than or equal to 5

In [None]:
my_array[my_array >= 5]

#### A subset of `my_array` with values greater than 3 AND less than 6 - (`&`)

In [None]:
my_array[(my_array > 3) & (my_array < 6)]

#### A subset of `my_array` with values less than 4 OR greater than 6 - (`|`)

In [None]:
my_array[(my_array < 4) | (my_array > 6)]

### All together now - Fraction

- Find the fraction of values of `my_arrray` larger than 5

In [None]:
np.size(my_array[my_array > 5]) / np.size(my_array)

---

# Lists vs. Arrays

 - Python's `List` in another way to collect datatypes.
 - `Lists` shares many of the same properties as `Arrays`.
 - However, there are some important differences!
 - We will almost always use `Arrays` in this class.

In [None]:
my_list = [1,2,3,4]

In [None]:
type(my_list)

In [None]:
my_array = np.array([1,2,3,4])

In [None]:
type(my_array)

## List and Arrays - Math is different 

In [None]:
my_list * 3

In [None]:
my_array * 3

## Lists can have mulitple datatypes

In [None]:
my_other_list = [1, "one", 1.0, True]

In [None]:
my_other_list

In [None]:
type(my_other_list[0]), type(my_other_list[1]), type(my_other_list[2]), type(my_other_list[3])

## Arrays cannot have multiple datatypes

 - All of the values will be coverted to the "highest" datatype in the array.
 - `str` > `float` > `int` > `boolean`

In [None]:
my_other_array = np.array([1, "one", 1.0, True])

In [None]:
my_other_array

In [None]:
type(my_other_array[0]), type(my_other_array[1]), type(my_other_array[2]), type(my_other_array[3])

In [None]:
my_other_array = np.array([1, 1.0, True])

my_other_array

In [None]:
type(my_other_array[0]), type(my_other_array[1]), type(my_other_array[2])

### The fact that numpy arrays can only have one datatype is the main reason they are so fast