In [None]:
import numpy as np

# The Numpy array vs. the built-in Python list

![texte](https://jakevdp.github.io/PythonDataScienceHandbook/figures/array_vs_list.png)

- With Numpy, you are able to specify the exact format (a.k.a. dtype) of the data.
- This formatted data is stored in one concise place in your memory, *in order*
- Therefore, Numpy has some clear advantages:
    - Your computer doesn't have to search around for the location of every single element
    - Mathematical operations are guaranteed to work in the same way for every element (for example, note that the same mathematical operation might be algorithmically very different for `float` vs `int` dtypes)

# Array creation methods review (with dtypes in mind)

## `np.arange` (arguments: [start=0,] stop, [step=1])

In [None]:
# np.arange creates int arrays by default if the arguments are ints (5 is an int)
a = np.arange(5)
a

In [None]:
# We can easily convert ints to floats and vice versa with the .astype() method
a.astype(float)

In [None]:
# We also could have just specified dtype=float during construction
np.arange(4, 10, 2, dtype=float)

## `np.zeros`, `np.ones` (only positional argument is shape, a.k.a. array size)

In [None]:
# These create floats by default
np.ones(3), np.zeros(4)

In [None]:
np.ones(3, dtype=int), np.zeros(3, dtype=int)

In [None]:
# They are also commonly used for the boolean dtype, where 1=True and 0=False
np.ones(3, dtype=bool), np.zeros(3, dtype=bool)

## `np.linspace`, `np.logspace` (arguments: start, stop, [num=50])

In [None]:
# np.linspace() generates `num` numbers between (and including) `start` and `stop`
# intermediate numbers are spaced evenly in linear-space - dtype is float by default
np.linspace(1, 10, 10)

In [None]:
# np.logspace() generates `num` numbers between (and including) 10^(`start`) and 10^(`stop`)
# intermediate numbers are spaced evenly in log-space - equivalent to 10 ** np.linspace(...)
np.logspace(0, 1, 10)

In [None]:
# While you could construct these with ints or bools, it probably wouldn't make much sense.
# But, you can change the precision of the float by
# specifying dtype = np.float128, np.float64 (default float), np.float32, and np.float16
np.logspace(0, 1, 10, dtype=np.float16)

## `np.array` (argument: any iterable, e.g. a list)

In [None]:
# If all elements of the list are ints, the array dtype will be int
np.array([1, 2, 3])

In [None]:
# If there are any floats, everything will be converted to floats
np.array([1.0, 2, 3])

In [None]:
# If there are any other Python objects, numerical operations might not work
a = np.array([1, "2.0", set()])
b = np.array(["1", "2.0", None])
print(a.dtype)
print(a == b)  # some operations may work, but there is no performance benefit for object arrays

## `np.random.default_rng()` to generate random numbers

First create a random number generator: `rng = np.random.default_rng()`.
- `rng.uniform(low=0, high=1, size=None)`: Uniform distribution over [low, high)
- `rng.normal(loc=0, scale=1, size=None)`: Normal distribution
- `rng.poisson(lam, size=None)`: Poisson distribution
- Many more distributions to choose from
- (and in principle, you can transform `rng.uniform` to any mathematical distribution you like)

In [None]:
# Try changing the bounds and size parameter to generate an array with several elements
rng = np.random.default_rng()
rng.uniform()

In [None]:
# Try changing the loc and scale parameters to change the center/width of the normal distribution
rng.normal()

In [None]:
# Try changing the number of degrees of freedom and size of output array
rng.poisson()

# Array Methods
Like all other data structures in Python, NumPy arrays are objects. These objects have methods associated with them.   
Recall that methods are just like functions but are associated with an object. Whereas functions take an object as an input and return another object as the output, methods act on the object they are associated with and may alter the object itself. Here's an example, the `array.sum()` **method** returns the sum of all the elements of the array. The `np.sum()` **function** takes an array as its input and returns the sum of its elements as outputs. Both do the same thing but are accessed in different ways. A complete lists of methods and attributes associated with any NumPy array can be found [here](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html)

In [None]:
pos_1 = np.array([1, 2, 3])
pos_1.sum()

In [None]:
np.sum(pos_1)

# Element-wise comparisons

Operators like `<`, `>`, `<=`, `>=`, `==`, and `!=` work element-wise over Numpy arrays

In [None]:
a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 4, 3, 2, 1])
print(a < b)
print(a == b)

We can save the output "boolean" arrays of True and False values. These arrays can be useful for "masking" or calculating the fraction of values which are True, for example.

In [None]:
a_less_than_b = a < b
print(a_less_than_b)

In [None]:
# We can mask a and b to only show their values where a is less than b
print(a[a_less_than_b])
print(b[a_less_than_b])

In [None]:
# Since True = 1 and False = 0, we can use sum to find how many values in `a` are less than their `b` counterpart
print("There are", np.sum(a_less_than_b), "values of a less than b")

In [None]:
# We can find the opposite of a boolean array
print(a_less_than_b)
print(np.logical_not(a_less_than_b))

## Calculate the *percentage* of values in `a` that are less than their `b` counterpart

In [None]:
## INSERT CODE HERE ##

# Exercise: Generating random numbers and masking

1. Generate an array of length 100,000 where each element is drawn from a Gaussian distribution with
    - population mean $\mu = 4$ and
    - population scatter $\sigma = 2$
2. What is the fraction of elements whose value is less than zero?
3. Use the matplotlib skills you learned yesterday and plot a histogram of your generated values
    - make all positive values a different color from negative values

In [None]:
## INSERT CODE HERE ##

In [None]:
import matplotlib.pyplot as plt
## INSERT PLOTTING CODE HERE ##

# More mathematical functions

In addition to the elementary operations (`+`, `-`, `*`, `/`), Numpy has a host of built-in mathematical functions which also operate element wise. These functions are generally faster than using loops to calculate the value of the function on each element. Let's see this in action by calculating the $\sin$ of an array using traditional python and also NumPy. We will measure the runtime using the `%%timeit` magic command.

In [None]:
from math import sin

In [None]:
%%timeit

vals = [i for i in range(10000)] # Create a list of 10,000 integers starting with 0
sin_vals = [sin(i) for i in vals] # Use a for loop to calculate the sin of each integers

**NOTE:** The above way of using `for` loop to generate a list is called list comprehension. So this line of code
```python
numbers = [2 * i for i in range(100)]
```
is equivalent to doing
```python
numbers = []
for i in range(100):
    numbers.append(2 * i)
```
Both achieve the same thing, the former is the more "pythonic" way.

In [None]:
%%timeit

vals = np.arange(10000)
sin_vals = np.sin(vals)

As seen above we can use the NumPy `sin` function which takes an array as its input. Almost all common mathematical functions are implemented as native numpy functions. A complete list can be found [here](https://numpy.org/doc/stable/reference/routines.math.html?highlight=mathematical%20functions).
   
We can see a huge improvement in runtime when using NumPy over barbones Python. This is because NumPy functions (aka `ufuncs`) are implemented in other compiled languages like C or Fortran and made available to be used in Python. Whenever possible it is recommended to use NumPy's inbuilt data structures and functions. These implementations are also called **vectorized** functions. Vectorization is a theme we will visit many times in this tutorial.    


**NOTE:** Though they might look very similar Python `lists` are completely different objects than Numpy arrays. `list` is a native Python data type while NumPy arrays are not. Attributes and methods of lists like `.append()` do not work on NumPy arrays and vice versa. Moreover the elements of a list can each be of a different data type, for example
```python
example_list = [2.0, 1, "A sentence"]
```
is a list which has a `float`, an `int` and a `str` as its element. This is not possible in a NumPy array. All the elements need to be of the same datatype.
  
## Plotting $\sin(x)$

Let's try this out by plotting $\sin(x)$ between $0$ and $2\pi$. The first step is to define a NumPy array for x. (*Hint: Google how to define $\pi$ using NumPy.*)

In [None]:
## INSERT CODE HERE ##

Next, we define the array for $\sin(x)$.

In [None]:
## INSERT CODE HERE ##

Finally, use the matplotlib skills you learned yesterday to plot! 

In [None]:
import matplotlib.pyplot as plt
## INSERT CODE HERE ##

# Plotting the galaxy luminosity function

The number density of galaxies having a given luminosity $(\Phi(L))$ is found to follow the functional form
$$ \Phi(L)=\left( \dfrac{\Phi^*}{L^*} \right) \left( \dfrac{L}{L^*} \right)^{\alpha}10^{\left(-\dfrac{L}{L^*}\right)}$$

Assuming the normalization constant $\left( \dfrac{\Phi^*}{L^*} \right)$ to be unity and $\alpha = -1.5, -1.0 \text{ and} -0.5$,  plot $\log(\Phi(L))$ versus $\dfrac{L}{L^*}$. Follow these steps:

Take a look at the documentation for `np.logspace`. Generate the $x$ axis (i.e.$\dfrac{L}{L^*}$ ) as a logarithmically spaced grid of 50 data points between $10^{-2}$ and $10$.

In [None]:
## INSERT CODE HERE ##

Use NumPy functions and binary operations on the above array to generate the y axis (i.e. $\log(\Phi(L))$). The function to take the logarithm with a base 10 in numpy is `np.log10` whereas the function to take the natural logarithm is `np.log`.   

Since we are doing the same operation thrice for different input parameters, it is really helpful to define a function. 

In [None]:
def get_log_phi(lum, alpha):
    """
    Function to calculate the log of luminosity function
    
    Arguments:
    l (array): Values of L/L^star
    alpha (float): Faint end slope parameter
    
    Returns:
    array: The log of the luminosity function
    """
    
    # COMPLETE THESE THREE LINES OF CODE
    phi = 
    log_phi = 

    return log_phi

**NOTE:** The chunk of commented text following the function definition is called a docstring. It is a good practice to always add a doctring whenever you define a function and intend to reuse it.

The following is an example of a very basic docstring. 

```python
def a_generic_function(input1, input2):
    """
    This is what the funtion does
    
    Arguments:
    input1 (data_type): Meaning of the input
    
    
    Returns
    data_type: Meaning of the return value
    
    """
    
    return something
```

There are multiple formatting schemes for doctsrings. The one followed by the Scientific Python community can be found [here](https://docs.scipy.org/doc/numpy/docs/howto_document.html).  

This docstring can be accessed using the usual methods of accessing the documentation for any function imported from a library. Access the docstring for the function we defined in the previous cell:

In [None]:
# COMPLETE THIS LINE OF CODE
get_log_phi

In [None]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10,6))


# COMPLETE THESE THREE LINES OF CODE

ax.plot(, , label=r"$\alpha=-1.5$")  # for alpha = -1.5
ax.plot(, , label=r"$\alpha=-1.0$")  # for alpha = -1.0
ax.plot(, , label=r"$\alpha=-0.5$")  # for alpha = -0.5

ax.set_xlabel(r"$\dfrac{L}{L^{\star}}$", fontsize=20)  # Print the x label in latex
ax.set_ylabel(r"$\log(\Phi(L))$", fontsize=20)  # Print the x label in latex
ax.set_xscale("log")  # Set the spacing in the x axis logarithmically
ax.legend()