**<font color='red'>Change the name from assignment_lastname to include YOUR last name</font>**. This will make sure that the assignment doesn't get overwritten when you do a git pull in class. 

# Homework 6 - Plotting Practice

## Due: Nov. 5th

Skills: Plotting, reading in data files

Always start with your import statement!

In [None]:
import numpy as np
import pandas as pd #You need this to read in the data file from the paper
import matplotlib.pyplot as plt #You need this to be able to plot!

#This line is key if you want your plots to show up in the Jupyter notebook!
%matplotlib inline

### For your last coding homework, you will be reading in a data file from the [paper you read for part A](https://iopscience.iop.org/article/10.3847/1538-4357/aa7d07/pdf) of homework 5 and working to recreate some of the plots in Figure 1. Please refer back to the notebook we worked on in class with plotting examples!

Read in the data file using the **pandas** module that was used to create the plots in Figure 1:

In [None]:
data = pd.read_csv('../data/terrazas.txt', delimiter='\t', comment='#')
data.head()

### (not so) Quick aside on the **pandas** module (semi-optional)

**pandas** is a widely used python module is used to read and manipulate tabular data, e.g. spreadsheets. We may or may not have covered this in class, but optional lesson notebook 8 is on working with data and pandas. To use pandas, we need to first import it by running ```import pandas as pd```, which was done in the first code cell. 

After importing, we can then use all of the pandas functions by running ```pd.function_name``` replacing *function_name* with the function to use. An important function in pandas is ```pd.read_csv``` which will take the path to the data file (i.e. spreadsheet) given as the first argument, and turn that spreadsheet into a pandas *DataFrame*, which is pandas' version of a spreasheet. 

In the cell above, we run

```data = pd.read_csv('../data/terrazas.txt', delimiter='\t', comment='#')```

Which reads in the galaxy data and turns it into a DataFrame. Let's look at the DataFrame by running

```data.head()```
![image-4.png](attachment:image-4.png)

This displays the first 5 rows of the data, which gives you a sense of what's present.

#### Accessing data using pandas 

To get a single column (e.g. the name column), you can run

```data['Name']```


You can slice the DataFrame with indices similarly to a numpy array. To get a single row (e.g. the first), you can run 

```data.iloc[0]```

You can also get multiple rows using either a start and stop:

```data.iloc[2:6]``` 

which will return rows with indices 2, 3, 4, 5. Or by using an array of bools:

```
method_is_star = data['method'] == 'star'
star_rows = data[method_is_star]
```

Finally, you may notice that the returned slices look something like this:
![image-3.png](attachment:image-3.png)

The type of the object returned is a pd.Series object. I won't say too much more about Series, other than that you can turn a Series into a numpy array by adding ```.value``` to the end of the series:

```
galaxy_masses = data['M_*'] # this is a series
galaxy_masses_array = galaxy_masses.value
```

If you would like to see more examples with pandas, I'd recommend either this [tutorial](https://www.datacamp.com/tutorial/pandas) which has lots of code with explanations.

### Play around with the data here
To check out what your data file looks like, you can execute the cell below to see the first three lines. Play around to try to display the first ten lines, lines 20-30, and the last three lines (*Hint: to display the last line, you would type data.iloc[:-1]* )

To work with individual columns from a data table, you can call them by their names. Execute the cell below to check out only the `'Name'` column from the data table.

In [None]:
data['Name'].values

## Exercises

**1)** Display only the **last** 10 lines of the `'M_BH'` column.

**2)** In the cell below, fill in the correct arguments the `ax.scatter` function to make a scatter plot of the stellar mass `'M_*'` and the star formation rate `'SFR'`. (Note: you can plot pd.Series against each other, or you can use ```.values``` to get the numpy arrays)

In [None]:
fig, ax = plt.subplots(figsize=(7,7))


**3)** Copy your code to create the scatter plot from the previous question and add X and Y axis labels. Increase the font size of your axis labels to size 18.

In [None]:
fig, ax = plt.subplots(figsize=(7,7))


**4)** You can also make scatter plots with the `plot` command. Figure out how to make a scatter plot that looks like the one you made with `scatter` using the `plot` command. I suggest checking out this [plot gallery](https://matplotlib.org/stable/gallery/index.html) from the `matplotlib` creators. If you click a plot, you will see code that shows how the plot was made. There is a nice scatter plot on the top row that was made with the `plot` command. There are a ton of these plot galleries online and it is a great way to figure out how to make a new plot!

In [None]:
fig, ax = plt.subplots(figsize=(7,7))
#Hint: you'll need one more item inside the function call to get points instead of lines. 
#Check out the last item in the function call in the example in the plot gallery linked above


### Adding colors and labels

It can be helpful to color code certain points on a plot to understand where certain types of objects lie on your diagram. You can create masks/boolean arrays to select certain types of points. For example, if I wanted to select all of the galaxies in `data` that have black holes measured with the method `star`, I would type:<br>

In [None]:
star = (data['Method'] == 'star')
print(star)

`star` is a mask/boolean array: an array of `True` and `False` that is the same length as the full `data` array. The value of `star` is `True` where the `Method` column value is `star` and `False` where it is not. You can select rows from the array where `star` is `True` using the following command:

In [None]:
data[star] # Notice how only rows where the Method is star are displayed below

If you just want to plot out the values in the `SFR` column for the galaxies with `Method = star`, you could type:

In [None]:
data[star]['SFR'] # this is a pd.Series, you can add .value to the end to make this a numpy array

To find out the total unique values in an array, you can use the `np.unique` function. Execute the cell below to see the unique values in the `Method` column:

In [None]:
print(np.unique(data['Method']))

**5)** Create a scatter plot with the stellar mass on the X-axis and the black hole mass on the Y-axis (using either the `scatter` or `plot` command, whichever you prefer) where the points are color coded by their measurement method. Make sure to add a label for each set of points and include a legend on your plot. *Make sure to label your axes!*

In [None]:
fig,ax=plt.subplots(figsize=(7,7))

#Fill in the correct arguments to the ax.scatter function call to plot the M_BH by M_*


#Don't forget to include axis labels


**6)** Recreate the scatter plot above, but color code each point by that point's measurement method. Make sure to add a label for each set of points and include a legend on your plot. *Make sure to label your axes!*

There are six measurement methods: star, CO, RM, gas, maser, and star_gas.

**You can choose either a or b**, depending on your comfort level with for loops.

**a)** without for loop

In [None]:
fig,ax=plt.subplots(figsize=(7,7))

# Define six different masking arrays that say whether a particular detection method was used


#Plot each type of point below:
#You will need six ax.scatter calls, one for each measurement type


#Don't forget to include axis labels


#Include the command to display the legend


**b)** with for loop

In [None]:
measurement_methods = ['star', 'CO', 'RM', 'gas', 'maser', 'star_gas'] # list of different methods

fig,ax=plt.subplots(figsize=(7,7))

for method in measurement_methods:
    print(method) # print out string stored in method (you can comment out or delete this line)
    
    # Define a masking array for the measurement method currently set to method
    
    # Use ax.scatter to plot the points for the this method
    
# Add axis labels and legend


**Optional Challenge** (this doesn't count for anything, but it's here if you want to do it):<br>
<br>
Create a scatter plot that looks like the first plot in Figure 1 from the paper. Plot SFR/$M_{*}$ on the Y-axis and $M_{*}$ on the X-axis and color code the points by their value of $M_{BH}$. Include a color bar. Don't worry about error bars or the gray points or background line.<br>
*Hint: check out the example on [this stack overflow page](https://stackoverflow.com/questions/6063876/matplotlib-colorbar-for-scatter)*<br>
*Another hint, when using a colormap, you need to set `vmin` and `vmax` values. A good way to set these is for `vmin` to by the minimum value of the array you're using to color code and `vmax` is the maximum value of that array. You can find the minimum value of an array with `numpy` using the `np.amin(array)` function and find the max with the `np.amax` function.*<br>
*Ok, last hint: if you really want your color map to look like the one in the paper, check out [this page](https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html) from `matplotlib` that shows what all their color maps look like and choose one that looks like the one in the paper. I'm sure this plot was made with python so you should be able to find the exact same one!*