# ExoStat Lab 03: Transit and Radial Velocity Methods

**Administrative details:**

- This Lab will be turned in for credit.

- Some questions of this lab are the same as the Practice 04 questions found on the main [YData website](http://ydata123.org/sp19/).  

- Collaborating on the ExoStat Labs is encouraged. If you get stuck for a while on a question, feel free to ask a neighbor or come to the instructor's or TF's office hours for additional help. (Explaining things is beneficial, too -- the best way to solidify your knowledge of a subject is to explain it.) Please don't just share answers, though.

This term we will be using Piazza for class discussion. Find our class page [here](https://piazza.com/yale/spring2019/sds170/home)

You can read more about course policies on our [canvas site](https://canvas.yale.edu).

**Deadline:**

This assignment is due Monday, February 12th at 11:59 P.M. Late work will not be accepted as per the course policies (see the Syllabus and Course policies on [Canvas](https://canvas.yale.edu)).

Directly sharing answers is not okay, but discussing problems with the course staff or with other students is encouraged. Refer to the policies page to learn more about how to learn cooperatively.


#### Today's ExoStat Lab

1.  Practice with applying functions to a column and visualization methods
* [Applying a Function to a Column](https://www.inferentialthinking.com/chapters/08/1/applying-a-function-to-a-column.html)
* [Visualizations](https://www.inferentialthinking.com/chapters/07/visualization.html)

2.  More on the Transit Method

3.  Getting started with the Radial Velocity Method

**Submission:**

Submit your assignment both as a .pdf and .ipynb (Jupyter notebook) in Canvas.  

To produce the .pdf, please do the following in order to preserve the cell structure of the notebook:  
1.  Go to "File" at the top-left of your Jupyter Notebook
2.  Under "Download as", select "HTML (.html)"
3.  After the .html has downloaded, open it and then select "File" and "Print" (note you will not actually be printing)
4.  From the print window, select the option to save as a .pdf

To produce the .ipynb, please do the following:  
1.  Go to "File" at the top-left of your Jupyter Notebook
2.  Under "Download as", select "Notebook (.ipynb)"

In [None]:
import numpy as np
from datascience import *

# These lines set up graphing capabilities.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', FutureWarning)

from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

## 1. Functions and CEO Incomes

In this question, we'll look at the 2015 compensation of CEOs at the 100 largest companies in California. The data was compiled from a [Los Angeles Times analysis](http://spreadsheets.latimes.com/california-ceo-compensation/), and ultimately came from [filings](https://www.sec.gov/answers/proxyhtf.htm) mandated by the SEC from all publicly-traded companies. Two companies have two CEOs, so there are 102 CEOs in the dataset.

We've copied the raw data from the LA Times page into a file called `raw_compensation.csv`. (The page notes that all dollar amounts are in millions of dollars.)

In [None]:
raw_compensation = Table.read_table('raw_compensation.csv')
raw_compensation

**Question 1.** We want to compute the average of the CEOs' pay. Try running the cell below.

In [None]:
np.average(raw_compensation.column("Total Pay"))

You should see an error. Let's examine why this error occurred by looking at the values in the "Total Pay" column. Use the `type` function and set `total_pay_type` to the type of the first value in the "Total Pay" column.

In [None]:
total_pay_type = ...
total_pay_type

**Question 2.** You should have found that the values in "Total Pay" column are strings. It doesn't make sense to take the average of string values, so we need to convert them to numbers if we want to do this. Extract the first value in the "Total Pay" column.  It's Mark Hurd's pay in 2015, in *millions* of dollars.  Call it `mark_hurd_pay_string`.

In [None]:
mark_hurd_pay_string = ...
mark_hurd_pay_string

**Question 3.** Convert `mark_hurd_pay_string` to a number of *dollars*.  The string method `strip` will be useful for removing the dollar sign; it removes a specified character from the start or end of a string.  For example, the value of `"100%".strip("%")` is the string `"100"`.  You'll also need the function `float`, which converts a string that looks like a number to an actual number.  Last, remember that the answer should be in dollars, not millions of dollars.

In [None]:
mark_hurd_pay = ...
mark_hurd_pay

To compute the average pay, we need to do this for every CEO.  But that looks like it would involve copying this code 102 times.

This is where functions come in.  First, we'll define a new function, giving a name to the expression that converts "total pay" strings to numeric values.  Later in these exercises we'll see the payoff: we can call that function on every pay string in the dataset at once.

**Question 4.** Copy the expression you used to compute `mark_hurd_pay` as the `return` expression of the function below, but replace the specific `mark_hurd_pay_string` with the generic `pay_string` name specified in the first line of the `def` statement.

*Hint*: When dealing with functions, you should generally not be referencing any variable outside of the function. Usually, you want to be working with the arguments that are passed into it, such as `pay_string` for this function. If you're using `mark_hurd_pay_string` within your function, you're referencing an outside variable! 

In [None]:
def convert_pay_string_to_number(pay_string):
    """Converts a pay string like '$100' (in millions) to a number of dollars."""
    return ...

Running that cell doesn't convert any particular pay string. Instead, it creates a function called `convert_pay_string_to_number` that can convert any string with the right format to a number representing millions of dollars.

We can call our function just like we call the built-in functions we've seen. It takes one argument, a string, and it returns a number.

In [None]:
convert_pay_string_to_number('$42')

In [None]:
convert_pay_string_to_number(mark_hurd_pay_string)

In [None]:
# We can also compute Safra Catz's pay in the same way:
convert_pay_string_to_number(raw_compensation.where("Name", are.containing("Safra")).column("Total Pay").item(0))

So, what have we gained by defining the `convert_pay_string_to_number` function? 
Well, without it, we'd have to copy that `10**6 * float(pay_string.strip("$"))` code line each time we wanted to convert a pay string.  Now we just call a function whose name says exactly what it's doing.

Soon, we'll see how to apply this function to every pay string in a single expression. First, let's take a brief detour and introduce `interact`.

### Using `interact`

We've included a nifty function called `interact` that allows you to
call a function with different arguments.

To use it, call `interact` with the function you want to interact with as the
first argument, then specify a default value for each argument of the original
function like so:

In [None]:
_ = interact(convert_pay_string_to_number, pay_string='$42')

You can now change the value in the textbox to automatically call
`convert_pay_string_to_number` with the argument you enter in the `pay_string`
textbox. For example, entering in `'$49'` in the textbox will display the result of
running `convert_pay_string_to_number('$49')`. Neat!

Note that we'll never ask you to write the `interact` function calls yourself as
part of a question. However, we'll include it here and there where it's helpful
and you'll probably find it useful to use yourself.

Now, let's continue on and write more functions.

## 2. Defining functions

Let's write a very simple function that converts a proportion to a percentage by multiplying it by 100.  For example, the value of `to_percentage(.5)` should be the number 50.  (No percent sign)

A function definition has a few parts.

##### `def`
It always starts with `def` (short for **def**ine):

    def

##### Name
Next comes the name of the function.  Let's call our function `to_percentage`.
    
    def to_percentage

##### Signature
Next comes something called the *signature* of the function.  This tells Python how many arguments your function should have, and what names you'll use to refer to those arguments in the function's code.  `to_percentage` should take one argument, and we'll call that argument `proportion` since it should be a proportion.

    def to_percentage(proportion)

We put a colon after the signature to tell Python it's over.

    def to_percentage(proportion):

##### Documentation
Functions can do complicated things, so you should write an explanation of what your function does.  For small functions, this is less important, but it's a good habit to learn from the start.  Conventionally, Python functions are documented by writing a triple-quoted string:

    def to_percentage(proportion):
        """Converts a proportion to a percentage."""
    
    
##### Body
Now we start writing code that runs when the function is called.  This is called the *body* of the function.  We can write anything we could write anywhere else.  First let's give a name to the number we multiply a proportion by to get a percentage.

    def to_percentage(proportion):
        """Converts a proportion to a percentage."""
        factor = 100

##### `return`
The special instruction `return` in a function's body tells Python to make the value of the function call equal to whatever comes right after `return`.  We want the value of `to_percentage(.5)` to be the proportion .5 times the factor 100, so we write:

    def to_percentage(proportion):
        """Converts a proportion to a percentage."""
        factor = 100
        return proportion * factor
Note that `return` inside a function gives the function a value, while `print`, which we have used before, is a function which has no `return` value and just prints a certain value out to the console. The two are **very** different. 

**Question 1.** Define `to_percentage` in the cell below.  Call your function to convert the proportion .2 to a percentage.  Name that percentage `twenty_percent`.

In [None]:
def ...
    """ ... """
    ... = ...
    return ...

twenty_percent = ...
twenty_percent

Like the built-in functions, you can use named values as arguments to your function.

**Question 2.** Use `to_percentage` again to convert the proportion named `a_proportion` (defined below) to a percentage called `a_percentage`.

*Note:* You don't need to define `to_percentage` again!  Just like other named things, functions stick around after you define them.

In [None]:
a_proportion = 2**(.5) / 2
a_percentage = ...
a_percentage

Here's something important about functions: the names assigned within a function body are only accessible within the function body. Once the function has returned, those names are gone.  So even though you defined `factor = 100` inside  the body of the `to_percentage` function up above and then called `to_percentage`, you cannot refer to `factor` anywhere except inside the body of `to_percentage`:

In [None]:
# You should see an error when you run this.  (If you don't, you might
# have defined factor somewhere above.)
factor

As we've seen with the built-in functions, functions can also take strings (or arrays, or tables) as arguments, and they can return those things, too.

**Question 3.** Define a function called `disemvowel`.  It should take a single string as its argument.  (You can call that argument whatever you want.)  It should return a copy of that string, but with all the characters that are vowels removed.  (In English, the vowels are the characters "a", "e", "i", "o", and "u".)

*Hint:* To remove all the "a"s from a string, you can use `that_string.replace("a", "")`.  The `.replace` method for strings returns another string, so you can call `replace` multiple times, one after the other. 

In [None]:
def disemvowel(a_string):
    ...
    ...

# An example call to your function.  (It's often helpful to run
# an example call from time to time while you're writing a function,
# to see how it currently works.)
disemvowel("Can you read this without vowels?")

In [None]:
# Alternatively, you can use interact to call your function
_ = interact(disemvowel, a_string='Hello world')

##### Calls on calls on calls
Just as you write a series of lines to build up a complex computation, it's useful to define a series of small functions that build on each other.  Since you can write any code inside a function's body, you can call other functions you've written.

If a function is a like a recipe, defining a function in terms of other functions is like having a recipe for cake telling you to follow another recipe to make the frosting, and another to make the sprinkles.  This makes the cake recipe shorter and clearer, and it avoids having a bunch of duplicated frosting recipes.  It's a foundation of productive programming.

For example, suppose you want to count the number of characters *that aren't vowels* in a piece of text.  One way to do that is this to remove all the vowels and count the size of the remaining string.

**Question 4.** Write a function called `num_non_vowels`.  It should take a string as its argument and return a number.  The number should be the number of characters in the argument string that aren't vowels.

*Hint:* The function `len` takes a string as its argument and returns the number of characters in it.

In [None]:
def num_non_vowels(a_string):
    """The number of characters in a string, minus the vowels."""
    ...

# Try calling your function yourself to make sure the output is what
# you expect. You can also use the interact function if you'd like.

Functions can also encapsulate code that *do things* rather than just compute values.  For example, if you call `print` inside a function, and then call that function, something will get printed.

The `movies_by_year` dataset in the textbook has information about movie sales in recent years.  Suppose you'd like to display the year with the 5th-highest total gross movie sales, printed in a human-readable way.  You might do this:

In [None]:
movies_by_year = Table.read_table("movies_by_year.csv")
rank = 5
fifth_from_top_movie_year = movies_by_year.sort("Total Gross", descending=True).column("Year").item(rank-1)
print("Year number", rank, "for total gross movie sales was:", fifth_from_top_movie_year)

After writing this, you realize you also wanted to print out the 2nd and 3rd-highest years.  Instead of copying your code, you decide to put it in a function.  Since the rank varies, you make that an argument to your function.

**Question 5.** Write a function called `print_kth_top_movie_year`.  It should take a single argument, the rank of the year (like 2, 3, or 5 in the above examples).  It should print out a message like the one above.  It shouldn't have a `return` statement.

In [None]:
def print_kth_top_movie_year(k):
    # Our solution used 2 lines.
    ...
    ...

# Example calls to your function:
print_kth_top_movie_year(2)
print_kth_top_movie_year(3)

In [None]:
# interact also allows you to pass in an array for a function argument. It will
# then present a dropdown menu of options.
_ = interact(print_kth_top_movie_year, k=np.arange(1, 10))

##### Print is not the same as Return
The `print_kth_top_movie_year(k)` function prints the total gross movie sales for the year that was provided! However, since we did not return any value in this function, we can not use it after we call it. Let's look at an example of a function that prints a value but does not return it.

In [None]:
def print_number_five():
    print(5)

In [None]:
print_number_five()

However, if we try to use the output of `print_number_five()`, we see that we get an error when we try to add the number 5 to it!

In [None]:
print_number_five_output = print_number_five()
print_number_five_output + 5

It may seem that `print_number_five()` is returning a value, 5. In reality, it just displays the number 5 to you without giving you the actual value! If your function prints out a value without returning it and you try to use it, you will run into errors so be careful!

## 3. `apply`ing functions

Defining a function is a lot like giving a name to a value with `=`.  In fact, a function is a value just like the number 1 or the text "the"!

For example, we can make a new name for the built-in function `max` if we want:

In [None]:
our_name_for_max = max
our_name_for_max(2, 6)

The old name for `max` is still around:

In [None]:
max(2, 6)

Try just writing `max` or `our_name_for_max` (or the name of any other function) in a cell, and run that cell.  Python will print out a (very brief) description of the function.

In [None]:
max

Why is this useful?  Since functions are just values, it's possible to pass them as arguments to other functions.  Here's a simple but not-so-practical example: we can make an array of functions.

In [None]:
make_array(max, np.average, are.equal_to)

**Question 1.** Make an array containing any 3 other functions you've seen.  Call it `some_functions`.

In [None]:
some_functions = ...
some_functions

Working with functions as values can lead to some funny-looking code.  For example, see if you can figure out why this works:

In [None]:
make_array(max, np.average, are.equal_to).item(0)(4, -2, 7)

Here's a simpler example that's actually useful: the table method `apply`.

`apply` calls a function many times, once on *each* element in a column of a table.  It produces an array of the results.  Here we use `apply` to convert every CEO's pay to a number, using the function you defined:

In [None]:
raw_compensation.apply(convert_pay_string_to_number, "Total Pay")

Here's an illustration of what that did:

<img src="apply.png"/>

Note that we didn't write something like `convert_pay_string_to_number()` or `convert_pay_string_to_number("Total Pay")`.  The job of `apply` is to call the function we give it, so instead of calling `convert_pay_string_to_number` ourselves, we just write its name as an argument to `apply`.

**Question 2.** Using `apply`, make a table that's a copy of `raw_compensation` with one more column called "Total Pay (\$)".  It should be the result of applying `convert_pay_string_to_number` to the "Total Pay" column, as we did above, and creating a new table which is the old one, but with the additional "Total Pay (\$)" column.  Call the new table `compensation`.

In [None]:
compensation = raw_compensation.with_column(
    "Total Pay ($)",
    ...
compensation

Now that we have the pay in numbers, we can compute things about them.

**Question 3.** Compute the average total pay of the CEOs in the dataset.

In [None]:
average_total_pay = ...
average_total_pay

**Question 4.** Companies pay executives in a variety of ways: directly in cash; by granting stock or other "equity" in the company; or with ancillary benefits (like private jets).  Compute the proportion of each CEO's pay that was cash.  (Your answer should be an array of numbers, one for each CEO in the dataset.)

In [None]:
cash_proportion = ...
cash_proportion

Check out the "% Change" column in `compensation`.  It shows the percentage increase in the CEO's pay from the previous year.  For CEOs with no previous year on record, it instead says "(No previous year)".  The values in this column are *strings*, not numbers, so like the "Total Pay" column, it's not usable without a bit of extra work.

Given your current pay and the percentage increase from the previous year, you can compute your previous year's pay.  For example, if your pay is $\$100$ this year, and that's an increase of 50% from the previous year, then your previous year's pay was $\frac{\$100}{1 + \frac{50}{100}}$, or around \$66.66.

**Question 5.** Create a new table called `with_previous_compensation`.  It should be a copy of `compensation`, but with the "(No previous year)" CEOs filtered out, and with an extra column called "2014 Total Pay ($)".  That column should have each CEO's pay in 2014.

*Hint 1:* You can print out your results after each step to make sure you're on the right track.

*Hint 2:* We've provided a structure that you can use to get to the answer. However, if it's confusing, feel free to delete the current structure and approach the problem your own way!

In [None]:
# Definition to turn percent to number
def percent_string_to_num(percent_string):
    return ...

# Compensation table where there is a previous year
having_previous_year = ...

# Get the percent changes as numbers instead of strings
percent_changes = ...

# Calculate the previous years pay
previous_pay = ...

# Put the previous pay column into the compensation table
with_previous_compensation = ...

with_previous_compensation

**Question 6.** What was the average pay of these CEOs in 2014?

In [None]:
average_pay_2014 = ...
average_pay_2014

## 4. Histograms
Earlier, we computed the average pay among the CEOs in our 102-CEO dataset.  The average doesn't tell us everything about the amounts CEOs are paid, though.  Maybe just a few CEOs make the bulk of the money, even among these 102.

We can use a *histogram* method to display more information about a set of numbers.  The table method `hist` takes a single argument, the name of a column of numbers.  It produces a histogram of the numbers in that column.

**Question 1.** Make a histogram of the pay of the CEOs in `compensation`.

In [None]:
...

**Question 2.** Looking at the histogram, how many CEOs made more than $30 million? Answer the question with code.  *Hint:* Use the table method `where` and the property `num_rows`.

In [None]:
num_ceos_more_than_30_million_2 = ...
num_ceos_more_than_30_million_2

## 5. Periodogram:  estimating the period of an exoplanet

In this section, we are going to continue with our investigation of some Kepler transit data using the "lightkurve" module.  In particular, we are going to see how periodograms can help us to estimate the orbital period of a potential planet.  Additional discussion about the periodogram function in `lightkurve` can be found in the [userguide](https://docs.lightkurve.org/tutorials/index.html).  

In the first part of this section, we will walk through an analysis where we estimate the period using a periodogram.  Then you will have a chance to try this out on your own.

Recall that this module has not been previously added to our computing cluster so we have to add an extra line of code `!pip install lightkurve` before importing the module.

In [None]:
!pip install lightkurve

In [None]:
from lightkurve import *
import astropy.units as u  

First we are going to load in the same Kepler data we used in the last lab:  Kepler ID 6922244.
This is exoplanet [Kepler-8b](https://en.wikipedia.org/wiki/Kepler-8b), which was one of the first five planets confirmed by the Kepler mission.  It has a mass of about 0.603 M$_J$, a radius of about 1.419 R$_J$, and a semimajor axis of 0.0483 AU (close in orbit!).  

In [None]:
tpf = search_targetpixelfile(6922244, quarter=4).download()

Next, we will conver the `tpf` to a light curve and plot it.

In [None]:
lc = tpf.to_lightcurve(aperture_mask=tpf.pipeline_mask)
lc.plot()

We can flatten the light curve to remove the trend:

In [None]:
flat_lc = lc.flatten(window_length=401)
flat_lc.plot()

Since we know the period for the planet, 3.5225 days, let's see what the folded light curve looks like:

In [None]:
folded_lc = flat_lc.fold(period=3.5225)
folded_lc.plot()

Now let's take a step back.  We used `period=3.5225` days when we folded the light curve, but in practice we wouldn't necessarily even know that a planet was present!  We can use the `periodogram` method to estimate the period present in the flattened light curve.

First let's look at the flatted light curve again, but this time as a scatter plot:

In [None]:
flat_lc.scatter()

Next we can convert the detrended light curve to a periodogram and plot it.

In [None]:
pg = flat_lc.to_periodogram()
pg.plot()

We can view the data as a table as well:

In [None]:
Table().with_column("Frequency", pg.frequency, 
                  "Power", pg.power).sort("Power", descending = True)

Since we typically work in terms of period rather than frequency, we can change the format of the plot to `period`.  We also adjust to a log scale as well (though you can try plotting on the usual scale and see which you prefer).  The maximum power is noted with a red dot by nothing that `pg.period_at_max_power` gives the period of maximum power, and `max(pg.power)` gives the height.

In [None]:
pg.plot(format='period', scale='log');
plt.scatter(pg.period_at_max_power, max(pg.power), color = "red")


A table can be created with period instead of frequency as well:

In [None]:
Table().with_column("Period", pg.period, 
                  "Power", pg.power).sort("Power", descending = True)

OK, so now that we have the estimated period, we can fold the light curve accordingly!  Let's define the period and then plot the folded light curve.

In [None]:
period = pg.period_at_max_power
print('Best period: {}'.format(period))
flat_lc.fold(period.value).scatter();

Hmmm...well this doesn't look all that great.  It appears that the top part of the transit has points covering it.  This suggests that the period is off by some multiple of the estimate.  Let's try doubling the period estimate and see what happens.

In [None]:
period = pg.period_at_max_power * 2
print('Best period: {}'.format(period))
flat_lc.fold(period.value).scatter();

This looks much better!  We get our final estimate of about 3.5226 days, which is close to the generally accepted estimate of 3.5225 days.  Sometimes the period estimated is a fraction or multiple (a harmonic) of the true perod.

**Question 5.1.** Now you get to try!  An unnamed set of Kepler transit data needs to be loaded.  The name of the file is `lab03_kepler.csv.`  Begin by loading in the data and call it `kepler`.

In [None]:
kepler = ...

**Question 5.2.**  Next plot the light curve. First you can turn the imported data `kepler` into a `LightCurve` object that is recognizable by the `lightkurve` module.  You can use the following code:
`lc = LightCurve(kepler.column("Time"), kepler.column("Flux"))` and check the type with `type(lc)`.

Don't forget to plot the light curve!

In [None]:
# Create the LightCurve object and plot the light curve here...



**Question 5.3.** Next flatten the light curve to remove the trend.  You may (or may not) need to try a couple `window_length`s...just remember that it has to be an odd number.  Plot the flattened light curve as well.

In [None]:
# Put your answer here...





**Question 5.4.**  Okay, so now we have our flattened light curve and we need to estimate the period.  To do this, we can use the `to_periodogram()` method as we did earlier.  This will give us the Power Spectral Density plotted against the frequency.

In [None]:
# Put answer here





**Question 5.5.**  Now plot the periodogram again, but use `period` for the horizontal axis.  (See the previous example above where we did this.)  You can also set the `scale='log'`.

In [None]:
# Put your answer here



**Question 5.6.**  Now we can figure out the estimate of the period at the maximum power.  Determine the period and plot the folded light curve using your estimated period.  Also, record the period estimate in a cell below.

In [None]:
# Put your code here
    

Period estimate = [ADD ANSWER]

**Question 5.7.**  Well, the folded light curve above doesn't look quite right, but it is close.  Try adjusting the period slightly to see if you can get the transits to line up a bit better.  Try scaling the period estimate up on the order of 1.001298.  Record your final period estimate in a cell below.

In [None]:
# Put your code here.


Final period estimate = [ADD ANSWER]

## 6. Radial Velocity Light Curve

In this Section, we are going to explore the RV curve of the exoplanet [51 Pegasi b](https://en.wikipedia.org/wiki/51_Pegasi_b).  This exoplanet was discovered using the RV method in 1995, and it was the first exoplanet discovered orbiting a star similar to our Sun!
The host star, 51 Pegasi, is about 50 light years away.

This discovery was published in Nature by [Michel Mayor & Didier Queloz (1995)](https://www.nature.com/articles/378355a0)

The particular dataset we are using was taken from [this website](https://iopscience.iop.org/article/10.1086/304088/fulltext/).

**Question 6.1.**  To begin, let's read in the data.  The data file is called `51pegb.csv` so read it in and call the table `data`.

In [None]:
data = ...

**Question 6.2.** Check the data type of each column.  Are they numerical?  If not, convert them to numerical data.  You may find you need to define a function to convert the string to a float (and the `.strip` method may be useful to get rid of non-numerical features in the string), and then use `apply` to convert the whole column.  Be sure to add the converted column to `data`.

In [None]:
...

**Question 6.3.** Take a look at the data by producing a plot.  When you first do this you will notice the the `JD` values get quite large, but less frequent.  For today, let's just use the observations with `JD` < 31 days.  You can create a new data table, call it `data2`, and only include the rows with `JD` < 31 days.

Then use `plt.scatter` to product the scatterplot, `plt.xlabel` and `plt.ylabel` to add axis labels, and `plt.ylim` to change the y-axis limits to -80 to 80 m/s.

In [None]:
data2 = ...

**Question 6.4.** Next we want to add our RV curve model to the plot.  Today we are assuming we already know all about this planet, so we can just use the known values in the model.  Below we go through several steps for setting up the model so we can add it to our plot.

First, let's check the range of values for `JD` so we know what values to consider in the model.

In [None]:
#Notice the range of the JD column
[np.min(data2.column(3)),np.max(data2.column(3))]

Now let's define an array of values within that range.  The more values we use, the smoother our model is going to look when we plot it.  Let's add a point for every quarter of a day in the range.

In [None]:
#Set the range of values to evaluate your function
jd_range = np.arange(2,32, .25)
jd_range

Now we just need to define a function that specifies our model.  We are going to assume that 51 Pegasi b has a circular orbit (eccentricity = 0)...this isn't too far off the truth since the eccentricity is estimated at [$e=0.013$](https://en.wikipedia.org/wiki/51_Pegasi_b).  We are also going to assume the planet is observed edge-on.  Therefore, the only parameters we will use in our model (which will be a sine curve) is the day (`x`), the orbital period (`P`), and the semi-amplitude (`K`).

In [None]:
#Define the RV curve
def rv(x,K,P): 
    return K*np.sin(x*2*np.pi/P)

Now all you need to do is set the values for `K` and `P` for the function, and run the function on `jd_range` to get your x and y values to plot the model curve.  Use your same code as for the earlier scatterplot, but now add a `plt.plot` with the new model data.

You can look up the values for `K` and `P` [here](https://en.wikipedia.org/wiki/51_Pegasi_b).

In [None]:
K = ...
P = ...


**Question 6.5.**  Why do we fit a model to the data?  Why don't we just connect the dots?

Add your answer here.

**Question 6.6.** What is the orbital period and of the planet and the semi-amplitude of the RV curve?  (We used these above, but confirm that you also notice that period in the data.)

In [None]:
period = ...
amplitude = ...

**Question 6.7.**  Calculate the approximate mass in $M_J$ of the planet ($M_P$) using the `period` (`P`, in days) and `amplitude` (`K`, in $ms^{-1}$) from above.  $M_\star$ is the mass of the star in Solar radii. 
Note that this equation assumes a circular orbit and viewing the system edge-on.  

$M_P \approx 4.919\times10^{-3} P^\frac{1}{3} K  M_\star^{\frac{2}{3}}$

In [None]:
mass = ...

**Question 6.8.** How does your calculation above compare to the generally accepted estimated displayed in Table [here](https://en.wikipedia.org/wiki/51_Pegasi)?

Put your answer here.

**Question 6.9.** Let's compare our model predictions to the observations.  To do this, we need to get model predictions at the JD locations of our data.  Use the `JD` column of our data in the model function, `rv`, that we defined above.  Then add these points to the same scatterplot as above.  You may want to use the variable `color="green"` in the `plt.scatter` so you can distinguish the points.

In [None]:
...

**Question 6.10.** Next let's plot the residuals.  The residuals are defined as the observations minus the predicted values at the same `JD` values as the observations (which you calculated above).  Calculate these residuals and then plot them.  Note that the vertical axis scale will be different from the scale you used above.

In [None]:
...

**Question 6.11.** Checking the residuals is necessary for validating the model.  But in our setting, the residuals may be of more interest.  Why might astronomers be interested in looking at the residuals after fitting for an exoplanet?

Add your response here.

**Submission:** Once you're finished, follow the instructions at the top of this notebook to save as a .pdf and .ipynb.  Then submit the two files through Canvas.