## Notebook 2: Visualizing data
___

#### Please upload your completed notebook to Canvas as an .ipynb file
#### Title the file as: LastName_notebook2.ipynb

### Original work statement:

Please write your name and the names of your collaborators in this cell.

Please be sure to cite sources along the way as appropriate.

### Your name:
#### Collaborators:

You can edit this notebook directly by adding code and text cells as needed. As always, begin by importing the necessary packages.
___

# Problem 1: The Hertzsprung-Russell diagram (H-R diagram)
*This problem is from the course textbook: Intro to Machnine Learning for Physics and Astronomy by Viviana Acquaviva*

The H-R diagram is a visual map that relates stellar brightness to stellar temperature or stellar color. We describe it as a map because a star will move within the H-R diagram within its lifetime depending on its life stage. We will be creating an H-R diagram in this problem.

## 1a. Import the data from the file HIP_star.dat

You can read about the data here:
https://astrostatistics.psu.edu/datasets/HIP_star.html

The file has 9 columns, but we are only interested in three columns:
- Vmag, the apparent visual magnitude of the star. This is the second column.
- Plx, the parallax angle of the star in milliarcseconds ($10^{-3}$arcseconds). This is the 5th column.
- B-V, the color of the star. This is the 9th column.
You will want to skip the first row when opening the file.



## 1b. Create three variables:
- one to store Vmag
- one to store Plx
- one to store B-V

i. How many stars are in this data set?

## 1c. plot a histogram of B-V
i. What can you say about the distribution of B-V?

ii. Calculate its mean, median and standard deviation

## 1d. Define a function that calculates the log base 10 of the Luminosity of the star.

The log Luminosity of a star is related to its apparent magnitude (brightness) and its distance away from us by the following equation:

$$ {\rm log}L = \frac{15 - {\rm m} -5{\rm log(\theta)}}{2.5}$$
where m is the apparent visual magnitude, $\theta$ is the parallax angle which is inversely related to distance, and log is log base 10. The units of luminosity here is solar luminosity, $[L_{\odot}]$ where log$L_{\odot}$ of our sun is =0.

## 1e. Make the H-R diagram.
Create a scatter plot with B-V on the x-axis and their log luminosity on the y-axis. This is called a H-R diagram (Hertzsprung-Russell diagram). It encodes information about the temperature of stars (expressed by color, or B-V) and their luminosity.

Don't forget to label your axes as appropriate.

Adjust axes limits to be [-0.5,3] for B-V, [-3, 4] for log L

## 1f. Incorporate apparent magnitude information.
Make another scatter plot, this time with colors arranged according to apparent magnitude of the star, and where the size of the markers is also proportional to the brightness of each star.

Include a colorbar. Remember, you will want to choose a colormap! Scroll here to find them: https://matplotlib.org/stable/users/explain/colors/colormaps.html


## 1g. What trends do you notice in the H-R diagram?
Important note: Stellar color is a proxy for temperature. An H-R diagram that shows temperature on the x-axis has temperature in descending order going (left to right) from hotter-->cooler stars. Using color as a proxy, more blue stars (stars that have B-V < 0) are hotter than red stars (stars that have B-V > 0).

# Problem 2: Creating a mock data set
A powerful tool for reserach is creating a mock dataset that is based on known physical processes. In this problem, we're going to create a mock set of stellar parallaxes in order to obtain their distances.

The formula for obtaining distance to stars through parallax is:
$$ D = \frac{57.3^{\circ} \times 1 {\rm AU}}{\theta}$$

where D is the distance to the star (typically expressed in units of parsecs), 1AU is the radius of our observer base here on Earth (recall from class), and $\theta$ is the parallax angle in *degrees*. (In case it's hard to see, that says 57.3 degrees)

## 2a. Plot a histogram of the parallax angles from the data set in Problem 1.
In order to create a statistical data set of parallaxes, we want to look at the statistical distribution of known parallaxes. So we can look at the histogram of parallaxes from real data in order to make decisions.

i. What is the range of the parallaxes (in milliarcseconds)?
ii. What shape does the distribution have? What are the most/least frequent occurrences? Is there a trend?


## 2b. Create an array to store 1000 mock stellar parallaxes with a similar distribution as the real parallaxes.
Be sure to plot the histogram of the mock distrbution to check your results.

** Please, don't take too long on this. If you get stuck here, you can just randomly sample 1000 data points from the real data set in Problem 1. (you can look at numpy random.choice)

## 2c. get the distance, in parsecs, to the mock stars from your mock parallax
You may want to use astropy for your units inside your function. Convert the output to parsecs before using the .value attribute! We don't want to output the unit itself!

## 2d. Create an array of mock errors on the measurement.
Think about what might be a source of error and how that might present in your measurments. Let's imagine that our instrument is more precise with larger parallaxes. Therefore creata a mock array of errors on the stars where the smaller the parallax angle, the higher the error.

## 2e. Make a plot of the mock distances as a function of mock parallax with the error bars as a shaded region behind the line plot (CHALLENGE, I believe in you!)

Don't forget to label your axes and provide units as appropriate.

## 2f. Does your plot make sense? Explain.