# Using Jupyter and Python for data analysis
This is a *Jupyter notebook* with blocks of code called *cells*. You can press shift+ENTER to *run* a cell and go on to the next one. You can also edit the code and run it again to see how the output changes.

If you're running this on Google Colab, you'll see a popup window the first time saying "Warning". Don't worry, it's safe. Click on "run anyway".

Try running the following cells by pressing SHIFT and ENTER (at the same time) for each one.

*You won't hurt anything by experimenting. If you break it, close the tab and open the activity again to start over.*  

## Part 1: Running Python code

In [None]:
# Click on this cell. Then, press SHIFT and ENTER at the same time.
2+2

In [None]:
# This is called a "comment". It's a message to other humans.
# Starting with # tells the program not to read this line.
# the program will run the next line since it doesn't start with #
5-4

In [None]:
# the following lines define variables called "a" and "b"
a = 4
b = 3

# the next line shows us what a plus b is.
a+b

In [None]:
c = a*a # this line calculates a times a and saves the result as a varialbe called "c"
c       # this line tells the program to show us what "c" is.

In [None]:
# this coding language is called "python"
d = "I just coded in Python" # yep you did!
d

Try editing some of the code above.
- Edit some code to do a different calculation
- Add a comment somehwere

You can run a cell again by pressing shift+ENTER. 

In [None]:
# Can you figure out what ** does?
e = b**a
e

## Part 2: Markdown
The cells above are *code cells* that let you to run code. This is a *markdown cell* that contains markdown text. That's text that isn't read as Python code. Instead, you can format markdown text to look nice.

Double-click on this cell to see the markdown text underneath. Running a markdown cell turns it into pretty, formatted text.
- here's a bullet point
- and another list item in *italics* and **bold**.
- this is a hyperlink to [my favorite thing on the web](https://www.youtube.com/watch?v=dQw4w9WgXcQ)
- You can even embed images  
![cute kitten](https://github.com/adamlamee/CODINGinK12/raw/master/notebooks/1dayoldkitten.png)  

### Try this
Double-click on this cell to see the code that formats this text. Make a few edits and press shift+ENTER to see the changes.

Read more about [formatting the markdown text](https://help.github.com/articles/basic-writing-and-formatting-syntax/) in a cell, like this one, or go to Help > Markdown > Basic Writing and Formatting Text.  

## Part 3: Analyzing stellar data  
Now you'll analyze the properties of over 100,000 stars. 

In [None]:
# Import modules that contain functions we need
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Read in data that will be used for the calculations.
#data = pd.read_csv("https://github.com/adamlamee/CODINGinK12/raw/master/data/stars.csv")
data = pd.read_csv("https://github.com/adamlamee/UCF_labs/raw/main/data/stars.csv")

In [None]:
# this shows the first 5 rows of the data set
data.head(5)

In [None]:
# The .shape command displays the (number of rows , number of columns) in a file.
data.shape

### Self-check
- In the table above, what do you think the column headings represent? You can check on the [Astronomy Nexus page](http://www.astronexus.com/hyg) that collected the data.
- How many stars are included in this data set? See the data.shape line of code above.

When you're ready, run the code below.  

In [None]:
# plots a histogram of one column
plt.hist(data['dist'])
plt.title("Any idea for a title?")
plt.xlabel("label me?")
plt.ylabel("me, too");
plt.show()

### Self-check questions
Use the graph above to answer the following questions:
- Which column is shown in the histogram? See the plt.hist line above.
- What does that shape of the graph say about the stars in this data set? On a histogram, tall bars mean more values in that range. The introduction to the [histogram Wikipedia page](https://en.wikipedia.org/wiki/Histogram) gives a brief explanation.
- Give some better labels to the graph and run the code again to see how they look.
    - the horizontal axis shows distance values in parsecs.
    - the vertical axis of a histogram is often labeled "number of values"  (or sometimes "frequency" or "counts) since that shows the number of numerical values that fell into each bin.  

When you're ready, run the code below.  

In [None]:
# draws a scatter plot
plt.scatter(data['temp'], data['absmag'], s=1, alpha=0.2, edgecolors='none')
plt.xlim(2000,15000)
plt.ylim(20,-15)
plt.title("Any idea for a title?")
plt.ylabel("Absolute Magnitude")
plt.xlabel("Temperature (K)");

### Self-check
Use the graph above to answer the following questions:
- What patterns do you see in the graph?
- The y-axis shows brighter stars at the top and dimmer stars toward the bottom. What's strange about the **absolute magnitude** scale?
- Some stars aren't very hot, but they're very bright because they're so big (called *giants* and *super giants*). Where are those on the graph?
- Other stars are really hot, but they're really small so not very bright (called *white dwarfs*). How could that happen? Where might you find them on the graph?

When you're ready, run the code below.

In [None]:
# These are the abbreviations for all the constellations
data['con'].sort_values().unique()

In [None]:
# This filters the data using based on the "con" column
constellation = data.query('con == "Vir"') # makes a new data set called "constellation"
constellation = constellation.sort_values('mag').head(10) # only saves the 10 brightest stars in the data set

# plots the constellation's stars in red over the big graph of all stars
plt.scatter(data['temp'], data['absmag'], s=1, alpha=0.2, edgecolors='none')
plt.scatter(constellation['temp'], constellation['absmag'], color='red', edgecolors='none')
plt.xlim(2000,15000)
plt.ylim(20,-15)
plt.title("Types of stars in a constellation")
plt.ylabel("Absolute Magnitude")
plt.xlabel("Temperature (K)")
plt.show()

### Self-check
The graph above shows the ten brightest stars in a constellation (as red points) on the entire data set of stars.  
- Using the graphic below, which types of stars are in Virgo?
- The code above shows the abbreviation for each constellation, then filters the data set for just the stars in that constellation (called a *query*). Try plotting a different constellation. Is it made of different types of stars?

![HR diagram](https://raw.githubusercontent.com/adamlamee/CODINGinK12/master/data/H-R-diagram.jpeg)  

---
## Saving Your Work
This is running on a Google server on a distant planet and deletes what you've done when you close this tab. To save your work for later use or analysis you have a few options:
- File > "Save a copy in Drive" will save it to you Google Drive in a folder called "Collaboratory". You can run it later from there.  
- File > "Download .ipynb" to save to your computer (and run with Jupyter software later)
- File > Print to ... um ... print.
- Save an image to your computer of a graph or chart, right-click on it and select Save Image as ...

## Credits
This notebook was designed by [Adam LaMee](https://adamlamee.github.io/). Thanks to the great folks at [Binder](https://mybinder.org/) and [Google Colaboratory](https://colab.research.google.com/notebooks/intro.ipynb) for making this notebook interactive without you needing to download it or install [Jupyter](https://jupyter.org/) on your own device. This document is shared with a [CC BY-NC-SA license](https://creativecommons.org/licenses/by-nc-sa/4.0/). This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. If you remix, adapt, or build upon the material, you must license the modified material under identical terms.  