# Introduction to Python And Juypter Notebooks

## Introduction to Juypter Notebooks
There are two modes (edit mode and command mode)
- Press `esc` to enter command mode
- Press `enter` to enter enter mode
- Press `B` in edit mode to add a cell
    - Press `Y` to change cell to code cell
    - Press `M` to change cell to markdown cell
- Press `DD` to delete cell
- Press `ctrl + enter` to run cell

This guide has more info [juypter notebook shortcuts](https://towardsdatascience.com/jypyter-notebook-shortcuts-bf0101a98330)

# Introduction to Markdown
Markdown is used to annotate your code and provide more information.
There aren't strictly required but are helpful for documenting your progress.

Her are some useful markdown 

```verbatim
# Heading 1
## Heading 2

List
- Ele 1 
- Ele 2
    - Ele 2.1
- Ele 1

Links  
[Name of Link](Link)

Styling for Code  
`variable/code`
```

P.S. Notice that we need two spaces to create a new line  
The code above will generate the following below:

# Heading 1
## Heading 2

List
- Ele 1 
- Ele 2
    - Ele 2.1
- Ele 1

Links  
[Name of Link](Link)

Styling for Code  

`variable/code`

This guide has more information on markdown syntax [markdown guide](https://www.markdownguide.org/basic-syntax/)

# **TODO: Write some markdown in this cell with details about your research project**

# Why Python?

Python is a general purpose programming language used for data science.
Juypter Notebooks allows Python to be ran in distinct blocks

Brief overview of Python features
```python
a = True
b = False
a & b # true AND false = false
a | b # true OR false = true
a == b # true equals false = false
```

In [None]:
a = True
b = False
a & b # true AND false = false

In [None]:
# If Statements
variable = 50
if (variable > 10):
    print('greater than 10')
else:
    print('less than 10')
        

In [None]:
# i is [0, 5)
for i in range(0, 5):
    print(i)

In [None]:
list = ['math', 'physics', 'astronomy','chemistry', 'biology', 'astrology', 'alchemy']
for ele in list:
    print(ele)

In [None]:
for index, ele in enumerate(list):
    print(index, ele)

# TODO: Create a loop that plays the game fizzbuzz
- If number is divisble by 3, print fizz
- If number is divisible by 5, print buzz
- If number is divisible by both, print fizzbuzz (one line)
- Otherwise, print the number

Here is an example
```text
1
2 
fizz
4
Buzz
Fizz 
7
8
Fizz
Buzz
11
Fizz
13 
14 
FizzBuzz
16
17
Fizz
19
Buzz
```

In [None]:
# Put your code here

## Importing Data
Lets start off with analysis some data that I used in my project 

In [None]:
# Importing relevant packages that I will use
import pandas as pd # Data Analytic Library
import matplotlib.pyplot as plt # Data Visualiation Library
import numpy as np # Package for mathematical computing
from astropy.table import Table # Astropy works with .fits files

Import a fits file and converting it to Pandas data frame
- I am using a trimmed down version of Galah Data Release 4 
- More information on this can be found on the [galah-survey](https://www.galah-survey.org/dr3/using_the_data/#recommended-columns) website

In [None]:
# Imports the data that we will be using
glh = Table.read("data/galah_dr4_trimmed.fits", format="fits", hdu=1)
names = [name for name in glh.colnames if len(glh[name].shape) <= 1]
glh = glh[names].to_pandas()

In [None]:
# Shows the first few elements of your dataset
glh.head()

In [None]:
# Provides an overview of the dataset
glh

In [None]:
glh.columns.values

### Indexing and Accessing Data

In [None]:
# We can access a specific columns by using brackets and then specifying the columns like so:
glh['tmass_id']
glh.tmass_id

In [None]:
# We can access multiple columns by using a list [...]
glh[['tmass_id', 'ra', 'dec']]

In [None]:
# Conditional Filtering 
glh[(glh.ra > 0) & (glh.dec < 0)]

# TODO: Figure out how many stars have a metallicity more than 0
Recall metallicity is $$ \Big[ \frac{Fe}{H} \Big]$$

In [None]:
# Put your code in this cell

# TODO: Filter out stars with the following flags
Flags indicate there may be data that is not reliable. (Bitmasks are used if you are curious)
These flags should be checked
- `flag_sp` to 0
- `flag_fe_h` to 0
- `flag_x_fe` to 0 (find an element `x` that you are interested and check if there is a flag to remove)
- `snr_c3_iraf` should be greater than 30

In [None]:
# Put your code in this cell

### Editing Data
We can add columns to our data by computing using the rest of the columns  
Recall that:
$$ \Big[\frac{C}{N}\Big] = \Big[\frac{C}{Fe}\Big] - \Big[\frac{N}{Fe}\Big] $$

In [None]:
# Define Relative Abundance of C/N
glh['c_n'] = glh.c_fe - glh.n_fe

In [None]:
glh.c_n

### Graphing Data with Matploblib
You can right click to save to .png format to include in your report

In [None]:
# Scatterplot
# plt.plot(x, y, additional options)
plt.plot(
    glh.dec, 
    glh.ra, 
    'x',
    markersize=2,
    color='red'
) # Graph each of the data with an x
plt.xlabel("dec")
plt.ylabel("rec")
plt.xlim(-90, 30) # Change the x axis 
plt.ylim(-10, 250) # Change the y-axis
plt.title("Position of Stars")

In [None]:
# Histogram 
counts, bins = np.histogram(glh.teff.dropna()) # Not I have removed any NaN, i.e. not-a-number
plt.stairs(counts, bins)
plt.title("Effective Temperature of Stars")

# TODO: Graph Effective Temperature vs Surface Gravity
Try to find the main sequence stars graphically  
After that, save your graph as .png

In [104]:
# Put your code here

### Export and Importing Data

In [None]:
head = glh.head()
head

In [None]:
head.to_csv('data/test.csv', index=False)

In [None]:
new_head = pd.read_csv('data/test.csv')
new_head

### Exporting LaTeX Tabl;es
We can export tables for use in LaTeX to use in your reports

In [None]:
glh.head()[['tmass_id', 'ra', 'dec']]

In [None]:
glh.head()[['tmass_id', 'ra', 'dec']].to_latex(index=False)

# Advice for Coding
- Name your variables something sensible
    - variables names like `a` is short and easy to type but won't mean much to someone else
    - instead, try giving more descriptive names or abbreviations e.g. `glh` is an abbreviation for Galah
- Version Control
    - Make sure to save your data periodically
    - You don't want to have to redo hours of work if you mess something up
- Project Structure
    - Make sure to include folders and sub-folders
        - A folder for processed data vs raw data
        - Have a folder for figures  
    - Make different Juypter Notebooks for different purposes (e.g. 1 for data cleaning, one for ML etc)
- Coding 
    - Google is your friend
    - Reading documentation for 5 minutes is probably better than 1hr of trial and error
    - Research as you go
 
# Task
- Get started with your project
   - Create a new Juypter Notebook and import the data
   - Figure out what type of data you are working with
   - Graph some data to see if you can find any relationships