[Pre-MAP Course Website](http://depts.washington.edu/premap/seminar/cohort-17-2021-seminar/) | [Pre-MAP GitHub](https://github.com/UWPreMAP/PreMAP2021) | [Google](https://www.google.com)

### Each time you access the PreMAP2021 directory make sure your files are up to date
1. Open up a terminal tab (New -> Terminal). Change directories into the PreMAP2021 directory, then do:
```bash
cd PreMAP2021
```
2. Update the directory to get any newly added files by running in the terminal:
```bash
git pull
```
3. Type in your terminal:
```bash
cd lessons
```
4. If you're on the AstroLab computer, type in your terminal:
```bash
jupyter notebook
```
This will open a webpage that has the lessons on them. You can select this lesson and then edit and run the cells to follow along with the lesson. Remember to change "Lastname" to your last name.

Plan: Pandas (just how to get data/read a csv file, how to add a column to a dataframeb) -> Subplots (for people who haven't done it yet) -> 

In [16]:
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

# Python dictionary

Before we talk about working with data, we're going to learn a new python type, the dictionary!

A dictionary in python is kind of like a dictionary in the real world, it is an object that associates a reference to a value. This is similar to how a dictionary associates a word with a definition. 

In python, we can declare a dictionary with curly brackets {}. Inside the curly brackets, we write out keys and values, as so:

In [2]:
SampleDictionary = {"Aardvark": "A type of animal", "Bear": "A kind of scary furry mammal"}

Dictionaries have two types of things in them, <b>keys</b> and <b>values</b>. Each <b>values</b> is associated with a <b>key</b>. When defining a dictionary, the key is on the left of the colon, and the value is on the right. Keys and values can be any type, in the example above both our keys and our values are strings, but they could be ints, or floats, or even lists.

You can access the values from the dictionary by "indexing" into the dictionary with the key that is associated with that value, for example to see the definition for a "Bear" we type:

In [3]:
SampleDictionary["Bear"]

'A kind of scary furry mammal'

You can add a key, value pair to a dictionary by assigning that value to the key like a variable, as so:

In [5]:
SampleDictionary["Fish"] = "A swimming not-mammal"
print(SampleDictionary)

{'Aardvark': 'A type of animal', 'Bear': 'A kind of scary furry mammal', 'Fish': 'A swimming not-mammal'}


Finally, you can print all of the keys or values for a dictionary by running `SampleDictionary.keys()` or `SampleDictionary.values()`

In [7]:
print(SampleDictionary.keys())
print(SampleDictionary.values())

dict_keys(['Aardvark', 'Bear', 'Fish'])
dict_values(['A type of animal', 'A kind of scary furry mammal', 'A swimming not-mammal'])


## Example 1: Making your own dictionary

A common code that people will use is to assign a number to each letter of the alphabet, such that A = 1, B = 2, etc. 

In the cell below, make your own dictionary where the keys are the first 5 letters of the alphabet, and the values is that letter's corresponding numerical value. Print out the dictionary, and access some of the values in the dictionary using the letters as keys.

# Working with data and pandas

`pandas` is a package that heavily builds upon this concept of a dictionary that we use to manage data.

Run the cell below to import it. A common way to abbreviate `pandas` (like `numpy` as `np`) is as `pd`.

In [2]:
import pandas as pd

## The pandas dataframe

`pandas` introduces a new type called a `dataframe`. If it helps, you can kind of think of it like an excel sheet, because it organizes data into columns and each column has a name. Thinking of it like a dictionary, the column name is the key and the column is the value.

We can load a dataframe from a text file using the `read_csv` function in pandas. Run the cell below to load the `PlanetEvolution` file into a pandas dataframe. 

(Quick note here, a .csv file is a way to organize data, csv stands for comma-separated values. This means that the columns in the data are separated by commas, the `read_csv` function knows this and can tell which column is which by looking for those commas.

In [3]:
data = pd.read_csv("data/PlanetEvolution.csv")

So now we have this variable data that contains the `pandas` dataframe that holds the data from "PlanetEvolution.csv". Whenever we load a dataframe we want to get a feel for the data. The best way to do this is with the `.head()` function that every dataframe has access to. This will give us the first 5 rows of the data, try running the cell below to see what this function does.

Getting a feel for the data
Getting what you want from the dataframe

In [4]:
data.head()

Unnamed: 0,Time,SurfWaterMass,EruptionRate,TMan,MagMom
0,0.0,1.864983,3221604000000000.0,2976.163,0.0
1,0.01,1.657204,42274070000.0,2515.928065,0.0
2,0.02,1.44577,23387860000.0,2488.819349,0.0
3,0.03,1.234176,17165400000.0,2472.629132,0.0
4,0.04,1.022522,13953160000.0,2461.071795,0.0


Another helpful thing to get familiar with the data (especially if there are a lot of columns) is the `.columns` variable that every dataframe has available. This will return a numpy array that tells you all the columns in that dataframe. Run the cell below to see what it does.

In [7]:
data.columns

Index(['Time', 'SurfWaterMass', 'EruptionRate', 'TMan', 'MagMom'], dtype='object')

Finally, let's get data from this dataframe. Pandas dataframes work very similarly to a python dictionary, you get the data in the column by "indexing" into the dataframe with the name of the column, like below:

In [8]:
print(data["Time"])

0      0.00
1      0.01
2      0.02
3      0.03
4      0.04
       ... 
446    4.46
447    4.47
448    4.48
449    4.49
450    4.50
Name: Time, Length: 451, dtype: float64


Notice that this looks a little different to a numpy array, this is actually a `pandas` <b>Series</b> (which you can read more about <a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.html">here</a> and <a href="https://towardsdatascience.com/a-practical-introduction-to-pandas-series-9915521cdc69">here</a>. A Series works a little differently than a numpy array does, but it's easy to get the numpy array from the series, just use the `.values` command that is part of every Series.

In [11]:
print(type(data["Time"]))
print(type(data["Time"].values))

<class 'pandas.core.series.Series'>
<class 'numpy.ndarray'>


## Example 2: Getting data from a pandas dataframe

Print the Time column of the data, and the SurfWaterMass column. Then, plot the SurfWaterMass column by the Time column.