In [None]:
%matplotlib inline
import os
import matplotlib.pyplot as plt
import numpy as np
import seaborn

from IPython.display import HTML, Image

plt.style.use('seaborn-darkgrid')

# Python Data Structures

Different types and ways of holding data. Essentially variables.

* Built-in common types:
    * Scalar (single-value):   
        `str`, `int`, `float`, `bool`, `complex`, ...
        
    * Compound (multi-value):   
        `list`, `dict`, `tuple`, `set`, ... 
        
    * Fancy things:  
        `datetime`  
        [`collections`](https://docs.python.org/3/library/collections.html):    
        `namedtuple`, `defaultdict`, `deque`, `Counter`, ...
        
    * `numpy`:  
        `numpy.array`

## Scalar

Most familiar with these types.

In [None]:
foo = 42                 # int
bar = 'meaning of life'  # str
baz = 3.14158            # float
bam = False              # bool
moo = 42j                # complex uses 'j'

In [None]:
print(foo * baz)

In [None]:
print(foo * bar)

In [None]:
print(foo * bam)

## Compound

Mostly familiar with these types, especially `list` and `dict`.

In [None]:
foo = [1, 2, 3, 4, 5]

bar = {
    'stars': 1e12,
    'galaxies': 1000,
    'black_holes': 1
}

In [None]:
print(f"There are approx {bar['stars']} in a monolith.")

In [None]:
Image('https://s.hdnux.com/photos/63/02/07/13377985/3/920x920.jpg')

In [None]:
for i in foo:
    print(i)

In [None]:
for obj, num in bar.items():
    print(f"There are {num} {obj} in the Milky Way")

## More on dictionaries

Wilfred says, "The most useful of data types"

Think of a dictionary as a way to hold and organize all of your other variables but with convenient <span style="text-decoration: line-through">semantic labelling</span> names.

In [None]:
location = {
    'name': "Macquarie University",
    'latitude': 33.7771,      # degrees
    'longitude': 151.1180,    # degrees
    'elevation': 100,         # meters
    'pressure': 1000,         # mbar 
    'horizon': 30,            # degrees
    'timezone': 'Australia/Sydney', 
}

location

We can then use this dictonary in various ways:

In [None]:
from astroplan import Observer
from astropy import units as u

mqu = Observer(
    longitude=location['longitude'] * u.deg,
    latitude=location['latitude'] * u.deg,
    elevation=location['elevation'] * u.meter,
    timezone=location['timezone'],
    name=location['name']
)

mqu

In [None]:
from astropy.coordinates import SkyCoord
from astroplan import FixedTarget
from astropy.time import Time
from astroplan.plots import plot_airmass 

alpha_centauri = FixedTarget.from_name('Arcturus')

plot_airmass(alpha_centauri, mqu, Time.now() + 1 * u.hour);

## But...

What happens when we need to use this information a number of different scripts or programs?

* Copy and paste `location` into every script we run?
* What if our `elevation` were incorrect? (Which we only discover after copying-and-pasting 1000 times)

## Serialization

> ...serialization (or serialisation) is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer) or transmitted (for example, across a network connection link) and reconstructed later (possibly in a different computer environment). 

> When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. 

https://en.wikipedia.org/wiki/Serialization

<span style="color: red; font-size: 48px">Save stuff. Use it later.</span>

## Pickle

Built-in Python serialiation format.

* Python only.
* Not human readable.
* They say it's slow.

## JSON and YAML

Easy-peasy serialization.

#### JAY-sun
JavaScript Object Notation


#### /ˈjæməl/  (think camel)
Yet Another Markup Language

## Advantages

* Any language
* Human-readable*
* Faster

*JSON isn't super friendly

The basic idea with serialization is to convert your object into a text string*, which can be saved to a regular file.

When you want to use the object you deserialize the string and have the same object.

Serialize: `object` -> `string`  
Deseriialize: `string` -> `object`


*Can be converted to bytes (i.e. jsonb), not discussed here.

### Differences between YAML and JSON

[Stolen from here](https://www.json2yaml.com/yaml-vs-json)


#### YAML vs JSON
YAML is best suited for configuration where JSON is better as a serialization format or serving up data for your APIs.

In some cases, YAML has a couple of big advantages over JSON, including:

* **comments**
* ability to self reference
* support for complex datatypes
* more...

**Write your configuration files in YAML format where you have the opportunity - it is designed to be readable and editable by humans.**

#### JSON vs YAML
JSON wins as a serialization format. 

* It is more explicit and more suitable for data interchange between your apis.
* YAML is a superset of JSON, which means you can parse JSON with a YAML parser. (Wilfred says "Don't do this")
* Try mixing JSON and YAML in the same document: `[..., ..]` for annotating arrays and `{ "foo" : "bar"}` for objects. (Wilfred says "Don't do this either.")

### JSON

Machines talking to machines

### YAML

Humans talking to machines (and vice versa)

Let's take another look at our `location` object:

In [None]:
location

In [None]:
import json  # Part of Python standard library

json.dumps(location)

In [None]:
# It is just a string
type(json.dumps(location))

In [None]:
# We can save to a file
with open('my_location.json', 'w') as f:
    f.write(json.dumps(location))

In [None]:
# Using shell commands from Jupyter with exclamation point
!ls -l

In [None]:
!cat my_location.json

In [None]:
import yaml  # pip install pyyaml - or pip install ruamel.yaml

yaml.dump(location)

In [None]:
type(yaml.dump(location))

In [None]:
# We can save to a file
with open('my_location.yaml', 'w') as f:
    f.write(yaml.dump(location))

In [None]:
!ls -l

In [None]:
!cat my_location.yaml

### YAML as super-fancy persistent dictionary

YAML (and JSON) handle all built-in types.*

This means it can handle an array in a dict in a dict, etc.


*You can also handle fancy types such as `datetime`, `numpy.array`, etc. Only slightly more involved.

In [None]:
foobar = {
    'baz': {
        'bar': [1, 2, 3, 4, 5],
        'bam': [
            {'one': 1},
            {'two': 2}
        ]
    },
    'moo': 42,
    'boo': "Hello World"
}

foobar

In [None]:
json.dumps(foobar)

In [None]:
yaml.dump(foobar)

### A real configuation file

Here we write out all of our configuration details in a big YAML file called `config.yaml`.

In [None]:
!cat config.yaml

Accessing this file is as simple as loading it:

In [None]:
# Note the use of `safe_load`
with open('config.yaml', 'r') as f:
    my_config = yaml.safe_load(f.read())

In [None]:
my_config

In [None]:
location = my_config['location']

location

Let's make ourselves a quick load function:

In [None]:
def load_config(filename):
    with open(filename, 'r') as f:
        my_config = yaml.safe_load(f.read())
        
    return my_config

### Change file externally

Change the file outside the script, then re-read

In [None]:
my_config = load_config('config.yaml')

In [None]:
my_config

In [None]:
planet_names = [
    'Alpha',
    'Beta',
    'Gamma',
    'Bob'
]

In [None]:
planets = dict()

# Generate a random number of fake light-curves
for i, name in enumerate(planet_names):
    # Random number between 1 - 5
    light_curves = list()
    for j in range(np.random.randint(1, 5)):
    
        # Generate a light curve
        lc0 = np.random.normal(1, 0.1, size=100)
    
        # Set up file names
        planet_directory_name = f'data/planet_{i:003d}'
        # Create our directory
        os.makedirs(planet_directory_name, exist_ok=True)
        
        # Set up json filename
        lc_filename = f'{planet_directory_name}/lc_{j:003d}.json'
        
        # Save light curve to a json file (probably not ideal)
        with open(lc_filename, 'w') as f:
            f.write(json.dumps(list(lc0)))
    
        # Add the name of our file to the list
        light_curves.append(lc_filename)
        
    # Save to our big list
    planets[name] = light_curves

In [None]:
# Look at our config
planets

In [None]:
# Save our config to a yaml file
with open('planet_config.yaml', 'w') as f:
    f.write(yaml.dump(planets))

In [None]:
!cat planet_config.yaml

### A different script

Now we go to an entirely different script: [Use Planet Config](Use-Planet-Config.ipynb)

## Logging

Logging can allow for flexible output, which allows you to easily separate information from warnings from debugging.

In [None]:
display(HTML('<iframe src="https://docs.python.org/3.7/library/logging.html" width=800 height=600></iframe>'))