# Basic Data Structures

## Lists

A list is most commonly designated with **square brackets**. Items in a list are seperated using a **comma** and can have different data types - strings, integers, etc. List items can be accessed with an index or indices.

**Resources**
-  https://www.tutorialspoint.com/python/python_lists.htm
-  https://developers.google.com/edu/python/lists

### Creating a list

In [1]:
stars = ['Betelgeuse', 'Polaris', 'Sirius']
constellations = list(['Cassiopeia', 'Draco', 'Orion'])
messierObjects = [31, 'Crab nebula'] # different data types

### Accessing list items/elements

An **index** is used to access a single value in a list.
A **slice** is used to access range of elements in a list. The slice **[x:y]** represents the starting index, **x**, up to but not including the **y** index.

In [2]:
# Index
print(stars[0]) # 0 is the index of the first item in the list: Betelgeuse
print(stars[1]) # Polaris
print(stars[2]) # Sirius
print(stars[-1]) # -1 is the index of the last item in the list, -2 is the second to last, and so on...

Betelgeuse
Polaris
Sirius
Sirius


In [3]:
# Get the index or position of a known list item
index = (messierObjects.index('Crab nebula')) # the method .index() returns the index of an item in a list
print(index)
print(messierObjects[index])

1
Crab nebula


In [4]:
# Slice
print(stars[0:2]) # prints the first and second items
print(constellations[1:3]) # prints the second and third items

['Betelgeuse', 'Polaris']
['Draco', 'Orion']


In [5]:
len(stars) # get length of list

3

### List comprehension

If a list is not explicity defined (like in our stars or constellations lists) then they are normally populated using a for loop. List comprehesion (line 96) is the more "pythonic" way to create a list since it's more efficient and less syntax.

In [6]:
squares1 = []

for number in range(10):
    squares1.append(number**2) # The ** operator is a power
    
squares1

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [7]:
# List comprehension follows the logic: [expression for item in list if conditional]
squares2 = [x**2 for x in range(10)]
squares2

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

### Iterating through a list or lists

In [8]:
for star in stars:
    print(star)

Betelgeuse
Polaris
Sirius


In [9]:
for s, c in zip(stars, constellations): # zip() maps the indices of "n" number of lists and returns a tuple
    print(s, c)

Betelgeuse Cassiopeia
Polaris Draco
Sirius Orion


In [10]:
for number, s in enumerate(stars): # enumerate() adds a counter to an iterable
    print(number, s)

0 Betelgeuse
1 Polaris
2 Sirius


In [11]:
i = 0
while i < len(stars):
    print(stars[i])
    i += 1 # sames as i = i+1

Betelgeuse
Polaris
Sirius


In [12]:
for s in range(1, len(stars)):
    print(stars[s])    

Polaris
Sirius


### Appending, inserting, and removing items in a list
The method **.append()** appends, or adds, an item to the end of a list. The method **.insert()** inserts an item at a specific index in a list.

In [13]:
# Add to a list
stars.append('Gemini')
stars

['Betelgeuse', 'Polaris', 'Sirius', 'Gemini']

In [14]:
# Oops, Gemini is not a star so I want to remove it
stars.remove('Gemini')
stars

['Betelgeuse', 'Polaris', 'Sirius']

In [15]:
# I want to add Gemini to the constellations lists but then it would not be in alphabetical order.
# So instead of .append(), I use .insert()
constellations.insert(2, 'Gemini') # where 2 is the index of the 3rd list item
constellations

['Cassiopeia', 'Draco', 'Gemini', 'Orion']

In [16]:
# Sagittarius is my zodiac sign so I want to replace Gemini with it.
constellations[2] = 'Sagittarius'
constellations

['Cassiopeia', 'Draco', 'Sagittarius', 'Orion']

In [17]:
# Now the list is out of alphabetical order but I can sort it
constellations.sort()
constellations

['Cassiopeia', 'Draco', 'Orion', 'Sagittarius']

## Dictionaries

A dictionary is most commonly designated with **curly brackets**. It is a set of key and value pairs (key: value) seperated by commas. Values in a dictionary are accessed via a unique key. Similar to a list, a dictionary can contain any kind of data type. **Spoiler**: a FITS file header information is contained in a dictionary. 

**Resources**
-  https://www.tutorialspoint.com/python/python_dictionary.htm
-  https://hackernoon.com/python-basics-10-dictionaries-and-dictionary-methods-4e9efa70f5b9


### Creating a dictionary

In [18]:
# The keys are Messier, NGC, Constellation, and Apparent magnitude
eagleNebula = {'Messier': 'M16', 'NGC': 'NGC 6611', 'Constellation': 'Serpens', 'Apparent magnitude': 6}
dumbbellNebula = dict({'Messier': 'M27', 'NGC': 'NGC 6853', 'Constellation': 'Vulpecula', 'Apparent magnitude': 7.5})

### Accessing dictionary keys and values
The method **.keys()** returns all dictonary keys. The method **.values()** returns a dictionary values. The method **.items()** returns all key-value pairs. The method **.has_key()** returns true if the given key is in the dictonary; otherwise false. An indiviual value is given using the syntax **dictionary['key']**, where dicionary is the name of the dictionary and 'key' is the name of a key.

In [19]:
eagleNebula.keys() # return all dictionary keys

dict_keys(['Constellation', 'Apparent magnitude', 'NGC', 'Messier'])

In [20]:
eagleNebula.values() # return all dictionary values

dict_values(['Serpens', 6, 'NGC 6611', 'M16'])

In [21]:
eagleNebula.items() # returns all key-value pairs

dict_items([('Constellation', 'Serpens'), ('Apparent magnitude', 6), ('NGC', 'NGC 6611'), ('Messier', 'M16')])

In [22]:
eagleNebula['Messier'] # get the value of the Messier key

'M16'

In [23]:
eagleNebula['NGC'] # get the value of the NGC key

'NGC 6611'

In [24]:
eagleNebula.get('NGC')

'NGC 6611'

In [25]:
# Can also check dictonary keys by using the "in" and "not" operators
if 'Messier' in eagleNebula:
    print('eEssier is a key')
    
if 'RA' not in eagleNebula:
    print('RA is not a key')

eEssier is a key
RA is not a key


### Updating a dictionary

In [26]:
eagleNebula['Object type'] = 'Nebula' # Add key and value

if 'Object type' in eagleNebula:
    print(eagleNebula['Object type'])

Nebula


In [27]:
if 'Object type' in eagleNebula:
    eagleNebula['Object type'] = 'Nebula, H II region with cluster' # updating existing key
    
print(eagleNebula['Object type'])

Nebula, H II region with cluster


In [28]:
if eagleNebula['Object type'] in eagleNebula:
    del eagleNebula['Object type'] # delete an existing key

The method **.clear()** removes all key-value pairs.
But first, let's make a copy using the method **.copy()**

In [29]:
newEagleNebula = eagleNebula.copy()
eagleNebula.clear()
print(eagleNebula)
print(newEagleNebula)

{}
{'NGC': 'NGC 6611', 'Apparent magnitude': 6, 'Constellation': 'Serpens', 'Messier': 'M16', 'Object type': 'Nebula, H II region with cluster'}


### Iterating through a dictionary

In [30]:
for item in newEagleNebula.items():
    print(item)

('NGC', 'NGC 6611')
('Apparent magnitude', 6)
('Constellation', 'Serpens')
('Messier', 'M16')
('Object type', 'Nebula, H II region with cluster')


In [31]:
for key in newEagleNebula.keys():
    print(key)

NGC
Apparent magnitude
Constellation
Messier
Object type


## Numpy Arrays
NumPy is a python library for scientific computering which provides a powerful N-dimensional array object. We will only scratch the surface of NumPy so it is recommened to review the additional provided resources. **Spoiler:** The image data in a FITS file is contained in a Numpy array.

**Resources**
-  https://jakevdp.github.io/PythonDataScienceHandbook/02.02-the-basics-of-numpy-arrays.html
-  https://docs.scipy.org/doc/numpy/reference/
-  https://docs.scipy.org/doc/numpy/user/quickstart.html
-  https://docs.scipy.org/doc/numpy-1.10.1/reference/routines.array-creation.html
-  https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.math.html

### Create basic arrays

In [32]:
import numpy as np

oneD = np.array([1,2,3]) # create a one-dimensional array with 3 elements
oneD

array([1, 2, 3])

In [33]:
twoD = np.array([[1,2,3], [4,5,6]]) # create a two-dimensional array
twoD

array([[1, 2, 3],
       [4, 5, 6]])

In [34]:
zeros = np.zeros((2,2)) # create a two-dimensional array of zeros
zeros

array([[ 0.,  0.],
       [ 0.,  0.]])

In [35]:
ones = np.ones((2,2)) # create a two-dimensional array on ones
ones

array([[ 1.,  1.],
       [ 1.,  1.]])

In [36]:
randoms = np.random.random((2,2)) # create a two-dimensional array with random values
randoms

array([[ 0.7332228 ,  0.07955718],
       [ 0.91656148,  0.95936189]])

In [37]:
randomIntArray = np.random.randint(10, size=6)
randomIntArray

array([3, 3, 1, 7, 8, 4])

In [38]:
floatArray = np.array(range(10), dtype=float) # create an array full of floats
floatArray

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

In [39]:
intArray = np.array(range(10), dtype=int) # create an array full of ints
intArray

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [40]:
spacedArray = np.arange(1, 10, 2) # array with evenly spaced values within a given interval of 2
spacedArray

array([1, 3, 5, 7, 9])

In [41]:
np.concatenate([floatArray, intArray]) # combine arrays

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.,  0.,  1.,  2.,
        3.,  4.,  5.,  6.,  7.,  8.,  9.])

### Accessing arrays

Arrays values are accessed and modified using an index or indices just like a list.

In [42]:
oneD[0]

1

In [43]:
oneD[0] = 100
oneD

array([100,   2,   3])

In [44]:
twoD[1, 2]

6

In [45]:
np.where(twoD==5) # returns the index of where 5 is in the array

(array([1]), array([1]))

In [46]:
twoD[1, 1]

5

In [47]:
# Arrays can be sliced using the syntax: x[start:stop:step]
x = np.arange(10)
x[:5] # first 5 elements

array([0, 1, 2, 3, 4])

In [48]:
x[5:] # elements after index 5

array([5, 6, 7, 8, 9])

In [49]:
x[4:7] # specificed range of subarray

array([4, 5, 6])

In [50]:
x[::2] # every other element

array([0, 2, 4, 6, 8])

In [51]:
x[1::2]  # every other element, starting at index 1

array([1, 3, 5, 7, 9])

In [52]:
x[::-1]  # all elements, reversed

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [53]:
x[5::-2]  # reversed every other from index 5

array([5, 3, 1])

### Basic array operations, functions, variables, and statistics



In [54]:
twoD.size # returns the number of elements

6

In [55]:
twoD.ndim # returns the number of dimensions

2

In [56]:
twoD.shape # returns the shape of the array

(2, 3)

In [57]:
twoD[:2, :2]

array([[1, 2],
       [4, 5]])

In [58]:
np.sqrt(4)

2.0

In [59]:
np.pi

3.141592653589793

In [60]:
np.cos(120)

0.8141809705265618

In [61]:
twoD.max()

6

In [62]:
twoD.min()

1

In [63]:
np.mean(twoD)

3.5

In [64]:
np.median(twoD, axis=0)

array([ 2.5,  3.5,  4.5])

In [65]:
np.median(twoD, axis=1)

array([ 2.,  5.])

In [66]:
twoD.std()

1.707825127659933

In [67]:
twoD.sum()

21

### Arithmetic with Arrays

In [68]:
np.power(spacedArray, 2) # spacedArray raised to the second power

array([ 1,  9, 25, 49, 81])

In [69]:
a = np.arange(1,6,1)
b = np.arange(11, 16, 1)
print(a)
print(b)

[1 2 3 4 5]
[11 12 13 14 15]


In [70]:
# Addition
print(a+1)
print(a+b)

[2 3 4 5 6]
[12 14 16 18 20]


**Note:** Array multiplication is not matrix multiplication. Use .dot() for the dot product.

In [71]:
# Multiplicaton
print(np.arange(5)*2)

[0 2 4 6 8]


## Summary

### Lists
-  Items in a list are accessed by an **index**, or the position, of an item
-  A list is an ordered set of objects
-  Lists are mutable objects

### Dictionaries
-  Items in a dictionary are accesses via **keys**
-  A dictionary is an unordered set of key-value pairs
-  Dictionaries are mutable objects

### Numpy Arrays
-  Arrays values can be accessed and modified using an index or indices just like a list.
-  A subarray can be accessed using slices just like a list