# Dictionaries and DataFrames

## Dictionaries

* Dictionaries are (sort of) a generalized version of arrays. Instead of `Index,Value` pairs of arrays, dictionares ues `Key:Value` pairs.
* They can be created via a comma-separated list of `Key:Value` pairs within curly braces `{}`
* Dictionaries are at the heart of a lot of what goes on in Python "under-the-hood"

![Python Dict](./images/PyDict.jpg)

In [None]:
import numpy as np

In [None]:
numbers = {'one': 1, 
           'two': np.array([2, 2]), 
           'three': np.array([3, 3, 3])}

numbers

#### Access a `value` via the `key`

In [None]:
numbers['two']

#### Add an `index` after the `key` for a slice of a `value`

In [None]:
numbers['two'][0]

#### New items can be added to the dictionary using indexing

In [None]:
numbers['ninety'] = np.array(['n', 'i', 'n', 'e', 't', 'y'])

numbers

----



In [None]:
numbers.keys()

In [None]:
numbers.values()

In [None]:
for my_key,my_value in numbers.items():
    print (my_key, my_value)

---

# The `pandas` package - Python Data Analysis Library - `DataFrame`

In [None]:
import pandas as pd

### Make some arrays of data

In [None]:
my_star_name = np.array(['Sirius', 'Canopus', 'Rigil_Kentaurus', 'Arcturus', 'Vega', 'Capella', 'Rigel'])

my_star_name

In [None]:
my_star_dist = np.array([8.6, 74, 4.3, 34, 25, 41, 1400])

my_star_dist

In [None]:
my_star_appmag = np.array([-1.46, -0.72, -0.27 ,-0.04, 0.03, 0.08, 0.12])

my_star_appmag

### You can use the arrays as the "data" part of a dictionary

In [None]:
my_star_name_dict = {'Name': my_star_name}

In [None]:
my_star_name_dict

In [None]:
my_star_name_dict['Name'][0]

## Arrays -> Dictionaries -> `DataFrame`

* Each Array is made into a Dictionary
* Each Dictionary become a Column in the `DataFrame`

In [None]:
star_table = pd.DataFrame(
    {'Name': my_star_name,
     'Distance': my_star_dist,
     'AppMag': my_star_appmag
    }
)

In [None]:
star_table

### Notice that each row has an `index` assigned to it.

In [None]:
print(star_table)

In [None]:
star_table.info()

In [None]:
star_table.describe()

In [None]:
star_table.min()

##### `.min(), .max(), .mean(), .std()`

### Number of values

* Again, many ways to count

In [None]:
star_table.count()

In [None]:
star_table['Name'].count()

In [None]:
np.size(star_table['Name'])

In [None]:
len(star_table['Name'])

## Pieces


* `head()`
* `tail()`
* `loc[row, column]`

In [None]:
star_table

In [None]:
star_table.head(2)

In [None]:
star_table.tail(2)

In [None]:
star_table.loc[2:3, :]

#### Notice that `pandas` slices are different than `numpy` slices - they include the end point

In [None]:
star_table.loc[2:3, ['Name', 'AppMag']]

In [None]:
star_table.loc[:, 'Distance']

In [None]:
star_table['Distance']

In [None]:
star_table['Distance'].count()

## Sorting (`.sort_values`)

In [None]:
star_table.sort_values(['Name'])

In [None]:
star_table.sort_values(['Distance'])

In [None]:
star_table.sort_values(
    ['Distance'],
    ascending=False
)

#### The original table is unchanged

In [None]:
star_table

In [None]:
star_table.sort_values(
    ['Distance'],
    ascending=False,
    inplace=True
)

#### The original table is changed

In [None]:
star_table

#### Notice that the row-index has **NOT** been reordered!

In [None]:
star_table.loc[2:4, :]

## Resetting the index (`.reset_index`)

In [None]:
star_table.reset_index(drop=True, inplace=True)

In [None]:
star_table

In [None]:
star_table.loc[2:4, :]

## Picking out data (`.query`)

In [None]:
star_table.query("Distance < 35")

In [None]:
star_table.query("Distance < 35").count()

In [None]:
star_table.query("Distance < 35")['Name'].count()

In [None]:
star_table.query("Distance < 35 and AppMag < 0")

## Methods

* The `pandas` package has a huge number of different ways to explore, manipulate, and extract data from a `DataFrame`.
* The [`DataFrame` reference page](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)
lists all of the different methods that can be used on a `DataFrame`.
* These methods can be chained together to explore a `DataFrame`

----

#### Example: Find the name of the star in `star_table` with a distance closest to 30 l.y.

In [None]:
star_table

#### Write a function to calculate |distance|

In [None]:
def find_distance(distance, my_dist):
    result = np.abs(distance - my_dist)
    return result

In [None]:
my_distance_value = 30

#### `.assign()` creates a new column

In [None]:
new_star_table = (
    star_table
    .assign(delta_distance = find_distance(star_table['Distance'], my_distance_value))
)

In [None]:
new_star_table

#### `.sort_values()` by the new column

In [None]:
new_star_table = (
    star_table
    .assign(delta_distance = find_distance(star_table['Distance'], my_distance_value))
    .sort_values(['delta_distance'])
    .reset_index(drop=True)
)

new_star_table

#### The smallest value is at the top of the table

In [None]:
new_star_table.head(1)

In [None]:
new_star_table.loc[:0]

In [None]:
new_star_table['Name'][0]

#### The original table is unchanged

In [None]:
star_table

---

## Back to `numpy`: `DataFrame` -> Arrays

* Creates `numpy` arrays by row

In [None]:
star_table_numpy = star_table.to_numpy()

In [None]:
star_table_numpy

In [None]:
star_table_numpy[0]

### To do columns you need the use the column label

In [None]:
star_table_numpy_name = star_table['Name'].to_numpy()

In [None]:
star_table_numpy_name

In [None]:
my_star_name