# Dictionaries and DataFrames

## Dictionaries

* Dictionaries are (sort of) a generalized version of arrays. Instead of `Index,Value` pairs of arrays, dictionares ues `Key:Value` pairs.
* They can be created via a comma-separated list of `Key:Value` pairs within curly braces `{}`
* Dictionaries are at the heart of a lot of what goes on in Python "under-the-hood"

In [0]:
numbers = {'one':1, 'two':2, 'three':3}

numbers

#### Access a `value` via the `key`

In [0]:
numbers['two']

#### New items can be added to the dictionary using indexing

In [0]:
numbers['ninety'] = 90

numbers

----



In [0]:
numbers.keys()

In [0]:
numbers.values()

In [0]:
for key,value in numbers.items():
    print (key, value)

---

# The `pandas` package - Python Data Analysis Library - `DataFrame`

In [0]:
import pandas as pd
import numpy as np

In [0]:
my_star_name = np.array(['Sirius','Canopus','Rigil_Kentaurus','Arcturus','Vega','Capella','Rigel'])
my_star_dist = np.array([8.6,74,4.3,34,25,41,1400])
my_star_appmag = np.array([-1.46,-0.72,-0.27,-0.04,0.03,0.08,0.12])

In [0]:
my_star_name,my_star_dist,my_star_appmag

In [0]:
star_table = pd.DataFrame(
    {'Name': my_star_name,
     'Distance': my_star_dist,
     'AppMag': my_star_appmag
    }
)

In [0]:
star_table

### Notice that each row has an `index` assigned to it.

In [0]:
print(star_table)

In [0]:
star_table.info()

In [0]:
star_table.describe()

In [0]:
star_table.count()

In [0]:
star_table.min()

##### `.min(), .max(), .mean(), .std(), .count()`

### Pieces


* `head()`
* `tail()`
* `loc[row, column]`

In [0]:
star_table

In [0]:
star_table.head(2)

In [0]:
star_table.tail(2)

In [0]:
star_table.loc[2:3, :]

In [0]:
star_table.loc[2:3, ['Name', 'AppMag']]

In [0]:
star_table.loc[:, 'Distance']

In [0]:
star_table['Distance']

## Sorting (`.sort_values`)

In [0]:
star_table.sort_values(['Name'])

In [0]:
star_table.sort_values(['Distance'])

In [0]:
star_table.sort_values(
    ['Distance'],
    ascending=False
)

#### The original table is unchanged

In [0]:
star_table

In [0]:
star_table.sort_values(
    ['Distance'],
    ascending=False,
    inplace=True
)

#### The original table is changed

In [0]:
star_table

#### Notice that the row-index has **NOT** been reordered!

In [0]:
star_table.loc[2:4, :]

## Resetting the index (`.reset_index`)

In [0]:
star_table.reset_index(drop=True, inplace=True)

In [0]:
star_table

In [0]:
star_table.loc[2:4, :]

## Picking out data (`.query`)

In [0]:
star_table.query("Distance < 35")

In [0]:
star_table.query("Distance < 35").count()

In [0]:
star_table.query("Distance < 35")['Name'].count()

In [0]:
star_table.query("Distance < 35 and AppMag < 0")

## Methods

* The `pandas` package has a huge number of different ways to explore, manipulate, and extract data from a `DataFrame`.
* The [`DataFrame` reference page](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)
lists all of the different methods that can be used on a `DataFrame`.
* These methods can be chained together to explore a `DataFrame`

----

#### Simple example: AppMag / 2 for all stars with Distance < 35

In [0]:
star_table

In [0]:
star_table.query("Distance < 35")['AppMag'].div(2)

#### More complicated example: Find the name of the star in `star_table` with a distance closest to 30 l.y.

In [0]:
star_table

In [0]:
my_distance_value = 30

#### We will just work with the `Distance` column

In [0]:
star_table.loc[:, 'Distance']

#### `.sub()` subtracts a value

In [0]:
star_table.loc[:, 'Distance'].sub(my_distance_value)

#### `.abs()` absolute value

In [0]:
star_table.loc[:, 'Distance'].sub(my_distance_value).abs()

#### `.idxmin()` the row-index of the minimum value

In [0]:
star_table.loc[:, 'Distance'].sub(my_distance_value).abs().idxmin()

In [0]:
star_table.loc[3,:]

In [0]:
star_table['Name'][3]

In [0]:
my_min_index = (
    star_table
    .loc[:, 'Distance']
    .sub(my_distance_value)
    .abs()
    .idxmin()
)

star_table['Name'][my_min_index]

## Saving a table `.to_csv()`

In [0]:
star_table.to_csv('./Data/NewStarTable.csv', index=False)