In [None]:
from astropy.table import Table

import numpy as np
import pandas as pd

%matplotlib inline
import matplotlib.pyplot as plt

# Getting your data into Python

## Tables and dataframes

For more detail about tables, see [this tutorial notebook about tables](../Extras/06-Tables/Tables.ipynb) and the [astropy documentation on tables](https://docs.astropy.org/en/stable/table/index.html).

## Quick table tour

### Creating tables

There is great deal of flexibility in the way that a table can be initially constructed:

- Read an existing table from a file or web URL
- Add columns of data one by one
- Add rows of data one by one
- From an existing data structure in memory:

  - List of data columns
  - Dict of data columns
  - List of row dicts
  - NumPy homgeneous array or structured array
  - List of row records
  
See the documentation section on [Constructing a table](http://astropy.readthedocs.org/en/stable/table/construct_table.html) for the details and plenty of examples.

In [None]:
t = Table()
t['name'] = ['larry', 'curly', 'moe', 'shemp', 'abbot', 'costello']
t['flux'] = [10.3, 5.4, 123, 589, 210, 34]

### Looking at your table

In a Jupyter notebook, showing a table will produce a nice HTML representation of the table:

In [None]:
t

If you did the same in a terminal session you get a different view that isn't as pretty but does give a bit more information about the table:

    >>> t
    <Table rows=4 names=('name','flux')>
    array([('source 1', 1.2), ('source 2', 2.2), ('source 3', 3.1),
           ('source 4', 4.3)], 
          dtype=[('name', 'S8'), ('flux', '<f8')])

To get a plain view which is the same in notebook and terminal use `print()`:

To get the table column names and data types using the `colnames` and `dtype` properties:

In [None]:
t.colnames

In [None]:
t.dtype

### Accessing parts of the table

We can access the columns and rows as for NumPy structured arrays. Notice that the outputs are `Column`, `Row`, or `Table` objects depending on the context.

In [None]:
t['flux']  # Flux column (notice meta attributes)

#### Row numbering starts at zero

In [None]:
t['flux'][1]  # Row 1 of flux column

In [None]:
t[1]  # Row obj for with row 1 values

In [None]:
t[1]['flux']  # Flux column of row 1

#### Slice syntax in python

A range of items in Python is called a slice. The slice `[1:3]` means starting at item 1 and going up to, but not including, item 3.

In [None]:
t[1:3]  # 2nd and 3rd rows in a new table

In [None]:
t[-1]

Another type of slices lets you list the items you want.

In [None]:
t[[0, 4]] # Just human rows 1 and 5

#### Boolean slice

In another type of slicing you provide a list, for each row in the table, that is either `True` or `False`. Rows that are `True` are kept, and rows that `False` are dropped.

In [None]:
boo_slice = [False, True, False, True, False, False]
t[boo_slice]

Boolean indexing may seem more difficult than other methods of slicinig at the moment, but it turns out to be *extremely powerful*.

### Making new columns 

Since we have a flux column in our data we should probably calculate a magnitude.

The [numpy]() package, which everyone abbreviates to `np` in code, has just about any mathematical function or operation you might need, optimized to work with reasonably large arrays.

Instrumental magnitude is given by $m = -2.5 \log_{10}(flux)$, so the code to create a new `mag` column is below.

In [None]:
t['mag'] = -2.5 * np.log10(t['flux'])

Let's also add a color for the B-V color of these stars. 

In [None]:
t['B-V'] = [0.3, -0.1, 0.74, -0.05, 0.41, 1.5]

In [None]:
t

### Boolean indexing 🥳 ♬♪♬

#### Boolean indexing is one of the most power properites of tables (and numpy arrays).

In the first example of this, above, we manually typed out `True` or `False` for each row in the table. That is tedious and error-prone.

Another way to create a boolean index is to use comparison operators. 

The cell below creates a mask that is true for all rows where the B-V color is less than zero.

In [None]:
blue_stars = t['B-V'] < 0  # Define boolean index that is true for all stars with negative colors
blue_stars  # display the mask

Once you have created the mask you can index the table with it.

In [None]:
t[blue_stars]

#### Boolean indexes can be combined

There are three operators for combining masks:

+ `&` means logical "and"; it results in `True` if both inputs are `True`
+ `|` means logical "or"; it results in `True` if either input is `True`
+ `~` means logical "not"; `True` is turned in to `False` and the other way around

Let's now select stars which are blue and bright. For the sake of argument, let's assume by "bright" we mean "brighter than magnitude -4".

In [None]:
blue_stars = t['B-V'] < 0
bright_stars = t['mag'] < -4 # Remember, the magnitude system is backwards

t[blue_stars & bright_stars]  # Return stars which are both blue and bright

In [None]:
t[blue_stars | bright_stars] # Return stars which are either blue OR bright OR both

## Reading and writing tabular data

### Writing

Astropy can write data to a [large variety of formats](https://docs.astropy.org/en/stable/table/io.html#supported-formats), including CSV and FITS. 

One particularly useful format is ECSV, which is a plain-text CCSV file with information about the table.

As an example, we write out our table in three different formats.

In [None]:
t.write('my_table.fits')
t.write('my_table.csv')
t.write('my_table.ecsv')

#### Specifying the format

You can specify the format by the file extension, or by using the `format` keyword argument. In the example below we save the table as tab-separated text.

In [None]:
t.write('my_table.txt', format='ascii.tab')

### Reading

Astropy can sometimes guess the format of a file you are reading based on the file extension (the part of the file name after the `.`).

In [None]:
my_table_again = Table.read('my_table.ecsv')

In [None]:
my_table_again

Sometimes, though, astropy cannot guess the format

In [None]:
my_table_from_txt = Table.read('my_table.txt')

In cases like this, you may have to tell astropy that the file is `ascii`

In [None]:
my_table_from_txt = Table.read('my_table.txt', format='ascii')

Let's check that that actually worked by displaying the first 3 rows of the table.

In [None]:
my_table_from_txt[:3]

## Interfacing with Pandas
Astropy `Table` includes  `to_pandas()` and `from_pandas()` [methods](http://docs.astropy.org/en/stable/table/pandas.html)  that facilitate conversion to/from [pandas](http://pandas.pydata.org) `DataFrame` objects.  There are a few caveats in making these conversions:
 - Tables with multi-dimensional columns cannot be converted.
 - Masked values are converted to `numpy.nan`. Numerical columns, int or float, are thus converted to ``numpy.float`` while string columns with missing values are converted to object columns with ``numpy.nan`` values to indicate missing or masked data. Therefore, you cannot always round-trip between `Table` and `DataFrame`.

In [None]:
my_table_data_frame = t.to_pandas()
my_table_data_frame

In [None]:
t_pd = Table.from_pandas(my_table_data_frame)