<img src='img/logo.png' />

<img src='img/title.png'>

<img src='img/py3k.png'>

# Table of Contents
* [Learning Objectives](#Learning-Objectives)
	* [Some Simple Setup](#Some-Simple-Setup)
	* [Compound Data:  Structured Arrays / Record Arrays:  `np.record`](#Compound-Data:--Structured-Arrays-/-Record-Arrays:--np.record)
* [DataFrames](#DataFrames)
* [IO on arrays](#IO-on-arrays)
* [Learning Objectives:](#Learning-Objectives:)
* [Exercise:  Data I/O](#Exercise:--Data-I/O)
	* [Some Simple Setup](#Some-Simple-Setup)
	* [CSV Data](#CSV-Data)

# Learning Objectives

After completion of this module, learners should be able to:

* Usage of `np.record` data type.

## Some Simple Setup

In [None]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import os.path as osp
import numpy.random as npr
vsep = "\n-------------------\n"

def dump_array(arr):
    print("%s array of %s:" % (arr.shape, arr.dtype))
    print(arr)

## Compound Data:  Structured Arrays / Record Arrays:  `np.record`

NumPy arrays have elements with a single type.  But, that type can be a compound type (i.e., a record or a struct).

Two main recommended ways of specifying type codes:
  
  * b1, i1, i2, i4, i8, u1, u2, u4, u8, f2, f4, f8, c8, c16, a&lt;n&gt;
   (bytes, ints, unsigned ints, floats, complex and fixed length strings of a given *byte* lengths)
  * int8,...,uint8,...,float16, float32, float64, complex64, complex128
   (similar but with *bit* sizes)

In [None]:
# a record with a 4 byte int, a 4 byte float, 
# and 10 bytes of characters (ascii values)
x = np.zeros((2,), dtype=('i4,f4,a10'))
print(x)
print(repr(x), end=vsep)

x[:] = [(1, 5., 'Hello'), (2, 6., 'World')]
print(x)
print(repr(x), end=vsep)

print("a field:")
print(x['f1'])
print(repr(x['f1']))

In [None]:
%%file tmp/patient-records.csv
name,date,weight(kg),height(cm)
Mark,2011-01-01,86.1,180
Barb,2012-02-03,65.7,167
Ethan,2013-04-06,29.45,127

In [None]:
patient_dtype = [("name", "a10"),
                 ("visit_date", 'datetime64[D]'),
                 ("weight", np.float),
                 ("height", np.int)]
data = np.loadtxt("tmp/patient-records.csv", 
                  skiprows=1, 
                  delimiter=",", 
                  dtype=patient_dtype,
                  converters = {1: np.datetime64})

print("first row: ", data[0])
print("all weights: ", data['weight'])

# BMI = kg / m**2
print("BMIs:", data['weight'] / (data['height']/100.0)**2)

# DataFrames

``recarrays`` are row-oriented, with multiple dtypes per row

``DataFrames`` are column-oriented with multiple dtypes per row

This leads to much more efficient storage patterns as acccess is typicall by the *column*

In [None]:
df = pd.DataFrame.from_records(data)
df

In [None]:
df.dtypes

In [None]:
print("first row:\n", df.loc[0], "\n")
print("all weights:\n", df['weight'], "\n")

# BMI = kg / m**2
print("BMIs:\n", df['weight'] / (df['height']/100.0)**2)

# IO on arrays

We can also save and load arrays

In [None]:
#saving / load data
np.savez('tmp/data.npz',data=data) # list of arrays to store
dataz = np.load('tmp/data.npz')

print(dataz.files)     # list of arrays stored in this archive
print(dataz['data'])

In [None]:
# cleanup
!rm tmp/data.npz

# Learning Objectives:

After completion of this module, learners should be able to:

* import data from a comma-separated values (CSV) file directly into a `numpy` ndarray

# Exercise:  Data I/O

## Some Simple Setup

In [None]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import os.path as osp
import numpy.random as npr
vsep = "\n-------------------\n"

def dump_array(arr):
    print("%s array of %s:" % (arr.shape, arr.dtype))
    print(arr)

## CSV Data

Using the code snippet below to grab some stock market data from Yahoo, read the data into an array of appropriately structured records.  The data grabbed is for all of 2013.  Find the highest high and the lowest low for your ticker symbol.  For extra credit, graph the daily closes.

Use `help(np.genfromtxt)` or `help(np.recfromcsv)` to get the correct arguments to read the data correctly.

In [None]:
try:
    from urllib.request import urlretrieve
except ImportError:
    from urllib import urlretrieve # Python 2.7
sym = "GE"

base_url = "http://ichart.finance.yahoo.com/table.csv?"
url_args = "s=%s&d=11&e=31&f=2013&g=d&a=0&b=2&c=2013&ignore=.csv"
quote_url = base_url + url_args

urlretrieve(quote_url % sym, "tmp/"+sym+".csv")

In [None]:
# note, on unix-heritage boxes, you can use head and tail 
# to glance at the data
!head -5 tmp/GE.csv
print("...")
!tail -5 tmp/GE.csv

<img src='img/copyright.png'>