# Todo 

# 10 Minutes to Blaze 

This section provides a short introduction for Blaze. After reading this section, beginners should know how to do basic data operations with blaze [rephrase].

In [2]:
import blaze as bz
from blaze import odo

# Data Input and Output

Data used in this section can be found [here](https://github.com/Will-So/blaze_data). The interface for reading different data types is consistent. With all data stores, we use the `Data` object to load the data. 

## Reading from a CSV

In [5]:
from blaze import Data

In [14]:
Data(DATA_DIR + 'iris.csv')

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


## Reading from an SQL Database

In [48]:
DATA_DIR

'/Users/Will/Data/blaze_data/'

In [75]:
db = Data('sqlite:////' + DATA_DIR + 'lahman2013.sqlite')

We can use the `fields` attribute to look at a list of the tables in the database. Note that with Ipython you can type in `db.` and then hit the tab key to see all the tables as well. 

In [76]:
print(db.fields,)

['AllstarFull', 'Appearances', 'AwardsManagers', 'AwardsPlayers', 'AwardsShareManagers', 'AwardsSharePlayers', 'Batting', 'BattingPost', 'Fielding', 'FieldingOF', 'FieldingPost', 'HallOfFame', 'Managers', 'ManagersHalf', 'Master', 'Pitching', 'PitchingPost', 'Salaries', 'Schools', 'SchoolsPlayers', 'SeriesPost', 'Teams', 'TeamsFranchises', 'TeamsHalf', 'temp']


In [80]:
df = db.Batting
df

Unnamed: 0,playerID,yearID,stint,teamID,lgID,G,G_batting,AB,R,H,2B,3B,HR,RBI,SB,CS,BB,SO,IBB,HBP,SH,SF,GIDP,G_old
0,aardsda01,2004,1,SFN,NL,11,11.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,11.0
1,aardsda01,2006,1,CHN,NL,45,43.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,45.0
2,aardsda01,2007,1,CHA,AL,25,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0
3,aardsda01,2008,1,BOS,AL,47,5.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,5.0
4,aardsda01,2009,1,SEA,AL,73,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
5,aardsda01,2010,1,SEA,AL,53,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
6,aardsda01,2012,1,NYA,AL,1,,,,,,,,,,,,,,,,,,
7,aardsda01,2013,1,NYN,NL,43,43.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
8,aaronha01,1954,1,ML1,NL,122,122.0,468.0,58.0,131.0,27.0,6.0,13.0,69.0,2.0,2.0,28.0,39.0,,3.0,6.0,4.0,13.0,122.0
9,aaronha01,1955,1,ML1,NL,153,153.0,602.0,105.0,189.0,37.0,9.0,27.0,106.0,3.0,1.0,49.0,61.0,5.0,3.0,7.0,4.0,20.0,153.0


## Reading from HDF5

First, we take a look at the `File Object` of the HDF5 files. 

In [59]:
Data(DATA_DIR + 'sample.hdf5').fields

['info', 'points', 'z']

So now we see that there are three different datasets in the file. We pick the one we are interested in:

In [40]:
Data(DATA_DIR + 'sample.hdf5::/z')

Unnamed: 0,z
0,0.628902
1,0.797281
2,0.524203
3,0.04775
4,0.238235
5,0.681711
6,0.772755
7,0.979527
8,0.294134
9,0.787979


Dealing with HDF5 files is somewhat more complex than most types of files. More details for dealing with HDF5 files can be found [here](https://odo.readthedocs.org/en/latest/hdf5.html). 

## Reading other Files 

## Writing Other Files

# Selecting 

# Dealing with Missing Data 

Pending `isnull` implementation.

# Basic Exploratory Analysis

## Selection

In [79]:
df

Unnamed: 0,playerID,yearID,gameNum,gameID,teamID,lgID,GP,startingPos
0,aaronha01,1955,0,NLS195507120,ML1,NL,1,
1,aaronha01,1956,0,ALS195607100,ML1,NL,1,
2,aaronha01,1957,0,NLS195707090,ML1,NL,1,9.0
3,aaronha01,1958,0,ALS195807080,ML1,NL,1,9.0
4,aaronha01,1959,1,NLS195907070,ML1,NL,1,9.0
5,aaronha01,1959,2,NLS195908030,ML1,NL,1,9.0
6,aaronha01,1960,1,ALS196007110,ML1,NL,1,9.0
7,aaronha01,1960,2,ALS196007130,ML1,NL,1,9.0
8,aaronha01,1961,1,NLS196107110,ML1,NL,1,
9,aaronha01,1961,2,ALS196107310,ML1,NL,1,


In [81]:
df[['playerID', 'yearID', 'H']]

Unnamed: 0,playerID,yearID,H
0,aardsda01,2004,0.0
1,aardsda01,2006,0.0
2,aardsda01,2007,0.0
3,aardsda01,2008,0.0
4,aardsda01,2009,0.0
5,aardsda01,2010,0.0
6,aardsda01,2012,
7,aardsda01,2013,0.0
8,aaronha01,1954,131.0
9,aaronha01,1955,189.0


In [98]:
df.H[5:10]

Data:       Engine(sqlite://///Users/Will/Data/blaze_data/lahman2013.sqlite)
Expr:       _34.Batting.H[5:10]
DataShape:  5 * ?int32

## Basic Arithmetic

In [95]:
df1 =df.H * 2


Unnamed: 0,H
0,0.0
1,0.0
2,0.0
3,0.0
4,0.0
5,0.0
6,
7,0.0
8,262.0
9,378.0


In [100]:
type(df[:5])

blaze.expr.expressions.Slice

## Basic Descriptive Stats

In [93]:
df.H.count()

In [84]:
df.H.max()

## Plotting

# Operations 

# Merging

In [None]:
from blaze import join

# Grouping 

# Converting Data 

# Gotchas

# No Read

In [4]:
DATA_DIR = '/Users/Will/Data/blaze_data/'