# Basic Numpy Manipulation
In this section, we'll do some examples manipulating Numpy arrays.
numpy stands for "numerical python".  We'll learn about:
- Importing data
- Indexing
- Slicing and concatenating arrays
- Transposing and reshaping
- Inverting a matrix

In this lab, we're going to hit the major numpy operations on arrays.  Lots of just practicing with NumPy. Not much of a seemingly driving purpose or goal.


Here's an inline LaTex example...  Just for fun!  See how cool LaTex is?  It's prounounced "Lay - tech" by the way.  Not "latex" the stuff that surgical gloves are sometimes made of.

$\large\frac{a}{b}$

In [2]:
import numpy as np


# Importing Data

NumPy has some nice ways to load data from files and other sources.  np.genfromtxt() is a very sophisticated function to load data out of a text file.  Most commonly these files are "CSV" files - **Comma Separated Values**

Here is a snippet from the AverageRanifallByState.csv file:

```
State,Inches,MM,Rank
Alabama,58.3,1480,4
Alaska,22.5,572,39
Arizona,13.6,345,47
```

See how each value is separated by a comma?  That's the **delimiter** - it delimits the data elements.

The first row is apparently a **header**

You could load that file into Excel or Google Sheets.  Or export an Excel sheet as a CSV. 

#### --> 1. In the following cell, load the data into a NumPy Array called `rainfallData` using:

`    np.genfromtxt (  FLENAME,  delimiter (a comma),  dtype=str)`

The `dtype=str` tells NumPy to make an array of strings.

**Double Click Here for answer**
<!--
rainfallData = np.genfromtxt ('AverageRainfallByState.csv', delimiter=",", dtype=str)

-->



In [3]:


###
rainfallData = np.genfromtxt ('AverageRainfallByState.csv', delimiter=",", dtype=str)




### Indexing.
For Python lists, you index with the `[ ]`.  I like to read this as "atposition" so...

`rocks[2]` is *rocks at position 2*.  Note the first position is 0.  That should be feldspar.

In NumPy, you can specify indices with commas between the dimensions.  So...
```
a = np.array([[1,2,3],[4,5,6]])
print(a[1,2])
```
should print 6.

#### --> 2.  Index into Numpy

Print the value at rainfallData third row (not index 3), first column.  Should be "Alaska"

**Double Click Here for answer**
<!--
print(rainfallData[2,0])
-->



In [10]:
rocks=['garnate', 'quartz', 'feldspar', 'perskovite', "Ben (he's a bit boring.)"]
print(rocks[2])

a = np.array([[1,2,3],[4,5,6]])
print(a[1,2])

###
print(rainfallData[2,0])

feldspar
6
Alaska


### Slicing.
There are some really common **slices** to know.  Try executing the code below...



In [6]:
rocks=['garnate', 'quartz', 'feldspar', 'perskovite', "Ben (he's a bit boring.)"]
print("All but the first: ", rocks[1:])  # start at 1, go up to the very end.
print("All but the last: ", rocks[:-1])  # start at the beginning, 0, and go up to 1 before the end
print("2nd, 3rd, 4th: ", rocks[1:4]) # note the first rock is at index 0.
print("All: ", rocks[:])  # technically this makes a COPY of the list too.

All but the first:  ['quartz', 'feldspar', 'perskovite', "Ben (he's a bit boring.)"]
All but the last:  ['garnate', 'quartz', 'feldspar', 'perskovite']
2nd, 3rd, 4th:  ['quartz', 'feldspar', 'perskovite']
All:  ['garnate', 'quartz', 'feldspar', 'perskovite', "Ben (he's a bit boring.)"]


### Slicing Out Array Pieces with NumPy

In Numpy, you can specify slices on each dimension.  And if you put : for one of the indices, you get the entire range. And if you specify a specific row or column, the dimensionality drops.

Try executing the cell below.  That'll give you an idea of how to do this next party.  PARTY ON.

*Note that NumPy also defines ...  which is like "all the values in all the other dimensions" which isn't really that useful for 2d arrays.*


#### --> 3.  Parse out the values of rainfallData
* Store the headers in a vector `header` by selecting out and saving the first row.
* Store the state names in a vector `states` by selecting out the first column.  Remember you'll need to exclude the first row too because that's a header.
* Store the values in an array `rainfallDataValues` by cutting out the state names and header.
* Print the header, print the first 3 states, print the first 3 rows of data.

You should get, roughly:
```
['State' 'Inches' 'MM' 'Rank']
['Alabama' 'Alaska' 'Arizona']
[['58.3' '1480' '4']
 ['22.5' '572' '39']
 ['13.6' '345' '47']]
 ```
 
 **Double Click Here for answer**
<!--

header = rainfallData[0, :]
states = rainfallData[1:, 0]
rainfallDataValues = rainfallData[ 1:, 1: ]
print(header)
print(states[0:3])
print(rainfallDataValues[0:3])

-->


In [38]:
print("A is, as you remember: \n", a)
print("The first row, as a vector: ", a[0,:])  
print("The first column, as a vector: ", a[:,0])
print("And this is cool...")
print("All but the first column: ")
print(a[:, 1:])



###
header = rainfallData[0, :] 
states = rainfallData[1:, 0]
rainfallDataValues = rainfallData[ 1:, 1: ]
print(header)
print(states[0:3])
print(rainfallDataValues[0:3])

A is, as you remember: 
 [[1 2 3]
 [4 5 6]]
The first row, as a vector:  [1 2 3]
The first column, as a vector:  [1 4]
And this is cool...
All but the first column: 
[[2 3]
 [5 6]]
['State' 'Inches' 'MM' 'Rank']
['Alabama' 'Alaska' 'Arizona']
[['58.3' '1480' '4']
 ['22.5' '572' '39']
 ['13.6' '345' '47']]


## Convertying Types 
NumPy arrays are Homogeneous - all the same type.  You can cast them with the astype() method.
You can inspect the type of `a` with:  `a.dtype`  That'll show you that the type is int64 which is very specific:  integers that are 64 bits of data. Roughly values +/- 2^63 (one bit for sign). or +/- 4 billion billion.

the `a.dtype` is **not** a function method call.  It's a property.

You can convert `a` to a floats instead of ints with:  `a_floats = a.astype(float)` and that stores the result in variable `a_floats`

#### --> 4. Convert rainfallDataValues to float
And verify the type.

**Double Click Here for answer**
<!--
rainfallDataValues = rainfallDataValues.astype(float)
print(rainfallDataValues.dtype)
-->



In [49]:
print(a.dtype )
a_floats = a.astype(float)
print(a_floats.dtype)

###
rainfallDataValues = rainfallDataValues.astype(float)
print(rainfallDataValues.dtype)

int64
float64
float64


## Reshaping

We're going to now play briefly with reshaping, concatenating, and stacking.

If you have a one dimensional array, you can make it a 2d array.  And you can let NumPy figure out the size of one of the dimensions by putting a -1 in that size.


In [30]:
a1 = np.linspace(1,12,12)
print(a2)

a2 = a1.reshape( (-1,4 ))
print("\nReshaped (-1, 4)... " , a2.shape, "which is: \n", a2)

# See how you can reshape an array even if it isn't flat.
# a2 is not flat. 
a3 = a2.reshape( (2,-1) )
print("\nReshaped (2, -1)... " , a3.shape, "which is: \n", a3)

[[ 1.  2.  3.  4.]
 [ 5.  6.  7.  8.]
 [ 9. 10. 11. 12.]]

Reshaped (-1, 4)...  (3, 4) which is: 
 [[ 1.  2.  3.  4.]
 [ 5.  6.  7.  8.]
 [ 9. 10. 11. 12.]]

Reshaped (2, -1)...  (2, 6) which is: 
 [[ 1.  2.  3.  4.  5.  6.]
 [ 7.  8.  9. 10. 11. 12.]]


Reshape makes a copy.

You can also use `np.resize( a1,  (dimensions)   )` and that will do an in-place resizing, plus fill in data by repeating to get to the right size.  You can't say `a.resize( (dimensions) )` for some reason though.  I think "resize" is a NumPy function

## Transposing

In Octave, you can do `A'` to transpose an array.  In NumPy, you use `somearray.T`

That's the same as saying `somearray.transpose()`



In [39]:
print(a2)
print(a2.T)

[[ 1.  2.  3.  4.]
 [ 5.  6.  7.  8.]
 [ 9. 10. 11. 12.]]
[[ 1.  5.  9.]
 [ 2.  6. 10.]
 [ 3.  7. 11.]
 [ 4.  8. 12.]]


## Practice with Resizing and Transposing

#### --> 5.  Reshaping and resizing play.
* Create a 4,4 identity matrix, named `i`
* Flatten it.  Make your flatten statement so that it doesn't care what the size is. (use -1).  Store that in `iflat`
* Convert it to boolean.  Store that in `iflatbool`
* Chop it in half. Store the first half in `ihalfbool`.  It'll have 8 elements.  Use the `i.size` on it to figure out how big it is and `//` integer division for what *half* is
* Convert that to a 2-row array called `ibox`.  Use -1 for the columns.
* transpose that into a 4x2 array called `iboxtall`.

You should get:

```
[[ True False]
 [False  True]
 [False False]
 [False False]]
```

**Double Click Here for answer**
<!--
i = np.eye(4)
iflat = i.reshape(-1)
iflat.size
iflatbool = iflat.astype(bool)
ihalfbool = iflatbool[ : iflatbool.size // 2]
ibox = ihalfbool.reshape( (2,-1) )
iboxtall = ibox.T
print(iboxtall)
-->



In [45]:
###
i = np.eye(4)
iflat = i.reshape(-1)
iflat.size
iflatbool = iflat.astype(bool)
ihalfbool = iflatbool[ : iflatbool.size // 2]
ibox = ihalfbool.reshape( (2,-1) )
iboxtall = ibox.T


#
print(iboxtall)

[[ True False]
 [False  True]
 [False False]
 [False False]]


## Inverting

We'll close by showing that NumPy can invert matrices.  Inverting a matrix is what you do when you solve a big system of linear equations (roughly).  There's a lot of technical math and linear algebra specifically on how and why this works, but you may have used *row reduction* in high school to solve a set of linear equations. That basically computes a matrix inverse.

We won't be doing a lot with matrix inverses in this class.  Only for Mulitvariable Regression really.
There are a few ways to find an inverse.

* np.linalg.inv - the exact inverse.  Matrix must be nonsingluar (not "single" - have an inverse "spouse")
* np.linalg.pinv - pseudo - inverse.  This works with singular matrices, and gives A solution (there might be more than one solution).    It's also faster.  And numeric.  It uses gradient descent, actually … which we'll learn a lot about!

A few Octave exercises use pinv.





# Some stats on Rainfall

There are a lot of stats you can get on a NumPy array.  But NumPy is designed primarily for storing and manipulating data, and not as much for stats.



In [55]:
print("Average rainfall for all states", rainfallDataValues[:,0].mean(axis=0), " inches per year" )



Average rainfall for all states 37.078  inches per year
