# 1. Basic Arrays and Data Types

## 1.1 What is an array?

An array is an ordered and structured collection of elements. Arrays are structured around the number of dimensions they contain, as well as how many elements exist along each dimension. Today we will focus on arrays that are only one dimension.

For example, we can have a one-dimensional array that contains six elements:

  4 5 6 7 8 9


### 1.1.1 Why numpy arrays and not just python lists or tuples?

Numpy arrays are structured in memory very differently from python lists or tuples. The full array is stored within a single large block in the computer's memory, thus making it much quicker to do many operations. Especially when the number of elements in the array becomes large.

## 1.2 Creating an array

### 1.2.1 Creating an array from a Python lists or tuples

Perhaps the most straightforward way to create a numpy array is by passing a python list or a tuple to the `numpy.array` function. 

In [None]:
import numpy

In [None]:
# create a numpy array from a python list:
my_list = [1, 3, 5, 7, 9]
my_array = numpy.array(my_list)
print( my_array )

In [None]:
# numpy arrays can also be created based on python tuples:
my_tuple = (11, 12, 13, 14, 15)
my_array =  numpy.array(my_tuple) 
print( my_array )

### 1.2.2 Creating an array from a file

Another common way of creating a numpy array is to load the information from a file. There are two major filetimes that can be read: text files (csv and similar data types) and binary. To load a text file, use `numpy.loadtxt()`. Next week we will discuss `numpy.load()` for dealing with binary files.

In [None]:
# code to create a file to load:
a = numpy.array([2, 4, 6, 8])
numpy.savetxt("sample_file.txt", a)

# code to load that file:
b = numpy.loadtxt("sample_file.txt", dtype=numpy.int32)

# check values....
print(a)
print(b)

Given the example above, I suspect you will not be surprised to learn that the function to write a numpy array to a text file is `numpy.savetxt()`.

### 1.2.3 Intrinsic NumPy array creation functions

Numpy has a function which is similar to the `range()` function which is `arange(a, b, s)`, where $a$ is the start point, $b$ is the end point and $s$ is the step size. Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop). The function can have both integers or floats as inputs (unlike `range()` can only support integers).

In [None]:
# fixed step size:
x = numpy.arange(1, 11.9, 2.1)
print( x )

Another function to create a range is the `linspace(a, b, i)` function, where $a$ is the start point, $b$ is the end point, and $i$ is the number of items. Return evenly spaced numbers over a specified interval [start, stop]. The endpoint of the interval is included by default but can optionally be excluded by setting `endpoint=False`.

The advantage of the `linspace` function is that you can specify the number of items and the advantage of the `arange` function is that you can specify the step size.

In [None]:
# fixed number of items:
y = numpy.linspace(0, 2, 9)
print( y )

We can use also `zeros` or `ones` functions to create an array of all zeros or ones with the desired shape:

In [None]:
z = numpy.zeros([10])
print( z )

In [None]:
o = numpy.ones([10])
print( o )

### 1.2.4 Creating arrays of random values

If we want to create an array with random numbers we can use the `numpy.random.random(n)` function, this function returns an array with `n` float numbers in the half-open interval `[0.0, 1.0)`.

In [None]:
# create array with four random numbers between 0 and 1:
print( numpy.random.random(4) )

`randint(low, high, n)` returns `n` random integers from the "discrete uniform" distribution of the specified dtype in the "half-open" interval `[low, high)`. If high is None (the default), then results are from [0, low).

In [None]:
# create array with 12 random integers between 5 and 10:
print( numpy.random.randint(5, 10, 12) )

The expression `numpy.random.uniform(low, high, n)` returns an array with `n` random numbers uniformly drawn from the interval between `[low, high)` (includes low, but excludes high). 

In [None]:
# create array with 12 random numbers with uniform distribution between 5 and 10:
print( numpy.random.uniform(5, 10, 12) )

`numpy.random.normal(loc=0.0, scale=1.0, size=None)` draws random samples from a normal (Gaussian) distribution. `loc` and `scale` define respectively the mean and standard deviation of the normal distribution.

In [None]:
# create array with 10 random numbers with normal distribution with mean 5 and standard deviation of 3:
print( numpy.random.normal(5, 3, 10) )

For more information about creating arrays: 
http://docs.scipy.org/doc/numpy/user/basics.creation.html

## 1.3 Utility Functions and attributes

### 1.3.1 Size and shape
Often it is really useful to check the size and shape of an array when debugging your code. For an array `a` you can use the attribute `a.size` to show the total number of elements in the array and the attribute `a.ndim` to see the number of dimensions. These can differ when we consider arrays that contain more than one dimension (next week).

In [None]:
a = numpy.random.uniform(1, 100, 1000000)
print(numpy.size(a), numpy.ndim(a), numpy.shape(a))
print(a.size, a.ndim, a.shape)

### 1.3.2 Checking equivalence
To check equivalance between two arrays, you could check:
- the match between their elements using `numpy.allclose()`

In [None]:
a = numpy.array([2, 4, 6, 8])
a = b

In [None]:
print(numpy.allclose(a, b))

To check whether the locations of two arrays in computer memory is the same, we can use `id()`:


In [None]:
print(id(a) == id(b))

## Exercise 1.1

Create an array of length 15 with equally spaced values from 13 to 99. Confirm that the number of elements is correct.

## Exercise 1.2

Create an array that ranges from 0 to 1 in steps of 0.015.

## Exercise 1.3

Create an array of length 10,000,000 with random values between 0 and 100. Confirm that the number of elements is correct.

-----



## 1.4 Arrays and data types

### 1.4.1 Data types in Python

Before discussing Numpy datatypes, first a small list of the data types in Python:

##### Immutable types:
- boolean (True, False)
- int (integer)
- float
- complex 
- str (string)
- byte
- tuple ( )

The type of these variables cannot be changed after they are created

##### Mutable types:
- list [ ]
- set
- dict { } (dictionary)

The type of these variables can be changed after being created

### 1.4.2 Data types in Numpy

Numpy is based on **arrays**. You can think of an array as a list, or a table, where each cell of the table contains an item of the same **datatype**. The elements of an array must all be the same data type. 

The 6 basic data types of Numpy arrays are:
- float (float16, float32, or float64)
- integer (int8, int16, int32, or int64)
- unsigned integer: this number cannot be negative (uint8, uint16, uint32, or uint64)
- boolean (bool)
- complex (complex64 or complex128)
- string (for example <U3 or <U64, where the number indicates the maximum length of the strings)

The data type of the numpy array `x` can be obtained with the `x.dtype` attribute. 

#### 1.4.2.1 Memory Details

The numbers 8, 16, 32, 64 in the name of datatypes are used to indicate the amount of memory storage.

For example, the data type `int8` only uses 1 byte (8 bits) and therefore this variable can only store a number in the range of -128 to 127. The data type int16 uses 2 bytes and therefore this variable can store a number in the range of -32768 to 32767. 

For the unsigned integer, the number can never by negative and therefore the data type uint8 can have a value in the range of 0 to 255, where 255 = 2^8 - 1. 

We can use `itemsize` attribute of a numpy array to retrieve the length of an array element in bytes.


In [None]:
# array with integers:
x = numpy.array([1, 3, 5, 7, 9])
print( x, x.dtype, x.itemsize )

In [None]:
x = numpy.array([1, 3, 5, 7, 9], dtype='int8')
print( x, x.dtype, x.itemsize )

In [None]:
# array with floats:
y = numpy.array([2.2, 4.4, 6.6, 8.8])
print( y, y.dtype, y.itemsize )

In [None]:
# array with booleans:
z = numpy.array([True, False, True])
print( z, z.dtype, z.itemsize )

In [None]:
# array with strings:
x = numpy.array(["a", "b", "cde"])
print( x, x.dtype, x.itemsize )

### 1.4.3 Specifying data types

If you initialize an array with multiple data types, the elements are converted to the same type. 

You can specify the data type of the array when you create the array with the `array()` function using the `dtype` keyword argument.

In [None]:
# explicitly specify the data type:
y = numpy.array([9, 8, 7, 6], dtype='float')
print( y, y.dtype, y.itemsize )

 Note that not every combination is possible. For example, when your array contains text then you cannot choose float as a data type, because the text cannot be converted to floats, except when the text is exactly representing a floating point number.

In [None]:
# Try to convert strings to integers
strings = ["12", "3", "24"]
z = numpy.array(strings, dtype='int')
print( z, z.dtype, z.itemsize )

In [None]:
# This will produce an error
numerals = ["one", "two", "three"]
a = numpy.array(numerals, dtype='int')
print( a, a.dtype, a.itemsize )

In [None]:
# array with mixed data types:
x = numpy.array([1, 3.4, True, 2.3+4.5j, "a"])
print( x, x.dtype, x.itemsize )

### 1.4.4 Converting to a different data type

use the `astype()` function to convert an existing Numpy array to a different type.

In [None]:
# convert an array with integers to the data type float:
x = numpy.array([1, 3, 5, 7, 9])
print( x, x.dtype, x.itemsize )

g = x.astype('float')
print( g, g.dtype, g.itemsize )

In [None]:
# convert an array with strings to float with 128 bits:
y = numpy.array(["1.4", "3.4", "5.4"])
print( y, y.dtype, y.itemsize )

h = y.astype('float64')
print( h, h.dtype, h.itemsize )

Sometimes, the data types are converted but the content is slightly changed. 
For example, when converting from float to integer, then the numbers are rounded down (floor). 
For example, when converting an array with only zeros and ones to the data type Boolean than the 0 is converted to False and the 1 is converted to True. 

In [None]:
# from float to integer
x = numpy.array([2.2, 3.2, 2.8])
print( x, x.dtype, x.itemsize)

a = x.astype('int')
print( a, a.dtype, a.itemsize )

In [None]:
# from 0-1 to boolean
x = numpy.array([0, 1, 1, 0])
print( x, x.dtype, x.itemsize )

a = x.astype('bool')
print( a, a.dtype, a.itemsize )

Not every data type can be converted to all other data types. Some examples:


In [None]:
# from non-numeric strings to integer
x = numpy.array(["a", "b", "c"])
a = x.astype('bool')

## Exercise 1.4

Create an array of length 5 containing random values between 1 and 10. Determine the type of the array. Convert the array to type `<U8`, which is a string that uses 8 bytes, and then convert it back to an array of float values. Describe what happens to the values.

## Exercise 1.5

Create an array of length 5 containing random values between 1 and 10. Convert the resulting array into an array of integers. Then convert it back to float values. Describe what happens to the values.

------

## 1.5 Indexing arrays

Indexing, as you might remember from Data Processing or Intro to R, involves selecting a subset of an array. There are many ways to address content in an array and we will discuss 3 here (for more complete information about indexing see
http://docs.scipy.org/doc/numpy/user/basics.indexing.html) and more next week:

### 1.5.1 Linear indexing

Indexing in a 1-dimensional array is the same as the indexing in a Python list (we will revisit this next week when we discuss multi-dimensional arrays).

When accessing more than one element from an array, the slicing `":"` can be used, and this works similar as it works with python lists.
If the index is `[a:b)` then indices that are used are `a` up to but not including `b`. 

In [None]:
a = numpy.arange(10)
print(a)

In [None]:
print( a[0] )
print( a[3] )
print( a[9] )
print( a[-1] )

In [None]:
print( a[:] )
print( a[3:6] )
print( a[:4] )
print( a[-4:] )

## Exercise 1.6

Build an array of 100 random values between 0 and 1. On three separate lines print A) the first 4 elements, B) the last 3 elements, C) the 37th to 42nd elements.

## Exercise 1.7

Build an array of 10,000,000 random values between 0 and 1. Print the minimum value of the array as well as the five values that occur before the minimum.

-----

### 1.5.2 Boolean indexing

Return all values in the array for which the index is True.

In [None]:
a = numpy.random.uniform(-0.5, .5, 25)
print(a)

In [None]:
# Create a boolean index for positive numbers in array a
index = a > 0.0
print( index )

In [None]:
# Print all the positive numbers
b =  a[index] 
print( b )

#### Aside about length

The number of elements of b is less than the length of a:

In [None]:
print( a.size )
print( b.size )

### 1.5.3 Indexing with an array of indices

Specify a separate array in which you store the indices as integers and you will return exactly the elements of the array with these indices. 

In [None]:
b = numpy.linspace(0, 1, 10)
print( b )

In [None]:
# Print numbers at prime indices
index = numpy.array([ 2, 3, 5, 7])
print( b[index] )

## Exercise 1.8

Draw 1,000,000 samples from a normal distribution with a mean of 1 and a standard deviation of 0.5. Compute the proportion of samples that are below 0.


## Exercise 1.9

Build an array with 10,000 random values between 0 and 1. Determine the value of the entry that is the largest value of all entries whose index is divisable by 5.
