# 1. Numpy data types

## Numpy and scipy

- Numpy: datastructures and basic operations for scientific computing

   - This week we'll discuss numpy datatypes

- Scipy: specialized functionality for specific scientific domains
- Pandas: data frames in the style of R
   - especially useful for time series


## Numpy resources

- Scipy Lecture Notes: http://www.scipy-lectures.org/
- A short numpy tutorial: https://docs.scipy.org/doc/numpy-dev/user/quickstart.html
- Basics of numpy:
https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/ch04.html

- Library reference: 
http://docs.scipy.org/doc/numpy/reference/index.html#reference 
- User guide: http://docs.scipy.org/doc/numpy/user/index.html#user



In [4]:
import numpy as np

In [5]:
np.array([123,345])


array([123, 345])

## Data types in Python

Before discussing Numpy datatypes, first a small review of the data types in Python:

- boolean (True, False)
- int (integer), float, complex 
- str (string)
- byte
- list 
- tuple
- set
- dict  (dictionary)


The types can be divided into immutable and mutable types. 
The content of immutable types cannot be changed after creation. 

Which are the immutable Python data types?

In [6]:

a = set([1,2,3])
a.add(4)


In [7]:
print(a)

{1, 2, 3, 4}


In [8]:
a = [1, "two", 3.0]
for ai in a:
    print(type(ai))

<class 'int'>
<class 'str'>
<class 'float'>


## Numpy array

- Numpy is based on **arrays**. 
- Array is a kind of table where each cell contains an item of the same **datatype**
- One array can only have one data type
- This seems inconventient. What advantages does this have?



## Numerical datatypes


The 5 basic data types of a numerical value are:

- float (float16, float32, or float64)
- integer (int8, int16, int32, or int64)
- unsigned integer: this number cannot be negative (uint8, uint16, uint32, or uint64)
- boolean (bool)
- complex (complex64 or complex128)

In [9]:
x = np.array([1,2,3], dtype=np.int32)
print(x.dtype)
y = np.array([1.0, 2.0, 3.0])
print(y.dtype)

int32
float64


## Bits and bytes
The data types `int8` (and `uint8`) use 1 byte (8 bits). 1 byte contains 8 bits and a bit can have either value 0 or value 1. 
How many different values can `int8` it represent?

In [10]:
2**8

256

In [11]:
# Array with bytes
x = np.array([1, 2, 3], dtype=np.uint8)
print(x.dtype)
# array with booleans:
z = np.array([True, False, True])
print(z.dtype)

uint8
bool


## Strings

Another important datatype is:

- string
- in numpy, the string datatype includes indication of its size.


In [12]:
a = np.array(["abcf", "de"])
print(a.dtype)


<U4


## Implicit conversions
- when you create an array with from a list with mixed data, numpy will do a conversion
- best to avoid this and specify desired result with `dtype=`.

In [14]:
l = [1, 3.4, True, 2.3+4.5j, "a"]
print(l)
a = np.array(l)
print(a)
print(a.dtype)
print(a[0]+2)

[1, 3.4, True, (2.3+4.5j), 'a']
['1' '3.4' 'True' '(2.3+4.5j)' 'a']
<U64


TypeError: Can't convert 'int' object to str implicitly

## Explicit conversions

- specify datatype when creating an array from a list
- convert array of one type into another

In [15]:
# explicitly specify the data type:
y = np.array([9, 8, 7, 6], dtype=np.float32)
print(y, y.dtype)
# this also works
y = np.array([9, 8, 7, 6], dtype='float32')
print(y, y.dtype)


[ 9.  8.  7.  6.] float32
[ 9.  8.  7.  6.] float32


In [16]:
strings = np.array(["12", "3", "24"])
print(strings)
z = strings.astype('int')
print(z)
print(z.dtype)

['12' '3' '24']
[12  3 24]
int64


In [18]:
# convert an array with strings to float:
y = np.array(["1.4", "3.4", "5.8"])
print(y, y.dtype)
h = y.astype('float')
print(h, h.dtype)
g = h.astype('int')
print(g, g.dtype)
print(g.astype('float'))

['1.4' '3.4' '5.8'] <U3
[ 1.4  3.4  5.8] float64
[1 3 5] int64
[ 1.  3.  5.]


In [20]:
numerals = ["one", "two", "three"]
a = np.array(numerals, dtype='int')
print (a, a.dtype)

ValueError: invalid literal for int() with base 10: 'one'

## Lossy conversions

- Some conversions lose information.
  - Converting from float to integer, the decimal part is lost. 
  - Converting from integer to boolean, zeros become False, everything else becomes True.

In [22]:
# from float to integer
x = np.array([2.2, 3.2, 2.8])
print(x)
print(x.dtype)
a = x.astype(np.int)
print(a)
print(a.dtype)

[ 2.2  3.2  2.8]
float64
[2 3 2]
int64


In [23]:
# from int to boolean
x = np.array([0, 1, 1, 0, -1, -123, 12], dtype=np.int)
print(x)
print(x.dtype)
a = x.astype(np.bool)
print(a)
print(a.dtype)

[   0    1    1    0   -1 -123   12]
int64
[False  True  True False  True  True  True]
bool


## Float operations in numpy

- In Numpy, division by zero results in inf (infinity) and a RuntimeWarning. 
- In basic Python, division by zero results in the ZeroDivisionError message.

In [24]:
1.0 / 0.0

ZeroDivisionError: float division by zero

In [25]:
# divison by zero in Numpy
x = np.array([1.0, 2.0])
print(x/0)

[ inf  inf]


  This is separate from the ipykernel package so we can avoid doing imports until


## Special float values

- numpy has representations for some special values including
    - infinities
        - numpy.inf for infinity
        - numpy.PINF for positive infinity
        - numpy.NINF for negative infinity
    - Not a number: wrong or undefined value
        - numpy.nan : Create NAN value

## Checking for special values

In order to check for these the following functions can be useful:

- numpy.isinf(x) : This function returns True when the value of x is either positive infinity or negative infinity. 
- numpy.isneginf(x) : This function returns True when the value of x is negative infinity.
- numpy.isposinf(x) : This function returns True when the value of x is positive infinity.
- numpy.isfinite(x) : This function is the opposite of the isinf() function.
- numpy.isnan(x) : This function returns True when the value of x is Not A Number (NAN).        

# Exercises

## Exercise 1.1

Some conversions between datatypes lose information, and are therefore not reversible. 
Decide which of the following conversions are lossy i.e. not reversible:
1. float -> int
2. bool  -> int
3. int   -> '<U16'
4. int   -> float
5. float64 -> float32
6. float32 -> float64

Write the functiuon `reversible` to check your guesses. This function should take an array of values and the target datatype, and check whether the values in the array can be converted to the target datatype and back to the original one, and if they stay the same after the conversion. 

In [26]:
def reversible(a, dtype):
    """Returns True if all the values in array a stay the same afer a round-trip conversion to and from dtype."""
    # 8<---------------
    return np.all(a == a.astype(dtype).astype(a.dtype))

**Note**: this function only checks whether the particular values in the given array are convertible without losing information. It cannot check all the possible values of a given datatype pair.
For example:

In [34]:
a = np.array([0, 1], dtype=np.int32)
print(reversible(a, np.bool))
b = np.array([0, 1, 2], dtype=np.int32)
print(reversible(b, np.bool))


True
False


In [35]:
# float -> int
a = np.array([0.0, 0.1, 2.7, 2.2**10])
print(a)
print(reversible(a, np.int))

[  0.00000000e+00   1.00000000e-01   2.70000000e+00   2.65599228e+03]
False


In [36]:
# bool -> int
a = np.array([True, False])
print(reversible(a, np.int))

True


In [37]:
# int -> '<U16'
a = np.array([1, 10000, 2**60], dtype='int64')
print(reversible(a, '<U16'))

False


In [38]:
# int -> float
a = np.array([0.0, 1, 10000, 2**62], dtype='int64')
print(reversible(a, np.float))

True


In [39]:
# float64 -> float32

a = np.array([0.0, 10**-10, 10**100], dtype='float64')
print(reversible(a, np.float64))

True


In [40]:
# float32 -> float64
a = np.array([0.0, 10**-10, 10**100], dtype='float32')
print(reversible(a, np.float32))


True


## Exercise 1.2

The following function `linetofloat` takes a string which contains float numbers separated by 
spaces and returns an array of floats. Complete the definition of the function
For example:

```linetofloat("3.14 12.3 4.0") -> array([3.14, 12.3, 4.0])
```

In [41]:
def linetofloat(text):
    """Return array of numbers in input string."""
    # 8<----------
    return np.array(text.split(), dtype=np.float)


In [42]:
x = "3.14 12.3 4.0"
print(x)
x1 = x.split()
print(np.array(x1, dtype=np.float))
print(linetofloat(x))

3.14 12.3 4.0
[  3.14  12.3    4.  ]
[  3.14  12.3    4.  ]


## Exercise 1.3

Open and inspect the file [population.txt](population.txt). It contains some numerical data. 
Try to figure out which numpy function you can use to load this data into a numpy array. Load the data, and convert it to `float32`.

In [44]:
# 8<----------------
data = np.loadtxt("population.txt", dtype=np.float32)
print(data)


[[  1900.  30000.   4000.  48300.]
 [  1901.  47200.   6100.  48200.]
 [  1902.  70200.   9800.  41500.]
 [  1903.  77400.  35200.  38200.]
 [  1904.  36300.  59400.  40600.]
 [  1905.  20600.  41700.  39800.]
 [  1906.  18100.  19000.  38600.]
 [  1907.  21400.  13000.  42300.]
 [  1908.  22000.   8300.  44500.]
 [  1909.  25400.   9100.  42100.]
 [  1910.  27100.   7400.  46000.]
 [  1911.  40300.   8000.  46800.]
 [  1912.  57000.  12300.  43800.]
 [  1913.  76600.  19500.  40900.]
 [  1914.  52300.  45700.  39400.]
 [  1915.  19500.  51100.  39000.]
 [  1916.  11200.  29700.  36700.]
 [  1917.   7600.  15800.  41800.]
 [  1918.  14600.   9700.  43300.]
 [  1919.  16200.  10100.  41300.]
 [  1920.  24700.   8600.  47300.]]


## Exercise 1.4

Let's check the behavior of numpy numeric data with certain important mathematical operations.
First let's make sure we are working with numpy datatypes and not with base Python ones:

In [50]:
a = np.array([0, 1], dtype=np.float)
zero = a[0]
one = a[1]
print(type(zero))
print(zero.dtype)

<class 'numpy.float64'>
float64


Check what happens in the following cases:

- dividing a float by zero
- dividing zero by zero
- logarithm (`np.log`) of one
- logarithm of zero
- dividing a float by zero and then dividing the result by itself
- dividing a float by zero and then comparing whether the results is equal to itself
- dividing zero by zero and then comparing whether the result is equal to itself

Keep in mind that the behavior of numpy floats is not always what one would naively expect!

In [59]:
# 8< -----
# dividing a float by zero
print(one/zero)

inf


  This is separate from the ipykernel package so we can avoid doing imports until


In [60]:
# dividing zero by zero
print(zero/zero)    
    

nan


  


In [61]:
# logarithm of one
print(np.log(one))

0.0


In [62]:
# logarithm of zero
print(np.log(zero))

-inf


  


In [63]:
# dividing a float by zero and then dividing the result by itself
r = one/zero
print(r/r)

nan


  
  This is separate from the ipykernel package so we can avoid doing imports until


In [64]:
# dividing a float by zero and then comparing whether the results is equal to itself
r = one/zero
print(r==r)

True


  


In [65]:
# dividing zero by zero and then comparing whether the result is equal to itself
r = zero/zero
print(r == r)

False


  
