# Manipulating Data using NumPy

![caption](../images/numpy-logo.jpg)
***
NumPy, which stands for Numerical Python, is a library consisting of multidimensional array objects and a collection of routines for processing those arrays. Using NumPy, mathematical and logical operations on arrays can be performed. Let's along with the basics of NumPy such as its architecture and environment. It also discusses the various array functions, types of indexing, etc.

We can install numpy by `pip install numpy`

- Main object: **`ndarray`**

<img src="../images/icon/Technical-Stuff.png" alt="Technical-Stuff" style="width: 100px;float:left; margin-right:15px"/>
<br />

# ndarray
***

The most important object defined in NumPy is an N-dimensional array type called ndarray. It describes the collection of items of the same type. Items in the collection can be accessed using a zero-based index.

Every item in an ndarray takes the same size of block in the memory. Each element in ndarray is an object of data-type object (called dtype).Any item extracted from ndarray object (by slicing) is represented by a Python object of one of array scalar types. An instance of ndarray class can be constructed by different array creation routines described later.

You import the function in python by calling `import numpy`. The basic ndarray is created using an array function in NumPy as follows −

In [1]:
import numpy
numpy.array

<function numpy.core.multiarray.array>


<img src="../images/icon/Concept-Alert.png" alt="Concept-Alert" style="width: 100px;float:left; margin-right:15px"/>
<br />

## How do I create Arrays in Python?
***
* Create an array from a regular Python list or tuple using the array function. 

* The type of the resulting array is deduced from the type of the elements in the sequences

It creates an ndarray from any object exposing array interface, or from any method that returns an array.

In [2]:
numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)

array(<class 'object'>, dtype=object)

### Instructions
* Create a list of number's as `[10,20,30]` and save it as `my_list`
* Convert the list into an array using `np.array()`

In [1]:
import numpy as np
my_list = [10,20,30]
np.array(my_list)

array([10, 20, 30])

### Instructions
* Create a list of list as `[[5, 10, 15], [20, 25, 30], [35, 40, 45]]` and save it as `list_of_lists `
* Convert the list of list into an array using `np.array()` and print it.
* Check the `type()` of the array.

In [2]:
list_of_lists = [[5, 10, 15], [20, 25, 30], [35, 40, 45]]
print(np.array(list_of_lists))
print(type(np.array(list_of_lists)))

[[ 5 10 15]
 [20 25 30]
 [35 40 45]]
<class 'numpy.ndarray'>


An example of how does n-dimensional looks

## Types

![NumPy Array Types](../images/numpy-types1.jpg)

`ndarray` is also known by the alias `array`. Note that `numpy.array` is not the same as the Standard Python Library class `array.array`, which only handles one-dimensional arrays and offers less functionality. The more important attributes of an `ndarray` object are:

***ndarray.ndim***
the number of axes (dimensions) of the array. In the Python world, the number of dimensions is referred to as rank.

***ndarray.shape***
the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, `shape` will be `(n,m)`. The length of the `shape` tuple is therefore the rank, or number of dimensions,`ndim`.

***ndarray.size***
the total number of elements of the array. This is equal to the product of the elements of shape.

***ndarray.dtype***
an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.

***ndarray.reshape***
Returns an array containing the same data with a new shape.

<img src="../images/icon/Technical-Stuff.png" alt="Technical-Stuff" style="width: 100px;float:left; margin-right:15px"/>
<br />

## numpy.dtype
***
<br/>
The data type or dtype describes the kind of elements that are contained within the array.

* **bool**: Boolean values
<br/><br/>

* **int**: Integer values. Can be int16, int32, or int64.


* **float**: Floating point values. Can be float16, float32, or float64.
<br/><br/>


* **string**: Text. Can be string or unicode (this distinction is greatly simplified in Python 3)

<img src="../images/icon/ppt-icons.png" alt="ppt-icons" style="width: 100px;float:left; margin-right:15px"/>
<br />

## Let's try it ourselves!
***
### Create a vector from the list [10, 20, 30]. Print the dtype and shape.

In [5]:
my_list = [10, 20, 30]

arr = np.array(my_list)

print(arr.dtype)
print(arr.shape)

int64
(3,)


<img src="../images/icon/ppt-icons.png" alt="ppt-icons" style="width: 100px;float:left; margin-right:15px"/>
<br />

### Create a matrix from the list of lists [[5.3, 10.2, 15.1], [20.4, 25.3, 30.9], [35.4, 40.1, 45.6]]. Print the dtype and shape. 
***

In [6]:
my_list1 = [[5.3, 10.2, 15.1], [20.4, 25.3, 30.9], [35.4, 40.1, 45.6]]
arr1 = np.array(my_list1)
print(arr1.dtype)
print(arr1.shape)

float64
(3, 3)


<img src="../images/icon/Concept-Alert.png" alt="Concept-Alert" style="width: 100px;float:left; margin-right:15px"/>
<br />

## Indexing and Selection
***
Basic slicing extends Python's basic concept of slicing to N dimensions. Basic slicing occurs when obj is a :class: slice object (constructed by start:stop:step notation inside of brackets), an integer, or a tuple of slice objects and integers. :const:Ellipsis and :const:newaxis objects can be interspersed with these as well. In order to remain backward compatible with a common usage in Numeric, basic slicing is also initiated if the selection object is any non-ndarray sequence (such as a :class:list) containing :class:slice objects.

Let us look at creating an array with numpy and indexing and selection follows from what we have learnt in python

### Instructions
* Create an array from `0` to `9` using `np.arange()` and save it as `arr`
* Print the element at index `5`
* Print the elements from `1` to `9` with an interval of `2`

In [3]:
# indexing one dimensional array


# get the element at index 5


#Get values in a range


### Instructions
* Create a two dimensional array from `1` to `9` and save it as `arr` and print it.
* Print 1st row
* Print the value of 2nd row and 2nd column
* Print the value of 0th row and 2nd column

In [33]:
# indexing two dimensional array
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])

print (arr)
print (arr[1])    # select a row
print (arr[2][2]) #[row], [column]
print (arr[0,2])  # [row, column]

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[4 5 6]
9
3


# Analyzing the Weather using NumPy

<center><img src="../images/weather.jpg" alt="Weather" style="width: 350px;"/></center>
Now it's time to use some them to learn data manipulation by analyzing a weather data set. As they say

We'll be working with **weather_small_2012.csv**, which contains weather data for each hour in 2012.
Since weather_small_2012.csv is a csv file, rows are separated by line breaks, and columns are
separated by commas:

```
Date/Time,Temp (C),Dew Point Temp (C),Rel Hum (%),Wind Spd (km/h),Visibility (km),Stn Press (kPa)
2012-01-01 00:00:00,-1.8,-3.9,86,4,8.0,101.24
2012-01-01 01:00:00,-1.8,-3.7,87,4,8.0,101.24
2012-01-01 02:00:00,-1.8,-3.4,89,7,4.0,101.26
2012-01-01 03:00:00,-1.5,-3.2,88,6,4.0,101.27
```

**To read csv file, we use:**

    numpy.genfromtxt(fileName, delimeter=",")

### Instructions
* `weather` variable  is already defined for you with the dataset path
* Print the type of the dataset using `.dtype`
* Also print the `weather` dataset and have a look at it

In [12]:
# read csv file
weather = np.genfromtxt("../data/weather_small_2012.csv", delimiter=",")

print (weather.dtype)
print (weather)

float64
[[   nan    nan    nan ...    nan    nan    nan]
 [   nan  -1.8   -3.9  ...   4.     8.   101.24]
 [   nan  -1.8   -3.7  ...   4.     8.   101.24]
 ...
 [   nan  -0.5   -1.5  ...  28.     4.8   99.95]
 [   nan  -0.2   -1.8  ...  28.     9.7   99.91]
 [   nan   0.    -2.1  ...  30.    11.3   99.89]]


Many items in this dataset are nan.

* The entire first row is nan – headers are String.
* Some of the numbers are written like 1.98600000e+03.

The data type of weather is float. Because all of the values in a NumPy array have to have the same
data type, NumPy attempted to convert all of the columns to floats when they were read in.

**Reading In The Data Properly**

***
To read weather_small_2012.csv file properly we will have to use correct data type and skip the header.
* genfromtxt() default dtype is float, it converts non-numeric value to nan (not a number)
* To avoid nan, we read values as |S20 (String of length 20) 

### Instructions
* Read the Weather data using `np.genfromtxt()` by passing parameters as `path of the dataset`,`dtype='|S20'`, `skip_header=1`, `delimiter=","`
* Print the value of first index of the dataset

In [13]:
weather = np.genfromtxt("../data/weather_small_2012.csv", dtype='|S20', skip_header=1, delimiter=",")

print (weather.dtype)
print (weather[0])

|S20
[b'2012-01-01 00:00:00' b'-1.8' b'-3.9' b'86' b'4' b'8.0' b'101.24']


### Instructions
* Create an array of temperature and convert it into float16 using `.astype(np.float16)` and save it as `temperatures`.
* Create an array of dew point temperature and convert it into float16 using `.astype(np.float16)` and save it as `dew_point_temperatures`.


In [19]:
# Create an array of temperatures from the data set

temperatures = weather[:,1].astype(np.float16)
print(np.array(temperatures))

dew_point_temperatures = weather[:,2].astype(np.float16)
print(np.array(dew_point_temperatures))

[-1.8 -1.8 -1.8 ... -0.5 -0.2  0. ]
[-3.9 -3.7 -3.4 ... -1.5 -1.8 -2.1]


<img src="../images/icon/Concept-Alert.png" alt="Concept-Alert" style="width: 100px;float:left; margin-right:15px"/>
<br />

# Operations with NumPy arrays
***
NumPy provides a lot of built-in functionality for working with arrays.
**The important concepts to remember are**
- Any operation with a scalar number or a scalar function will cause that operation being computed for each element
- Any operation with two **compatible** (eg.: same shape) arrays will cause one-to-one element computations

<img src="../images/icon/Maths-Insight.png" alt="Technical-Stuff" style="width: 100px;float:left; margin-right:15px"/>
<br />

## Arithmetic (1/2)
***
### Vector Arithmetic
- All operations between arrays are **element-wise**
- This means that if you multiply two 2d vectors, it will **NOT** perform matrix multiplication

<img src="../images/icon/Maths-Insight.png" alt="Technical-Stuff" style="width: 100px;float:left; margin-right:15px"/>
<br />

## Arithmetic (2/2)
***
### Scalar Arithmetic
- Any operation of an array with a scalar will result in **element-wise** computation of that operation
- For example **`my_array + 2`** is the same as adding 2 to each element of array

### Instructions
* Calculate the Temperatures from the weather dataset in Farenheit

In [20]:
farenheit = (temperatures * 9 / 5) + 32
farenheit

array([28.77, 28.77, 28.77, ..., 31.1 , 31.64, 32.  ], dtype=float16)

### Addition

### Instructions
* Print Vector Addition of `temperatures` and `dew_point_temperatures`
* Print Scalar Addition of `temperatures` + 100

In [21]:
# Total temperature

# Vector Addition
print(temperatures + dew_point_temperatures)

# Scalar Addition
print(temperatures + 100)

[-5.7 -5.5 -5.2 ... -2.  -2.  -2.1]
[ 98.2  98.2  98.2 ...  99.5  99.8 100. ]


### Division
### Instructions
* Create an array from `1` to `10` using `np.arange()` and passing parameter as `dtype=np.float16` and reshape it in 3 X 3 matrix and save it as `array1`
* Create an array from `100` to `109` using `np.arange()` and passing parameter as `dtype=np.float16` and reshape it in 3 X 3 matrix and save it as `array2`
* Perform Vector Division by dividing `array2` by `array1`
* Perform Scalar Division by dividing `array2` by `3`

In [22]:
array1 = np.arange(1, 10, dtype=np.float16).reshape(3, 3)
array2 = np.arange(100, 109, dtype=np.float16).reshape(3, 3)

print(array1)
print(array2)

print(array2 / array1)  # Vector Division
print(array2 / 3)    # Scalar Division

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]
[[100. 101. 102.]
 [103. 104. 105.]
 [106. 107. 108.]]
[[100.     50.5    34.   ]
 [ 25.75   20.8    17.5  ]
 [ 15.14   13.375  12.   ]]
[[33.34 33.66 34.  ]
 [34.34 34.66 35.  ]
 [35.34 35.66 36.  ]]


## Comparison

Comparing two numpy arrays for equality, element-wise

### Instructions
* Find `temperatures` that are above `0` degree celcius and save it as `greater_than_0`
* Print `temperatures` and `greater_than_0`
* Print type of `greater_than_0` using `type()`
* Print type of `greater_than_0` using `.dtype`

In [23]:
# Find those temperatures that are above 0 degrees Celcius

greater_than_0 = temperatures > 0

print(temperatures)
print(greater_than_0)

print(type(greater_than_0))
print(greater_than_0.dtype)

[-1.8 -1.8 -1.8 ... -0.5 -0.2  0. ]
[False False False ... False False False]
<class 'numpy.ndarray'>
bool


### Instructions
* Create 3 X 3 matrix using `1` to `9` numbers and save it as `arr`
* Check the condition for `2` or `5` in `arr` and save it as `two_or_five` and print it.

In [29]:
# multiple conditions
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(arr)
two_or_five = (arr == 2) | (arr == 5)
print(two_or_five)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[False  True False]
 [False  True False]
 [False False False]]


<img src="../images/icon/Technical-Stuff.png" alt="Concept-Alert" style="width: 100px;float:left; margin-right:15px"/>
<br />

## Aggregation 
***
* **`sum()`:** Computes the sum of all the elements in a vector, or the sum along a dimension in a matrix.
* **`mean()`:** Computes the average of all the elements in a vector, or the average along a dimension in a matrix.
* **`max()`/`min()`:** Identifies the maximum/minimum value among all the elements in a vector, or along a dimension in a matrix.
* **`argmax()`/`argmin()`:** Returns the index of maximum/minimum element.

### Instructions
* Find Max temperature using `.max()` on `temperatures`
* Find Min temperature using `.min()` on `temperatures`
* Find Mean temperature using `.mean()` on `temperatures`
* Find index of Max temperature using `.argmax()` on `temperatures`
* Find index of Min temperature using `.argmin()` on `temperatures`

In [58]:
# Find max, min, mean temperature
print('Max: ', temperatures.max())
print('Min: ', temperatures.min())
print('Mean: ', temperatures.mean())

# Find index of max/min temperature
print('Argmax: ', temperatures.argmax())
print('Argmin: ', temperatures.argmin())

('Max: ', 33.0)
('Min: ', -23.297)
('Mean: ', 8.7969)
('Argmax: ', 4143)
('Argmin: ', 344)


## Quiz
1. Which of these best describes an array?

```python
a) A data structure that shows a hierarchical behavior
b) Container of objects of similar types
c) Container of objects of mixed types
d) All of the mentioned
Answer: b
Explanation: Array contains elements only of the same type.
```
2. What gets printed?

```python
import numpy as np
a = np.array([1,2,3,5,8])
print (a.ndim)
a) 0
b) 1 
c) 2
d) 3
e) 5
Answer: b
Explanation: ndim is the number of dimensions in the array and not the number of elements.
```
3. What is the output of the follwing code snippet?

```python
import numpy as np
a = np.arange(12).reshape(3,4)
print(a[2,1])
Answer: 9
```

# Further Reading
***
- Python Official Documentation: https://docs.python.org/
- NumPy documentation: http://www.numpy.org/

# Thank You
***
### Coming up next...

- Data Wrangling with Pandas

For more queries - Reach out to academics@greyatom.com 