**NumPy** is a python library used for working with arrays.



**installation of NumPy** if we have python and pip already installed on a system, then:

Install it using this command:
```
C:\Users\Your Name>pip install numpy
```

**Import NumPy** Once NumPy is installed, import it in your applications by adding the `import` keyword:

In [1]:
import numpy as np

In [2]:
# Now numpy is imported and ready to use.
arr = np.array([1,2,3,4,5])
print(arr)

[1 2 3 4 5]


### Introducing NumPy Arrays

In [3]:
# Simple array creation
a = np.array([0,1,2,3])
a

array([0, 1, 2, 3])

In [4]:
# Checking the type
type(a)

numpy.ndarray

In [5]:
# Numeric "TYPE" of Elements (data type)
a.dtype

dtype('int32')

In [6]:
# Number of dimensions
a.ndim

1

In [7]:
# Array shape
a.shape

(4,)

In [8]:
# Bytes per element
a.itemsize

4

In [9]:
# Bytes of memory used
a.nbytes

16

### Benefits of using Numpy arrays

There are a couple of important benefits of using Numpy arrays instead of Python lists for operating on numerical data:

- **Ease of use**: You can write small, concise and intutive mathematical expressions like `(kanto * weights).sum()` rather than using loops & custom functions like `crop_yeild`.
- **Performance**: Numpy operations and functions are implemented internally in C++, which makes them much faster than using Python statements & loops which are interpreted at runtime

Here's a quick comparision of dot products done of vectors with a million elements each using Python loops vs. Numpy arrays.

In [10]:
# Python lists
arra1 = list(range(1000000))
arra2 = list(range(1000000))

# Numpy array
arra1_np = np.array(arra1)
arra2_np = np.array(arra2)

In [11]:
%%time
result = []
for x1, x2 in zip(arra1, arra2):
    result.append(x1 + x2)
result

Wall time: 299 ms


[0,
 2,
 4,
 6,
 8,
 10,
 12,
 14,
 16,
 18,
 20,
 22,
 24,
 26,
 28,
 30,
 32,
 34,
 36,
 38,
 40,
 42,
 44,
 46,
 48,
 50,
 52,
 54,
 56,
 58,
 60,
 62,
 64,
 66,
 68,
 70,
 72,
 74,
 76,
 78,
 80,
 82,
 84,
 86,
 88,
 90,
 92,
 94,
 96,
 98,
 100,
 102,
 104,
 106,
 108,
 110,
 112,
 114,
 116,
 118,
 120,
 122,
 124,
 126,
 128,
 130,
 132,
 134,
 136,
 138,
 140,
 142,
 144,
 146,
 148,
 150,
 152,
 154,
 156,
 158,
 160,
 162,
 164,
 166,
 168,
 170,
 172,
 174,
 176,
 178,
 180,
 182,
 184,
 186,
 188,
 190,
 192,
 194,
 196,
 198,
 200,
 202,
 204,
 206,
 208,
 210,
 212,
 214,
 216,
 218,
 220,
 222,
 224,
 226,
 228,
 230,
 232,
 234,
 236,
 238,
 240,
 242,
 244,
 246,
 248,
 250,
 252,
 254,
 256,
 258,
 260,
 262,
 264,
 266,
 268,
 270,
 272,
 274,
 276,
 278,
 280,
 282,
 284,
 286,
 288,
 290,
 292,
 294,
 296,
 298,
 300,
 302,
 304,
 306,
 308,
 310,
 312,
 314,
 316,
 318,
 320,
 322,
 324,
 326,
 328,
 330,
 332,
 334,
 336,
 338,
 340,
 342,
 344,
 346,
 348,
 350,

In [12]:
%%time
arra1_np + arra2_np

Wall time: 2 ms


array([      0,       2,       4, ..., 1999994, 1999996, 1999998])

### Multi-Dimensional Arrays
<img src="https://python.astrotech.io/_images/array-axis.png" width="500">

**0-D Arrays**, or scalars, are the elements in an `array`. Each value in an array is a `0-D array`

In [13]:
arr0 = np.array(12)
print(arr0)

12


In [14]:
arr0.ndim

0

In [15]:
arr0.shape

()

In [16]:
arr0.size

1

**1-D Arrays**, that has `0-D arrays` as its elements is called uni-dimensional or `1-D array`.

In [17]:
arr1 = np.array([1,2,3,4,5,6,7,8,9,0])
print(arr1)

[1 2 3 4 5 6 7 8 9 0]


In [18]:
arr1.ndim

1

In [19]:
arr1.shape

(10,)

In [20]:
arr1.size

10

In [21]:
arr1 = np.array([12,34,56,78,90])
print(arr1)

[12 34 56 78 90]


In [22]:
arr1.ndim

1

In [23]:
arr1.shape

(5,)

In [24]:
arr1.size

5

**2-D Arrays**, that has `1-D arrays` as its elements in called a `2-D array`.

In [25]:
arr2 = np.array([[1,2,3],[4,5,6]])
print(arr2)

[[1 2 3]
 [4 5 6]]


In [26]:
arr2.ndim

2

In [27]:
arr2.shape

(2, 3)

In [28]:
arr2.size

6

**3-D Arrays**,  that has `2-D arrays` as its elements is called `3-D arrays`.

In [29]:
arr3 = np.array([[[1,2],[3,4],[5,6]],[[7,6],[8,7],[9,8]]])
print(arr3)

[[[1 2]
  [3 4]
  [5 6]]

 [[7 6]
  [8 7]
  [9 8]]]


In [30]:
arr3.ndim

3

In [31]:
arr3.shape

(2, 3, 2)

In [32]:
arr3.size

12

## Array indexing and slicing

**Indexing**, Numpy extends Python's list indexing notation using `[]` to multiple dimensions in a fairly intuitive fashion. You can provide a comma separated list of indices or ranges to select a specific element or a subarray (also called slice) from a numpy array.

In [33]:
b = np.array([
    [[11,12,13,14],
     [13,14,15,19]],
    
    [[15,16,17,21],
     [63,92,36,18]],
    
    [[98,32,81,23],
     [17,18,19,43]]])
print(b)

[[[11 12 13 14]
  [13 14 15 19]]

 [[15 16 17 21]
  [63 92 36 18]]

 [[98 32 81 23]
  [17 18 19 43]]]


In [34]:
b.shape

(3, 2, 4)

In [35]:
# Single element
b[2, 1, 3]

43

In [36]:
# Subarray using ranges
b[1:, 0:1, :2]

array([[[15, 16]],

       [[98, 32]]])

In [37]:
# Mixing indices and ranges
b[:2, 0, 3]

array([14, 21])

In [38]:
# Mixing indices and ranges
b[1:, 1, 0]

array([63, 17])

In [39]:
# Using fewer indices
b[2]

array([[98, 32, 81, 23],
       [17, 18, 19, 43]])

In [40]:
# Using fewer indices
b[1:, 0]

array([[15, 16, 17, 21],
       [98, 32, 81, 23]])

In [41]:
# Using fewer indices
b[1:, 0, 0]

array([15, 98])

The notation and results can confusing at first, so take your time to experiment and become comfortable with it. Use the cells below to try out some examples of array indexing and slicing, with different combinations of indices and ranges. Here are some more examples demonstrated visually:

<img src="https://scipy-lectures.org/_images/numpy_indexing.png" width="360">

## Other ways of creating Numpy arrays

Numpy also provides some handy functions to create arrays of a desired shape with fixed or random values. Check the out the [official documentation](https://numpy.org/doc/stable/reference/routines.array-creation.html) or use the `help` function to learn more about the following functions.

In [42]:
# All zeros
np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [43]:
# All zeros
np.zeros((3,2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [44]:
# All ones
np.ones((3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [45]:
# ALL ones
np.ones((3,2))

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

In [46]:
# Identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [47]:
# Identity matrix
np.eye(2)

array([[1., 0.],
       [0., 1.]])

In [48]:
# Random vactor
np.random.rand(5)

array([0.17460155, 0.03023898, 0.2398998 , 0.80810786, 0.16939752])

In [49]:
# Random matrix
np.random.randn(2,3)

array([[-0.31689618,  1.48860477,  0.37796509],
       [ 0.35272864, -0.84092933,  0.31583384]])

In [50]:
# Fixed value
np.full([2,3],42)

array([[42, 42, 42],
       [42, 42, 42]])

In [51]:
# Fixed value
np.full([3,3],33)

array([[33, 33, 33],
       [33, 33, 33],
       [33, 33, 33]])

In [52]:
# Range with start, end and step
np.arange(10,90,3)

array([10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 49, 52, 55, 58,
       61, 64, 67, 70, 73, 76, 79, 82, 85, 88])

In [53]:
# Equally spaced numbers in a range start, end and number of element
np.linspace(3, 27, 5)

array([ 3.,  9., 15., 21., 27.])

## Arithmetic operations and broadcasting

Numpy arrays supports arithmetic operators like `+`, `-`, `*` etc. You can perform an arithmetic operation with a single number (also called scalar), or with another array of the same shape. This makes it really easy to write mathemtical expressions with multi-dimensional arrays.


In [54]:
c = np.array([[1,2,3,4],
              [5,6,7,8],
              [9,1,2,3]])

In [55]:
d = np.array([[11,12,13,14],
              [15,16,17,18],
              [19,11,12,13]])

In [56]:
# Element wise additions
c + d

array([[12, 14, 16, 18],
       [20, 22, 24, 26],
       [28, 12, 14, 16]])

In [57]:
# Addition with scalar
c + 3

array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12,  4,  5,  6]])

In [58]:
# Element-wise Substractions
d - c

array([[10, 10, 10, 10],
       [10, 10, 10, 10],
       [10, 10, 10, 10]])

In [59]:
# Element-wise multiplication
d * c

array([[ 11,  24,  39,  56],
       [ 75,  96, 119, 144],
       [171,  11,  24,  39]])

In [60]:
# Multiplication with scalar
c * 100

array([[100, 200, 300, 400],
       [500, 600, 700, 800],
       [900, 100, 200, 300]])

In [61]:
# Division
d / 2

array([[5.5, 6. , 6.5, 7. ],
       [7.5, 8. , 8.5, 9. ],
       [9.5, 5.5, 6. , 6.5]])

In [62]:
# Division resulting int value
d // 2

array([[5, 6, 6, 7],
       [7, 8, 8, 9],
       [9, 5, 6, 6]], dtype=int32)

In [63]:
d

array([[11, 12, 13, 14],
       [15, 16, 17, 18],
       [19, 11, 12, 13]])

In [64]:
# Modulus with scalar
d % 4

array([[3, 0, 1, 2],
       [3, 0, 1, 2],
       [3, 3, 0, 1]], dtype=int32)

Numpy arrays also support **brodcasting**, which allows arthmetic operations between two array having a different number of dimensions, but compatible shapes. Let's look at an example to see how it works.

In [65]:
c

array([[1, 2, 3, 4],
       [5, 6, 7, 8],
       [9, 1, 2, 3]])

In [66]:
c.shape

(3, 4)

In [67]:
e = np.array([4,5,6,7])

In [68]:
e.shape

(4,)

In [69]:
# Brodcasting
c + e

array([[ 5,  7,  9, 11],
       [ 9, 11, 13, 15],
       [13,  6,  8, 10]])

When the expression `c + d` is evaluated, `d` (which has the shape `(4,)`) is replicated 3 times to match the shape `(3, 4)` of `c`. This is pretty useful, because numpy performs the replication without actually creating 3 copies of the smaller dimension array.

<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png" width="360">

Broadcasting only works if one of the arrays can be replicated to exactly match the shape of the other array.

In [70]:
f = np.array([1,2])

In [71]:
f.shape

(2,)

In [72]:
# Does not match the shape `Error`
c + f

ValueError: operands could not be broadcast together with shapes (3,4) (2,) 

In [73]:
j = np.array([[1,2,3],[3,4,5]])
k = np.array([[2,2,3],[1,2,5]])

In [74]:
j == k

array([[False,  True,  True],
       [False, False,  True]])

In [75]:
j != k

array([[ True, False, False],
       [ True,  True, False]])

In [76]:
j >= k

array([[False,  True,  True],
       [ True,  True,  True]])

In [77]:
j < k

array([[ True, False, False],
       [False, False, False]])

A common use case for this is to count the number of equal elements in two arrays using the `sum` method. Remember that `True` evalues to `1` and `False` evaluates to `0` when booleans are used in arithmetic operations.

In [78]:
(j == k).sum()

3

### Working with CSV data files

Numpy also provides helper functions reading from & writing to files. Les's download a file `climate.txt` which contains 10,000 climate data
(temperature, rainfall & humidity) in the following formate:
```
temperature,rainfall,humidity
25.00,76.00,99.00
39.00,65.00,70.00
59.00,45.00,77.00
84.00,63.00,38.00
66.00,50.00,52.00
41.00,94.00,77.00
91.00,57.00,96.00
49.00,96.00,99.00
67.00,20.00,28.00
...
```

This format of storing data is known as *comma separated values* or CSV. 

> **CSVs**: A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields. (Wikipedia)




To read this file into a numpy array, we can use the `genfromtxt` function.

In [79]:
import urllib.request

urllib.request.urlretrieve(
    'https://hub.jovian.ml/wp-content/uploads/2020/08/climate.csv',
    'climate.txt')

('climate.txt', <http.client.HTTPMessage at 0x1d505503208>)

In [80]:
climate_data = np.genfromtxt(
    'climate.txt', delimiter=',', skip_header = 1)

In [81]:
climate_data

array([[25., 76., 99.],
       [39., 65., 70.],
       [59., 45., 77.],
       ...,
       [99., 62., 58.],
       [70., 71., 91.],
       [92., 39., 76.]])

In [82]:
climate_data.shape

(10000, 3)

In [83]:
weights = np.array([0.3, 0.2, 0.5])

In [84]:
weights.shape

(3,)

We can now compute the predicted yields of apples in all the regions, using a single matrix multiplication between `climate_data` (a 10000x3 matrix) and `weights` (a vector of length 3). Here's what it looks like visually:

<img src="https://i.imgur.com/LJ2WKSI.png" width="240">

You can learn about matrices and matrix multiplication by watching the first 3-4 videos of this playlist: https://www.youtube.com/watch?v=xyAuNHPsq-g&list=PLFD0EB975BA0CC1E0&index=1

We can use the `np.matmul` function from Numpy, or simply use the `@` operator to perform matrix multiplication.

In [85]:
yields = climate_data @ weights

In [86]:
yields

array([72.2, 59.7, 65.2, ..., 71.1, 80.7, 73.4])

In [87]:
yields.shape

(10000,)

We can now add the `yields` back to `climate_data` as a fourth column using the [`np.concatenate`](https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html) function.

In [93]:
climate_results = np.concatenate((
    climate_data, yields.reshape(10000, 1)), axis=1)

In [94]:
climate_results

array([[25. , 76. , 99. , 72.2],
       [39. , 65. , 70. , 59.7],
       [59. , 45. , 77. , 65.2],
       ...,
       [99. , 62. , 58. , 71.1],
       [70. , 71. , 91. , 80.7],
       [92. , 39. , 76. , 73.4]])

There are a couple of subtleties here:

* We need to provide to `axis` argument to `np.concatenate` to specify the dimension along with concatenation should be performed.

* The arrays being concatenated should have the same number of dimensions, and the same length along each dimension, except the one along which concatenation is being performed. We use the [`np.reshape`](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) function here to change the shape of `yields` from `(10000,)` to `(10000,1)`.

Here's a visual explanation of `np.concatenate` along `axis=1` (can you guess what `axis=0` results in):

<img src="https://www.w3resource.com/w3r_images/python-numpy-image-exercise-58.png" width="300">

The best way to understand what an Numpy function does is to experiment with it and read the documentation using the `help` function to learn about its arguments & return values. Use the cells below to experiment with `np.concatenate` and `np.reshape`.

Let's write the final results from our computation above back to a file using the `np.savetxt` function.

In [95]:
climate_result

array([[25. , 76. , 99. , 72.2],
       [39. , 65. , 70. , 59.7],
       [59. , 45. , 77. , 65.2],
       ...,
       [99. , 62. , 58. , 71.1],
       [70. , 71. , 91. , 80.7],
       [92. , 39. , 76. , 73.4]])

In [96]:
np.savetxt('climate_results.txt',
          climate_results,
          fmt = '%.2f',
          header = 'Temperature,Rainfall,Humidity,Yeild_apples',
          comments ='')

The results are written back in the CSV format to the file `climate_results.txt`. 

```
temperature,rainfall,humidity,yeild_apples
25.00 76.00 99.00 72.20
39.00 65.00 70.00 59.70
59.00 45.00 77.00 65.20
84.00 63.00 38.00 56.80
66.00 50.00 52.00 55.80
41.00 94.00 77.00 69.60
91.00 57.00 96.00 86.70
49.00 96.00 99.00 83.40
67.00 20.00 28.00 38.10
...
```



Numpy provides hundreds of functions for peforming operations on arrays. Here are some common functions:


* Mathematics: `np.sum`, `np.exp`, `np.round`, arithemtic operators 
* Array manipulation: `np.reshape`, `np.stack`, `np.concatenate`, `np.split`
* Linear Algebra: `np.matmul`, `np.dot`, `np.transpose`, `np.eigvals`
* Statistics: `np.mean`, `np.median`, `np.std`, `np.max`

> **How to find the function you need?** Since Numpy offers hundreds of functions for operating on arrays, it can sometimes be hard to find exactly what you need. The easiest way to find the right function is to do a web search e.g. searching for "How to join numpy arrays" leads to [this tutorial on array concatenation](https://cmdlinetips.com/2018/04/how-to-concatenate-arrays-in-numpy/). 

You can find a full list of array functions here: https://numpy.org/doc/stable/reference/routines.html