## Numpy
- It is one of the most important foundational
packages for numerical computing & data
analysis in Python. Most computational
packages providing scientific functionality use
NumPy’s array objects as the lingua franca for
data exchange.

Let's say we want to use climate data like the temperature, rainfall and humidity in a region to determine if the region is well suited for growing apples. A really simple approach for doing this would be to formulate the relationship between the annual yield of apples (tons per hectare) and the climatic conditions like the average temperature (in degrees Fahrenheit), rainfall (in millimeters) & average relative humidity (in percentage) as a linear equation.

$$
\text{yield\_of\_apples} = w_1 \cdot \text{temperature} + w_2 \cdot \text{rainfall} + w_3 \cdot \text{humidity}
$$


- We're expressing the yield of apples as a weighted sum of the temperature, rainfall and humidity. Obviously, this is an approximation, since the actual relation may not necessarily be linear. But a simple linear model like this often works well in practice.
- Based on some statistical analysis of historical data, we might be able to come up with reasonable values for the weights `w1`, `w2` and `w3`. Here's an example set of values:  

In [1]:
w1, w2, w3 = 0.3, 0.2, 0.5

Given some climate data for a region, we can now predict what the yield of apples in the region might look like. Here's some sample data:

| Region | Temp.(F) | Rainfall (mm) | Humidity (%) |
|---|---|---|---|
| Kanto | 73 | 67 | 43 |
| Johto | 91 | 88 | 64 |
| Hoenn | 87 | 134 | 58 |
| Sinnoh | 102 | 43 | 37 |
| Unova | 69 | 96 | 70 |

To begin, we can define some variables to record the climate data for a region.

In [2]:
Kanto = [73, 67, 43]
Johto = [91, 88, 64]
Hoenn = [87, 134, 58]
Sinnoh = [102, 43, 37]
Unova = [69, 96, 70]

In [3]:
weights = [w1, w2, w3]

zip built-in function:

```python
list_one = [23, 24, 25, 26]
list_two = ["D", "E", "E", "P"]
for x, y in zip(list_one, list_two):
    print(x, y, end=" ")          # 23 D 24 E 25 E 26 P 
```

In [4]:
# list_one = [23, 24, 25, 26]
# list_two = ["D", "E", "E", "P"]
# for x, y in zip(list_one, list_two):
#     print(x, y, end=" ")

### Now we can write a crop_yield functino to calculate yield of apples.

In [5]:
for x,  y in zip(Kanto, weights):
    print(x, y)

73 0.3
67 0.2
43 0.5


In [6]:
def crop_yield(region, weights):
    result = 0
    for x, y in zip(region, weights):
        result += x*y
    return result

In [7]:
crop_yield(Kanto, weights)

56.8

In [8]:
crop_yield(Johto, weights)

76.9

In [9]:
crop_yield(Unova, weights)

74.9

### The calculation performed by crop_yield function is also called a dot product of two vectors.  
The Numpy library provide a built-in function to perform dot product of two vectors. However the list must first be converted to numpy arrays before we can perform the operation.  
- First: install the Numpy
```
!pip install numpy
```
- Second: import numpy as np

In [10]:
!pip install numpy



In [11]:
pip show numpy

Name: numpyNote: you may need to restart the kernel to use updated packages.

Version: 2.2.3
Summary: Fundamental package for array computing in Python
Home-page: https://numpy.org
Author: Travis E. Oliphant et al.
Author-email: 
License: Copyright (c) 2005-2024, NumPy Developers.
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are
 met:

     * Redistributions of source code must retain the above copyright
        notice, this list of conditions and the following disclaimer.

     * Redistributions in binary form must reproduce the above
        copyright notice, this list of conditions and the following
        disclaimer in the documentation and/or other materials provided
        with the distribution.

     * Neither the name of the NumPy Developers nor the names of any
        contributors may be used to endorse or promote products derived
        from this software with

In [12]:
import numpy as np

Now numpy can be created using np.array function

In [13]:
kanto = np.array([73, 67, 43])

In [14]:
kanto

array([73, 67, 43])

In [15]:
type(kanto)

numpy.ndarray

- Numpy arrays support indexing also

In [16]:
kanto[1]

np.int64(67)

In [17]:
print(kanto[1])

67


- help function: help(np.dot)

In [18]:
help(np.dot)

Help on _ArrayFunctionDispatcher in module numpy:

dot(...)
    dot(a, b, out=None)

    Dot product of two arrays. Specifically,

    - If both `a` and `b` are 1-D arrays, it is inner product of vectors
      (without complex conjugation).

    - If both `a` and `b` are 2-D arrays, it is matrix multiplication,
      but using :func:`matmul` or ``a @ b`` is preferred.

    - If either `a` or `b` is 0-D (scalar), it is equivalent to
      :func:`multiply` and using ``numpy.multiply(a, b)`` or ``a * b`` is
      preferred.

    - If `a` is an N-D array and `b` is a 1-D array, it is a sum product over
      the last axis of `a` and `b`.

    - If `a` is an N-D array and `b` is an M-D array (where ``M>=2``), it is a
      sum product over the last axis of `a` and the second-to-last axis of
      `b`::

        dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])

    It uses an optimized BLAS library when possible (see `numpy.linalg`).

    Parameters
    ----------
    a : array_like
        Fir

### Dot Produtct
- np.dot()

In [19]:
a = [1, 2, 3]
b = [3, 2, 1]
print(np.dot(a, b))

# simple multiply
a = np.array(a)
b = np.array(b)
# print(a * b)

10


In [20]:
a * b

array([3, 4, 3])

In [21]:
# sum
(a * b).sum()

np.int64(10)

- (*) operator perform an element-wise multiplication of two arrays (assuming they have the same size).

In [22]:
np.dot(Kanto, weights)

np.float64(56.8)

In [23]:
print(np.dot(Kanto, weights))

56.8


#### Creating arrays command

Using a fromiter()

In [24]:
iterable = (a for a in range(8))
print(np.fromiter(iterable, int))

[0 1 2 3 4 5 6 7]


In [25]:
# One D arrays
np.arange(1, 10)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [26]:
print(np.arange(1, 5))

[1 2 3 4]


In [27]:
print(np.linspace(1, 10, 5))

[ 1.    3.25  5.5   7.75 10.  ]


### Benifits of using numpy
- Ease of use
- Performance [Numpy operation and function are implemented internally in C++]

Let's see an example

In [28]:
# Example:
arr1 = list(range(100000))
arr2 = list(range(100000, 200000))

# numpy arrays
arr1_np = np.array(arr1)
arr2_np = np.array(arr2)

%%time is a Jupyter Notebook magic command that measures the execution time of a single code cell. 

In [29]:
%%time
result = 0
for x, y in zip(arr1, arr2):
    result += x * y
result

CPU times: total: 15.6 ms
Wall time: 26 ms


833323333350000

In [30]:
%%time
result = 0
np.dot(arr1_np, arr2_np)

CPU times: total: 0 ns
Wall time: 1.01 ms


np.int64(833323333350000)

### Multi-dimensional numpy arrays
- Climate data for all the region

In [31]:
# 2d array
climate_data = np.array([[73, 67, 43],
                         [91, 88, 64],
                         [87, 134, 58],
                         [102, 43, 37],
                         [69, 96, 70]])
climate_data

array([[ 73,  67,  43],
       [ 91,  88,  64],
       [ 87, 134,  58],
       [102,  43,  37],
       [ 69,  96,  70]])

In [32]:
# shape
climate_data.shape

(5, 3)

In [33]:
# 3d array: [[[], []], [[], []]]
arr3 = np.array([[[1, 2, 3], [4, 5, 6]],[[7, 8, 9], [10, 11, 12]]])
arr3

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [34]:
# shpae
arr3.shape

(2, 2, 3)

In [35]:
arr4 = np.array([1, 2, 3])
arr4.shape

(3,)

In [36]:
# Data type
arr4.dtype

dtype('int64')

### Now we can compute the predicted yield of apples in all regions, using a single matrix multiplication between climate_data (a 5x3 matrix) and weights (a vector length 3)

- we can use np.matmul function from numpy or simply use the @ operator to perform matrix multiplication

In [37]:
# matrix multiplication
data = np.matmul(climate_data, weights)

In [38]:
for i in data:
    print("Data:", i)

Data: 56.8
Data: 76.9
Data: 81.9
Data: 57.699999999999996
Data: 74.9


In [39]:
climate_data @ weights

array([56.8, 76.9, 81.9, 57.7, 74.9])

## Working with CSV (Comma Seperated Values) file:
- Numpy also provide helper functions reading from & writing to files.

In [40]:
# import urllib.request
# url = "https://raw.githubusercontent.com/the-stranger-web/jovian_Data_Analyst/refs/heads/main/italy-covid-daywise.csv"
# urllib.request.urlretrieve(url, 'data.txt')

- We have a climate.txt file.
- To read this into numpy array, we can use `genfromtxt` function

In [42]:
climate_data1 = np.genfromtxt('./data/climate.txt', delimiter=',', skip_header=1)

In [43]:
climate_data1

array([[44.97, 25.79, 60.28],
       [89.98, 68.89, 46.56],
       [86.53, 30.24, 74.74],
       ...,
       [35.78, 78.78, 40.17],
       [40.24, 44.66, 75.39],
       [58.95, 71.33, 71.11]], shape=(10000, 3))

#### Now we can use matrix multiplication operator 

In [44]:
weights = np.array([0.3, 0.2, 0.5])

In [45]:
yields = climate_data1 @ weights

In [46]:
yields

array([48.789, 64.052, 69.377, ..., 46.575, 58.699, 67.506],
      shape=(10000,))

#### Now we can add a new column in climate_data1 as a yields
- Using a np.concatenate function

In [47]:
climate_result = np.concatenate((climate_data1, yields.reshape(10000, 1)), axis=1)

In [48]:
climate_result

array([[44.97 , 25.79 , 60.28 , 48.789],
       [89.98 , 68.89 , 46.56 , 64.052],
       [86.53 , 30.24 , 74.74 , 69.377],
       ...,
       [35.78 , 78.78 , 40.17 , 46.575],
       [40.24 , 44.66 , 75.39 , 58.699],
       [58.95 , 71.33 , 71.11 , 67.506]], shape=(10000, 4))

#### Save txt file
- Using np.savetxt

In [49]:
np.savetxt('climate_result',
           climate_result,
           fmt='%.2f',
           header='temperature,rainfall,humidity,yeild_apples',
           comments="")

## Arithemtic Operators: +, -, *, /

In [50]:
# First 3x3 array
arr1 = np.array([[1, 2, 3], 
                 [4, 5, 6], 
                 [7, 8, 9]])

# Second 3x3 array
arr2 = np.array([[9, 8, 7], 
                 [6, 5, 4], 
                 [3, 2, 1]])

In [51]:
# adding a scalar
arr1 + 3

array([[ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [52]:
# Adding two array
arr1 + arr2

array([[10, 10, 10],
       [10, 10, 10],
       [10, 10, 10]])

In [53]:
# Subtraction by scalar
arr2 - 2

array([[ 7,  6,  5],
       [ 4,  3,  2],
       [ 1,  0, -1]])

In [54]:
# Subtraction of two arrays
arr2 - arr1

array([[ 8,  6,  4],
       [ 2,  0, -2],
       [-4, -6, -8]])

In [55]:
# divide by scalar
arr1/2

array([[0.5, 1. , 1.5],
       [2. , 2.5, 3. ],
       [3.5, 4. , 4.5]])

In [56]:
arr2 / arr1

array([[9.        , 4.        , 2.33333333],
       [1.5       , 1.        , 0.66666667],
       [0.42857143, 0.25      , 0.11111111]])

In [57]:
# Element-wise multiplication
arr1 * arr2

array([[ 9, 16, 21],
       [24, 25, 24],
       [21, 16,  9]])

In [58]:
# Modulus with scalar
arr2 % 3

array([[0, 2, 1],
       [0, 2, 1],
       [0, 2, 1]])

#### Numpy support broadcasting, which allows arithmetic operations between two array having a different number of dimensions, but compatible shape

In [59]:
arr1

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [60]:
arr3 = np.array([2, 4, 6])

In [61]:
arr1 + arr3

array([[ 3,  6,  9],
       [ 6,  9, 12],
       [ 9, 12, 15]])

This convert arr3 into [2, 4, 6], [2, 4, 6], [2, 4, 6]

---

Broadcasting only work if one of the arrays can be replicated to exactly match the shape of the other array

In [62]:
arr4 = np.array([1, 5])

In [63]:
# arr1 + arr4
# This give an error: operands could not be broadcast together with shapes (3,3) (2,) 

### Numpy also supports comparision operators: ==, !=, >, < etc

In [64]:
arr1 == arr2

array([[False, False, False],
       [False,  True, False],
       [False, False, False]])

In [65]:
arr1 >= arr2

array([[False, False, False],
       [False,  True,  True],
       [ True,  True,  True]])

In [66]:
arr1 != arr2

array([[ True,  True,  True],
       [ True, False,  True],
       [ True,  True,  True]])

In [67]:
(arr1 < arr2).dtype
# It's array are boolean

dtype('bool')

In [68]:
(arr1 > arr2).sum()

np.int64(4)

#### **Step 1: Element-wise Comparison (`arr1 > arr2`)**

This creates a **Boolean array** where each element is `True (1)` if `arr1[i, j] > arr2[i, j]`, otherwise `False (0)`.

| **arr1** | **arr2** | **arr1 > arr2** |
|----------|----------|----------------|
| 1        | 9        | **False (0)**   |
| 2        | 8        | **False (0)**   |
| 3        | 7        | **False (0)**   |
| 4        | 6        | **False (0)**   |
| 5        | 5        | **False (0)**   |
| 6        | 4        | **True (1)**    |
| 7        | 3        | **True (1)**    |
| 8        | 2        | **True (1)**    |
| 9        | 1        | **True (1)**    |

The resulting Boolean array:

```python
[[False False False]
 [False False  True]
 [ True  True  True]]
```
#### **Step 2: Summing the `True` Values**
- `True` is equivalent to `1`, and `False` is `0`.

The total count of `True (1)` values is:

```python
0 + 0 + 0 + 0 + 0 + 1 + 1 + 1 + 1 = 4



#### Array Indexing and Slicing

In [69]:
arr = np.array([
    [[1, 2, 3, 4], [5, 6, 7, 8]],  
    [[9, 10, 11, 12], [13, 14, 15, 16]],  
    [[17, 18, 19, 20], [21, 22, 23, 24]]  
])

In [70]:
arr.shape

(3, 2, 4)

In [71]:
# Single element
arr[1, 1, 2]

np.int64(15)

In [72]:
# Working:
print("first: ",arr[1])
print("second: ",arr[1, 1])
print("third: ",arr[1, 1, 2])

first:  [[ 9 10 11 12]
 [13 14 15 16]]
second:  [13 14 15 16]
third:  15


In [73]:
arr

array([[[ 1,  2,  3,  4],
        [ 5,  6,  7,  8]],

       [[ 9, 10, 11, 12],
        [13, 14, 15, 16]],

       [[17, 18, 19, 20],
        [21, 22, 23, 24]]])

In [74]:
# Sub-array
arr[1:, 0:1, 2:]

array([[[11, 12]],

       [[19, 20]]])

In [75]:
arr[1:]

array([[[ 9, 10, 11, 12],
        [13, 14, 15, 16]],

       [[17, 18, 19, 20],
        [21, 22, 23, 24]]])

In [76]:
arr[1:, 0:1]

array([[[ 9, 10, 11, 12]],

       [[17, 18, 19, 20]]])

In [77]:
arr

array([[[ 1,  2,  3,  4],
        [ 5,  6,  7,  8]],

       [[ 9, 10, 11, 12],
        [13, 14, 15, 16]],

       [[17, 18, 19, 20],
        [21, 22, 23, 24]]])

In [78]:
# Mixing indices and ranges
arr[1:, 1, 3]

array([16, 24])

In [79]:
arr[1: , 1, :3]

array([[13, 14, 15],
       [21, 22, 23]])

Note: range preserve dimension

### Other way of creating np.array
### Python3

#### Create a NumPy array using `numpy.arange()`
```python
import numpy as np

print(np.arange(1, 10))
```

#### Create a NumPy array using `numpy.linspace()`
```python
print(np.linspace(1, 10, 3))
```

#### Create a NumPy array using `numpy.zeros()`
```python
print(np.zeros(5, dtype=int))
```

#### Create a NumPy array using `numpy.ones()`
```python
print(np.ones(5, dtype=int))
```

#### Create a NumPy array using `numpy.random.rand()`
```python
print(np.random.rand(5))
```

#### Create a NumPy array using `numpy.random.randint()`
```python
print(np.random.randint(5, size=10))
```


In [80]:
# All zeors
np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [81]:
# All ones
np.ones((2, 3))

array([[1., 1., 1.],
       [1., 1., 1.]])

In [82]:
# Identity Matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [83]:
# Random vector
np.random.rand(3)

array([0.39541664, 0.90540893, 0.88909371])

In [84]:
# Random matrix
np.random.randn(2,3)

array([[ 1.20938841,  0.67628984,  0.8305718 ],
       [-0.57515529,  0.77443766,  0.04797616]])

In [85]:
# Fixed values
np.full([2, 3], 44)

array([[44, 44, 44],
       [44, 44, 44]])

In [86]:
# Range with start, end and step
np.arange(10, 90, 3)

array([10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 49, 52, 55, 58,
       61, 64, 67, 70, 73, 76, 79, 82, 85, 88])

In [87]:
# Equally spaced numbers in range
np.linspace(3, 27, 7)

array([ 3.,  7., 11., 15., 19., 23., 27.])