### Suppose we want to use climate data like the temperature, rainfall, and humidity to determine if a region is well suited for growing apples. A simple approach for doing this would be to formulate the relationship between the annual yield of apples (tons per hectare) and the climatic conditions like the average temperature (in degrees Fahrenheit), rainfall (in millimeters) & average relative humidity (in percentage) as a linear equation.

## yield_of_apples = w1 * temperature + w2 * rainfall + w3 * humidity

#### Based on some statical analysis of historical data, we might come up with reasonable values for the weights w1, w2, and w3

In [1]:
w1, w2, w3 = 0.3, 0.2, 0.5

In [2]:
# convert every region data into list-->1D vector
kanto = [73, 67, 43]
johto = [91, 88, 64]
hoenn = [87, 134, 58]
sinnoh = [102, 43, 37]
unova = [69, 96, 70]

In [3]:
weights = [w1, w2, w3]

In [4]:
# Sum of element by element multiplication --> dot product
# Zip function --> create pair of elements from two lists
def crop_yield(region, weights):
    result = 0
    for x, w in zip(region, weights):
        result += x * w
    return result

In [5]:
crop_yield(kanto, weights)

56.8

In [6]:
crop_yield(johto, weights)

76.9

In [7]:
crop_yield(sinnoh, weights)

57.699999999999996

### The calculation performed by the crop_yield (element-wise multiplication of two vectors and taking a sum of the results) is also called the dot product. The Numpy library provides a built-in function to compute the dot product of two vectors.

In [8]:
import numpy as np

In [11]:
kanto = np.array([73, 67, 43])
kanto

array([73, 67, 43])

In [12]:
weights = np.array([w1, w2, w3])
weights

array([0.3, 0.2, 0.5])

In [13]:
type(kanto)

numpy.ndarray

In [14]:
np.dot(kanto,weights)

np.float64(56.8)

In [15]:
(kanto * weights)

array([21.9, 13.4, 21.5])

In [16]:
(kanto * weights).sum()

np.float64(56.8)

## Benefits of using Numpy arrays
### Ease of use: You can write small, concise, and intuitive mathematical expressions like (kanto * weights).sum() rather than using loops & custom functions like crop_yield.
### Performance: Numpy operations and functions are implemented internally in C++, which makes them much faster than using Python statements & loops that are interpreted at runtime

In [17]:
# Python lists
arr1 = list(range(1000000))
arr2 = list(range(1000000, 2000000))

# Numpy arrays
arr1_np = np.array(arr1)
arr2_np = np.array(arr2)

In [18]:
%%time
result = 0
for x1, x2 in zip(arr1, arr2):
    result += x1*x2
result

CPU times: total: 375 ms
Wall time: 375 ms


833332333333500000

In [19]:
%%time
np.dot(arr1_np, arr2_np)

CPU times: total: 0 ns
Wall time: 2.3 ms


np.int64(833332333333500000)

## Multi-dimensional Numpy arrays : Represent the climate data for all the regions using a single 2-dimensional Numpy array.

In [20]:
climate_data = np.array([[73, 67, 43],
                         [91, 88, 64],
                         [87, 134, 58],
                         [102, 43, 37],
                         [69, 96, 70]])
climate_data

array([[ 73,  67,  43],
       [ 91,  88,  64],
       [ 87, 134,  58],
       [102,  43,  37],
       [ 69,  96,  70]])

In [21]:
# 2D array (matrix)
climate_data.shape

(5, 3)

In [22]:
# 1D array (vector)
weights.shape

(3,)

In [25]:
# 3D array 
arr3 = np.array([
    [[11, 12, 13], 
     [13, 14, 15]], 
    [[15, 16, 17], 
     [17, 18, 19.5]]])
arr3.shape

(2, 2, 3)

### We can now compute the predicted yields of apples in all the regions, using a single matrix multiplication between climate_data (a 5x3 matrix) and weights (a vector of length 3).

In [28]:
np.matmul(climate_data, weights)

array([56.8, 76.9, 81.9, 57.7, 74.9])

In [27]:
climate_data @ weights

array([56.8, 76.9, 81.9, 57.7, 74.9])

In [30]:
import urllib.request

urllib.request.urlretrieve(
    'https://gist.github.com/BirajCoder/a4ffcb76fd6fb221d76ac2ee2b8584e9/raw/4054f90adfd361b7aa4255e99c2e874664094cea/climate.csv', 
    'climate.txt')

('climate.txt', <http.client.HTTPMessage at 0x28f6831fa80>)

In [32]:
climate_data = np.genfromtxt('climate.txt', delimiter=',', skip_header=1)
climate_data

array([[25., 76., 99.],
       [39., 65., 70.],
       [59., 45., 77.],
       ...,
       [99., 62., 58.],
       [70., 71., 91.],
       [92., 39., 76.]])

In [33]:
climate_data.shape

(10000, 3)

In [35]:
yields = climate_data @ weights
yields

array([72.2, 59.7, 65.2, ..., 71.1, 80.7, 73.4])

In [36]:
yields.shape

(10000,)

In [39]:
yields.reshape(10000,1)

array([[72.2],
       [59.7],
       [65.2],
       ...,
       [71.1],
       [80.7],
       [73.4]])

In [37]:
climate_results = np.concatenate((climate_data, yields.reshape(10000, 1)), axis=1)
climate_results

array([[25. , 76. , 99. , 72.2],
       [39. , 65. , 70. , 59.7],
       [59. , 45. , 77. , 65.2],
       ...,
       [99. , 62. , 58. , 71.1],
       [70. , 71. , 91. , 80.7],
       [92. , 39. , 76. , 73.4]])

In [40]:
np.savetxt('climate_results.txt', 
           climate_results, 
           fmt='%.2f', 
           delimiter=',',
           header='temperature,rainfall,humidity,yeild_apples', 
           comments='')

## Arithmetic operations, broadcasting and comparison

In [54]:
arr2 = np.array([[1, 2, 3, 4], 
                 [5, 6, 7, 8], 
                 [9, 1, 2, 3]])

In [55]:
arr3 = np.array([[11, 12, 13, 14], 
                 [15, 16, 17, 18], 
                 [19, 11, 12, 13]])

In [56]:
arr2+12

array([[13, 14, 15, 16],
       [17, 18, 19, 20],
       [21, 13, 14, 15]])

In [57]:
arr3*2

array([[22, 24, 26, 28],
       [30, 32, 34, 36],
       [38, 22, 24, 26]])

In [59]:
arr2+arr3

array([[12, 14, 16, 18],
       [20, 22, 24, 26],
       [28, 12, 14, 16]])

In [61]:
# Broadcasting of array --> Replicate the array with same size as higher dimension array
arr4 = np.array([4, 5, 6, 7])

In [62]:
arr2+arr4

array([[ 5,  7,  9, 11],
       [ 9, 11, 13, 15],
       [13,  6,  8, 10]])