## Working with numerical data

###### Suppose we want to use climate data like the temperature, rainfall, and humidity to determine if a region is well suited for growing apples. A simple approach for doing this would be to formulate the relationship between the annual yield of apples (tons per hectare) and the climatic conditions like the average temperature (in degrees Fahrenheit), rainfall (in millimeters) & average relative humidity (in percentage) as a linear equation.

```yield_of_apples = w1 * temperature + w2 * rainfall + w3 * humidity```

###### We're expressing the yield of apples as a weighted sum of the temperature, rainfall, and humidity. This equation is an approximation since the actual relationship may not necessarily be linear, and there may be other factors involved. But a simple linear model like this often works well in practice.

###### Based on some statical analysis of historical data, we might come up with reasonable values for the weights `w1`, `w2`, and `w3`. Here's an example set of values:



In [2]:
w1, w2, w3 = 0.3, 0.2, 0.5

###### Given some climate data for a region, we can now predict the yield of apples. Here's some sample data:

<img src="https://i.imgur.com/TXPBiqv.png" alt="data" width="500"/>

In [3]:
kanto = [73, 67, 43]
johto = [91, 88, 64]
hoenn = [87, 134, 58]
sinnoh = [102, 43, 37]
unova = [69, 96, 70]
weights = [w1, w2, w3]

In [4]:
print(tuple(zip(kanto, weights)))

((73, 0.3), (67, 0.2), (43, 0.5))


###### We can now write a function `crop_yield` to calcuate the yield of apples (or any other crop) given the climate data and the respective weights.

In [5]:
def crop_yield(region, weights):
    result = 0
    for x, w in zip(region, weights):
        result += x * w
    return result

In [6]:
crop_yield(kanto, weights)

56.8

In [7]:
crop_yield(johto, weights)

76.9

In [8]:
crop_yield(unova, weights)

74.9

## Going from Python lists to Numpy arrays

In [9]:
import numpy as np

In [10]:
kanto = np.array([73, 67, 43])

In [11]:
kanto

array([73, 67, 43])

In [12]:
weights = np.array([w1, w2, w3])

In [13]:
weights

array([0.3, 0.2, 0.5])

In [14]:
type(kanto)

numpy.ndarray

In [15]:
type(weights)

numpy.ndarray

In [16]:
weights[0]

0.3

In [17]:
kanto[2]

43

## Operating on Numpy arrays

In [18]:
np.dot(kanto, weights)

56.8

In [19]:
(kanto * weights).sum()

56.8

In [20]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

In [21]:
arr1 * arr2

array([ 4, 10, 18])

In [22]:
arr2.sum()

15

In [23]:
# Python lists
arr1 = list(range(1000000))
arr2 = list(range(1000000, 2000000))

# Numpy arrays
arr1_np = np.array(arr1, dtype=np.int64)
arr2_np = np.array(arr2, dtype=np.int64)

In [24]:
%%time
result = 0
for x1, x2 in zip(arr1, arr2):
    result += x1*x2
result

CPU times: total: 46.9 ms
Wall time: 134 ms


833332333333500000

In [25]:
%%time
np.dot(arr1_np, arr2_np)

CPU times: total: 0 ns
Wall time: 1.14 ms


833332333333500000

## Multi-dimensional Numpy arrays

In [26]:
climate_data = np.array([[73, 67, 43],
                         [91, 88, 64],
                         [87, 134, 58],
                         [102, 43, 37],
                         [69, 96, 70]])

In [27]:
climate_data

array([[ 73,  67,  43],
       [ 91,  88,  64],
       [ 87, 134,  58],
       [102,  43,  37],
       [ 69,  96,  70]])

<img src="https://fgnt.github.io/python_crashkurs_doc/_images/numpy_array_t.png" alt="matrix" width="500"/>

In [28]:
# 2D array (matrix)
climate_data.shape

(5, 3)

In [29]:
weights

array([0.3, 0.2, 0.5])

In [30]:
# 1D array (vector)
weights.shape

(3,)

In [31]:
# 3D array
arr3 = np.array([
    [[11, 12, 13],
     [13, 14, 15]],
    [[15, 16, 17],
     [17, 18, 19.5]]])


In [32]:
arr3.shape

(2, 2, 3)

In [33]:
weights.dtype

dtype('float64')

In [34]:
climate_data.dtype

dtype('int32')

In [35]:
arr3.dtype

dtype('float64')

###### We can now compute the predicted yields of apples in all the regions, using a single matrix multiplication between climate_data (a 5x3 matrix) and weights (a vector of length 3). Here's what it looks like visually:

<img src="https://i.imgur.com/LJ2WKSI.png" alt="matrix mult" width="500"/>


In [36]:
np.matmul(climate_data, weights)

array([56.8, 76.9, 81.9, 57.7, 74.9])

In [37]:
climate_data @ weights

array([56.8, 76.9, 81.9, 57.7, 74.9])

## Working with CSV data files

In [38]:
climate_data = np.genfromtxt('climate.csv', delimiter=',', skip_header=1)

In [39]:
climate_data

array([[25., 76., 99.],
       [39., 65., 70.],
       [59., 45., 77.],
       ...,
       [99., 62., 58.],
       [70., 71., 91.],
       [92., 39., 76.]])

In [40]:
climate_data.shape

(10000, 3)

In [41]:
weights = np.array([0.3, 0.2, 0.5])

In [42]:
yields = climate_data @ weights

In [43]:
yields

array([72.2, 59.7, 65.2, ..., 71.1, 80.7, 73.4])

In [44]:
yields.shape

(10000,)

In [45]:
yields.reshape(10000, 1)

array([[72.2],
       [59.7],
       [65.2],
       ...,
       [71.1],
       [80.7],
       [73.4]])

In [46]:
climate_results = np.concatenate(
    (climate_data, yields.reshape(10000, 1)), axis=1)


In [47]:
climate_results

array([[25. , 76. , 99. , 72.2],
       [39. , 65. , 70. , 59.7],
       [59. , 45. , 77. , 65.2],
       ...,
       [99. , 62. , 58. , 71.1],
       [70. , 71. , 91. , 80.7],
       [92. , 39. , 76. , 73.4]])

In [48]:
np.savetxt('climate_results.txt',
            climate_results,
            fmt='%.2f',
            delimiter=',',
            header='temperature,rainfall,humidity,yielded_apples',
            comments='')

## Arithmetic operations, broadcasting and comparison

In [49]:
arr2 = np.array([[1, 2, 3, 4],
                 [5, 6, 7, 8],
                 [9, 1, 2, 3]])

In [51]:
arr3 = np.array([[11, 12, 13, 14],
                 [15, 16, 17, 18],
                 [19, 11, 12, 13]])

In [52]:
# Adding a scalar
arr2 + 3

array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12,  4,  5,  6]])

In [53]:
# Element-wise subtraction
arr3 - arr2

array([[10, 10, 10, 10],
       [10, 10, 10, 10],
       [10, 10, 10, 10]])

In [54]:
# Division by scalar
arr2 / 2

array([[0.5, 1. , 1.5, 2. ],
       [2.5, 3. , 3.5, 4. ],
       [4.5, 0.5, 1. , 1.5]])

In [55]:
# Element-wise multiplication
arr2 * arr3

array([[ 11,  24,  39,  56],
       [ 75,  96, 119, 144],
       [171,  11,  24,  39]])

In [56]:
# Modulus with scalar
arr2 % 4

array([[1, 2, 3, 0],
       [1, 2, 3, 0],
       [1, 1, 2, 3]], dtype=int32)

## Array Broadcasting

In [57]:
arr2 = np.array([[1, 2, 3, 4],
                 [5, 6, 7, 8],
                 [9, 1, 2, 3]])

In [58]:
arr2.shape

(3, 4)

In [59]:
arr4 = np.array([4, 5, 6, 7])

In [60]:
arr4.shape

(4,)

In [61]:
arr2 + arr4

array([[ 5,  7,  9, 11],
       [ 9, 11, 13, 15],
       [13,  6,  8, 10]])

In [62]:
arr5 = np.array([7, 8])

In [63]:
arr5.shape

(2,)

In [64]:
arr2 + arr5

ValueError: operands could not be broadcast together with shapes (3,4) (2,) 

## Array Comparison

In [65]:
arr1 = np.array([[1, 2, 3], [3, 4, 5]])
arr2 = np.array([[2, 2, 3], [1, 2, 5]])

In [66]:
arr1 == arr2

array([[False,  True,  True],
       [False, False,  True]])

In [67]:
arr1 != arr2

array([[ True, False, False],
       [ True,  True, False]])

In [68]:
arr1 >= arr2

array([[False,  True,  True],
       [ True,  True,  True]])

In [69]:
arr1 < arr2

array([[ True, False, False],
       [False, False, False]])

In [70]:
(arr1 == arr2).sum()

3

## Array indexing and slicing

In [71]:
arr3 = np.array([
    [[11, 12, 13, 14],
     [13, 14, 15, 19]],

    [[15, 16, 17, 21],
     [63, 92, 36, 18]],

    [[98, 32, 81, 23],
     [17, 18, 19.5, 43]]])


In [72]:
arr3.shape

(3, 2, 4)

In [76]:
# Single element
arr3[1, 1, 2]

36.0

In [87]:
# Subarray using ranges
arr3[1:, 0:1, :2]

array([[[15., 16.]],

       [[98., 32.]]])

In [94]:
# Mixing indices and ranges
arr3[1:, 1, :3]

array([[63. , 92. , 36. ],
       [17. , 18. , 19.5]])

In [95]:
# Using fewer indices
arr3[1]

array([[15., 16., 17., 21.],
       [63., 92., 36., 18.]])

In [97]:
# Using fewer indices
arr3[:2, 1]

array([[13., 14., 15., 19.],
       [63., 92., 36., 18.]])

In [98]:
# Using too many indices
arr3[1, 3, 2, 1]

IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed

## Other ways of creating Numpy arrays

In [99]:
# All zeros
np.zeros((3, 2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [100]:
# All ones
np.ones([2, 2, 3])

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

In [105]:
# Identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [102]:
# Random vector
np.random.rand(5)

array([0.80909611, 0.13538636, 0.60539344, 0.11200257, 0.29376835])

In [110]:
# Random matrix
np.random.randn(2, 3) # rand vs. randn - what's the difference?

array([[ 0.52403251, -0.55155186,  0.39711825],
       [ 0.66250033,  0.42218412,  1.22593525]])

In [113]:
# Fixed value
np.full([2, 3], 42)

array([[42, 42, 42],
       [42, 42, 42]])

In [116]:
# Range with start, end and step
np.arange(10, 90, 3)

array([10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 49, 52, 55, 58,
       61, 64, 67, 70, 73, 76, 79, 82, 85, 88])

In [125]:
# Equally spaced numbers in a range
np.linspace(3, 27, 9)

array([ 3.,  6.,  9., 12., 15., 18., 21., 24., 27.])

: 