# List working

In [114]:
w1, w2, w3 = 0.3, 0.2, 0.5

In [115]:
kanto_temp = 73
kanto_rainfall = 67
kanto_humadity = 43

In [116]:
kanto_yeild_apples = kanto_temp * w1 + kanto_rainfall * w2 + kanto_humadity * w3
kanto_yeild_apples

56.8

In [117]:
print("The expected yeild of apples in kanto region is {} tons per hectare".format(kanto_yeild_apples))

The expected yeild of apples in kanto region is 56.8 tons per hectare


To make it slightly easier to perform the above computation for multiple regions, we can represent the climate data for each region as a vector, i.e., a list of numbers.

In [118]:
kanto = [73, 67, 43]
johto = [91, 88, 64]
hoenn = [87, 134, 58]
sinnoh = [102, 43, 37]
unova = [69, 96, 70]

The three numbers in each vector represent the temperature, rainfall and humadity

In [119]:
weights = [w1, w2, w3]

In [120]:
def crop_yeilds(region, weights):
    result = 0
    for x, w in zip(region, weights):
        result += x * w
    return result

In [121]:
crop_yeilds(kanto, weights)

56.8

# going from python list to NUMPY ARRAYS

In [122]:
import numpy as np

In [123]:
kanto = np.array([73, 67, 43])

In [124]:
kanto

array([73, 67, 43])

In [125]:
weights = np.array([w1, w2, w3])

In [126]:
weights

array([0.3, 0.2, 0.5])

In [127]:
type(kanto)

numpy.ndarray

Numpy arrays have the type `ndarray`.

## Operating on Numpy arrays

We can now compute the dot product of the two vectors using the `np.dot` function.

In [128]:
np.dot(kanto, weights)

56.8

In [129]:
(kanto * weights).sum()

56.8

The `*` operator performs an element-wise multiplication of two arrays if they have the same size. The `sum` method calculates the sum of numbers in an array.

In [130]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

In [131]:
arr1 * arr2

array([ 4, 10, 18])

In [132]:
arr1.sum()

6

## Benefits of using Numpy arrays

Numpy arrays offer the following benefits over Python lists for operating on numerical data:

- **Ease of use**: You can write small, concise, and intuitive mathematical expressions like `(kanto * weights).sum()` rather than using loops & custom functions like `crop_yield`.
- **Performance**: Numpy operations and functions are implemented internally in C++, which makes them much faster than using Python statements & loops that are interpreted at runtime

Here's a comparison of dot products performed using Python loops vs. Numpy arrays on two vectors with a million elements each.

# Multidimensional Numpy array

In [133]:
climate_data = np.array([[73, 67, 47],
                         [91, 88, 64],
                         [87, 134, 58],
                         [102, 43, 37],
                         [69, 96, 70]])

In [134]:
climate_data

array([[ 73,  67,  47],
       [ 91,  88,  64],
       [ 87, 134,  58],
       [102,  43,  37],
       [ 69,  96,  70]])

If you've taken a linear algebra class in high school, you may recognize the above 2-d array as a matrix with five rows and three columns. Each row represents one region, and the columns represent temperature, rainfall, and humidity, respectively.

Numpy arrays can have any number of dimensions and different lengths along each dimension. We can inspect the length along each dimension using the `.shape` property of an array.

<img src="https://fgnt.github.io/python_crashkurs_doc/_images/numpy_array_t.png" width="420">


In [135]:
climate_data.shape

(5, 3)

In [136]:
# 3D array

arr3 = np.array([
    [[11, 12, 13], 
     [13, 14, 15]], 
    [[15, 16, 17], 
     [17, 18, 19.5]]])

In [137]:
arr3.shape

(2, 2, 3)

# Matrix Multiplication

We can now compute the predicted yields of apples in all the regions, using a single matrix multiplication between `climate_data` (a 5x3 matrix) and `weights` (a vector of length 3). Here's what it looks like visually:

<img src="https://i.imgur.com/LJ2WKSI.png" width="240">

We can use the `np.matmul` function or the `@` operator to perform matrix multiplication.

In [138]:
np.matmul(climate_data, weights)

array([58.8, 76.9, 81.9, 57.7, 74.9])

In [139]:
climate_data @ weights

array([58.8, 76.9, 81.9, 57.7, 74.9])

# Working with CSV data files

CSVs: comma seperated values

In [140]:
import urllib.request

urllib.request.urlretrieve(
    'https://gist.github.com/BirajCoder/a4ffcb76fd6fb221d76ac2ee2b8584e9/raw/4054f90adfd361b7aa4255e99c2e874664094cea/climate.csv', 
    'climate.txt')

('climate.txt', <http.client.HTTPMessage at 0x26b4d9ca670>)

In [141]:
climate_data = np.genfromtxt('climate.txt', delimiter=',', skip_header=1)

To read this file into numpy array, we use the `genfromtxt` function. climate text is a file. delimiter means values are seperathed by which operator, here it is ','. `skip_header` skip rows

In [142]:
climate_data

array([[25., 76., 99.],
       [39., 65., 70.],
       [59., 45., 77.],
       ...,
       [99., 62., 58.],
       [70., 71., 91.],
       [92., 39., 76.]])

In [143]:
weights

array([0.3, 0.2, 0.5])

In [144]:
yeilds = climate_data @ weights

In [145]:
yeilds

array([72.2, 59.7, 65.2, ..., 71.1, 80.7, 73.4])

In [146]:
yeilds.shape

(10000,)

Lets add `yeilds` to `climate_data` as a fourth column using the `np.concatenate` function

In [147]:
climate_results = np.concatenate((climate_data, yeilds.reshape(10000, 1)), axis =1)

In [148]:
climate_results

array([[25. , 76. , 99. , 72.2],
       [39. , 65. , 70. , 59.7],
       [59. , 45. , 77. , 65.2],
       ...,
       [99. , 62. , 58. , 71.1],
       [70. , 71. , 91. , 80.7],
       [92. , 39. , 76. , 73.4]])

# Save text as file in current directory

lets write the the final result from our computation above back to a file using `np.savetxt` function

In [149]:
np.savetxt('climate_results.txt',
          climate_results,
          fmt= '%.2f',
          delimiter = ',',
          header = 'temperature,rainfall,humadity,yeilds_apples',
          comments = '')

# Arithmatic operation, broadcasting and comparision

In [150]:
arr2 = np.array([[1, 2, 3, 4], 
                 [5, 6, 7, 8], 
                 [9, 1, 2, 3]])

In [151]:
arr3 = np.array([[11, 12, 13, 14],
                 [15, 16, 17, 18],
                 [19, 11, 12, 13]])

### Adding a scaler

In [152]:
arr2 + 3

array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12,  4,  5,  6]])

### Element wise substraction

In [153]:
arr3 - arr2

array([[10, 10, 10, 10],
       [10, 10, 10, 10],
       [10, 10, 10, 10]])

### Divion by scaler

In [154]:
arr2 / 2

array([[0.5, 1. , 1.5, 2. ],
       [2.5, 3. , 3.5, 4. ],
       [4.5, 0.5, 1. , 1.5]])

### element wise multiplication

In [155]:
arr2 * arr3

array([[ 11,  24,  39,  56],
       [ 75,  96, 119, 144],
       [171,  11,  24,  39]])

### Modulus with scaler

In [156]:
arr3 % 5

array([[1, 2, 3, 4],
       [0, 1, 2, 3],
       [4, 1, 2, 3]], dtype=int32)

# Array Broadcasting

Numpy arrays also support *broadcasting*, allowing arithmetic operations between two arrays with different numbers of dimensions but compatible shapes. Let's look at an example to see how it works.

In [157]:
arr2 = np.array([[1, 2, 3, 4], 
                 [5, 6, 7, 8], 
                 [9, 1, 2, 3]])
arr2.shape

(3, 4)

In [158]:
arr4 = np.array([4, 5, 6, 7])

arr4.shape

(4,)

In [159]:
arr2 + arr4

array([[ 5,  7,  9, 11],
       [ 9, 11, 13, 15],
       [13,  6,  8, 10]])

When the expression `arr2 + arr4` is evaluated, `arr4` (which has the shape `(4,)`) is replicated three times to match the shape `(3, 4)` of `arr2`. Numpy performs the replication without actually creating three copies of the smaller dimension array, thus improving performance and using lower memory.

<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png" width="360">

Broadcasting only works if one of the arrays can be replicated to match the other array's shape.

# Array Comparision

In [160]:
arr1 = np.array([[1, 2, 3], [3, 4, 5]])
arr2 = np.array([[2, 2, 3], [1, 2, 5]])

In [161]:
arr1 == arr2

array([[False,  True,  True],
       [False, False,  True]])

In [162]:
arr1 >= arr2

array([[False,  True,  True],
       [ True,  True,  True]])

In [163]:
(arr1 == arr2).sum()

3

## Array indexing and slicing

Indexing Start with 0

Numpy extends Python's list indexing notation using `[]` to multiple dimensions in an intuitive fashion. You can provide a comma-separated list of indices or ranges to select a specific element or a subarray (also called a slice) from a Numpy array.

In [164]:
arr3 = np.array([
    [[11, 12, 13, 14], 
     [13, 14, 15, 19]], 
    
    [[15, 16, 17, 21], 
     [63, 92, 36, 18]], 
    
    [[98, 32, 81, 23],      
     [17, 18, 19.5, 43]]])

In [165]:
arr3.shape

(3, 2, 4)

In [166]:
# Single element
arr3[1, 1, 2]

36.0

In [167]:
# Subarray using ranges
arr3[1:, 0:1, :2]

array([[[15., 16.]],

       [[98., 32.]]])

In [168]:
# Mixing indices and ranges
arr3[1:, 1, 3]

array([18., 43.])

In [169]:
# Mixing indices and ranges
arr3[1:, 1, :3]

array([[63. , 92. , 36. ],
       [17. , 18. , 19.5]])

In [170]:
# Using fewer indices
arr3[1]

array([[15., 16., 17., 21.],
       [63., 92., 36., 18.]])

In [171]:
# Using fewer indices
arr3[:2, 1]

array([[13., 14., 15., 19.],
       [63., 92., 36., 18.]])

## Other ways of creating Numpy arrays
[official documentation](https://numpy.org/doc/stable/reference/routines.array-creation.html)

In [172]:
# All zeros
np.zeros((3, 2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [173]:
# All ones
np.ones([2, 2, 3])

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

In [174]:
# Identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [175]:
# Random vector
np.random.rand(5)

array([0.29476959, 0.91758798, 0.70093819, 0.68854189, 0.3380881 ])

In [176]:
# Fixed value
np.full([2, 3], 42)

array([[42, 42, 42],
       [42, 42, 42]])

In [177]:
# Range with start, end and step
np.arange(10, 90, 3)

array([10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 49, 52, 55, 58,
       61, 64, 67, 70, 73, 76, 79, 82, 85, 88])

In [178]:
# Equally spaced numbers in a range
np.linspace(3, 27, 9)

array([ 3.,  6.,  9., 12., 15., 18., 21., 24., 27.])