# Numerical Computing with Python and Numpy

This notebook covers the following topics:

- Working with numerical data in Python
- Going from Python lists to Numpy arrays
- Multi-dimensional Numpy arrays and their benefits
- Array operations, broadcasting, indexing, and slicing
- Working with CSV data files using Numpy

## Working with numerical data

The "data" in *Data Analysis* typically refers to numerical data, e.g., stock prices, sales figures, sensor measurements, sports scores, database tables, etc. The [Numpy](https://numpy.org) library provides specialized data structures, functions, and other tools for numerical computing in Python. Let's work through an example to see why & how to use Numpy for working with numerical data.


> Suppose we want to use climate data like the temperature, rainfall, and humidity to determine if a region is well suited for growing apples. A simple approach for doing this would be to formulate the relationship between the annual yield of apples (tons per hectare) and the climatic conditions like the average temperature (in degrees Fahrenheit), rainfall (in  millimeters) & average relative humidity (in percentage) as a linear equation.
>
> `yield_of_apples = w1 * temperature + w2 * rainfall + w3 * humidity`

We're expressing the yield of apples as a weighted sum of the temperature, rainfall, and humidity. This equation is an approximation since the actual relationship may not necessarily be linear, and there may be other factors involved. But a simple linear model like this often works well in practice.

Based on some statical analysis of historical data, we might come up with reasonable values for the weights `w1`, `w2`, and `w3`. Here's an example set of values:

Given some climate data for a region, we can now predict the yield of apples. Here's some sample data:

<img src="https://i.imgur.com/TXPBiqv.png" style="width:360px;">

To begin, we can define some variables to record climate data for a region.

In [41]:
import jovian

In [38]:
w1, w2, w3 = 0.3, 0.2, 0.5

In [2]:
kanto_temp = 73
kanto_rainfall = 67
kanto_humidity = 43

In [4]:
kanto_yield_apples = kanto_temp * w1 + kanto_rainfall * w2 + kanto_humidity * w3
kanto_yield_apples

56.8

In [5]:
kanto = [73, 67, 43]
johto = [91, 88, 64]
hoenn = [87, 134, 58]
sinnoh = [102, 43, 37]
unova = [69, 96, 70]

In [6]:
weights = [w1, w2, w3]

#### Zip function

In [18]:
a = zip(kanto,weights)

In [19]:
a

<zip at 0x7fe6423be900>

In [20]:
print(tuple(a))

((73, 0.3), (67, 0.2), (43, 0.5))


In [14]:
for item,t in zip(kanto,weights):
    print(item,t)

73 0.3
67 0.2
43 0.5


We can now write a function `crop_yield` to calcuate the yield of apples (or any other crop) given the climate data and the respective weights.

In [23]:
def crop_yield(reigon,weight):
    result = 0
    for x , w in zip(reigon,weight):
        result += x * w
    return result

In [25]:
crop_yield(kanto,weights)

56.8

# Going from Python lists to Numpy arrays


The calculation performed by the `crop_yield` (element-wise multiplication of two vectors and taking a sum of the results) is also called the *dot product*. Learn more about dot product here: https://www.khanacademy.org/math/linear-algebra/vectors-and-spaces/dot-cross-products/v/vector-dot-product-and-vector-length . 

The Numpy library provides a built-in function to compute the dot product of two vectors. However, we must first convert the lists into Numpy arrays.

Let's install the Numpy library using the `pip` package manager.

In [28]:
!pip install numpy --upgrade --quiet

In [29]:
import numpy as np

In [30]:
kanto = np.array([73, 67, 43])

In [31]:
kanto

array([73, 67, 43])

In [32]:
type(kanto)

numpy.ndarray

In [33]:
weights = np.array([w1, w2, w3])

Just like lists, Numpy arrays support the indexing notation `[]`.

In [35]:
weights[2]

0.5

## Operating on Numpy arrays

We can now compute the dot product of the two vectors using the `np.dot` function.

In [36]:
np.dot(kanto,weights)

56.8

In [37]:
(kanto * weights).sum()

56.8

In [42]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Committed successfully! https://jovian.ai/ajmehdi5/numerical-computing-with-numpy[0m


'https://jovian.ai/ajmehdi5/numerical-computing-with-numpy'

## Multi-dimensional Numpy arrays 

We can now go one step further and represent the climate data for all the regions using a single 2-dimensional Numpy array.

In [43]:
climate_data = np.array([[73, 67, 43],
                         [91, 88, 64],
                         [87, 134, 58],
                         [102, 43, 37],
                         [69, 96, 70]])

In [44]:
climate_data

array([[ 73,  67,  43],
       [ 91,  88,  64],
       [ 87, 134,  58],
       [102,  43,  37],
       [ 69,  96,  70]])

In [45]:
# 2D array (matrix)
climate_data.shape

(5, 3)

In [46]:
weights

array([0.3, 0.2, 0.5])

In [47]:
weights.shape

(3,)

In [48]:
# 3D array
arr3 = np.array([
    [[11, 12, 13], 
     [13, 14, 15]], 
    [[15, 16, 17], 
     [17, 18, 19.5]]])

In [49]:
arr3.shape

(2, 2, 3)

In [50]:
weights.dtype

dtype('float64')

In [54]:
climate_data.dtype

dtype('int64')

In [55]:
type(climate_data)

numpy.ndarray

All the elements in a numpy array have the same data type. You can check the data type of an array using the `.dtype` property.

In [56]:
arr3.dtype

dtype('float64')

In [57]:
arr3

array([[[11. , 12. , 13. ],
        [13. , 14. , 15. ]],

       [[15. , 16. , 17. ],
        [17. , 18. , 19.5]]])

We can now compute the predicted yields of apples in all the regions, using a single matrix multiplication between `climate_data` (a 5x3 matrix) and `weights` (a vector of length 3). Here's what it looks like visually:

<img src="https://i.imgur.com/LJ2WKSI.png" width="240">

You can learn about matrices and matrix multiplication by watching the first 3-4 videos of this playlist: https://www.youtube.com/watch?v=xyAuNHPsq-g&list=PLFD0EB975BA0CC1E0&index=1 .

We can use the `np.matmul` function or the `@` operator to perform matrix multiplication.

In [58]:
np.matmul(climate_data,weights)

array([56.8, 76.9, 81.9, 57.7, 74.9])

In [60]:
(climate_data @ weights)

array([56.8, 76.9, 81.9, 57.7, 74.9])

In [61]:
np.array([1,2,4])

array([1, 2, 4])