![Numpy logo](images\numpy.png)

# NumPy Tutorials - PART 1: Basics

## 1 Setup

### 1.1 Install library

Check Setup in [README.md][1] for instaling NumPy along with other libraries.

[1]: https://github.com/DheemanthBhat/python-data-science-libraries/blob/main/README.md

### 1.2 Import library

In [1]:
import numpy as np


np.set_printoptions(linewidth=88)

print("NumPy version:", np.__version__)

NumPy version: 2.3.0


## 2 Create

### 2.1 One Dimensional Array

Create one dimensional NumPy array.

In [2]:
np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

### 2.2 Two Dimensional Array

Create two dimensional NumPy array.

In [3]:
np.array(
    [
        [1, 2, 3, 4, 5],
        [6, 7, 8, 9, 0],
    ]
)

array([[1, 2, 3, 4, 5],
       [6, 7, 8, 9, 0]])

### 2.3 Multi Dimensional Array

Create three dimensional NumPy array.

In [4]:
np.array(
    [
        [
            [1, 2, 3, 4, 5],
            [6, 7, 8, 9, 0],
        ],
        [
            [1, 2, 3, 4, 5],
            [6, 7, 8, 9, 0],
        ],
    ]
)

array([[[1, 2, 3, 4, 5],
        [6, 7, 8, 9, 0]],

       [[1, 2, 3, 4, 5],
        [6, 7, 8, 9, 0]]])

### 2.4 Range using `arange()`

NumPy's `arange()` takes three parameters:

1. `start`: Starting value in NumPy array (inclusive).
2. `stop`: Ending value in NumPy array (exclusive).
3. `step`: Step size between two elements.

> **Note**:
> 
> Stop value is exclusive, meaning the stop value is not included in the resulting NumPy array.

In [5]:
np.arange(start=1, stop=100, step=1)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
       21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
       41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
       61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
       81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

#### Calculate stop value

When given the required array length that grows with non default step size of 1, formula for calculating ending value is:

```
stop_value = start_value + (length * step_size)
```

> **Note**:
> 
> Still the stop value is exclusive.

##### Quiz #1

Create an array of length `10` starting from `6` with a step value of `5`

In [6]:
start_val = 6
length = 10
step_size = 5

# Calculate stop value.
stop_val = start_val + (length * step_size)

print("Start value:", start_val)
print("Stop value:", stop_val)

# Generate NumPy array.
np.arange(start=start_val, stop=stop_val, step=step_size)

Start value: 6
Stop value: 56


array([ 6, 11, 16, 21, 26, 31, 36, 41, 46, 51])

## 3 Why NumPy array?

### 3.1 Python `list` vs NumPy array

#### Normal Python list

1. Python lists can store heterogeneous values.
2. Python lists are arrays of pointers to objects (each object is separately allocated on the heap) hence have higher space complexity compared to NumPy array.
3. Any math operation on Python list takes significantly large amount of time as compared to NumPy array.

In [7]:
start_val = 0
stop_val = 5_000_000

In [8]:
%%timeit

[i**2 for i in range(start_val, stop_val)]

349 ms ± 7.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


#### NumPy array

1. NumPy arrays are contiguous blocks of memory with homogeneous data types, stored more compactly.

In [9]:
%%timeit

np.arange(start_val, stop_val) ** 2

19.8 ms ± 111 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)


### 3.2 Python `range` vs NumPy `arange`

NumPy `arange()` supports decimal step value whereas Python `range()` function **does not** support decimal values for step size.

In [10]:
try:
    print(np.arange(0, 5, 0.5))
    print(range(0, 5, 0.5))

except TypeError as err:
    print("Error:", err)

[0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5]
Error: 'float' object cannot be interpreted as an integer


## 4 Properties of NumPy array

Some important properties of NumPy array can be accessed using:

1. `ndim`: Dimension of NumPy array.
2. `shape`: Shape (rows, columns) of NumPy array.
3. `size`: Size i.e., Total number of elements in all dimensions of NumPy array.
4. `dtype`: Datatype of elements in NumPy array.

Create sample NumPy arrays for testing its properties.

In [11]:
arr_1 = np.array([1, 2, 3, 4, 5])

arr_2 = np.array(
    [
        [1, 2.2, 3.0, 4.0, 5],
        [6.0, 7, 8, 9, 0],
    ]
)

arr_3 = np.array(
    [
        [
            [1, 2, 3, 4, 5],
            [6, 7, 8, 9, 0],
        ],
        [
            [1, 2, 3, 4, 5],
            [6, 7, 8, 9, 0],
        ],
    ]
)

#### `ndim`

In [12]:
arr_1.ndim

1

In [13]:
arr_2.ndim

2

In [14]:
arr_3.ndim

3

#### `shape`

In [15]:
arr_1.shape

(5,)

In [16]:
arr_2.shape

(2, 5)

In [17]:
arr_3.shape

(2, 2, 5)

#### `size`

In [18]:
arr_1.size

5

In [19]:
arr_2.size

10

In [20]:
arr_3.size

20

#### `dtype`

In [21]:
arr_1.dtype

dtype('int64')

In [22]:
arr_2.dtype

dtype('float64')

## 5 Datatype

### 5.1 Supported datatypes

### 5.2 Typecasting

String > Float > Integer > Boolean

#### Automatic type casting

In [23]:
arr = np.array([1, "abcd", 3.14, True])
arr

array(['1', 'abcd', '3.14', 'True'], dtype='<U32')

In [24]:
arr.dtype

dtype('<U32')

In [25]:
arr = np.array(["a", 3.14])
arr.dtype

dtype('<U32')

> **Note**:
> 
> `U` in `<U32` stands for Unicode.

#### Using `dtype` property

#### Using `astype` function

## 6 Indexing

[start:stop:step]

### 6.1 Fetch

#### Fetch Usecases

##### 1 Fetch even numbers

In [26]:
arr = np.arange(0, 10)
arr[::2]

array([0, 2, 4, 6, 8])

##### 2 Fetch odd numbers

In [27]:
arr = np.arange(1, 10)
arr[::2]

array([1, 3, 5, 7, 9])

### 6.2 Updating / Broadcasting

#### Update Usecases

##### 1 Update array

In [28]:
arr = np.arange(5)
arr[2:4] = 0
arr

array([0, 1, 0, 0, 4])

In [29]:
arr = np.array([0, 1, 2, 3, 4, 5])
arr[4:] = 10
arr

array([ 0,  1,  2,  3, 10, 10])

In [30]:
arr_1 = np.array([1, 2, 3, 4, 5])
arr_2 = np.array([8, 7, 6])

print(f"Array 1:", arr_1)
print(f"Array 2:", arr_2)

print(f"{arr_1[2:]} in Array 1 will be updated with {arr_2[::-1]} from Array 2")
arr_1[2:] = arr_2[::-1]

arr

Array 1: [1 2 3 4 5]
Array 2: [8 7 6]
[3 4 5] in Array 1 will be updated with [6 7 8] from Array 2


array([ 0,  1,  2,  3, 10, 10])

In [31]:
arr_1 = np.array([1, 2, 3, 4, 5])
arr_2 = np.array([8, 7, 6])

print(f"Array 1:", arr_1)
print(f"Array 2:", arr_2)

print(f"{arr_1[3:]} in Array 1 will be updated with {arr_2[::-2]} from Array 2")
arr_1[3:] = arr_2[::-2]

arr_1

Array 1: [1 2 3 4 5]
Array 2: [8 7 6]
[4 5] in Array 1 will be updated with [6 8] from Array 2


array([1, 2, 3, 6, 8])

In [32]:
try:
    arr_1 = np.array([1, 2, 3, 4, 5])
    arr_2 = np.array([0, 9, 8, 7, 6])

    print(f"Array 1:", arr_1)
    print(f"Array 2:", arr_2)

    print(f"{arr_1[3:]} in Array 1 will be updated with {arr_2[::-2]} from Array 2")

    arr_1[3:] = arr_2[::-2]

    arr_1

except ValueError as err:
    print("Error:", err)

Array 1: [1 2 3 4 5]
Array 2: [0 9 8 7 6]
[4 5] in Array 1 will be updated with [6 8 0] from Array 2
Error: could not broadcast input array from shape (3,) into shape (2,)


> **Note**:
> 1. Broadcasting requires 

## 7 Filtering

### 7.1 Index based filtering

### 7.2 Conditional filtering

In [33]:
x = np.array([-5, 9, 20, 25, -3, 5, 16, 10, -8])

x[(x >= -5) & (x <= 15)] *= -1

print(x)

[  5  -9  20  25   3  -5  16 -10  -8]


In [34]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
arr[::2] = range(10, 50, 10)
print(arr)

[10  2 20  4 30  6 40  8]


In [35]:
marks_arr = np.array([20, 35, 68, 82, 83, 70, 90])

# Complete the missing code
distinction = marks_arr[marks_arr >= 80]
first_div = marks_arr[(marks_arr >= 60) & (marks_arr < 80)]

distinction_count = len(distinction)
first_div_count = len(first_div)

ratio = distinction_count / first_div_count

round(ratio, 2)

1.5

## 8 Import

NumPy can read data from text or csv files. [Documentation][1]

[1]: https://numpy.org/doc/2.1/user/how-to-io.html

## 9 Project

Lets solve a small use case using NumPy

### 9.1 Problem Statement

#### Net Promoter Score (NPS)

1. Net Promoter Score (NPS) is a metric used to gauge customer loyalty and satisfaction. 
1. It measures the likelihood of customers recommending a company, product, or service to others. 
1. The score is derived from a single survey question asking respondents to rate their likelihood of recommending on a scale of 0 to 10.

Compute NPS for [this dataset][1] from Kaggle.

[1]: https://www.kaggle.com/datasets/charlottetu/npsbank

### 9.2 Solution

```
NPS Formula = (% of Promoters) - (% of Detractors)

Where

Promoters  := Score > 8  and Score <= 10
Detractors := Score >= 1 and Score < 7
```

### 9.3 Implementation

In [36]:
import os


input_file_path = os.path.join("data", "NPStimeseries.csv")
print("Importing CSV file from path:", input_file_path)

Importing CSV file from path: data\NPStimeseries.csv


In [37]:
# fname = Input file's relative path.
# delimiter = Columns are separated by coma.
# skiprows = Skip first row containing header text.
# usecols = Use only 6th column containing NPS information.
ratings = np.loadtxt(fname=input_file_path, delimiter=",", skiprows=1, usecols=6)
print(f"There are {ratings.size} ratings in CSV file.")

There are 5000 ratings in CSV file.


> **Note**:  
> `np.size` is same as total number of rows when the NumPy array is a Vector.

In [38]:
total_count = ratings.size
promoters = ratings[ratings > 8]
detractors = ratings[ratings < 7]

pro_per = promoters.size / total_count * 100
det_per = detractors.size / total_count * 100

nps = pro_per - det_per
print("NPS:", round(nps, 2))

NPS: 11.8
