# Introduction to Data Science - Lab 06 (Numpy)

### Table of Content


*  Installation and Importing the numpy
*  Numpy Arrays
* Array Attributes
* Indexing and Slicing
* Data Types
* Math Functions
* Linear Algebra
* Statistics & Probability
* Structured Arrays
* Special Functions
* Missing Values  
* Lab Tasks




#Numpy
Numpy is the core library for scientific computing in Python. It
provides a high-performance multidimensional array object, and tools for
working with these arrays. Supports mathematical, logical, and statistical operations.

To use Numpy, we first need to import the `numpy` package. By
convention, we import it using the alias `np`. Then, when we want to use
modules or functions in this library, we preface them with `np.`

**Installation**

In [None]:
pip install numpy



**Importing**

In [None]:
import numpy as np

## Numpy Arrays

A numpy array is a grid of values, all of the same type, and is indexed
by a tuple of nonnegative integers. The number of dimensions is the rank
of the array; the shape of an array is a tuple of integers giving the
size of the array along each dimension.
We can create a `numpy` array by passing a Python list to `np.array()`.

In [None]:
a = np.array([1, 2, 3])  # Create a rank 1 array
a

array([1, 2, 3])

This creates the array we can see on the right here:

![](http://jalammar.github.io/images/numpy/create-numpy-array-1.png)

In [None]:
print(type(a), a.shape, a[0], a[1], a[2])
a[0] = 5                 # Change an element of the array
print(a)

<class 'numpy.ndarray'> (3,) 1 2 3
[5 2 3]


To create a `numpy` array with more dimensions, we can pass nested
lists, like this:

![](http://jalammar.github.io/images/numpy/numpy-array-create-2d.png)

![](http://jalammar.github.io/images/numpy/numpy-3d-array.png)

In [None]:
b = np.array([[1,2],[3,4]])   # Create a rank 2 array
print(b)

[[1 2]
 [3 4]]


In [None]:
print(b.shape)

(2, 2)


There are often cases when we want numpy to initialize the values of the
array for us. numpy provides methods like `ones()`, `zeros()`, and
`random.random()` for these cases. We just pass them the number of
elements we want it to generate:

![](http://jalammar.github.io/images/numpy/create-numpy-array-ones-zeros-random.png)

We can also use these methods to produce multi-dimensional arrays, as
long as we pass them a tuple describing the dimensions of the matrix we
want to create:

![](http://jalammar.github.io/images/numpy/numpy-matrix-ones-zeros-random.png)

![](http://jalammar.github.io/images/numpy/numpy-3d-array-creation.png)

Sometimes, we need an array of a specific shape with “placeholder”
values that we plan to fill in with the result of a computation. The
`zeros` or `ones` functions are handy for this:

In [None]:
a = np.zeros((2,2))  # Create an array of all zeros
print(a)

[[0. 0.]
 [0. 0.]]


In [None]:
b = np.ones((1,2,3))   # Create an array of all ones
print(b)

[[[1. 1. 1.]
  [1. 1. 1.]]]


In [None]:
c = np.full((2,2,4), 7) # Create a constant array
print(c)

[[[7 7 7 7]
  [7 7 7 7]]

 [[7 7 7 7]
  [7 7 7 7]]]


In [None]:
d = np.eye(3)        # Create a 2x2 identity matrix
print(d)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [None]:
e = np.random.random((2,2)) # Create an array filled with random values
print(e)

[[0.42976554 0.86200914]
 [0.1015457  0.6176782 ]]


Numpy also has two useful functions for creating sequences of numbers:
`arange` and `linspace`.

The `arange` function accepts three arguments, which define the start
value, stop value of a half-open interval, and step size. (The default
step size, if not explicitly specified, is 1; the default start value,
if not explicitly specified, is 0.)

The `linspace` function is similar, but we can specify the number of
values instead of the step size, and it will create a sequence of evenly
spaced values.

In [None]:
f = np.arange(10,50,5)   # Create an array of values starting at 10 in increments of 5
print(f)

[10 15 20 25 30 35 40 45]


Note this ends on 45, not 50 (does not include the top end of the
interval).

In [None]:
g = np.linspace(0., 1., num=5)
print(g)

[0.   0.25 0.5  0.75 1.  ]


Sometimes, we may want to construct an array from existing arrays by “stacking” the existing arrays, either vertically or horizontally. We can use vstack() (or row_stack) and hstack() (or column_stack), respectively.

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.vstack((a,b))

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.hstack((a,b))

array([1, 2, 3, 4, 5, 6])

## Array Attributes

In [None]:
# Array attributes give details about arrays

arr = np.arange(10)
arr.shape      # Shape of array
arr.ndim       # Number of dimensions
arr.dtype      # Data type of elements
arr.size       # Total number of elements
arr.itemsize   # Memory size of each element in bytes

8

## Indexing and Slicing

We can index and slice numpy arrays in all the ways we can slice Python
lists:

![](http://jalammar.github.io/images/numpy/numpy-array-slice.png)

And you can index and slice numpy arrays in multiple dimensions. If
slicing an array with more than one dimension, you should specify a
slice for each dimension:

![](http://jalammar.github.io/images/numpy/numpy-matrix-indexing.png)

Slicing return values by reference

In [None]:
# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]
print(b)

[[2 3]
 [6 7]]


updating value of b will update value of a

In [None]:
print(a[0, 1])
b[0, 0] = 77    # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])

2
77


In [None]:
row_r1 = a[1, :]    # Rank 1 view of the second row of a
row_r2 = a[1:3, :]  # Rank 2 view of the second row of a
print(row_r1, row_r1.shape)
print(row_r2, row_r2.shape)

[5 6 7 8] (4,)
[[ 5  6  7  8]
 [ 9 10 11 12]] (2, 4)


Boolean array indexing: Boolean array indexing lets you pick out
arbitrary elements of an array. Frequently this type of indexing is used
to select the elements of an array that satisfy some condition. Here is
an example:

In [None]:
import numpy as np

a = np.array([[1,2], [3, 4], [5, 6]])

bool_idx = (a > 2)  # Find the elements of a that are bigger than 2;
                    # this returns a numpy array of Booleans of the same
                    # shape as a, where each slot of bool_idx tells
                    # whether that element of a is > 2.

print(bool_idx)

[[False False]
 [ True  True]
 [ True  True]]


## Data Types

In [None]:
# Working with data types (dtype)

arr = np.array([1,2,3])
arr.dtype        # Current type
arr.astype(float)  # Convert to float

array([1., 2., 3.])

### Math Functions
What makes working with `numpy` so powerful and convenient is that it
comes with many *vectorized* math functions for computation over
elements of an array. These functions are highly optimized and are
*very* fast - much, much faster than using an explicit `for` loop.

For example, let’s create a large array of random values and then sum it
both ways. We’ll use a `%%time` *cell magic* to time them.

In [None]:
a = np.random.random(100000000)
len(a)

100000000

Look at the “Wall Time” in the output - note how much faster the
vectorized version of the operation is! This type of fast computation is
a major enabler of machine learning, which requires a *lot* of
computation.

Whenever possible, we will try to use these vectorized operations.

Some mathematic functions are available both as operator overloads and
as functions in the numpy module.

For example, you can perform an elementwise sum on two arrays using
either the + operator or the `add()` function.

![](http://jalammar.github.io/images/numpy/numpy-arrays-adding-1.png)

![](http://jalammar.github.io/images/numpy/numpy-matrix-arithmetic.png)

In [None]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
print(x + y)
print(np.add(x, y))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]


And this works for other operations as well, not only addition:

![](http://jalammar.github.io/images/numpy/numpy-array-subtract-multiply-divide.png)

In [None]:
# Elementwise difference; both produce the array
print(x - y)
print(np.subtract(x, y))

[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]


In [None]:
# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(np.sqrt(x))

[[1.         1.41421356]
 [1.73205081 2.        ]]


We use the `dot()` function to compute inner
products of vectors, to multiply a vector by a matrix, and to multiply
matrices. `dot()` is available both as a function in the numpy module
and as an instance method of array objects:

![](http://jalammar.github.io/images/numpy/numpy-matrix-dot-product-1.png)

In [None]:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])

# Inner product of vectors; both produce 219
print(v.dot(w))
print(np.dot(v, w))

219
219


In [None]:
print(np.dot(x, y))

[[19 22]
 [43 50]]


Besides for the functions that overload operators, Numpy also provides
many useful functions for performing computations on arrays, such as
`min()`, `max()`, `sum()`, and others:

![](http://jalammar.github.io/images/numpy/numpy-matrix-aggregation-1.png)

In [None]:
x = np.array([[1, 2], [3, 4], [5, 6]])

print(np.max(x))
print(np.min(x))
print(np.sum(x))

6
1
21


Not only can we aggregate all the values in a matrix using these
functions, but we can also aggregate across the rows or columns by using
the `axis` parameter:

![](http://jalammar.github.io/images/numpy/numpy-matrix-aggregation-4.png)

In [None]:
x = np.array([[1, 2], [5, 3], [4, 6]])

print(np.max(x, axis=0))  # Compute max of each column; prints "[5 6]"
print(np.max(x, axis=1))  # Compute max of each row; prints "[2 5 6]"

[5 6]
[2 5 6]


You can find the full list of mathematical functions provided by numpy
in the
[documentation](http://docs.scipy.org/doc/numpy/reference/routines.math.html).

Apart from computing mathematical functions using arrays, we frequently
need to reshape or otherwise manipulate data in arrays. The simplest
example of this type of operation is transposing a matrix; to transpose
a matrix, simply use the T attribute of an array object.

![](http://jalammar.github.io/images/numpy/numpy-transpose.png)

In [None]:
x = np.array([[1, 2], [3, 4], [5, 6]])

print(x)
print("transpose\n", x.T)

[[1 2]
 [3 4]
 [5 6]]
transpose
 [[1 3 5]
 [2 4 6]]


## Linear Algebra

In [None]:
# Linear algebra operations

A = np.array([[1,2],[3,4]])
B = np.array([[5,6],[7,8]])

np.dot(A,B)       # Dot product
A @ B             # Matrix multiplication

np.linalg.det(A)  # Determinant
np.linalg.inv(A)  # Inverse
np.linalg.eig(A)  # Eigenvalues and eigenvectors
np.linalg.solve(A, np.array([1,2]))  # Solve Ax=b

array([0. , 0.5])

## Statistics & Probability

In [None]:
# Statistics and probability

arr = np.array([1,2,3,4,5])
np.mean(arr)
np.median(arr)
np.std(arr)
np.percentile(arr, 75)

# Random numbers
np.random.seed(42)
np.random.normal(0,1,10)   # Normal distribution
np.random.binomial(n=10, p=0.5, size=5)  # Binomial distribution

array([4, 4, 4, 5, 5])

## Structured Arrays

In [None]:
# Structured arrays with named fields

dt = np.dtype([('name','S10'), ('age','i4')])
arr = np.array([('Alice',25),('Bob',30)], dtype=dt)
arr['name']
arr['age']

array([25, 30], dtype=int32)

## Special Functions

In [None]:
# Special utility functions

arr = np.arange(5)

x, y = np.meshgrid(arr, arr)
np.tile(arr, 2)
np.repeat(arr, 3)
np.unique([1,2,2,3])
np.argsort([3,1,2])
np.diff([1,2,4,7])
np.gradient([1,2,4,7])
np.cumsum([1,2,3])
np.cumprod([1,2,3])

array([1, 2, 6])

## Missing Data

In [None]:
# Handling missing data

arr = np.array([1,2,np.nan,4])
np.nanmean(arr)
np.nanmax(arr)

import numpy.ma as ma
ma.masked_array(arr, mask=[0,0,1,0])

### Lab Task
#### Question # 01
1. Create a random 3x3 matrix M. Compute:
* transpose, determinant, inverse
* eigenvalues & eigenvectors
2. Create two matrices X (3x4) and Y (4x2). Perform matrix multiplication.
3. Solve the system of equations using NumPy:

$$
\begin{cases}
2x+y−z=8 \\
−3x−y+2z=−11 \\
−2x+y+2z=−3
\end{cases}
$$

    (Hint: Represent as 𝐴𝑋=𝐵)

#### Question # 02
1. Create a 10×10 matrix of random integers between 1–100.
* Extract the main diagonal & compute its mean.
* Replace the last column with the sum of each row.
2. Generate a 5×5 matrix where each element is defined as A[i,j] = i^2 + j^2.
* Without using loops, just NumPy broadcasting.
3. Create a 1D array of size 20 with random integers.
* Replace all values greater than the mean with -1.
* Extract the indices of the top 5 largest elements.

#### Question # 03
##### Stock Trading Decision Using Linear Regression  

You are an investor deciding whether to buy a stock or keep cash. You will use linear regression (implemented only with NumPy) to predict the next day’s price and guide your decision.  

##### **1. Generate Stock Prices (Synthetic Data)**  

We simulate prices with a **true trend** (linear growth) plus random noise:  

$$y_t = \text{intercept} + \text{slope} \cdot t + \epsilon_t$$  

where:  
- slope = 0.2 → true upward trend per day  
- intercept = 50.0 → starting baseline  
- $$\epsilon_t \sim \mathcal{N}(0, 1.5)$$ → random noise  

Formula in NumPy:  

$$\text{prices} = \text{intercept} + \text{slope} \cdot \text{days} + \text{noise}$$  

Represent days as:  

$$X = [1, 2, 3, \dots, 100]$$  

Represent prices as:  

$$y = [y_1, y_2, \dots, y_{100}]$$  

##### **2. Fit Linear Regression**  

Model:  

$$\hat{y} = mX + c$$  

where:
* m = slope
* c = intercept

Formulas:  

$$m = \frac{\sum (X - \bar{X})(y - \bar{y})}{\sum (X - \bar{X})^2}$$  

$$c = \bar{y} - m \cdot \bar{X}$$  


##### **3. Predict Next Day Price**  

$$\hat{y}_{101} = m \cdot 101 + c$$  


##### **4. Decision Rule (Buy or Cash)**  

- If $$\hat{y}_{101} > y_{100}$$ → **Buy the stock**  
- Else → **Keep cash**  

##### **5. Model Evaluation **  

$$MSE = \frac{1}{N} \sum_{i=1}^N (y_i - \hat{y}_i)^2$$  


##### **6. Trading Decision Outcome**  

Assume you start with **\$1000**.  

- If decision = Buy:  
  $$\text{Final Value} = 1000 \cdot \frac{y_{101}}{y_{100}}$$  

- If decision = Cash:  
  $$\text{Final Value} = 1000$$  



In [1]:
x = [1, 2, 3]
y = x
x.append(4)
y = y + [5]
print(x, y)


[1, 2, 3, 4] [1, 2, 3, 4, 5]


In [2]:
for i in range(-1, -5, -1):
    print(i * " * ")








In [3]:
a, b, c = 5, 10, 0
if a < b < c or c == 0 and not a == b:
    print("True branch")
else:
    print("False branch")


True branch


In [4]:
s = "Python"
print(s[::-1][::-1][2:5])


tho


In [5]:
def tricky(x, lst=[]):
    lst.append(x)
    return lst

print(tricky(1))
print(tricky(2))
print(tricky(3, []))
print(tricky(4))


[1]
[1, 2]
[3]
[1, 2, 4]
