## Numpy introduction

In [11]:
pip install numpy

Note: you may need to restart the kernel to use updated packages.


In [12]:
import numpy as np

# what is an aray?

In NumPy, an array is a grid of values (of the same type) indexed by non-negative integers. Here's a simple definition and example:

# Definition: 

A NumPy array (ndarray) is a multi-dimensional, homogeneous container for numerical data.

# Short Example: 

import numpy as np

# Create a simple 1D array 

arr = np.array([1, 2, 3, 4]) 

print(arr)  # Output: [1 2 3 4]

In [13]:
arr = np.array([1, 2, 3, 4, 5])
print(arr)

[1 2 3 4 5]


# Key Features: 

 - Fixed-size, efficient storage.

 - Supports mathematical operations on entire arrays.

 - Can be 1D (vector), 2D (matrix), or n-dimensional. 

 Would you like a specific type of array (e.g., zeros, ones, or a range)?

In [14]:
import numpy as np
np.arange(6)

array([0, 1, 2, 3, 4, 5])

In [15]:
# One-dimensional array
one_d_array = np.array([10, 20, 30, 40, 50])
print("1D array:", one_d_array)

# Two-dimensional array
two_d_array = np.array([[1, 2, 3], [4, 5, 6]])
print("2D array:\n", two_d_array)

1D array: [10 20 30 40 50]
2D array:
 [[1 2 3]
 [4 5 6]]


why do we have 1D, 2D, and 3D arrays and where we need them in data science?

In [None]:
In data science and programming, 1D, 2D, and 3D arrays (or higher-dimensional arrays) are used to represent and manipulate data efficiently.

 Here's why they exist and where they are applied: 
 

# 1. 1D Arrays (Vectors) 

- What? A single row or column of elements (e.g., [1, 2, 3]).

- Why? Simple, memory-efficient storage for linear data. 

- Data Science Use Cases: 

- Storing feature values (e.g., a single column in a dataset like [age_1, age_2, age_3]).

- Labels/target variables in supervised learning (e.g., y = [0, 1, 0] for binary classification).

- Time-series data (e.g., stock prices over time).

In [None]:
# example 

import numpy as np
arr_1d = np.array([1, 2, 3])  # 1D array

 2. 2D Arrays (Matrices)
   
- What? A grid of rows and columns (e.g., a table).

- Why? Represents structured data with relationships between rows/columns.

- Data Science Use Cases:

- Tabular datasets (e.g., CSV/Excel files where rows = samples, columns = features).

- Image data (grayscale images as height × width matrices).

- Weight matrices in neural networks.

In [None]:
# example 

arr_2d = np.array([[1, 2, 3], [4, 5, 6]])  # 2D array (2x3)

3. 3D Arrays (Tensors) 

- What? A "cube" of data (multiple 2D arrays stacked).

- Why? Captures complex structures like color images or sequential data.

- Data Science Use Cases:

- Color Images: 3D shape = (height, width, channels) (e.g., RGB = 3 channels).

- Time-series + Features: 3D shape = (samples, timesteps, features) (used in RNNs).

- Video Data: 3D shape = (frames, height, width).

In [None]:
arr_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])  # 3D array (2x2x2)

Higher-Dimensional Arrays (4D+) 

- Use Cases:

- Batch Processing: 4D = (batch_size, height, width, channels) (common in deep learning).

- Medical Imaging: 3D MRI scans + time = 4D.

Key Differences 

Dimension	    Structure       	Example Use Case 
---------------
1D	Vector     	Feature columns,     time-series 
-----------------
2D	Matrix  	Tabular data,        grayscale images 
--------------------
3D	Tensor	    Color images,         RNN input 
------------------------
4D+	             Batch of tensors	   CNN input (batches of images)


Why Does This Matter in Data Science? 

1. Efficiency: Libraries like NumPy/PyTorch optimize operations based on array dimensions.

2.Model Inputs: Machine learning models expect specific shapes (e.g., CNNs need 3D/4D for images).

3.Data Representation: Higher dimensions capture real-world complexity (e.g., videos = 3D + time).

Example Workflow: 

- 1D: Preprocess a column of data → [1.2, 3.4, 5.6].

- 2D: Feed a CSV → [[1, 2, 3], [4, 5, 6]] to a model.

- 3D: Train a CNN on RGB images → (256, 256, 3) tensors. 
-----------------------------------------------------------------


TL;DR 

1D: Single-axis data (lists/vectors).

2D: Tables, matrices, grayscale images.

3D+: Color images, videos, batches of data.

Understanding dimensions helps structure data for algorithms (e.g., Scikit-learn, TensorFlow). Would you like a practical example in a specific library (e.g., NumPy or Pandas)?

In [None]:
identity = np.eye(5)   # creating an identity matrix
identity

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [18]:
identity .dtype

dtype('float64')

array atributes

In [19]:
# Display attributes of the array 'arr'
print("arr.ndim (number of dimensions):", arr.ndim)
print("arr.shape (shape of array):", arr.shape)
print("arr.size (total number of elements):", arr.size)
print("arr.dtype (data type of elements):", arr.dtype)
print("arr.itemsize (size of each element in bytes):", arr.itemsize)
print("arr.nbytes (total bytes consumed):", arr.nbytes)

arr.ndim (number of dimensions): 1
arr.shape (shape of array): (5,)
arr.size (total number of elements): 5
arr.dtype (data type of elements): int64
arr.itemsize (size of each element in bytes): 8
arr.nbytes (total bytes consumed): 40


Basic operation 

In [20]:
# Addition
add_result = arr + one_d_array
print("Addition:", add_result)

# Subtraction
sub_result = one_d_array - arr
print("Subtraction:", sub_result)

# Element-wise multiplication
mul_result = arr * one_d_array
print("Element-wise Multiplication:", mul_result)

# Element-wise division
div_result = one_d_array / arr
print("Element-wise Division:", div_result)

Addition: [11 22 33 44 55]
Subtraction: [ 9 18 27 36 45]
Element-wise Multiplication: [ 10  40  90 160 250]
Element-wise Division: [10. 10. 10. 10. 10.]


# Numpy (part-2)

In [21]:
import numpy as np

In [22]:
empty = np.empty((2, 3)) 
print(empty)

[[4.9e-324 9.9e-324 1.5e-323]
 [2.0e-323 2.5e-323 3.0e-323]]


In [23]:
x = np.arange(6) 
x

array([0, 1, 2, 3, 4, 5])

In [27]:
# array of even numbers from 0 to 22 

even = np.arange(0, 23, 2) 
even

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22])

In [28]:
# array of odd numbers from 0 to 23 
odd = np.arange(1, 23,2) 
odd

array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21])

In [29]:
# specific difference between numbers 
diff = np.arange(0, 23, 3) 
diff

array([ 0,  3,  6,  9, 12, 15, 18, 21])

In [30]:
lin = np.linspace(0, 10, num=10) 
lin

array([ 0.        ,  1.11111111,  2.22222222,  3.33333333,  4.44444444,
        5.55555556,  6.66666667,  7.77777778,  8.88888889, 10.        ])

In [31]:
# another example of lin 
lin = np.linspace(0, 10, num=3) 
lin

array([ 0.,  5., 10.])

In [32]:
# define the dtype of array 

x = np.ones(2, dtype=np.int64) 
x

array([1, 1])

what is the difference between int32, 64,16......... 

 what is meaning of this ? why do we use and not the other one

The terms int32, int64, int16, etc., refer to different sizes of integer data types in programming.
 The number (16, 32, 64) indicates how many bits the integer occupies in memory.

In [None]:
import pandas as pd

# Key differences between int8, int16, int32, and int64 in NumPy


data = {
    "Type": ["int8", "int16", "int32", "int64"],
    "Bits": [8, 16, 32, 64],
    "Bytes": [1, 2, 4, 8],
    "Range (Signed)": ["-128 to 127", "-32,768 to 32,767", "-2,147,483,648 to 2,147,483,647", "-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807"],
    "Range (Unsigned)": ["0 to 255", "0 to 65,535", "0 to 4,294,967,295", "0 to 18,446,744,073,709,551,615"],
    "Common Usage": [
        "Small numbers, saving memory",
        "Medium-range integers (e.g., audio samples)",
        "Default in many languages, general use",
        "Large numbers (e.g., timestamps, big calculations)"
    ]
}

df_types = pd.DataFrame(data)
print(df_types)

# Why Use Different Integer Sizes?
1.Memory Efficiency

Smaller types (e.g., int8, int16) save memory when you don’t need large numbers.

Useful in embedded systems or large arrays (e.g., int8 for pixel values in images).

2.Range Requirements

int32 is the default in most languages (good for most calculations).

int64 is needed for very large numbers (e.g., file sizes, timestamps).

3.Performance Trade-offs

Smaller integers may be faster on some CPUs, but modern processors optimize for 32/64-bit.

Using unnecessarily large types wastes memory and bandwidth.

In [34]:
import numpy as np

smallNum = np.int8(100)      # 1 byte (-128 to 127)
mediumNum = np.int32(50000)  # 4 bytes (~-2.1 billion to 2.1 billion)
bigNum = np.int64(1e18)      # 8 bytes (~-9.2 quintillion to 9.2 quintillion)

## when to use which 

Use int8/int16: For small numbers (e.g., counters, sensor readings).

Use int32: Default choice for most applications.

Use int64: When dealing with very large numbers (e.g., financial calculations, nanoseconds in time).

## Modern Systems Default to 32/64-bit
Most languages (Python, JavaScript) use 64-bit integers internally.

In C/C++, int is usually 32-bit, while long can be 32 or 64-bit depending on the system.

In [35]:
# Sort the array 'arr' in ascending order
sorted_arr = np.sort(arr)
print("Sorted arr:", sorted_arr)

Sorted arr: [1 2 3 4 5]


In [37]:
a = np.array([1,2, 3, 4, 5]) 
b = np.array([6, 7, 8, 9, 10]) 


In [40]:
# concatenate these arrays 
c = np.concatenate((a, b)) 
c

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [41]:
# Concatenate two 2D arrays along rows (axis=0)
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
concatenated_2d = np.concatenate((arr1, arr2), axis=0)
print(concatenated_2d)

[[1 2]
 [3 4]
 [5 6]
 [7 8]]


In [42]:
# Concatenate arr1 and arr2 along columns (axis=1)
concatenated_2d_cols = np.concatenate((arr1, arr2), axis=1)
print(concatenated_2d_cols)

[[1 2 5 6]
 [3 4 7 8]]


## three Dimension array 

In [43]:
# Create a three-dimensional array by stacking 'a' and 'b'
three_d_array = np.stack([a, b]).reshape(1, 2, 5)
print(three_d_array)
print("Shape:", three_d_array.shape)

[[[ 1  2  3  4  5]
  [ 6  7  8  9 10]]]
Shape: (1, 2, 5)


In [44]:
import numpy as np

# Create a 3D array with dimensions 2x3x4
array_3d = np.array([
    [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]],
    [[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]
])

print(array_3d)
print("Shape:", array_3d.shape)  # Output: (2, 3, 4)

[[[ 1  2  3  4]
  [ 5  6  7  8]
  [ 9 10 11 12]]

 [[13 14 15 16]
  [17 18 19 20]
  [21 22 23 24]]]
Shape: (2, 3, 4)


In [46]:
arr.size

5

In [48]:
array_3d.dtype

dtype('int64')

In [50]:
array_3d.shape

(2, 3, 4)

# indexing and slicing 

In [51]:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8]) 
a

array([1, 2, 3, 4, 5, 6, 7, 8])

In [52]:
# Indexing and slicing examples with array 'a'

# Access the first element
first_element = a[0]
print("First element:", first_element)

# Access the last element
last_element = a[-1]
print("Last element:", last_element)

# Slice: elements from index 2 to 5 (exclusive)
slice_2_to_5 = a[2:5]
print("Elements from index 2 to 4:", slice_2_to_5)

# Slice: every other element
every_other = a[::2]
print("Every other element:", every_other)

# Reverse the array
reversed_a = a[::-1]
print("Reversed array:", reversed_a)

First element: 1
Last element: 8
Elements from index 2 to 4: [3 4 5]
Every other element: [1 3 5 7]
Reversed array: [8 7 6 5 4 3 2 1]
