#Basics:

### Lists :
- Data structure used to store collections of data, including numeric, categorical, or mixed types
- Storing small datasets or feature names

#### Lists for Multi-dimension data :
- Nested lists can represent multi-dimensional data, where each level corresponds to a dimension
- Accessing and modifying elements can be cumbersome

In [1]:
# Storing a row from a dataset:
row = [1,2,3,4,5,"One","Two","Three"] # Supports non-homogenous data

#Storing a dataset, dataset can have various type of non homogenous data types
dataset = [
    [1,2,3,4,5,6],
    ["One","Two","Three","Four","Five"],
    [1,2,3,4,"Five","Six","Seven"],
    [1.2,2.1,3.4,4.3,4.5,5.4,5.6,6.5]
]

# Accessing the stored values.
# The values stored in the lists can be accessed using the co-ordinate system similar to matrices.
print(row[3]) # Value: 4
print(dataset[1][3]) # Value: Four

4
Four


In [26]:
# 3D represnetation of data using a list
list_data = [
    [[1, 2], [3, 4], [5, 6]],
    [[7, 8], [9, 10], [11, 12]]
]

# Accessing data from the 3D list
print(list_data[0][2][1]) # Value: 6

#Modifying elements
list_data[0][2][1] = 10
print(list_data[0][2][1]) # Value: 10

6
10


In [27]:
# Calculating the dimensions of the defined array

samples = len(list_data)                  # Number of samples (outermost list)
features = len(list_data[0])              # Number of features (next level)
channels = len(list_data[0][0])           # Number of channels (innermost list)

print(f"Dimensions: {samples}x{features}x{channels}")

Dimensions: 2x3x2


### Pandas Dataframe :
- DataFrame is a 2D, labeled data structure similar to a spreadsheet, offering labeled axes (rows and columns).
- Loading, cleaning, and transforming datasets
- Handling mixed types (numeric + categorical)

#### Pandas Dataframe for Multi-dimension data :
- Pandas primarily handles 2D tabular data
- For higher dimensions, use MultiIndex or pivot tables to represent multi-dimensional relationships.

In [28]:
import pandas as pd

# Data is wrapped using the Dataframe method from pandas, dataframe wrapper is used to wrap the dictionary.
# Since the dictonary supports multiple data type, we can expect a key:value pair with non-homogenous data.
pd_data = pd.DataFrame({
    "sepal_length": [5.1, 7.0, 6.3],
    "sepal_width": [3.5, 3.2, 3.3],
    "petal_length": [1.4, 4.7, 6.0],
    "petal_width": [0.2, 1.4, 2.5],
    "class": ["setosa", "versicolor", "virginica"]
})

#Selecting data for features
#Here we use the values attribute to access the data within the key value pairs.
X = pd_data[["sepal_length", "sepal_width", "petal_length", "petal_width"]].values #Here [[]] are used since data is already is stored in a list
y = pd_data["class"].values

print(X)
print(y)

[[5.1 3.5 1.4 0.2]
 [7.  3.2 4.7 1.4]
 [6.3 3.3 6.  2.5]]
['setosa' 'versicolor' 'virginica']


In [29]:
import pandas as pd

# Here a multi-dimensional array is defined using different data values
# A dataframe can handle 2D arrays at best
pd_2D_data = [
    ["sample1", "sample1", "sample2", "sample2"],
    ["feature1", "feature2", "feature1", "feature2"]
]

# A wrapper method is used to convert the 2D array into a multi-index array
# A multi index array is a precusor of a dataframe
index = pd.MultiIndex.from_arrays(pd_2D_data, names=["Sample", "Feature"])

# Here the multi-index array is converted into a data frame
# A dataframe has defined column and index values
pd_2D_data = pd.DataFrame([1, 2, 3, 4], index=index, columns=["Value"])
print(pd_2D_data)
print('--------------------------')
# Accessing data using the "loc" attribute
print(pd_2D_data.loc["sample1"])  # Accessing data for sample1

                  Value
Sample  Feature        
sample1 feature1      1
        feature2      2
sample2 feature1      3
        feature2      4
--------------------------
          Value
Feature        
feature1      1
feature2      2


In [36]:
# Checking dimensions
print("Shape:", pd_2D_data.shape)

Shape: (4, 1)


### Numpy Array :
-  NumPy arrays are multi-dimensional, homogeneously typed arrays optimized for numerical computations
- Efficient storage and manipulation of numerical data
- Matrix operations and numerical computations
- Supports vectorized operations

#### Numpy Array for Multi-dimension data :
- NumPy arrays natively support multi-dimensional data (e.g., 3D, 4D arrays)
- Efficient for numerical computations, slicing, and broadcasting.

In [30]:
import numpy as np

# Here a 2D list is wrapped using the array method from numpy
# A numpy array expects homogenous data i.e numerical values
np_data = np.array([
    [5.1, 3.5, 1.4, 0.2],
    [7.0, 3.2, 4.7, 1.4],
    [6.3, 3.3, 6.0, 2.5]
])

#Accessing elements
# Here the data access format is represented as : A[row, column]
print(np_data[0, 1]) # Value: 3.5
print(np_data[:, 0]) # Print values from the first column
print(np_data[0, :]) # Print values from the first row

3.5
[5.1 7.  6.3]
[5.1 3.5 1.4 0.2]


In [31]:
# 3D representation of data using numpy array

np_3D_data = np.array([
    [[1,2], [2,3], [3,4]],
    [[2,1], [3,2], [4,3]],
    [[5,4], [5,6], [7,6]]
])

#Accessing data using a similar co-ordinate system
print(np_3D_data[1, 1, 1]) #Value: 2
print(np_3D_data[:, :, 0]) #Prints values from the first column of each list

2
[[1 2 3]
 [2 3 4]
 [5 5 7]]


In [32]:
# Calculating the Shape and dimensions of np-array

print("Shape:", np_3D_data.shape)
print("Number of dimensions:", np_3D_data.ndim)

Shape: (3, 3, 2)
Number of dimensions: 3


### Sparse Matrix :
- Used to store large datasets with many zero values efficiently, common in text data
- Saves memory by only storing non-zero elements

#### Sparse Matrix for Multi-dimension data :
- Sparse matrices generally represent 2D data.
- Multi-dimensional data can be flattened or converted into multiple sparse matrices.

In [33]:
from scipy.sparse import csr_matrix

# Creating a sparse matrix
# Here we first define the matrix using the numpy array.
sm_data = np.array([
    [1, 0, 0],
    [0, 2, 0],
    [0, 0, 3]
])
# Here the matrix is coverted into a sparse using the csr_matrix method.
sparse_matrix = csr_matrix(sm_data)

# Representing the data
print(sparse_matrix)  # Prints sparse representation
print('--------------')
print(sparse_matrix.data)
print('--------------')
print(sparse_matrix.toarray())  # Converts to dense array

  (0, 0)	1
  (1, 1)	2
  (2, 2)	3
--------------
[1 2 3]
--------------
[[1 0 0]
 [0 2 0]
 [0 0 3]]


In [34]:
from scipy.sparse import csr_matrix

# Defining a 3D dataset (2 samples, 3 features, 2 channels)
sm_3D_data = np.array([
    [[1, 0], [0, 4], [5, 0]],
    [[0, 0], [9, 0], [0, 12]]
])

# Flatten the 3D data into 2D (samples x features*channels)
flattened_data = sm_3D_data.reshape(sm_3D_data.shape[0], -1)
print(flattened_data)
print("---------------------------------")

sparse_3D_matrix = csr_matrix(flattened_data)

print(sparse_3D_matrix)  # Sparse representation


[[ 1  0  0  4  5  0]
 [ 0  0  9  0  0 12]]
---------------------------------
  (0, 0)	1
  (0, 3)	4
  (0, 4)	5
  (1, 2)	9
  (1, 5)	12


In [37]:
# Checking dimensions
print("Shape:", sparse_3D_matrix.shape)

Shape: (2, 6)


### Tensor :
- Libraries like PyTorch or TensorFlow use tensors for deep learning tasks

#### Tensor for Multi-dimension data :
- Tensors are explicitly designed for handling multi-dimensional data efficiently

In [14]:
import torch

# Here a multi-dimensional array is defined as a tensor by wrapping it using the tensor method from torch
# The multi-array has homogenous data
tensor_data = torch.tensor([
    [5.1, 3.5, 1.4, 0.2],
    [7.0, 3.2, 4.7, 1.4],
    [6.3, 3.3, 6.0, 2.5]
])

print(tensor_data)

tensor([[5.1000, 3.5000, 1.4000, 0.2000],
        [7.0000, 3.2000, 4.7000, 1.4000],
        [6.3000, 3.3000, 6.0000, 2.5000]])


In [35]:
import torch

# Creating a 4D tensor (Batch size, Channels, Height, Width)
tensor_4D_data = torch.tensor([
    [
        [[1, 2], [3, 4]],
        [[5, 6], [7, 8]]
    ],
    [
        [[9, 10], [11, 12]],
        [[13, 14], [15, 16]]
    ]
])

# Accessing data
print(tensor_4D_data[0, 1, 1, 1])  # Output: 8 (Batch 0, Channel 1, Height 1, Width 1)

# Operations on the tensor
tensor_4D_data += 1  # Increment all elements by 1
print(tensor_4D_data)


tensor(8)
tensor([[[[ 2,  3],
          [ 4,  5]],

         [[ 6,  7],
          [ 8,  9]]],


        [[[10, 11],
          [12, 13]],

         [[14, 15],
          [16, 17]]]])


In [38]:
# Checking dimensions
print("Shape:", tensor_4D_data.shape)
print("Number of dimensions:", tensor_4D_data.ndimension())

Shape: torch.Size([2, 2, 2, 2])
Number of dimensions: 4
