In [1]:
import numpy as np

In [14]:
# Creation of 2 matrices 
mat_1 = np.array([[4,3], [5,6]])
mat_2 = np.array([[5,6], [7,8]])
print("Matrix 1  \n", mat_1, "\n")
print("Matrix 2 \n", mat_2, "\n")

Matrix 1  
 [[4 3]
 [5 6]] 

Matrix 2 
 [[5 6]
 [7 8]] 



In [21]:
# Addition and subtraction operation
print("Addition: \n", np.add(mat_2, mat_1), "\n")
print("Subtraction: \n", np.subtract(mat_2, mat_1))

Addition: 
 [[ 9  9]
 [12 14]] 

Subtraction: 
 [[1 3]
 [2 2]]


In [20]:
# shape and size of array
print("Shape: ")
print(mat_1.shape)
print("")
print("Size: ")
print(mat_1.size)

Shape: 
(2, 2)

Size: 
4


### Understanding the difference between dense and sparse matrix 
The main difference between dense and sparse matrices lies in the way they store and represent data.

A dense matrix is one where most of the elements are non-zero. In other words, it contains a significant number of non-zero values. Dense matrices are typically represented as 2D arrays, where each element of the array corresponds to a value in the matrix. These matrices are memory-intensive since they store all the elements, regardless of whether they are zero or non-zero. Operations on dense matrices are generally straightforward and efficient, as the data is contiguous in memory.

On the other hand, a sparse matrix is one where the majority of elements are zero. In other words, it contains very few non-zero values. Sparse matrices are often encountered in real-world scenarios where data is inherently sparse, such as text data or social networks. To efficiently represent sparse matrices, various compression techniques are used. Instead of storing all the elements, sparse matrices store only the non-zero values along with their corresponding row and column indices. This representation reduces memory usage significantly, but it may lead to slower operations due to the need for additional computations to handle the compressed format.

The choice between using a dense or sparse matrix representation depends on the specific characteristics of the data and the operations that need to be performed. Dense matrices are more suitable when most of the elements are non-zero and memory is not a major concern. Sparse matrices, on the other hand, are preferable when the data is sparse, as they offer memory efficiency and can provide computational advantages for certain algorithms tailored to exploit the sparsity.

In [27]:
import numpy as np
import scipy.sparse
dense_matrix = [[0,0], [0,17], [16,0]]
print(dense_matrix)
sparse_matrix = scipy.sparse.csr_matrix(dense_matrix)
print(sparse_matrix)

[[0, 0], [0, 17], [16, 0]]
  (1, 1)	17
  (2, 0)	16


In [33]:
# transpose matrix 
print("Matrix: ")
print(mat_1)
print("")
print("Transpose:")
print(mat_1.T)

Matrix: 
[[4 3]
 [5 6]]

Transpose:
[[4 5]
 [3 6]]


### What is Normalisation?
Normalization, in the context of data processing and analysis, refers to the process of transforming data to a common scale or range. The goal of normalization is to bring the values of different variables or features onto a similar scale, enabling fair comparisons and eliminating biases that may arise due to varying magnitudes or units.

Normalization is particularly important when dealing with datasets that contain features with significantly different scales. For example, consider a dataset that includes two features: "age" ranging from 0 to 100 and "income" ranging from 0 to 1,000,000. If these features are used in a machine learning model without normalization, the "income" feature may dominate the learning process due to its larger values, leading to biased results.

There are various techniques for normalization, but two common methods are:

**Min-Max Scaling (Normalization):** This technique rescales the data to a specified range, typically between 0 and 1. The formula for min-max scaling is:

    X_normalized = (X - X_min) / (X_max - X_min)

Here, X represents the original value, X_min is the minimum value of the feature, and X_max is the maximum value of the feature. Min-max scaling preserves the relative relationships between the data points but compresses the range of values.

**Z-Score Standardization:** This technique transforms the data to have a mean of 0 and a standard deviation of 1. It is also known as standardization. The formula for z-score standardization is:

makefile

    X_standardized = (X - X_mean) / X_std
    
Here, X represents the original value, X_mean is the mean of the feature, and X_std is the standard deviation of the feature. Z-score standardization centers the data around the mean and scales it by the standard deviation.

The choice of normalization technique depends on the specific requirements of the data and the analysis or model being used. Normalization is commonly applied as a preprocessing step before feeding the data into machine learning algorithms to improve their performance and ensure fair comparisons between features.


In [37]:
# getting diagnol of the matrix
print("diagonal of matrix: \n",mat_1.diagonal())
print("")
# sum of the diagnols
print("sum of diagnols: ", mat_1.diagonal().sum())

diagonal of matrix: 
 [4 6]

sum of diagnols:  10


### Imputing Techniques

**Imputing techniques** in the context of data analysis and machine learning, refer to methods used to fill in missing or incomplete data values. Missing data can occur for various reasons, such as data collection errors, sensor failures, or incomplete survey responses. Imputing techniques aim to estimate or substitute missing values to maintain the integrity and usefulness of the data.

**Mean/Median/Mode Imputation:** This technique involves replacing missing values with the mean (for numerical data), median (for data with outliers or skewed distributions), or mode (for categorical data) of the available values for that feature.

In [39]:
print("Mean: ", np.mean(mat_1))
print("Median: ", np.median(mat_1))
print("Standard Deviation: ", np.std(mat_1))

Mean:  4.5
Median:  4.5
Standard Deviation:  1.118033988749895


In [42]:
print("Determinant: ", np.linalg.det(mat_1))
print("Matrix Multiplication: \n", np.matmul(mat_1, mat_2))

Determinant:  9.000000000000002
Matrix Multiplication: 
 [[41 48]
 [67 78]]
