# Data Structure & Numpy Array

![image.png](attachment:8e9e95f9-6a5d-4aa8-ba66-b62e882ee31b.png)

#### HEMANT THAPA

In [1]:
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statistics as st
import sympy as sp
import json
import requests
from io import StringIO

## 1. Introduction

A matrix is a rectangular array of numbers, symbols, or expressions arranged in rows and columns. It is commonly used in various fields, including mathematics, physics, computer science, and data science. Matrices are essential for representing and solving systems of linear equations, transformations, and various mathematical operations.

## Matrix Notation

A matrix is usually represented using a capital letter, such as A. It consists of rows and columns, and the dimensions of a matrix are often denoted as (m x n), where m represents the number of rows, and n represents the number of columns.

For example, a 2x3 matrix A is defined as:

$$
A = \begin{bmatrix}
a_{11} & a_{12} & a_{13} \\
a_{21} & a_{22} & a_{23}
\end{bmatrix}
$$

$a_{ij}$ represents the element in the $i$-th row and $j$-th column.\]
 An example of a simple matrix equation is:

$$
A \cdot X = B
$$

Where:
- \(A\) is a matrix.
- \(X\) is a matrix representing unknown variables.
- \(B\) is another matrix.

Matrix equations like this can be solved using various techniques, including matrix multiplication and linear algebra operations.

Matrices are fundamental in solving linear systems, data transformations, and many other mathematical and scientific applications.


## 2. Rows & Columns

$$
\begin{matrix}
\text{Student} & \text{Notebooks} & \text{Pens} \\
\text{Alice} & 5 & 3 \\
\text{Bob} & 4 & 2 \\
\text{Carol} & 6 & 1 \\
\end{matrix}
$$


$$
\begin{matrix}
 & \text{Column 1} & \text{Column 2} & \text{Column 3} \\
\text{Row 1} & \text{Student} & \text{Notebooks} & \text{Pens} \\
\text{Row 2} & \text{Alice} & 5 & 3 \\
\text{Row 3} & \text{Bob} & 4 & 2 \\
\text{Row 4} & \text{Carol} & 6 & 1 \\
\end{matrix}
$$


## 3. Differences Between Lists and Arrays 
#### 1. Data Type

- **Lists**: Lists in Python can store elements of different data types. A single list can contain integers, strings, floats, and other data types in any combination.

- **Arrays**: Arrays are typically used to store elements of the same data type. In Python, libraries like NumPy are commonly used to create arrays, and they are usually homogeneous, containing elements of the same data type, such as integers or floats.

#### 2. Performance

- **Lists**: Lists are flexible but not optimized for mathematical operations. When performing mathematical operations on list elements, you may need to use loops or list comprehensions, which can be slower for large datasets.

- **Arrays**: Arrays, especially NumPy arrays, are highly optimized for numerical operations. They allow for vectorized operations, which means you can perform element-wise operations on entire arrays without the need for explicit loops. This makes arrays more efficient for numerical and scientific computing.

#### 3. Size and Memory Efficiency

- **Lists**: Lists may be less memory-efficient than arrays because they store additional information, such as the data type, for each element. Lists can also consume more memory.

- **Arrays**: Arrays, especially NumPy arrays, are more memory-efficient as they store elements in a compact way. They also allow for better memory management.

#### 4. Functionality

- **Lists**: Lists are part of the Python standard library, offering a variety of built-in methods for adding, removing, and manipulating elements.

- **Arrays**: Arrays, particularly NumPy arrays, provide a wide range of mathematical and numerical functions for operations like linear algebra, statistical analysis, and more. NumPy is widely used in scientific and data analysis applications.


## 3. Creating Array from data structures

In [2]:
#dictonary of student records
#outter dictonary
student_records_dict = {
    #inner dictonary
    'Alice': {
        'Notebooks': 5,
        'Pens': 3
    },
    'Bob': {
        'Notebooks': 4,
        'Pens': 2
    },
    'Carol': {
        'Notebooks': 6,
        'Pens': 1
    }
}

In [3]:
#printing dictonary
print(student_records_dict)

{'Alice': {'Notebooks': 5, 'Pens': 3}, 'Bob': {'Notebooks': 4, 'Pens': 2}, 'Carol': {'Notebooks': 6, 'Pens': 1}}


In [4]:
#printing the information for 'Alice' 
print(student_records_dict['Alice'])
#printing the information for 'Bob' 
print(student_records_dict['Bob'])
#printing the information for 'Carol' 
print(student_records_dict['Carol'])

{'Notebooks': 5, 'Pens': 3}
{'Notebooks': 4, 'Pens': 2}
{'Notebooks': 6, 'Pens': 1}


In [5]:
#checking datatypes
type(student_records_dict)

dict

In [6]:
#external dictionary of ages
ages_dict = {
    'Alice': 20,
    'Bob': 21,
    'Carol': 19
}

The '.items()' method is a built-in method for dictionaries. It is used to retrieve the key-value pairs of a dictionary as a sequence of tuples.

Each tuple in the sequence represents a key-value pair, with the key as the first element and the value as the second element.

In [7]:
#loop through the items in 'ages_dict' dictionary.
for i, j in ages_dict.items():
    print(i,j)

Alice 20
Bob 21
Carol 19


In [8]:
#loop through the items in 'ages_dict' dictionary.
for i, j in ages_dict.items():
    #'i' represents the student's name, and 'j' represents the age.
    #accessing the 'student_records_dict' using the student's name 'i' as the key.
    student_records_dict[i]['Age'] = j
     #setting the 'Age' attribute in the student's record to the value 'j' (the age).

In [9]:
#updated dictonary
student_records_dict

{'Alice': {'Notebooks': 5, 'Pens': 3, 'Age': 20},
 'Bob': {'Notebooks': 4, 'Pens': 2, 'Age': 21},
 'Carol': {'Notebooks': 6, 'Pens': 1, 'Age': 19}}

#### Converting data to matrix

**the '.keys()' method is a built-in method for dictionaries. It is used to retrieve a view of the keys present in a dictionary. The keys in the dictionary are the unique identifiers for each item or entry in the dictionary.**

In [10]:
#list of name
student_names = list(student_records_dict.keys())

In [11]:
print(student_names)

['Alice', 'Bob', 'Carol']


In [12]:
#list of ages
student_ages = []
#for loop to iterare inside student_name
for i in student_names:
    #retrieve the 'Age' attribute from the 'student_records_dict' for the current student.
    age = student_records_dict[i]['Age']
    #append the retrieved age to the 'student_ages' list.
    student_ages.append(age)

In [13]:
student_ages

[20, 21, 19]

In [14]:
#empty list to store student stationary information.
student_stationary = []
#iterate through each student's name in the 'student_names' list.
for i in student_names:
    #retrieve the attribute dictionary for the current student from 'student_records_dict'.
    attributes_dict = student_records_dict[i]
    #extract the number of notebooks and pens from the attribute dictionary.
    notebooks = attributes_dict['Notebooks']
    pens = attributes_dict['Pens']
    #creating a list, 'attributes,' to store the number of notebooks and pens for the current student.
    attributes = [notebooks, pens]
    #append the 'attributes' list to 'student_stationary,' representing stationary information for the current student.
    student_stationary.append(attributes)

In [15]:
student_stationary

[[5, 3], [4, 2], [6, 1]]

#### Matrix Representation

In the following matrices, we have:

- Rows represent individual students.
- Columns represent attributes or characteristics.

#### Student Names Matrix

The student names are listed in rows, and there is one column for names.

| Student Names |
| ------------- |
| Alice         |
| Bob           |
| Carol         |


#### Student Ages Matrix (Column-wise)

The student ages are organized in columns, with each age in a separate column.

| Student Ages |
| ------------ |
| 20           |
| 21           |
| 19           |

#### Student Stationary (Notebooks and Pens) Matrix

The student stationary attributes are arranged in rows, with each row representing a student and two columns for Notebooks and Pens.

| Student     | Notebooks | Pens |
| ----------  | --------- | ---- |
| Alice       | 5         | 3    |
| Bob         | 4         | 2    |
| Carol       | 6         | 1    |


In [16]:
#student_names list into an array
names_matrix = np.array(student_names)
#student_ages list into an array
ages_matrix = np.array(student_ages)
#student_stationary list into an array
stationary_matrix = np.array(student_stationary)

In [17]:
#name columns of matrix
names_matrix

array(['Alice', 'Bob', 'Carol'], dtype='<U5')

In [18]:
#age columns
ages_matrix

array([20, 21, 19])

In [19]:
#stationary rows and columns
stationary_matrix

array([[5, 3],
       [4, 2],
       [6, 1]])

#### Checking dimesion of Array 

In [20]:
print(f"Dimension of 'ages_matrix' Array: {ages_matrix.ndim}D")

Dimension of 'ages_matrix' Array: 1D


In [21]:
print(f"Dimension of 'stationary_matrix' Array: {stationary_matrix.ndim}D")

Dimension of 'stationary_matrix' Array: 2D


#### Converting into dataframe 

In [22]:
df_student_records_dict = pd.DataFrame(student_records_dict).transpose()

In [23]:
df_student_records_dict

Unnamed: 0,Notebooks,Pens,Age
Alice,5,3,20
Bob,4,2,21
Carol,6,1,19


In [24]:
#selecting columns Notebooks
notebooks = df_student_records_dict['Notebooks'].values

In [25]:
notebooks

array([5, 4, 6], dtype=int64)

In [26]:
#selecting columns Pen
pens = df_student_records_dict['Pens'].values

In [27]:
pens

array([3, 2, 1], dtype=int64)

In [28]:
#selecting columns Age
ages = df_student_records_dict['Pens'].values

In [29]:
ages

array([3, 2, 1], dtype=int64)

In [30]:
#converting whole data structure into array
student_records_array = np.array([df_student_records_dict])

In [31]:
student_records_array

array([[[ 5,  3, 20],
        [ 4,  2, 21],
        [ 6,  1, 19]]], dtype=int64)

#### 1.2 Dictionaries of Lists

In [32]:
#dictory of student data
student_data = {
    #Name, Notebook & Pens attributes in the form of list
    'Name': ['Alice', 'Bob', 'Carol'],
    'Notebooks': [5, 4, 6],
    'Pens': [3, 2, 1]
}

In [33]:
#ages
ages = [20, 21, 19]

In [34]:
#adding age attributes
student_data['Age'] = ages

In [35]:
#printing data
print(student_data)

{'Name': ['Alice', 'Bob', 'Carol'], 'Notebooks': [5, 4, 6], 'Pens': [3, 2, 1], 'Age': [20, 21, 19]}


In [36]:
#checking data types
type(student_data)

dict

In [37]:
#printing each list
print(student_data['Name'])
print(student_data['Notebooks'])
print(student_data['Pens'])
print(student_data['Age'])

['Alice', 'Bob', 'Carol']
[5, 4, 6]
[3, 2, 1]
[20, 21, 19]


In [38]:
#empty lists for different student attributes.
names = []
notebooks = []
pens =[]
age = []
#iterate through the keys of the 'student_data' dictionary.
for i in student_data:
    #check if the attribute name matches a specific attribute and assign the values to the corresponding list.
    if i == 'Name':
        names = student_data[i]
    if i == 'Notebooks':
        notebooks = student_data[i]
    if i == 'Pens':
        pens = student_data[i]
    if i == 'Age':
        age = student_data[i]

In [39]:
print(names)
print(notebooks)
print(pens)
print(age)

['Alice', 'Bob', 'Carol']
[5, 4, 6]
[3, 2, 1]
[20, 21, 19]


### `np.column_stack` in NumPyweyou often encounter scenarios wheweyou need to combine multiple arrays as columns to create a structured 2D array. This is where NumPy's `np.column_stack` function becomes valuable. It allousyou to stack arrays horizontally, creating a new 2D array with each input array as a separate column.

### Syntax

The `np.column_stack` function has the following syntax:

```python
numpy.column_stack(tup)


In [40]:
#combining columns
combined_array = np.column_stack((notebooks, pens, age))

In [41]:
combined_array

array([[ 5,  3, 20],
       [ 4,  2, 21],
       [ 6,  1, 19]])

In [42]:
#creating dataframe
df_student_data = pd.DataFrame(student_data)

In [43]:
df_student_data

Unnamed: 0,Name,Notebooks,Pens,Age
0,Alice,5,3,20
1,Bob,4,2,21
2,Carol,6,1,19


#### SILICING ROWS & COLUMNS 

#### `iloc` Attribute in Pandas

In Pandas, the `iloc` attribute is a powerful tool for integer-location based indexing, which allows you to select and manipulate data in DataFrames or Series by specifying row and column positions using integers. Here's how `iloc` works:

#### Selecting data points

- **Specific Element Selection:** You can use `df.iloc[row_index, column_index]` to select a particular element at the intersection of a row and column. Indexing starts at 0, so the first row and column have an index of 0.

- **Selecting Entire Rows:** Omitting the column index (`df.iloc[row_index]`) allows you to select an entire row. This is useful when you want to work with an entire row of data.

- **Selecting Entire Columns:** If you omit the row index (`df.iloc[:, column_index`), you can select an entire column. This is a common operation when you need data from a single variable or feature.

- **Range Selection:** Slicing can be used to select a range of rows or columns. For example, `df.iloc[1:4, 2:5]` selects rows 1 to 3 and columns 2 to 4.


- `iloc` is particularly useful when you want to retrieve data by position, regardless of the index labels. It's ideal for scenarios where you need to perform operations based on the numerical location of data.

- Indexing in Pandas starts at 0, so the first row and column have an index of 0, the second has an index of 1, and so on.

- The `iloc` attribute simplifies the process of selecting and manipulating data within Pandas DataFrames and Series, making it a valuable tool in data analysis and manipulation tasks.


In [44]:
#selecting notebooks column
notebooks = df_student_data.iloc[:,1:2].values

#### ".values" is used to extract the values from a Pandas DataFrame or Series. 

In [45]:
notebooks

array([[5],
       [4],
       [6]], dtype=int64)

In [46]:
#selecting pens column
pens = df_student_data.iloc[:,2:3].values

In [47]:
pens

array([[3],
       [2],
       [1]], dtype=int64)

In [48]:
#selecting age column
age = df_student_data.iloc[:,3:4].values

In [49]:
age

array([[20],
       [21],
       [19]], dtype=int64)

In [50]:
#converting into array
student_data_array = np.array([df_student_data], dtype=object)

In [51]:
print(f"Dimension of 'student_data_array' Array: {student_data_array.ndim}D")

Dimension of 'student_data_array' Array: 3D


In [52]:
#remove the 'Name' column
#silicing for 3rd dimensional array
student_data_array_without_name = student_data_array[:, :, 1:]

In [53]:
#print the updated NumPy array without the 'name' column
print(student_data_array_without_name)

[[[5 3 20]
  [4 2 21]
  [6 1 19]]]


#### 1.3 Lists of Dictionaries

In [54]:
#list of student 
student_list = [
    #Name, Notebook & Pens attributes in the form of dictonary
    {'Name': 'Alice', 'Notebooks': 5, 'Pens': 3},
    {'Name': 'Bob', 'Notebooks': 4, 'Pens': 2},
    {'Name': 'Carol', 'Notebooks': 6, 'Pens': 1}
]

In [55]:
#adding age from outside of the student list
ages = [20, 21, 19]
for i, j in enumerate(student_list):
        j['Age'] = ages[i]

In [56]:
print(student_list)

[{'Name': 'Alice', 'Notebooks': 5, 'Pens': 3, 'Age': 20}, {'Name': 'Bob', 'Notebooks': 4, 'Pens': 2, 'Age': 21}, {'Name': 'Carol', 'Notebooks': 6, 'Pens': 1, 'Age': 19}]


In [57]:
#checking data types
type(student_list)

list

In [58]:
#Alice
student_list[0]

{'Name': 'Alice', 'Notebooks': 5, 'Pens': 3, 'Age': 20}

In [59]:
#Bob
student_list[1]

{'Name': 'Bob', 'Notebooks': 4, 'Pens': 2, 'Age': 21}

In [60]:
#Carol
student_list[2]

{'Name': 'Carol', 'Notebooks': 6, 'Pens': 1, 'Age': 19}

In [61]:
#empty lists to store individual attributes.
name = []
notebooks = []
pens = []
age = []
#iterate through the 'student_list' and extract values for each attribute.
for i in student_list:
    #append the attribute from each student to the list.
    name.append(i['Name'])
    notebooks.append(i['Notebooks'])
    pens.append(i['Pens'])
    age.append(i['Age'])

In [62]:
print(name)
print(notebooks)
print(pens)
print(age)

['Alice', 'Bob', 'Carol']
[5, 4, 6]
[3, 2, 1]
[20, 21, 19]


In [63]:
#combining all columns into array
combined_array = np.column_stack((notebooks, pens, age))

In [64]:
combined_array

array([[ 5,  3, 20],
       [ 4,  2, 21],
       [ 6,  1, 19]])

In [65]:
#creating dataframe
df_student_list = pd.DataFrame(student_list)

In [66]:
df_student_list

Unnamed: 0,Name,Notebooks,Pens,Age
0,Alice,5,3,20
1,Bob,4,2,21
2,Carol,6,1,19


In [67]:
#create array from data frame
df_student_list_array = np.array([df_student_list])

In [68]:
df_student_list_array

array([[['Alice', 5, 3, 20],
        ['Bob', 4, 2, 21],
        ['Carol', 6, 1, 19]]], dtype=object)

In [69]:
#checking dimesion
df_student_list_array.ndim

3

In [70]:
#silicing 3d array
student_data_array_without_name = df_student_list_array[:, :, 1:]

In [71]:
student_data_array_without_name

array([[[5, 3, 20],
        [4, 2, 21],
        [6, 1, 19]]], dtype=object)

#### 1.4 List of Lists

In [72]:
#student dataset
students = [
    {'Name': 'Alice', 'Notebooks': 5, 'Pens': 3, 'Ages': 20},
    {'Name': 'Bob', 'Notebooks': 4, 'Pens': 2, 'Ages': 22},
    {'Name': 'Carol', 'Notebooks': 6, 'Pens': 1, 'Ages': 19}
]

In [73]:
#converting the list of dictionaries to JSON format
#using indent for pretty & spacing printing
json_data = json.dumps(students, indent=3)  

In [74]:
#print the JSON data
print(json_data)

[
   {
      "Name": "Alice",
      "Notebooks": 5,
      "Pens": 3,
      "Ages": 20
   },
   {
      "Name": "Bob",
      "Notebooks": 4,
      "Pens": 2,
      "Ages": 22
   },
   {
      "Name": "Carol",
      "Notebooks": 6,
      "Pens": 1,
      "Ages": 19
   }
]


In [75]:
json_data

'[\n   {\n      "Name": "Alice",\n      "Notebooks": 5,\n      "Pens": 3,\n      "Ages": 20\n   },\n   {\n      "Name": "Bob",\n      "Notebooks": 4,\n      "Pens": 2,\n      "Ages": 22\n   },\n   {\n      "Name": "Carol",\n      "Notebooks": 6,\n      "Pens": 1,\n      "Ages": 19\n   }\n]'

In [76]:
#parse the JSON data into a list of dictionaries
data = json.loads(json_data)

In [77]:
data

[{'Name': 'Alice', 'Notebooks': 5, 'Pens': 3, 'Ages': 20},
 {'Name': 'Bob', 'Notebooks': 4, 'Pens': 2, 'Ages': 22},
 {'Name': 'Carol', 'Notebooks': 6, 'Pens': 1, 'Ages': 19}]

In [78]:
#initialize empty lists for each attribute
names = []
ages = []
notebooks = []
pens = []
#iterate through the data and extract the attributes
for student in data:
    names.append(student['Name'])
    ages.append(student['Ages'])
    notebooks.append(student['Notebooks'])
    pens.append(student['Pens'])

In [79]:
print(names)
print(ages)
print(notebooks)
print(pens)

['Alice', 'Bob', 'Carol']
[20, 22, 19]
[5, 4, 6]
[3, 2, 1]


In [80]:
#reading json file with pandas
df_json_student_data = pd.read_json(StringIO(json_data), orient='records')

In [81]:
df_json_student_data

Unnamed: 0,Name,Notebooks,Pens,Ages
0,Alice,5,3,20
1,Bob,4,2,22
2,Carol,6,1,19


In [82]:
df_json_student_data = np.array([df_json_student_data])

In [83]:
df_json_student_data = df_json_student_data[:, :, 1:]

In [84]:
df_json_student_data

array([[[5, 3, 20],
        [4, 2, 22],
        [6, 1, 19]]], dtype=object)

## References:

1. **GeeksforGeeks - How to Reverse Column Order in a Matrix with Python**:
   - [How to Reverse Column Order in a Matrix with Python](https://www.geeksforgeeks.org/how-to-reverse-column-order-in-a-matrix-with-python/)

2. **NumPy Documentation - numpy.array()**:
   - [NumPy - numpy.array()](https://numpy.org/doc/stable/reference/generated/numpy.array.html)

3. **NumPy Documentation - Arrays**:
   - [NumPy - Arrays](https://numpy.org/doc/stable/reference/arrays.html)
