# ***Scenario: Student Performance Analysis***

We'll use the [Student Performance Data Set from the UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/320/student+performance). This dataset includes student grades, demographic, social, and school-related features.

This is just a example of how to use array mapnipulation operations which were discussed in tutorial in real scanrio

## Load the Dataset

First, we'll load the dataset into a Pandas DataFrame and then convert it into a NumPy array for manipulation.

In [3]:
import pandas as pd
import numpy as np

# Load the dataset
url = "student-mat.csv"
df = pd.read_csv(url, sep=';')

# Display the first few rows
print(df.head())

  school sex  age address famsize Pstatus  Medu  Fedu     Mjob      Fjob  ...  \
0     GP   F   18       U     GT3       A     4     4  at_home   teacher  ...   
1     GP   F   17       U     GT3       T     1     1  at_home     other  ...   
2     GP   F   15       U     LE3       T     1     1  at_home     other  ...   
3     GP   F   15       U     GT3       T     4     2   health  services  ...   
4     GP   F   16       U     GT3       T     3     3    other     other  ...   

  famrel freetime  goout  Dalc  Walc health absences  G1  G2  G3  
0      4        3      4     1     1      3        6   5   6   6  
1      5        3      3     1     1      3        4   5   5   6  
2      4        3      2     2     3      3       10   7   8  10  
3      3        2      2     1     1      5        2  15  14  15  
4      4        3      2     1     2      5        4   6  10  10  

[5 rows x 33 columns]


## Convert to NumPy Array

We'll extract numerical columns for analysis.

In [10]:
# Select numerical columns
numerical_cols = df.select_dtypes(include=[np.number]).columns
print(numerical_cols)
data = df[numerical_cols].to_numpy()

# Display the shape of the data
print("Data shape:", data.shape)

Index(['age', 'Medu', 'Fedu', 'traveltime', 'studytime', 'failures', 'famrel',
       'freetime', 'goout', 'Dalc', 'Walc', 'health', 'absences', 'G1', 'G2',
       'G3'],
      dtype='object')
Data shape: (395, 16)


## Apply NumPy Functions

Now, let's apply various NumPy functions to this dataset.

In [14]:
# Reshape the data to a 3D array.

'''
data.shape[0]: the number of rows (e.g., samples)

data.shape[1]: the number of columns (e.g., features)

1: adds a third dimension with size 1
'''

reshaped_data = data.reshape(data.shape[0], data.shape[1], 1)

print("Reshaped data shape:", reshaped_data.shape)


# flatten() and ravel()
# Flatten the data into a 1D array.

flattened_data = data.flatten()
raveled_data = data.ravel()
print("Flattened data length:", len(flattened_data))
print("Raveled data length:", len(raveled_data))

# transpose() and swapaxes()
# Transpose the data and swap axes.

transposed_data = data.T
swapped_data = np.swapaxes(reshaped_data, 0, 2)
print("Transposed data shape:", transposed_data.shape)
print("Swapped data shape:", swapped_data.shape)

# concatenate(), vstack(), and hstack()

# Create a dummy array for demonstration
dummy_array = np.ones_like(data)

concatenated_data = np.concatenate((data, dummy_array), axis=1)
vstacked_data = np.vstack((data, dummy_array))
hstacked_data = np.hstack((data, dummy_array))
print("Concatenated data shape:", concatenated_data.shape)
print("Vstacked data shape:", vstacked_data.shape)
print("Hstacked data shape:", hstacked_data.shape)

# split()
split_data = np.split(data, 2, axis=1)
print("Split data shapes:", [arr.shape for arr in split_data])

# sort(), argsort(), argmax(), argmin(), where(), nonzero()
sorted_data = np.sort(data, axis=0)
argsorted_data = np.argsort(data, axis=0)
max_indices = np.argmax(data, axis=0)
min_indices = np.argmin(data, axis=0)
condition = data > 10
where_indices = np.where(condition)
nonzero_indices = np.nonzero(condition)
print("Sorted data shape:", sorted_data.shape)
print("Argsorted data shape:", argsorted_data.shape)
print("Max indices:", max_indices)
print("Min indices:", min_indices)
print("Where indices:", where_indices)
print("Nonzero indices:", nonzero_indices)

# broadcast_to(), expand_dims()
# Create a 1D array to broadcast
broadcast_array = np.array([1] * data.shape[1])
broadcasted_data = np.broadcast_to(broadcast_array, data.shape)
expanded_data = np.expand_dims(data, axis=2)
print("Broadcasted data shape:", broadcasted_data.shape)
print("Expanded data shape:", expanded_data.shape)

# unique(), in1d()
unique_elements = np.unique(data)
in1d_result = np.in1d(data, [15, 16, 17])
print("Unique elements:", unique_elements)
print("In1d result shape:", in1d_result.shape)




Reshaped data shape: (395, 16, 1)
Flattened data length: 6320
Raveled data length: 6320
Transposed data shape: (16, 395)
Swapped data shape: (1, 16, 395)
Concatenated data shape: (395, 32)
Vstacked data shape: (790, 16)
Hstacked data shape: (395, 32)
Split data shapes: [(395, 8), (395, 8)]
Sorted data shape: (395, 16)
Argsorted data shape: (395, 16)
Max indices: [247   0   0  61  47   2   1   9  18  29  29   3 276  42  47  47]
Min indices: [  2 127  76   1  12   0  25   7   9   0   0   7   6 248 130 128]
Where indices: (array([  0,   1,   2, ..., 393, 393, 394]), array([ 0,  0,  0, ..., 13, 14,  0]))
Nonzero indices: (array([  0,   1,   2, ..., 393, 393, 394]), array([ 0,  0,  0, ..., 13, 14,  0]))
Broadcasted data shape: (395, 16)
Expanded data shape: (395, 16, 1)
Unique elements: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 28 30 38 40 54 56 75]
In1d result shape: (6320,)


  in1d_result = np.in1d(data, [15, 16, 17])
