# üêº **AUPP Data Science Club**

## **Workshop 1: NumPy, Pandas, and Data Exploration**

## NumPy Basics

What is NumPy? <br>
NumPy is a Python library used for working with arrays and performing mathematical operations. It is the foundational library for numerical computing and offers very efficient operations.

---



What NumPy has
- Arrays (1D, 2D, 3D, ... nD): like Python's list but faster and smarter
- Fast math operations and aggregation (sum, mean, max, std...)
- Vectorized operations (do math on an entire array without writing loops)
- Tools for linear algebra, statistics, and much more...

In [None]:
%pip install numpy

## 1.1 Importing NumPy

The standard abbreviation to import NumPy is as `np`:

In [1]:
import numpy as np

print(f"NumPy Version: {np.__version__}")

NumPy Version: 2.2.6


The array object in NumPy is called `ndarray`

## 1.2 Creating NumPy Arrays

The main data structure in NumPy is called array (`ndarray`: N-dimensional array)

Key features of the `ndarray` include:
- Multidimensionality: can represent arrays with any number of dimensions (0D, 1D, 2D, 3D...).
- Homogeneity: all elements within a single `ndarray` must be of the exact same data type.
- Fixed Size: once an `ndarray` is created, its total size cannot change.
- Vectorization: Operations are applied across the entire array without explicit `for` loops. 

In [3]:
# Create a One-dimensional array (1D)
v = np.array([2.5, 20.00, 12.18, 67.00, 518.60])

print("Item prices:", v)
print("Type:", type(v))
print("Shape:", v.shape)  # (5,) means 1D array with 5 elements
print("Dimension:", v.ndim)
print("Data Type:", v.dtype)

Item prices: [  2.5   20.    12.18  67.   518.6 ]
Type: <class 'numpy.ndarray'>
Shape: (5,)
Dimension: 1
Data Type: float64


In [2]:
# Create a Two-dimensional array (2D)
m = np.array(
    [
        [1, 2, 3, 4],
        [99, 98, 97, 96],
        [66, 67, 68, 69]
    ]
)

m

array([[ 1,  2,  3,  4],
       [99, 98, 97, 96],
       [66, 67, 68, 69]])

In [3]:
print("Item prices:", m)
print("Type:", type(m))
print("Shape:", m.shape)  # (3, 4) means 2D array with 3 1D arrays of 4 elements
                          # 3 x 4 matrix
print("Dimension:", m.ndim)
print("Data Type:", m.dtype)

Item prices: [[ 1  2  3  4]
 [99 98 97 96]
 [66 67 68 69]]
Type: <class 'numpy.ndarray'>
Shape: (3, 4)
Dimension: 2
Data Type: int64


### Convert data type

In [17]:
print("Original Data Type:", m.dtype)

m_int = m.astype('float')
print("Float Data Type:", m_int.dtype)
print("Item prices as int:\n", m_int)

Original Data Type: int64
Float Data Type: float64
Item prices as int:
 [[ 1.  2.  3.  4.]
 [99. 98. 97. 96.]
 [66. 67. 68. 69.]]


In [23]:
# Create a Three-dimensional array (3D)
t = np.array(
    [
        [
            [1, 2, 3],
            [2, 3, 4],
            [4, 5, 6]
        ],
        [
            [67, 68, 69],
            [71, 72, 73],
            [99, 98, 97]
        ]
    ]
)

t.shape

(2, 3, 3)

### What does a (2, 3, 3) (m, n, p) shape of array means?

> Answer:

In [26]:
# How about a 0-dimensional array?
s = np.array(68)

s.ndim

0

### Create array of sequence number 

In [29]:
# np.arange(start, stop, step, dtype): array range

n = np.arange(0, 10, 1)
n

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [40]:
# np.reshape(a: ArrayLike, newshape, order)

n1 = np.reshape(n, (2, 5))

n1

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

### Create initialized arrays

In [44]:
ones_matrix = np.ones((2, 4))

ones_matrix

array([[1, 1, 1, 1],
       [1, 1, 1, 1]])

In [49]:
zeros_tensor = np.zeros((3, 3, 4))

zeros_tensor

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]])

### Create array with random elements

In [82]:
random_arr = np.array(np.random.rand(5,)).round(2)

random_arr

array([0.5 , 0.71, 0.95, 0.95, 0.24])

### Array Indexing

Access individual elements using square brackets `[]`. <br>
Remember: Python uses **0-based indexing**.

In [84]:
first_val = random_arr[0]
print(f"First value: {first_val}")

last_val = random_arr[-1]
print(f"Last value: {last_val}")

third_val = random_arr[2]
print(f"Third value: {third_val}")

First value: 0.5
Last value: 0.24
Third value: 0.95


### Array Slicing

Get multiple elements using slicing syntax: `array[start:end]`  
**Note:** The `end` index is **exclusive** (not included)

In [88]:
# Get first 3 prices
first_three_val = random_arr[0:3]  # or gift_prices[:3]
print("First 3 prices:", first_three_val)

# Get last 2 ratings
last_two_val = random_arr[-2:]
print("Last 2 ratings:", last_two_val)

# Get middle elements (index 1 to 3)
middle_vals = random_arr[1:-1]
print("Middle prep days:", middle_vals)

First 3 prices: [0.5  0.71 0.95]
Last 2 ratings: [0.95 0.24]
Middle prep days: [0.71 0.95 0.95]


### Array Operations

Let's imagine a shopping cart data

In [96]:
prices = [5.99, 2, 18, 14.67, 7.35]
amounts = [1, 12, 2, 5, 7]
cash_from_customers = [10, 50, 35, 100, 56]

In [106]:
# With normal Python loops
total_prices = []

for price, amount in zip(prices, amounts):
    total = price * amount
    total_prices.append(total)

total_prices

[5.99, 24, 36, 73.35, 51.449999999999996]

In [None]:
# Find the total price from each customer

total_prices = np.array(prices) * np.array(amounts)

total_prices

array([ 5.99, 24.  , 36.  , 73.35, 51.45])

In [None]:
# Calculate 10% tax on all prices
taxed_total_prices = total_prices * 1.1

array([ 6.589, 26.4  , 39.6  , 80.685, 56.595])

In [100]:
# Add a discount of $5 to all prices
discounted_prices = total_prices - 5

print("Original prices:", total_prices)
print("After $5 discount:", discounted_prices)

Original prices: [ 5.99 24.   36.   73.35 51.45]
After $5 discount: [ 0.99 19.   31.   68.35 46.45]


### Statistical Analysis

## Array Creation

NumPy provides several ways of creating arrays, including creating them from Python lists, linealy spaced ranges, and creating random matrices of arbitary sizes

`np.array` takes sequences as objects, rather then sequences unpacked

Key features of arrays:
- Fixed Size: once an `ndarray` is created, its total size cannot change.

In [15]:
items = np.array([87, True, 87-12, 9.9283])

print("Items:", items)
print("Type:", type(items))
print("Shape:", items.shape)

for item in items:
    print(type(item))

Items: [87.      1.     75.      9.9283]
Type: <class 'numpy.ndarray'>
Shape: (4,)
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>


In [43]:
ones_matrix = np.ones((4, 6))

takes in shape as an argument

 Initializing Data Structures (Placeholders)
Before filling an array with calculated data, you often need to create an empty container of the correct size.

np.zeros: Creates a default "empty" matrix (e.g., for representing an initial state in a simulation, a blank image, or a zeroed-out gradient in machine learning).

np.ones: Creates a matrix of 1s, commonly used to initialize weight matrices, bias vectors, or as a starting point for algorithms that require multiplication. 

## Vectorization


## Aggregation, Manipulation