<a href="https://colab.research.google.com/github/Amrita-GitHub/Mathur/blob/main/week5_lecture_notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week5: Introduction to the Numpy Library

## Introduction

Welcome, students. In today's lecture, we will embark on a comprehensive exploration of **NumPy**—a foundational library in Python for numerical computing. Up to now, you have become comfortable with basic Python data structures such as lists, tuples, dictionaries, and sets, as well as control flows, functions, and variables. While these built-in data structures are versatile and well-suited for general programming tasks, they have limitations when it comes to efficiently handling large-scale numerical data and performing complex mathematical operations. This is where NumPy comes in, offering enhanced performance, memory efficiency, and a rich collection of functions designed specifically for numerical operations.

## Situating NumPy Among Built-In Data Structures

Before delving into NumPy, let us briefly revisit the Python data structures you are already familiar with:

- **Lists and Tuples:**  
  These are ordered collections that can store items of different data types. However, they are not optimized for numerical computations, particularly when working with large datasets.

- **Dictionaries:**  
  While extremely useful for mapping keys to values, dictionaries are not designed for element-wise arithmetic operations or handling multi-dimensional data.

- **Sets:**  
  Sets are useful for storing unique items, but they lack ordering and do not support indexed operations, which are critical in numerical analysis.

In contrast, **NumPy arrays** are designed with the following advantages:

- **Homogeneous Data Types:**  
  Unlike lists or tuples, NumPy arrays require that all elements be of the same type. This homogeneity allows for optimized memory usage and faster computations.

- **Vectorized Operations:**  
  NumPy supports operations on entire arrays without the need for explicit loops. This feature, known as vectorization, leads to significant performance improvements.

- **Multi-dimensional Arrays:**  
  NumPy naturally handles arrays of two or more dimensions, making it ideal for representing matrices and tensors that frequently appear in business analytics and data science.

- **Broad Functionality:**  
  From statistical computations to linear algebra, NumPy provides an extensive suite of mathematical tools essential for robust data analysis.

## Installing and Setting Up NumPy

### Installing NumPy Locally

If you are working on your local machine, you can install NumPy via pip:

```bash
pip install numpy
```

Ensure that you are using the appropriate Python environment where your projects are maintained.

### Using NumPy in Google Colab

Google Colab is a convenient platform for running Python notebooks, and it comes with NumPy pre-installed. To verify your NumPy installation in Colab, simply execute:

In [None]:
import numpy as np
print("NumPy version:", np.__version__)

NumPy version: 1.26.4


This command will display the version of NumPy installed on the platform.

## Importing NumPy in Your Notebooks

Regardless of the environment, begin your notebooks by importing NumPy with the commonly used alias `np`:

In [None]:
import numpy as np

This aliasing not only saves time but also improves code readability, particularly when using NumPy’s extensive functionality.

# Creating NumPy Data Structures

NumPy offers several methods for creating arrays, each catering to different needs.

## Creating Arrays from Lists or Tuples

You can convert Python lists or tuples directly into NumPy arrays:

In [None]:
# From a list
sales_list = [250, 300, 450, 500, 350]
sales_array = np.array(sales_list)
print("Sales Array:", sales_array)

# From a tuple
inventory_tuple = (10, 15, 20)
inventory_array = np.array(inventory_tuple)
print("Inventory Array:", inventory_array)

Sales Array: [250 300 450 500 350]
Inventory Array: [10 15 20]


## Generating Arrays with Built-In Functions

NumPy provides functions to generate arrays without needing pre-existing data:

- **`np.arange(start, stop, step)`**  
  Creates arrays with evenly spaced values.

In [None]:
days = np.arange(1, 8)  # Days 1 to 7
print("Days:", days)

Days: [1 2 3 4 5 6 7]


- **`np.linspace(start, stop, num)`**  
  Creates arrays with a specified number of evenly spaced values between two numbers.

In [None]:
prices = np.linspace(10, 100, 10)  # 10 prices from 10 to 100
print("Prices:", prices)

Prices: [ 10.  20.  30.  40.  50.  60.  70.  80.  90. 100.]


<!--
*The following examples using np.zeros and np.eye were in the original notes but have been replaced by advanced slicing topics per the updated practice questions.
-->

# Accessing Elements in NumPy Arrays

Once you have created an array, accessing its elements is both intuitive and efficient.

## Indexing and Slicing in One-Dimensional Arrays

In [None]:
data = np.array([10, 20, 30, 40, 50])
# Accessing individual elements
print("First element:", data[0])
print("Last element:", data[-1])

# Slicing: extracting sub-arrays
print("Elements from index 1 to 3:", data[1:4])

First element: 10
Last element: 50
Elements from index 1 to 3: [20 30 40]


### Advanced Slicing in 1D Arrays (Using Step Values)

Advanced slicing allows you to extract elements using a step parameter. For example:

In [None]:
# Given a 1D array, extract every second element.
arr = np.array([10, 20, 30, 40, 50, 60])
# Use the slicing syntax arr[start:stop:step]
print("Every second element:", arr[::2])

Every second element: [10 30 50]


## Multi-Dimensional Array Indexing

For two-dimensional or higher arrays, indexing follows a similar pattern:

In [None]:
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])
# Accessing an element (row 2, column 3)
print("Element at row 2, column 3:", matrix[1, 2])

# Slicing rows and columns
print("First two rows:\n", matrix[:2, :])
print("Second column:\n", matrix[:, 1])

Element at row 2, column 3: 6
First two rows:
 [[1 2 3]
 [4 5 6]]
Second column:
 [2 5 8]


### Advanced Slicing in 2D Arrays (Extracting Submatrices)

In addition to basic indexing, you can extract submatrices by combining row and column slicing. For example:

In [None]:
# Given a 2D array A, extract the submatrix consisting of the first two rows and the last two columns.
A = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]
])
# Use combined slicing: rows 0 and 1, and the last two columns.
submatrix = A[:2, -2:]
print("Submatrix (first two rows, last two columns):\n", submatrix)

Submatrix (first two rows, last two columns):
 [[3 4]
 [7 8]]


# Processing Data in NumPy Arrays

One of the major advantages of NumPy is its ability to process data efficiently.

## Vectorized Operations

Vectorized operations allow you to perform element-wise arithmetic without explicit loops:

In [None]:
# Suppose we have an array of sales figures
sales = np.array([100, 150, 200, 250, 300])
# Applying a 10% discount using vectorized multiplication
discounted_sales = sales * 0.90
print("Discounted Sales:", discounted_sales)

Discounted Sales: [ 90. 135. 180. 225. 270.]


## Using Loops with NumPy Arrays

Although vectorized operations are preferred, you can still use loops for more complex logic. Note, however, that loops tend to be less efficient:

In [None]:
# Increment each element by 5 using a loop
incremented_sales = np.empty_like(sales)
for i in range(len(sales)):
    incremented_sales[i] = sales[i] + 5
print("Incremented Sales:", incremented_sales)

Incremented Sales: [105 155 205 255 305]


## Applying Lambda Functions and `np.vectorize`

You can use lambda functions to define custom operations. NumPy’s `np.vectorize` can then apply these functions to each element:

In [None]:
# Define a lambda function to convert sales to revenue (assume price = $19.99)
convert_to_revenue = lambda x: x * 19.99
vectorized_conversion = np.vectorize(convert_to_revenue)
revenues = vectorized_conversion(sales)
print("Revenues:", revenues)

Revenues: [1999.  2998.5 3998.  4997.5 5997. ]


## Mapping, Filtering, and Reducing Data

### Filtering with Boolean Indexing

Boolean indexing is an extremely powerful tool in NumPy that enables you to filter arrays based on one or more conditions. For example:

In [None]:
# Sample array of sales figures
sales = np.array([100, 150, 200, 250, 300, 350, 400])

# Filter sales greater than 200
high_sales = sales[sales > 200]
print("High Sales (greater than 200):", high_sales)

# Filter sales between 150 and 350 (inclusive)
sales_between = sales[(sales >= 150) & (sales <= 350)]
print("Sales between 150 and 350:", sales_between)

# Filter sales that are either less than 150 or greater than 350 using np.logical_or
extreme_sales = sales[np.logical_or(sales < 150, sales > 350)]
print("Sales less than 150 or greater than 350:", extreme_sales)

# Using np.where to obtain the indices of sales greater than 250
indices = np.where(sales > 250)
print("Indices where sales are greater than 250:", indices)
print("Sales values at these indices:", sales[indices])

# Filter sales using np.extract (similar to np.where but directly returns values)
extracted_sales = np.extract(sales > 250, sales)
print("Extracted sales (greater than 250):", extracted_sales)

High Sales (greater than 200): [250 300 350 400]
Sales between 150 and 350: [150 200 250 300 350]
Sales less than 150 or greater than 350: [100 400]
Indices where sales are greater than 250: (array([4, 5, 6]),)
Sales values at these indices: [300 350 400]
Extracted sales (greater than 250): [300 350 400]


### Traversing Data Using Loops and Lambda Functions

Traversing a NumPy array can be done with simple loops or with lambda functions in conjunction with vectorized operations.

#### Using Loops and Iterators

In [None]:
# Traversing using np.nditer
print("Traversing sales using np.nditer:")
for value in np.nditer(sales):
    print("Sales value:", value)

Traversing sales using np.nditer:
Sales value: 100
Sales value: 150
Sales value: 200
Sales value: 250
Sales value: 300
Sales value: 350
Sales value: 400


#### Using Lambda Functions

While lambda functions are not typically used for traversing arrays element-by-element (as vectorized operations are preferred), you can still apply them using functions such as `np.vectorize` or `np.apply_along_axis`.

**Example 1: Using np.vectorize to apply a lambda function element-wise**

In [None]:
# Define a lambda function to add a fixed commission (e.g., $10) to each sale
add_commission = lambda x: x + 10

# Vectorize the lambda function to apply it to each element of the array
vectorized_add = np.vectorize(add_commission)
commissioned_sales = vectorized_add(sales)
print("Sales after adding commission:", commissioned_sales)

Sales after adding commission: [110 160 210 260 310 360 410]


**Example 2: Using np.apply_along_axis to traverse a multi-dimensional array**

In [None]:
# Create a 2D array representing sales in different regions (rows) over several days (columns)
regional_sales = np.array([
    [100, 200, 150],
    [250, 300, 350],
    [400, 450, 500]
])

# Define a lambda function that calculates the range (max - min) for each row
range_func = lambda row: np.max(row) - np.min(row)

# Apply the lambda function along axis 1 (each row)
sales_range = np.apply_along_axis(range_func, 1, regional_sales)
print("Sales range for each region:", sales_range)

Sales range for each region: [100 100 100]


### Reducing Data

Reduction operations in NumPy aggregate data along an axis. Here are several examples beyond the standard sum, mean, and standard deviation:

In [None]:
# Compute the total, mean, and standard deviation of sales
total_sales = np.sum(sales)
mean_sales = np.mean(sales)
std_sales = np.std(sales)
print("Total Sales:", total_sales)
print("Mean Sales:", mean_sales)
print("Standard Deviation of Sales:", std_sales)

# Additional reduction examples:

# 1. Compute the product of all sales values
product_sales = np.prod(sales)
print("Product of Sales:", product_sales)

# 2. Find the minimum and maximum sales values
min_sales = np.min(sales)
max_sales = np.max(sales)
print("Minimum Sales:", min_sales)
print("Maximum Sales:", max_sales)

# 3. Compute the median sales value
median_sales = np.median(sales)
print("Median Sales:", median_sales)

# 4. Compute the cumulative sum and cumulative product of sales
cumulative_sales = np.cumsum(sales)
cumulative_product = np.cumprod(sales)
print("Cumulative Sales:", cumulative_sales)
print("Cumulative Product of Sales:", cumulative_product)

# 5. Compute the range of sales (difference between max and min)
range_sales = max_sales - min_sales
print("Range of Sales:", range_sales)

Total Sales: 1750
Mean Sales: 250.0
Standard Deviation of Sales: 100.0
Product of Sales: 31500000000000000
Minimum Sales: 100
Maximum Sales: 400
Median Sales: 250.0
Cumulative Sales: [ 100  250  450  700 1000 1350 1750]
Cumulative Product of Sales: [              100             15000           3000000         750000000
      225000000000    78750000000000 31500000000000000]
Range of Sales: 300


# Important NumPy Methods for Business Analytics

NumPy's comprehensive suite of methods is invaluable in business analytics. Below are some methods and their potential applications:

## **Aggregation Functions:**  

The following NumPy methods are used to aggregate a NumPy array into a single value:
  - `np.sum()`
  - `np.mean()`
  - `np.std()`
  - `np.min()`
  - `np.max()`
  - `np.median()`

## **Cumulative Operations:**  
`np.cumsum()` computes the cumulative sum, which can be useful in tracking progressive sales or inventory levels.

In [None]:
cumulative_sales = np.cumsum(sales)
print("Cumulative Sales:", cumulative_sales)

Cumulative Sales: [ 100  250  450  700 1000 1350 1750]


## **Difference Calculations:**  
`np.diff()` computes the difference between consecutive elements, aiding in trend analysis.

In [None]:
sales_diff = np.diff(sales)
print("Sales Differences:", sales_diff)

Sales Differences: [50 50 50 50 50 50]


## **Reshaping Arrays:**  

Below we demonstrate reshaping operations, including converting a 1D array into a 2D column array and extracting specific columns from a 2D array.

In [None]:
# Assume 'sales' is a one-dimensional array of sales figures.
sales = np.array([100, 150, 200, 250, 300])
print("Original Sales (1D array):\n", sales)

Original Sales (1D array):
 [100 150 200 250 300]


In [None]:
# Reshape the one-dimensional array into a two-dimensional column array.
reshaped_sales = sales.reshape(5, 1)
print("Reshaped Sales (2D column array):\n", reshaped_sales)

Reshaped Sales (2D column array):
 [[100]
 [150]
 [200]
 [250]
 [300]]


In [None]:
# -------------------------------------------------------
# Consider a 2D array with two columns.
# For example, an array where each row contains a sales figure and a corresponding discount value.
data = np.array([
    [100, 10],
    [150, 15],
    [200, 20],
    [250, 25],
    [300, 30]
])
print("Original Data (2D array):\n", data)

Original Data (2D array):
 [[100  10]
 [150  15]
 [200  20]
 [250  25]
 [300  30]]


In [None]:
# Extract just the first column (sales) as a one-dimensional array.
sales_column = data[:, 0]  # The ':' selects all rows; '0' selects the first column.
print("Extracted Sales Column (1D array):\n", sales_column)

Extracted Sales Column (1D array):
 [100 150 200 250 300]


In [None]:
# Similarly, extract the second column (discounts) as a one-dimensional array.
discounts_column = data[:, 1]
print("Extracted Discounts Column (1D array):\n", discounts_column)

Extracted Discounts Column (1D array):
 [10 15 20 25 30]


## **Sorting Data:**  
`np.sort()` can sort arrays, which is useful in identifying outliers or preparing data for further analysis.

In [None]:
sorted_sales = np.sort(sales)
print("Sorted Sales:", sorted_sales)

Sorted Sales: [100 150 200 250 300]


## **Dot Product and Matrix Multiplication:**  
For financial modeling and other analytical tasks, `np.dot()` and methods in `numpy.linalg` (such as `np.linalg.inv()` and `np.linalg.eig()`) are essential.

In [None]:
# Example of dot product between two arrays
vector_a = np.array([1, 2, 3])
vector_b = np.array([4, 5, 6])
dot_product = np.dot(vector_a, vector_b)
print("Dot Product:", dot_product)

Dot Product: 32


# Data Persistence in NumPy and Beyond

In business analytics, it is essential not only to process and analyze data but also to **persist** (i.e., save and later reload) your data for further processing, sharing, or archival purposes. In this section, we will explore three primary methods for data persistence:

1. **NumPy's Native Binary Formats:**  
   Utilizing NumPy’s own functions to save arrays in a binary format (.npy and .npz).

2. **CSV Files:**  
   Saving to and loading from comma-separated values (CSV) files, which are human-readable and widely used for data exchange.

3. **MS Excel Spreadsheets:**  
   Although NumPy does not directly support Excel formats, you can integrate with the pandas library to handle Excel files.

Each method has its advantages and is appropriate for different scenarios.

---

## Saving and Loading with NumPy's Native Formats

NumPy provides functions to efficiently save and load arrays in binary formats. These formats are highly efficient and preserve data type and shape information.

### Saving a Single Array (.npy)

In [None]:
# Create a sample array
sales = np.array([100, 150, 200, 250, 300, 225, 310, 485, 320, 190, 320, 276, 312, 378])

# Save the array to a binary file
np.save("sales.npy", sales)
print("Sales array saved to 'sales.npy'.")

Sales array saved to 'sales.npy'.


### Loading a Single Array (.npy)

In [None]:
# Load the array from the binary file
loaded_sales = np.load("sales.npy")
print("Loaded Sales:", loaded_sales)

Loaded Sales: [100 150 200 250 300 225 310 485 320 190 320 276 312 378]


### Saving Multiple Arrays (.npz)

When you have multiple arrays, you can store them together in a compressed file using the `.npz` format.

In [None]:
# Create multiple arrays
inventory = np.array([[10, 15, 20],
                      [12, 18, 25],
                      [14, 20, 30]])
prices = np.array([19.99, 29.99, 39.99])

# Save both arrays into a single .npz file
np.savez("data.npz", inventory=inventory, prices=prices)
print("Inventory and prices arrays saved to 'data.npz'.")

Inventory and prices arrays saved to 'data.npz'.


### Loading Multiple Arrays from a .npz File

In [None]:
# Load the data from the .npz file
data = np.load("data.npz")
loaded_inventory = data["inventory"]
loaded_prices = data["prices"]

print("Loaded Inventory:\n", loaded_inventory)
print("Loaded Prices:", loaded_prices)

Loaded Inventory:
 [[10 15 20]
 [12 18 25]
 [14 20 30]]
Loaded Prices: [19.99 29.99 39.99]


---

## Saving and Loading CSV Files

CSV files are text-based and widely used for data sharing and interoperability with other applications. While CSV files are not as efficient as binary formats, they are human-readable and easily editable.

### Saving an Array to a CSV File

In [None]:
# Save the sales array to a CSV file
# Here, we include a header and use commas as delimiters.
np.savetxt("sales.csv", sales, fmt="%d", delimiter=",", header="Sales Data", comments="")
print("Sales array saved to 'sales.csv'.")

Sales array saved to 'sales.csv'.


### Loading an Array from a CSV File

In [None]:
# Load the sales array from the CSV file
# The header row is skipped using the skiprows parameter.
loaded_sales_csv = np.loadtxt("sales.csv", delimiter=",", skiprows=1)
print("Loaded Sales from CSV:", loaded_sales_csv)

Loaded Sales from CSV: [100. 150. 200. 250. 300. 225. 310. 485. 320. 190. 320. 276. 312. 378.]


---

## Working with MS Excel Spreadsheets

While NumPy does not have built-in support for Excel files, the **pandas** library offers robust functionality for reading from and writing to Excel spreadsheets. Once the data is loaded into a pandas DataFrame, you can convert it to a NumPy array if needed.

### Saving Data to an Excel File using pandas

In [None]:
import pandas as pd

# Create a pandas DataFrame from the sales array
sales_df = pd.DataFrame(sales, columns=["Sales"])
# Write the DataFrame to an Excel file (requires openpyxl: pip install openpyxl)
sales_df.to_excel("sales.xlsx", index=False, sheet_name="SalesData")
print("Sales data saved to 'sales.xlsx'.")

Sales data saved to 'sales.xlsx'.


### Loading Data from an Excel File using pandas

In [None]:
# Read the Excel file into a DataFrame
df = pd.read_excel("sales.xlsx", sheet_name="SalesData")
# Convert the DataFrame to a NumPy array if necessary
sales_from_excel = df.to_numpy()
print("Sales data loaded from Excel:\n", sales_from_excel)

Sales data loaded from Excel:
 [[100]
 [150]
 [200]
 [250]
 [300]
 [225]
 [310]
 [485]
 [320]
 [190]
 [320]
 [276]
 [312]
 [378]]


---

# Real-World Applications in Business Analytics

To consolidate the concepts learned today, consider the following business cases where NumPy plays a critical role.

## Sales and Revenue Analysis

Imagine a retail company analyzing daily sales data:

In [None]:
# Daily units sold over one week
daily_units = np.array([120, 135, 150, 145, 160, 155, 170])
price_per_unit = 19.99

# Calculate daily revenue using vectorized multiplication
daily_revenue = daily_units * price_per_unit
print("Daily Revenue:", daily_revenue)

# Total revenue over the week
weekly_revenue = np.sum(daily_revenue)
print("Weekly Revenue:", weekly_revenue)

Daily Revenue: [2398.8  2698.65 2998.5  2898.55 3198.4  3098.45 3398.3 ]
Weekly Revenue: 20689.649999999998


## Inventory Management

For multi-store inventory management, NumPy aids in aggregating and analyzing inventory levels:

In [None]:
# Inventory levels for three stores and three product categories
inventory_levels = np.array([[100, 150, 200],
                             [120, 130, 210],
                             [110, 160, 190]])

# Total inventory per store (summing along columns)
store_totals = np.sum(inventory_levels, axis=1)
print("Inventory Totals per Store:", store_totals)

# Total inventory per product category (summing along rows)
category_totals = np.sum(inventory_levels, axis=0)
print("Inventory Totals per Category:", category_totals)

Inventory Totals per Store: [450 460 460]
Inventory Totals per Category: [330 440 600]


## Financial Modeling

Consider a scenario where you analyze asset returns to compute a covariance matrix—an essential component in portfolio risk analysis:

In [None]:
# Returns of two assets over several periods
returns = np.array([[0.05, 0.07, 0.06, 0.04],
                    [0.02, 0.03, 0.025, 0.035]])
covariance_matrix = np.cov(returns)
print("Covariance Matrix:\n", covariance_matrix)

Covariance Matrix:
 [[ 1.66666667e-04 -1.66666667e-05]
 [-1.66666667e-05  4.16666667e-05]]


# Conclusion

Today’s lecture provided a detailed introduction to NumPy, underscoring its importance in the realm of business analytics and data science. We began by comparing NumPy arrays with standard Python data structures, highlighting the need for efficiency and advanced operations. We then covered practical aspects such as installation (both locally and on Google Colab), importing the library, and various methods for creating, accessing, and processing data with NumPy. In addition, we explored techniques for mapping, filtering, and reducing data, and discussed several key functions essential for business analytics.

Furthermore, we introduced methods for persisting your work by loading and saving NumPy arrays using both binary and CSV file formats. Although NumPy does not directly support MS Excel spreadsheets, you can easily handle Excel data with the help of pandas.

I encourage you to experiment with these techniques and explore further how NumPy integrates with other libraries (such as pandas and scikit-learn) to create a powerful analytics toolkit. For additional reading and practice, please consult the [NumPy Official Documentation](https://numpy.org/doc/) and other recommended resources.

---

*Dr. Tim C. Smith ©️2025*  

*University of South Florida, ISM4641 Python for Business Analytics*

---