# Numpy Assignment
*Name:* Zach Novak

*PID:* za659148

*Date:* 1/26/2025

*Summary:* This Notebook first constructs and compares the compuation time of a simple operation (division) between two different sized arrays: one using Numpy and one using For Loops. The idea behind this is Numpy can broadcast operations across two different sized arrays. This is convinient right away, but the code below proves numpy is computationally faster than standard Python diction like For loops. Part Two shows the intricacies of converting Python dictionaries to NumPy arrays. For added complexity and data clarity, a structured NumPy array is created.


# Part One

In this exercise, try to use for loops to replicate the result of the following broadcasting operations.

Try to replicate this simple example using for loops and compare your results with the broadcasting operation below.

In [None]:
# import modules
import numpy as np
import timeit

# create a 4x4 array of random numbers
np.random.seed(123)
x = np.random.randn(4, 4)
y = np.random.randn(4)

# print values
print("X values are: \n", x)
print("\nY values are: \n", y)
print("\nA equals: \n", x / y)

X values are: 
 [[-1.0856306   0.99734545  0.2829785  -1.50629471]
 [-0.57860025  1.65143654 -2.42667924 -0.42891263]
 [ 1.26593626 -0.8667404  -0.67888615 -0.09470897]
 [ 1.49138963 -0.638902   -0.44398196 -0.43435128]]

Y values are: 
 [2.20593008 2.18678609 1.0040539  0.3861864 ]

A equals: 
 [[-0.49214189  0.45607819  0.28183596 -3.90043439]
 [-0.26229311  0.75518888 -2.41688145 -1.11063629]
 [ 0.57387869 -0.39635354 -0.67614513 -0.2452416 ]
 [ 0.676082   -0.29216483 -0.44218937 -1.12471925]]


Now that the array is known, it can be replicated with For loops.

In [2]:
# hand craft the arrays from the numpy random seed
multiarray = [[-1.0856306, 0.99734545, 0.2829785, -1.50629471], 
        [-0.57860025, 1.65143654, -2.42667924, -0.42891263], 
        [1.26593626, -0.8667404, -0.67888615, -0.09470897], 
        [1.49138963, -0.638902, -0.44398196, -0.43435128]]

singlearray = [2.20593008, 2.18678609, 1.0040539, 0.3861864]

# define a function to compute A = X / Y using for loops
def forloop_compute(multiarray, singlearray):
    output = []  # initialize an empty list for the output
    for i in range(len(multiarray)):
        row = []  # create a new row for the single values
        for j in range(len(multiarray[i])):
            row.append(multiarray[i][j] / singlearray[j])  # compute the value and append it to the row
        output.append(row)  # append the completed row of single values to the output
        print(row)
    return output  # return the A array

forloop_compute(multiarray, singlearray)

[-0.49214189055348484, 0.45607819372950187, 0.28183596518075377, -3.900434375731512]
[-0.2622931049564364, 0.7551888808658006, -2.4168814443129, -1.110636288590173]
[0.5738786879410067, -0.3963535363442887, -0.6761451252766411, -0.24524159835768428]
[0.6760820043761315, -0.2921648362963567, -0.442189368518961, -1.1247192547433054]


[[-0.49214189055348484,
  0.45607819372950187,
  0.28183596518075377,
  -3.900434375731512],
 [-0.2622931049564364,
  0.7551888808658006,
  -2.4168814443129,
  -1.110636288590173],
 [0.5738786879410067,
  -0.3963535363442887,
  -0.6761451252766411,
  -0.24524159835768428],
 [0.6760820043761315,
  -0.2921648362963567,
  -0.442189368518961,
  -1.1247192547433054]]

Above is the For Loop array. Below, we will compare the compuational times between the two methods.

In [3]:
# wrapper function to standardize the args and fit to the timeit function
def wrapper():
    forloop_compute(multiarray, singlearray)

# compute the execution time of the for loop
forloop_execution_time = timeit.timeit(wrapper, number=1)
print(f"For Loop Execution time: {forloop_execution_time} seconds\n")


# define a function to compute A = X / Y using numpy
def numpy_compute():
    A = x / y
    print(A)

# compute the execution time of the numpy computation
numpy_execution_time = timeit.timeit(numpy_compute, number=1)
print(f"Numpy Execution time: {numpy_execution_time} seconds")

# compare the execution times
speed_difference = forloop_execution_time / numpy_execution_time
print(f"\n\nNumpy computation is >>> {speed_difference.__round__(2)} <<< times faster than For Loop computation.")

[-0.49214189055348484, 0.45607819372950187, 0.28183596518075377, -3.900434375731512]
[-0.2622931049564364, 0.7551888808658006, -2.4168814443129, -1.110636288590173]
[0.5738786879410067, -0.3963535363442887, -0.6761451252766411, -0.24524159835768428]
[0.6760820043761315, -0.2921648362963567, -0.442189368518961, -1.1247192547433054]
For Loop Execution time: 0.00021239998750388622 seconds

[[-0.49214189  0.45607819  0.28183596 -3.90043439]
 [-0.26229311  0.75518888 -2.41688145 -1.11063629]
 [ 0.57387869 -0.39635354 -0.67614513 -0.2452416 ]
 [ 0.676082   -0.29216483 -0.44218937 -1.12471925]]
Numpy Execution time: 0.00021000008564442396 seconds


Numpy computation is >>> 1.01 <<< times faster than For Loop computation.


As you can see, most of the time the numpy computation will outperform the For loops based on time, not considering the convieniency it brings as well. If the above computation does not show NumPy as faster, it is an exception and a new test should be ran.  

# Part Two
Please convert a python dictionary into a NumPy array using the NumPy library. You can use any dictionary example you want.

In [4]:
# create a dictionary of stock prices and quantities to simulate a portfolio
portfolio_dictionary = {
    "ANET": {"price": 130.36, "quantity": 113.8952164},
    "AVGO": {"price": 245.36, "quantity": 60.637910821},
    "CIEN": {"price": 99.16, "quantity": 50.818172578},
    "GLW": {"price": 54.05, "quantity": 92.472720547},
    "MSFT": {"price": 445.38, "quantity": 22.47245111},
    "NVDA": {"price": 145.13, "quantity": 136.137771424},
    "ORCL": {"price": 184.9, "quantity": 54.185857491},
    "TSM": {"price": 222.64, "quantity": 89.559392418},
}


Extract the data from the dictionary. This is crucial to remove the key:value formatting which is not tabular and incompatible with NumPy.

In [5]:
# create a list of lists from the dictionary
portfolio_list = [
    [ticker, details["price"], details["quantity"]]
    for ticker, details in portfolio_dictionary.items()
]

print("\n\nPortfolio list of lists: \n", portfolio_list)



Portfolio list of lists: 
 [['ANET', 130.36, 113.8952164], ['AVGO', 245.36, 60.637910821], ['CIEN', 99.16, 50.818172578], ['GLW', 54.05, 92.472720547], ['MSFT', 445.38, 22.47245111], ['NVDA', 145.13, 136.137771424], ['ORCL', 184.9, 54.185857491], ['TSM', 222.64, 89.559392418]]


Now the data is compatible to apply the .array() method from the NumPy library.

In [6]:
# create a numpy array from the list of lists
portfolio_array = np.array(portfolio_list)
print("\n\nPortfolio Array: \n", portfolio_array)



Portfolio Array: 
 [['ANET' '130.36' '113.8952164']
 ['AVGO' '245.36' '60.637910821']
 ['CIEN' '99.16' '50.818172578']
 ['GLW' '54.05' '92.472720547']
 ['MSFT' '445.38' '22.47245111']
 ['NVDA' '145.13' '136.137771424']
 ['ORCL' '184.9' '54.185857491']
 ['TSM' '222.64' '89.559392418']]


In order to allow a more accessible array, and in my opinion more accurate, data types will be assigned to structure the array. 

In [7]:
# define the structured data type
dtype = [("Ticker", "U10"), ("Price", "f8"), ("Quantity", "f8")]

# ensure portfolio_list is a list of tuples
portfolio_list_tuple = [
    (ticker, details["price"], details["quantity"])
    for ticker, details in portfolio_dictionary.items()
]

# create the structured numpy array
portfolio_array = np.array(portfolio_list_tuple, dtype=dtype)

# print the structured array
print("\n\nPortfolio Array: \n", portfolio_array)



Portfolio Array: 
 [('ANET', 130.36, 113.8952164 ) ('AVGO', 245.36,  60.63791082)
 ('CIEN',  99.16,  50.81817258) ('GLW',  54.05,  92.47272055)
 ('MSFT', 445.38,  22.47245111) ('NVDA', 145.13, 136.13777142)
 ('ORCL', 184.9 ,  54.18585749) ('TSM', 222.64,  89.55939242)]


An example to work with the library:

In [8]:
# access the price and quantity features directly
prices = portfolio_array["Price"]
quantities = portfolio_array["Quantity"]

# print the features
print("Prices:", prices)
print("\nQuantities:", quantities)

# calculate total values
total_values = prices * quantities

# calculate the total value of the portfolio
portfolio_total_value = total_values.sum()
print("\n\nTotal Portfolio Value: \n", portfolio_total_value.round(2))

# calculate the percent allocations of each position according to the portfolio total value
percent_allocation = (total_values / portfolio_total_value) * 100
print("\nPercent Allocations: \n", percent_allocation)

# calculate the average percent allocation of each position.
average_percent_allocation = percent_allocation.mean()
print("\nAverage Percent Allocation: \n", average_percent_allocation)

Prices: [130.36 245.36  99.16  54.05 445.38 145.13 184.9  222.64]

Quantities: [113.8952164   60.63791082  50.81817258  92.47272055  22.47245111
 136.13777142  54.18585749  89.55939242]


Total Portfolio Value: 
 99487.7

Percent Allocations: 
 [14.92383492 14.95473059  5.06507829  5.02388782 10.06031909 19.85941415
 10.07055631 20.04217882]

Average Percent Allocation: 
 12.5


The above example showed the computational and structural ease which NumPy excels in.