## Vectorization in NumPy
Vectorizaion is operations performed on arrays as a whole instead of using a loop. It is much more efficent and faster in processing operations such as adding, subracting, multiplying.

In [None]:
import numpy as np

arr = np.random.rand(100)

# Pure python loop
def variance_loop(a):
    mean = sum(a)/len(a)
    return sum((x - mean) ** 2 for x in a) / len(a)

# Vectorization using numpy
def variance_vec(a):
    return np.var(a)

# Process time of loop
%timeit variance_loop(arr)

# Process time of vectorization
%timeit variance_vec(arr)

16.7 µs ± 168 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
5.77 µs ± 20.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [None]:
arr2 = np.random.rand(10)

def multiply_loop(a, b):
    result = a.copy()  # Create a copy of the input array
    for i in range(len(result)):
        result[i] = result[i] * b  # Assign the result to the new array
    return result

def multiply_vec(a, b):
    return a*b

print(multiply_loop(arr2, 2))
print(multiply_vec(arr2, 2))

[1.1194913376868507, 1.3013562400445657, 0.43488577541958384, 1.7639786040758947, 0.44351264272732327, 0.6325763935536668, 1.4532604112892815, 0.6106031644090735, 1.8715027403796978, 1.4567534521372005]
[1.11949134 1.30135624 0.43488578 1.7639786  0.44351264 0.63257639
 1.45326041 0.61060316 1.87150274 1.45675345]


## Pandas Refreshers
Review basic pandas functions such as groupby, agg, transform, apply, etc.

In [None]:
# 1. Given a DataFrame df_orders(customer_id, order_date, revenue),
# add a column order_rank that numbers each customer’s orders by recency (1 = newest).
import pandas as pd

df_orders = pd.DataFrame({
    'customer_id': [1, 2, 1, 3, 2],
    'order_date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']),
    'revenue': [100, 200, 150, 300, 250]
})

df_orders['order_rank'] = df_orders.groupby('customer_id')['order_date'].rank(method='first', ascending=False)
df_orders

# This solution achieves an efficient way of ranking each customer's order
# based on their most recent purchase using groupby and rank functions.

In [71]:
# 2. For df_prices(ticker, date, price), 
# compute a 30-day rolling z-score within each ticker.
import yfinance as yf
import pandas as pd

# Sample df of prices
tickers = ['TSLA', 'NVDA', 'AAPL', '^SPX']
start_date = '2023-01-01'  # Using past dates for actual data
end_date = '2023-05-01'

# Start with empty df then use yfinance to import data
df_prices = pd.DataFrame()

for ticker in tickers:
    # Download data
    data = yf.download(ticker, start=start_date, end=end_date)[['Close']].reset_index(drop=False)
    # Rename Close to Price
    data.rename(columns={'Close': 'Price'}, inplace=True)
    # Add ticker column
    data['Ticker'] = ticker
    # Concatenate to main DataFrame
    df_prices = pd.concat([df_prices, data], ignore_index=True)

# Sort the data
df_prices = df_prices.sort_values(['Ticker', 'Date'])

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


In [69]:
# Tasks
# Compute delinquency_rate = (# delinquent / # loans) per industry.
# Present the result sorted descending by rate.
# In one sentence, describe a business implication JP Morgan might draw.

data = {
    'loan_id': range(1, 11),
    'industry': ['Manufacturing','Retail','Healthcare','Manufacturing','Retail',
                 'Tech','Tech','Healthcare','Retail','Manufacturing'],
    'amount': [200,150,100,250,180,300,220,90,130,275],
    'status': ['current','delinquent','current','current','delinquent',
               'current','delinquent','current','delinquent','current']
}
df_loans = pd.DataFrame(data)

df_loans['delinquency_rate'] = df_loans.groupby('industry')['status'].transform(lambda x: (x == 'delinquent').sum() / len(x))
df_loans = df_loans[['industry', 'delinquency_rate']].drop_duplicates().sort_values(by='delinquency_rate', ascending=False).reset_index(drop=True)
df_loans 

Unnamed: 0,industry,delinquency_rate
0,Retail,1.0
1,Tech,0.5
2,Manufacturing,0.0
3,Healthcare,0.0


This shows that JPMC might need to take more caution in the retail sector given that they have a high delinquency rate.