Please use Markdown cells in your submission to document your thought process. You are expected to follow the clean code and PEP 8 guidelines as much as you can. You should use docstrings for all function declarations.
In this assignment, you will learn several functions from numpy. Please check their functionalities using their documentations or help() function, and see if you can apply them to solve the homework problems.

# Cosin Similarity
Cosine similarity measures the similarity between two high dimensional vectors. It is widely-used in applications such as clustering tasks in machine learning, building recommendation systems for e-commerce companies. See more background on cosine similarity in [Wikipedia](https://en.wikipedia.org/wiki/Cosine_similarity). 

Write a function named "cosine_similarity". The function takes two 1D numpy arrays as inputs, and returns their cosine similarity. 

**Note**: You are allowed to use np.dot() for inner product, np.sum() for summation, np.sqrt() for calculating square root. Do not use other built-in functions from Numpy such as np.linalg.norm(). 


In [None]:
import numpy as np

def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
   
    # Compute dot product
    dot_product = np.dot(a, b)
    
    # Compute magnitudes (norms)
    norm_a = np.sqrt(np.sum(a * a))
    norm_b = np.sqrt(np.sum(b * b))
    
    # Avoid division by zero
    if norm_a == 0 or norm_b == 0:
        return 0.0
    
    # Compute cosine similarity
    return dot_product / (norm_a * norm_b)

#Test case
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(cosine_similarity(a, b))  # Output: 0.9746318461970762


0.9746318461970762


# Stock Price Analysis--Part I
Write a function named "count_max_streak". The function takes the historical stock price data, represented as a 1D numpy array, as input, and returns the maximum number of consecutive days when the stock price increased. 

To test your function, please use `tesla_closing_price` as the 1D numpy array input to your function. The historical stock price of Tesla is provided for you to test your code.

```py
import pandas as pd
tesla = pd.read_csv('TSLA.csv') # load Tesla stock price from csv file
tesla_np = tesla.to_numpy() # convert the data to numpy arrays
tesla_closing_price = tesla_np[:, 4] # extract the closing stock price of Tesla
```

In [11]:
import numpy as np

def count_max_streak(prices: np.ndarray) -> int:
    
    if prices.size == 0:
        return 0

    max_streak = 0
    current_streak = 0

    # start from index 1 so we can compare with previous day
    for i in range(1, len(prices)):
        if prices[i] > prices[i - 1]:   # price increased
            current_streak += 1
            if current_streak > max_streak:
                max_streak = current_streak
        else:
            current_streak = 0  # reset when it doesn't increase

    return max_streak

#Test case
import pandas as pd
tesla = pd.read_csv('TSLA.csv')
tesla_np = tesla.to_numpy()
tesla_closing_price = tesla_np[:, 4]



# Stock Price Analysis--Part 2
Write a function named "detect_crash". The function takes a 2D numpy array as input, where the first column represents the historical opening stock price, and the second column represents the historical closing stock price. The function should return a 2D numpy array whose first column contains the indices when crashes occurred, and whose second column contains amount of price drop. 

**Note**: We say there is a stock price crash if the closing price is less than the opening price.

**Hint**: You may find functions np.where() and np.column_stack() helpful. You are also welcome to come up with other solutions without using these functions. 

To test your function, please use `open_close_prices` as the 2D numpy array input to your function. You can download the historical stock price of Tesla here.

```py
import pandas as pd
tesla = pd.read_csv('TSLA.csv') # load Tesla stock price from csv file
tesla_np = tesla.to_numpy() # convert the data to numpy arrays
tesla_opening_price = tesla_np[:, 1] # extract the closing stock price of Tesla
tesla_closing_price = tesla_np[:, 4] # extract the closing stock price of Tesla
open_close_price = np.column_stack((tesla_opening_price, tesla_closing_price)) # form a 2D numpy array using opening and closing prices 
```

In [10]:
import numpy as np

def detect_crash(open_close_prices: np.ndarray) -> np.ndarray:
   
    # Extract opening and closing prices
    opening = open_close_prices[:, 0]
    closing = open_close_prices[:, 1]
    
    # Find where crash occurs (closing < opening)
    crash_indices = np.where(closing < opening)[0]
    
    # Calculate drop amounts
    drop_amounts = opening[crash_indices] - closing[crash_indices]
    
    # Combine indices and drop amounts into a 2D array
    result = np.column_stack((crash_indices, drop_amounts))
    
    return result

#Test case
import pandas as pd
tesla = pd.read_csv('TSLA.csv') # load Tesla stock price from csv file
tesla_np = tesla.to_numpy() # convert the data to numpy arrays
tesla_opening_price = tesla_np[:, 1] # extract the closing stock price of Tesla
tesla_closing_price = tesla_np[:, 4] # extract the closing stock price of Tesla
open_close_price = np.column_stack((tesla_opening_price, tesla_closing_price)) # form a 2D numpy array using opening and closing prices 


# Infectious Disease Simulation

Write a function named "simulate_disease" to simulate how infectious disease may propagate over a network. Given the infection probabilities  of all individuals at time  and a network connection , the infection probabilities at time step  is computed as . Here are the requirements of this function:

The function uses a 1D numpy array to denote the probabilities of all individuals within the network of being infected. 
The network connection among individuals is represented using a 2D symmetric, row-stochastic numpy array, with all entries being non-negative (A row-stochastic matrix is one whose elements adds up to 1 for each row). If there are N individuals in the network, the matrix is of dimension . The -th entry of the matrix represents the connection strength between individual  and . 
Given the initial probability, the network connections, and prediction time horizon, the function returns the probabilities of each individual in the network of getting disease at the end of prediction time horizon.
You can use the following code snippet to generate a random connection matrix of dimension  to verify your function:
```py
np.random.seed(20)
matrix = np.random.rand(N, N)
symmetric_matrix = (matrix + matrix.T) / 2
connection_matrix = symmetric_matrix / symmetric_matrix.sum(axis=1, keepdims=True)
```
You can use the following code snippet to generate an initial infection probability: `np.random.rand(N, 1)`

In [1]:
import numpy as np

def simulate_disease(initial_prob: np.ndarray,
                     connection_matrix: np.ndarray,
                     steps: int) -> np.ndarray:
    
    # make sure it's 1D
    prob = np.asarray(initial_prob).reshape(-1)

    for _ in range(steps):
        # p_{t+1} = A @ p_t
        prob = connection_matrix @ prob

    return prob

#Test case

np.random.seed(20)
N = 5  # number of individuals

# build symmetric, row-stochastic connection matrix
matrix = np.random.rand(N, N)
symmetric_matrix = (matrix + matrix.T) / 2
connection_matrix = symmetric_matrix / symmetric_matrix.sum(axis=1, keepdims=True)

# initial infection probability (make it 1D)
initial_prob = np.random.rand(N)

final_prob = simulate_disease(initial_prob, connection_matrix, steps=10)
print(final_prob)



[0.60340112 0.60340112 0.60340112 0.60340112 0.60340112]


# Music Composition

Write a function named "compose_music" that takes a music sheet in A major scale as input and plays a piece of music based on the music sheet. The music sheet specifies a sequence of notes and their durations (see below for an example). You can use the in-class example "generate_sine" function to generate music notes with `fs = 8000`.

You can test your code using the example music_sheet below (simplified from "A Better Tomorrow (Mark's theme)" by Joseph Koo).
```py
music_sheet = [("Note_Cs", 0.5), ("Note_D", 0.5), ("Note_B", 0.5), ("Note_A_high", 1.5), ("Note_A_high", 0.5), ("Note_Gs", 0.5), ("Note_Fs", 0.5), ("Note_Gs", 0.5), ("Note_E", 0.75)]   
```
**Extra challenge**: Can you revise the function to play chords based on a given music sheet? A chord refers to the combination of two music nodes. (Extra challenge is not graded).

In [17]:
import numpy as np

def generate_sine(freq: float, duration: float, fs: int = 8000) -> np.ndarray:
    """Generate a sine wave of given frequency and duration."""
    t = np.linspace(0, duration, int(fs * duration), endpoint=False)
    return np.sin(2 * np.pi * freq * t)

def compose_music(music_sheet, fs: int = 8000) -> np.ndarray:
    """
    Compose music from a music sheet (list of (note_name, duration) tuples)
    in A major scale and return the audio signal as a 1D numpy array.
    """
    # A major scale (starting from A4 = 440 Hz)
    note_freq = {
        "Note_A": 440.0,
        "Note_B": 493.88,
        "Note_Cs": 554.37,   # C#5
        "Note_D": 587.33,
        "Note_E": 659.25,
        "Note_Fs": 739.99,   # F#5
        "Note_Gs": 830.61,   # G#5
        "Note_A_high": 880.0
    }

    audio_pieces = []

    for note_name, dur in music_sheet:
        if note_name not in note_freq:
            # treat unknown note as rest (silence)
            tone = np.zeros(int(fs * dur))
        else:
            freq = note_freq[note_name]
            tone = generate_sine(freq, dur, fs)
        audio_pieces.append(tone)

    # concatenate all notes into one long array
    music = np.concatenate(audio_pieces)

    # optional: avoid clipping by scaling down a bit
    music = music * 0.6

    return music

#Test case
music_sheet = [
    ("Note_Cs", 0.5),
    ("Note_D", 0.5),
    ("Note_B", 0.5),
    ("Note_A_high", 1.5),
    ("Note_A_high", 0.5),
    ("Note_Gs", 0.5),
    ("Note_Fs", 0.5),
    ("Note_Gs", 0.5),
    ("Note_E", 0.75),
]

audio = compose_music(music_sheet, fs=8000)


# Rock Paper Scissor
Write a function named `play_rock_paper_scissor` so that a user can interact with the computer to play Rock Paper Scissor game. Here are the expected functionalities:
- For each round of the game, the function should ask the user whether the user wants to start playing by inputting 0 or 1. User input 0 indicates no, and 1 indicates yes.
- If the user wants to start the game, prompt the user to input a choice of action among Rock, Paper, Scissor. For now, you can safely assume the user's input is always among these three options. Randomly generate an action among Rock, Paper, Scissor for the computer. Please refer to our in-class practice problem for an example on random number generation.
- Compare the user input with the computer action, and print the winner based on the following rules:
    - Rock beats Scissors
    - Scissors beats Paper
    - Paper beats Rock
- Prompt the user whether the game should continue or not as we did in the first step.

In [21]:
import random

def play_rock_paper_scissor():

    options = ["Rock", "Paper", "Scissor"]

    while True:
        start = int(input("Do you want to play Rock-Paper-Scissor? (1 = Yes, 0 = No): "))
        if start == 0:
            print("Game over. Thanks for playing!")
            break

        # User's move
        user_choice = input("Enter your choice (Rock / Paper / Scissor): ").capitalize()

        # Computer's move
        computer_choice = random.choice(options)
        print(f"Computer chose: {computer_choice}")

        # Determine the winner
        if user_choice == computer_choice:
            print("It's a tie!")
        elif (
            (user_choice == "Rock" and computer_choice == "Scissor") or
            (user_choice == "Scissor" and computer_choice == "Paper") or
            (user_choice == "Paper" and computer_choice == "Rock")
        ):
            print("You win!")
        else:
            print("Computer wins!")

        print()  # add a blank line for readability


# Linear Regression
- You are given a straight line learned from linear regression, denoted as $\hat{y}=ax+b$. Your task is to use a Python function named `eval_predict` to assess prediction quality over a dataset. Please first define the metric being used for assessment. Then let your program calculate and return the defined metric value.
- To learn a high-quality straight line in the form of $\hat{y}=ax+b$, we need to find optimal values for $a$
 and $b$. In what follows, you will perform grid search. You’ll try many pairs on a rectangular grid and pick the one with the best metric (defined in Problem 7) on the dataset. For example, consider a ∈ {0.0, 0.5, 1.0} and b ∈ {-1.0, 0.0, 1.0}, you evaluate all 9 combinations, compute the metric for each, and pick the best. Write a Python function named `grid_search` to find best pairs for given options of $a$ and $b$.

In [None]:
import numpy as np

def eval_predict(y_true: np.ndarray, y_pred: np.ndarray) -> float:

    mse = np.mean((y_true - y_pred) ** 2)
    return mse

def grid_search(x: np.ndarray, y: np.ndarray,
                a_values: list, b_values: list) -> tuple:
    
    best_mse = float('inf')
    best_a, best_b = None, None

    for a in a_values:
        for b in b_values:
            y_pred = a * x + b
            mse = eval_predict(y, y_pred)
            if mse < best_mse:
                best_mse = mse
                best_a, best_b = a, b

    return best_a, best_b, best_mse

# Sample dataset
x = np.array([0, 1, 2, 3, 4, 5])
y = np.array([1, 3, 5, 7, 9, 11])  # true line: y = 2x + 1

# Candidate parameter grids
a_values = np.arange(0, 3.0, 0.5)
b_values = np.arange(-1, 3.0, 0.5)

best_a, best_b, best_mse = grid_search(x, y, a_values, b_values)
print(f"Best a = {best_a}, Best b = {best_b}, Best MSE = {best_mse:.4f}")


Best a = 2.0, Best b = 1.0, Best MSE = 0.0000


# Sign-Up Validator
Imagine you are building a sign-up page for a new app. To keep the user database clean, you must validate all inputs before creating an account.
Instead of relying on advanced libraries, you’ll use pure Python basics.

### Username Validation
Write a function `validate_username(username)` that:
- Returns False if the username is empty, shorter than 3, or longer than 20 characters.
- Returns False if it contains anything besides letters, digits, underscore _, dot ., or hyphen -.
- Returns True if valid.

### Email Validation (Simple Version)
Write a function `validate_email(email)` that:
- Returns False if it doesn’t contain an "@".
- Splits the email into local part and domain (use .partition("@")).
- Ensures the domain contains at least one "." and doesn’t start/end with ".".
- Returns True if valid, False otherwise.

### Phone Normalization (Simplified)
Write a function `normalize_phone(phone, default_cc="+1")` that:
- Removes all non-digit characters.
- If the number doesn’t start with "+", prepend the default country code.
- Check that the final string has between 9 and 15 digits (excluding +).
- Return the normalized phone number (e.g., "+14155551212"), or "Invalid" if not valid.

### Sign-Up Aggregator
Write a function `validate_signup(user_info)` where `user_info` is a dictionary, e.g.:
```py
{
    "username": "alice_01",
    "email": "Alice@example.com",
    "password": "Strong!Pass1",
    "phone": "(415) 555-1212",
    "country": "us"
}
```
It should:
- Call each validation function.
- Collect any errors in a dictionary.
- Return a result like:
```py
{
    "ok": True,
    "errors": {},
    "normalized": {
        "username": "alice_01",
        "email": "alice@example.com",
        "phone": "+14155551212",
        "country": "US"
    }
}
```
If there are errors, `ok` should be False and errors should explain them.


In [23]:
def validate_username(username: str) -> bool:
    if not username:
        return False
    if len(username) < 3 or len(username) > 20:
        return False
    allowed_extra = "._-"
    for ch in username:
        if not (ch.isalnum() or ch in allowed_extra):
            return False
    return True

def validate_email(email: str) -> bool:
    if "@" not in email:
        return False
    local, _, domain = email.partition("@")
    if not local or not domain:
        return False
    if "." not in domain:
        return False
    if domain.startswith(".") or domain.endswith("."):
        return False
    return True

def normalize_phone(phone: str, default_cc: str = "+1") -> str:
    original = phone.strip()
    # keep only digits
    digits_only = "".join(ch for ch in original if ch.isdigit())

    # detect if user originally put a +
    has_plus = original.lstrip().startswith("+")

    if has_plus:
        normalized = "+" + digits_only
    else:
        normalized = default_cc + digits_only

    # check length of the digits part
    if len(digits_only) < 9 or len(digits_only) > 15:
        return "Invalid"

    return normalized

def validate_signup(user_info: dict) -> dict:
    errors = {}
    normalized = {}

    # username
    username = user_info.get("username", "")
    if not validate_username(username):
        errors["username"] = "Invalid username"
    else:
        normalized["username"] = username

    # email
    email = user_info.get("email", "")
    if not validate_email(email):
        errors["email"] = "Invalid email"
    else:
        # normalize to lowercase
        normalized["email"] = email.lower()

    # phone
    phone = user_info.get("phone", "")
    if phone:
        norm_phone = normalize_phone(phone)
        if norm_phone == "Invalid":
            errors["phone"] = "Invalid phone number"
        else:
            normalized["phone"] = norm_phone
    else:
        errors["phone"] = "Phone required"

    # country (just normalize to upper if present)
    country = user_info.get("country", "")
    if country:
        normalized["country"] = country.upper()

    # password – not validated here per prompt, just pass through
    if "password" in user_info:
        normalized["password"] = user_info["password"]

    ok = len(errors) == 0
    return {
        "ok": ok,
        "errors": errors,
        "normalized": normalized
    }
# Test case
user = {
    "username": "alice_01",
    "email": "Alice@example.com",
    "password": "Strong!Pass1",
    "phone": "(415) 555-1212",
    "country": "us"
}

result = validate_signup(user)
print(result)


{'ok': True, 'errors': {}, 'normalized': {'username': 'alice_01', 'email': 'alice@example.com', 'phone': '+14155551212', 'country': 'US', 'password': 'Strong!Pass1'}}
