## QTM 350: Data Science Computing

### Assignment 06 - AI-Assisted Programming

### Due 06 June 2025

### Instructions

In this assignment, you will use AI tools to help you generate, refactor, and explain code. Ideally, use GitHub Copilot, but if you cannot install the software, feel free to use other available tools (free or otherwise). Please name the tools you use in your assignment.

The main idea is to use natural language as much as possible, whilst remaining attentive to any mistakes the AI tool may produce. Your code should run without issues and provide the correct results. Please feel free to include tests and screenshots.

As always, should you have any questions, please let me know.

### Tasks

1. Use the `/explain` command in Copilot to get an explanation of the following code snippet:

```python
def fibonacci(n):
    if n <= 0:
        return []
    elif n == 1:
        return [0]
    elif n == 2:
        return [0, 1]
    else:
        fib = [0, 1]
        for i in range(2, n):
            fib.append(fib[i-1] + fib[i-2])
        return fib
```

2. Create a new Git repository using GitHub CLI and Copilot suggestions. The suggestions should include code to initialise the repository, add a `README.md` and a `.gitignore` file for Python projects, then add, commit, and push the changes to the repository. Please include the link to your repository below.

3. Use Copilot to refactor the following code to improve its efficiency and readability:

```python
def is_prime(num):
    if num <= 1:
        return False
    for i in range(2, num):
        if num % i == 0:
            return False
    return True

primes = []
for i in range(1, 101):
    if is_prime(i):
        primes.append(i)

print(primes)
```

4. Use the `@terminal` command to generate CLI commands that create a new directory named `data_analysis`, navigate into the directory, create a new Python file called `analysis.py`, and add a shebang line (`#!/usr/bin/env python`) at the top of the file.

5. Write a Python function that calculates the factorial of a number. Deliberately introduce an error in the function, then use Copilot's `/fix` command to identify and correct the issue.

6. Use Copilot to generate documentation comments for the following `R` factorial function:

```r
calculate_factorial <- function(n) {
  if (n == 0) {
    return(1)
  } else {
    return(n * calculate_factorial(n - 1))
  }
}
```

7. Use the `/explain` command to get an explanation for the following Python code snippet:

```python
def normalize_data(data):
    min_val = min(data)
    max_val = max(data)
    return [(x - min_val) / (max_val - min_val) for x in data]
```

8. Ask Copilot to translate the following `R` code to `Python`:

```r
library(tidyverse)

# Create a sample dataframe with missing values
df <- tibble(
  A = c(1, 2, NA, 4, 5),
  B = c(NA, 2, 3, 4, 5),
  C = c(1, 2, 3, NA, 5),
  D = c(1, 2, 3, 4, 5)
)

# Define the function
count_missing_values <- function(df) {
  sapply(df, function(col) sum(is.na(col)))
}

# Test the function
result <- count_missing_values(df)
print(result)
```

9. Write a function in both `R` and `Python` that takes a string and returns the number of vowels in the string.

10. Ask Copilot to translate the following binary code into text:

```verbatim
01011001 01101111 01110101 00100111 01110110 01100101 00100000 01100011 01101111 01101101 01110000 01101100 01100101 01110100 01100101 01100100 00100000 01110100 01101000 01100101 00100000 01100001 01110011 01110011 01101001 01100111 01101110 01101101 01100101 01101110 01110100 00101110 00100000 01000011 01101111 01101110 01100111 01110010 01100001 01110100 01110101 01101100 01100001 01110100 01101001 01101111 01101110 01110011 00100001
```



1. The code defines a function to generate the Fibonacci sequence up to the nth number.
    ```python
    def fibonacci(n):
        if n <= 0:
            return []
        elif n == 1:
            return [0]
        elif n == 2:
            return [0, 1]
        else:
            fib = [0, 1]
            for i in range(2, n):
                fib.append(fib[i-1] + fib[i-2])
            return fib
    ```
    Explanation:

    If `n` is 0 or negative, it returns an empty list (no Fibonacci numbers).
    If `n` is 1, it returns a list with just `[0]`.
    If `n` is 2, it returns `[0, 1]`.
    For `n` greater than 2, it initializes a list `fib` with `[0, 1]` and then uses a loop to calculate each next Fibonacci number by summing the two previous numbers (`fib[i-1] + fib[i-2]`). Each new number is appended to the list.
    The function returns the list of the first `n` Fibonacci numbers.
    Gotcha:
    The function returns a list of length `n`, starting from 0. If you pass `n=0`, you get an empty list, not `[0]`.

    Similar code found with 1 license type - View matches

2. Here are step-by-step suggestions and commands to create a new Git repository for a Python project:

    Initialize a new Git repository:
    ```bash
    git init
    ```

    Create a README.md file:
    ```bash
    echo "# Project Title" > README.md
    ```

    Create a .gitignore file for Python projects:
    ```bash 
    echo -e "__pycache__/\n*.pyc\n.env/\n.venv/\n*.pyo\n*.pyd\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\n*.egg-info/\n.installed.cfg\n*.egg\n*.log\n" > .gitignore
    ```

    Add all files to the repository:
    ```bash
    git add .
    ```

    Commit the changes:
    ```bash
    git commit -m "Initial commit with README and .gitignore"
    ```

    (Optional) Create a new repository on GitHub using GitHub CLI:
    ```bash
    gh repo create <repo-name> --public --source=. --remote=origin --push
    ```
    
    Replace <repo-name> with your desired repository name.

    https://github.com/Yu-Carl/06-assignment-repo

3. 
    ```python
    import math

    def is_prime(num):
        """Check if a number is prime."""
        if num <= 1:
            return False
        if num == 2:
            return True
        if num % 2 == 0:
            return False
        for i in range(3, int(math.isqrt(num)) + 1, 2):
            if num % i == 0:
                return False
        return True

    primes = [i for i in range(2, 101) if is_prime(i)]

    print(primes)
    ```
    Improvements:

    - Only checks up to the square root of `num` for factors, which is more efficient.
    - Skips even numbers after checking for 2.
    - Uses a list comprehension for generating the list of primes, improving readability.

4. 
    ```bash
    mkdir data_analysis && cd data_analysis && echo '#!/usr/bin/env python' > analysis.py
    ```

5. 
    ```python
    def factorial(n):
    f = 1
    for i in range(1, n+1):
        f = f ** i
    return f
    ```

    Pseudocode plan:

    - Define a function factorial that takes an integer n.
    - Initialize a variable f to 1.
    - Loop from 1 to n (inclusive).
    - Multiply f by i in each iteration (should use f = f * i, not f = f ** i).
    - Return f.

    Corrected code:
    ```python
    def factorial(n):
    f = 1
    for i in range(1, n+1):
        f = f * i  # Fixed: use multiplication, not exponentiation
    return f

6.
    ```r
    #' Calculate the factorial of a number recursively
    #'
    #' This function computes the factorial of a non-negative integer n using recursion.
    #'
    #' @param n A non-negative integer whose factorial is to be computed.
    #' @return The factorial of n.
    #' @examples
    #' calculate_factorial(5) # returns 120
    #' calculate_factorial(0) # returns 1
    calculate_factorial <- function(n) {
    if (n == 0) {
        return(1)
    } else {
        return(n * calculate_factorial(n - 1))
    }
    }
    ```

7. This function normalizes a list of numbers so that all values are scaled between 0 and 1.

    - `min_val = min(data)` finds the smallest value in the list.
    - `max_val = max(data)` finds the largest value in the list.
    - The list comprehension `[(x - min_val) / (max_val - min_val) for x in data]` subtracts the minimum from each value and divides by the range (`max_val - min_val`), scaling all values to the [0, 1] interval.
    Gotcha:
    If all values in `data` are the same, `max_val - min_val` will be 0, causing a division by zero error.

8. 
    ```python
    import pandas as pd
    import numpy as np

    # Create a sample DataFrame with missing values
    df = pd.DataFrame({
        'A': [1, 2, np.nan, 4, 5],
        'B': [np.nan, 2, 3, 4, 5],
        'C': [1, 2, 3, np.nan, 5],
        'D': [1, 2, 3, 4, 5]
    })

    # Define the function
    def count_missing_values(df):
        return df.isna().sum()

    # Test the function
    result = count_missing_values(df)
    print(result)
    ```
9. 
    ```python
    def vowels(word):
        count = 0
        for i in range(0, len(word)):
            if word[i] in ["a", "e", "i", "o", "u", "A", "E", "I", "O", "U"]:
                count = count + 1
        return count
    ```

    ```r
    vowels <- function(word) {
        count <- 0
        for (i in (1:nchar(word))) {
            if (substring(word, i, i) %in% c("a", "e", "i", "o", "u", "A", "E", "I", "O", "U")) {
                count <- count + 1
            }
        }
        return(count)
    }
    ```
10. You've completed the assignment. Congratulations!