## QTM 350: Data Science Computing

### Assignment 06 - AI-Assisted Programming

### Due 03 March 2025

### Instructions

In this assignment, you will use AI tools to help you generate, refactor, and explain code. Ideally, use GitHub Copilot, but if you cannot install the software, feel free to use other available tools (free or otherwise). Please name the tools you use in your assignment.

The main idea is to use natural language as much as possible, whilst remaining attentive to any mistakes the AI tool may produce. Your code should run without issues and provide the correct results. Please feel free to include tests and screenshots.

As always, should you have any questions, please let me know.

### Tasks

1. Use the `/explain` command in Copilot to get an explanation of the following code snippet:

```python
def fibonacci(n):
    if n <= 0:
        return []
    elif n == 1:
        return [0]
    elif n == 2:
        return [0, 1]
    else:
        fib = [0, 1]
        for i in range(2, n):
            fib.append(fib[i-1] + fib[i-2])
        return fib
```

Here's a step-by-step explanation of how the code works:

The function is defined with the name fibonacci and takes a single parameter n, which represents the number of Fibonacci numbers to generate.

The first if statement checks if n is less than or equal to 0. If it is, an empty list [] is returned. This is done to handle the case when n is 0 or a negative number.

The second elif statement checks if n is equal to 1. If it is, a list containing only the number 0 is returned. This is done to handle the case when n is 1.

The third elif statement checks if n is equal to 2. If it is, a list containing the numbers 0 and 1 is returned. This is done to handle the case when n is 2.

If none of the above conditions are met, the else block is executed. This block initializes a list fib with the first two Fibonacci numbers, 0 and 1.

The for loop iterates from 2 to n-1 (exclusive) using the range function. This loop calculates the remaining Fibonacci numbers by adding the last two numbers in the fib list and appending the sum to the list.

Finally, the function returns the fib list, which contains the Fibonacci sequence up to the given n.

To use this function, you can call it with a specific value for n and it will return the corresponding Fibonacci sequence as a list. For example, fibonacci(5) will return [0, 1, 1, 2, 3], which is the Fibonacci sequence up to the 5th number.




3. Use Copilot to refactor the following code to improve its efficiency and readability:

```python
def is_prime(num):
  if num <= 1:
    return False
  for i in range(2, int(num**0.5) + 1):
    if num % i == 0:
      return False
  return True

primes = [i for i in range(2, 101) if is_prime(i)]

print(primes)
```

4. Use the `@terminal` command to generate CLI commands that create a new directory named `data_analysis`, navigate into the directory, create a new Python file called `analysis.py`, and add a shebang line (`#!/usr/bin/env python`) at the top of the file.

%%bash



5. Write a Python function that calculates the factorial of a number. Deliberately introduce an error in the function, then use Copilot's `/fix` command to identify and correct the issue.

6. Use Copilot to generate documentation comments for the following `R` factorial function:

```r
calculate_factorial <- function(n) {
  if (n == 0) {
    return(1)
  } else {
    return(n * calculate_factorial(n - 1))
  }
}
```

7. Use the `/explain` command to get an explanation for the following Python code snippet:

```python
def normalize_data(data):
    min_val = min(data)
    max_val = max(data)
    return [(x - min_val) / (max_val - min_val) for x in data]
```

8. Ask Copilot to translate the following `R` code to `Python`:

```r
library(tidyverse)

# Create a sample dataframe with missing values
df <- tibble(
  A = c(1, 2, NA, 4, 5),
  B = c(NA, 2, 3, 4, 5),
  C = c(1, 2, 3, NA, 5),
  D = c(1, 2, 3, 4, 5)
)

# Define the function
count_missing_values <- function(df) {
  sapply(df, function(col) sum(is.na(col)))
}

# Test the function
result <- count_missing_values(df)
print(result)
```

9. Write a function in both `R` and `Python` that takes a string and returns the number of vowels in the string.

10. Ask Copilot to translate the following binary code into text:

```verbatim
01011001 01101111 01110101 00100111 01110110 01100101 00100000 01100011 01101111 01101101 01110000 01101100 01100101 01110100 01100101 01100100 00100000 01110100 01101000 01100101 00100000 01100001 01110011 01110011 01101001 01100111 01101110 01101101 01100101 01101110 01110100 00101110 00100000 01000011 01101111 01101110 01100111 01110010 01100001 01110100 01110101 01101100 01100001 01110100 01101001 01101111 01101110 01110011 00100001
```



2. Create a new Git repository using GitHub CLI and Copilot suggestions. The suggestions should include code to initialise the repository, add a `README.md` and a `.gitignore` file for Python projects, then add, commit, and push the changes to the repository. Please include the link to your repository below.

# Initialize the repository
!gh repo create my-repo

# Change directory to the repository
%cd my-repo

# Create a README.md file
!echo "06_assignment_repo" > README.md

# Create a .gitignore file for Python projects
!echo "06_assignment_rep" > .gitignore

# Add all files to the repository
!git add .

# Commit the changes
!git commit -m "Initial commit"

# Push the changes to the repository
!git push origin main

/Users/danielnickas/QTM350/qtm350/assignments


zsh:1: command not found: gh


3. Use Copilot to refactor the following code to improve its efficiency and readability:

```python
def is_prime(num):
  if num <= 1:
    return False
  for i in range(2, int(num**0.5) + 1):
    if num % i == 0:
      return False
  return True

primes = [i for i in range(2, 101) if is_prime(i)]

print(primes)
```



4. Use the `@terminal` command to generate CLI commands that create a new directory named `data_analysis`, navigate into the directory, create a new Python file called `analysis.py`, and add a shebang line (`#!/usr/bin/env python`) at the top of the file.


In [33]:
# Create a new directory named 'data_analysis'
!mkdir data_analysis

# Navigate into the directory
!cd data_analysis

# Create a new Python file called 'analysis.py'
!touch analysis.py

# Add the shebang line at the top of the file
!echo '#!/usr/bin/env python' > analysis.py


mkdir: data_analysis: File exists


5. Write a Python function that calculates the factorial of a number. Deliberately introduce an error in the function, then use Copilot's `/fix` command to identify and correct the issue.

In [35]:
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)
        

In [37]:
# Fixed version
def factorial(n):
    if n <= 0:
        return "Error: Factorial is not defined for non-positive numbers."
    elif n == 0:
        return 1
    else:
        return n * factorial(n-1)

'/fix'

6. Use Copilot to generate documentation comments for the following `R` factorial function:

```r
calculate_factorial <- function(n) {
  if (n == 0) {
    return(1)
  } else {
    return(n * calculate_factorial(n - 1))
  }
}
"""
Calculate the factorial of a given number.

Parameters:
    n (int): The number to calculate the factorial for.

Returns:
    int: The factorial of the given number.
"""

7. Use the `/explain` command to get an explanation for the following Python code snippet:

```python
def normalize_data(data):
    min_val = min(data)
    max_val = max(data)
    return [(x - min_val) / (max_val - min_val) for x in data]
```

The min function is used to find the minimum value in the data list, and it is assigned to the variable min_val.

The max function is used to find the maximum value in the data list, and it is assigned to the variable max_val.

A list comprehension is used to iterate over each element x in the data list.

For each element x, the expression (x - min_val) / (max_val - min_val) is evaluated. 

This expression calculates the normalized value of x by subtracting the minimum value from x and dividing it by the difference between the maximum and minimum values.

The resulting normalized value is added to a new list.

Finally, the new list containing the normalized values is returned as the result of the function.

This code snippet demonstrates a common technique for normalizing data, which is often used in machine learning and data analysis tasks. By scaling the data between 0 and 1, it ensures that all values are on the same scale, making it easier to compare and analyze the data.

8. Ask Copilot to translate the following `R` code to `Python`:

```r
library(tidyverse)

# Create a sample dataframe with missing values
df <- tibble(
  A = c(1, 2, NA, 4, 5),
  B = c(NA, 2, 3, 4, 5),
  C = c(1, 2, 3, NA, 5),
  D = c(1, 2, 3, 4, 5)
)

# Define the function
count_missing_values <- function(df) {
  sapply(df, function(col) sum(is.na(col)))
}

# Test the function
result <- count_missing_values(df)
print(result)
```


In [40]:
import pandas as pd
# Create a sample dataframe with missing values
df = pd.DataFrame({
    'A': [1, 2, None, 4, 5],
    'B': [None, 2, 3, 4, 5],
    'C': [1, 2, 3, None, 5],
    'D': [1, 2, 3, 4, 5]
})

# Define the function
def count_missing_values(df):
    return df.isna().sum()

# Test the function
result = count_missing_values(df)
print(result)

A    1
B    1
C    1
D    0
dtype: int64


9. Write a function in both `R` and `Python` that takes a string and returns the number of vowels in the string.
Outputs below:

In [None]:

count_vowels <- function(string) {
    vowels <- c("a", "e", "i", "o", "u")
    count <- 0
    for (char in strsplit(tolower(string), "")[[1]]) {
        if (char %in% vowels) {
            count <- count + 1
        }
    }
    return(count)
}

In [42]:
def count_vowels(string):
    vowels = ['a', 'e', 'i', 'o', 'u']
    count = 0
    for char in string:
        if char.lower() in vowels:
            count += 1
    return count

10. Ask Copilot to translate the following binary code into text:

```verbatim
01011001 01101111 01110101 00100111 01110110 01100101 00100000 01100011 01101111 01101101 01110000 01101100 01100101 01110100 01100101 01100100 00100000 01110100 01101000 01100101 00100000 01100001 01110011 01110011 01101001 01100111 01101110 01101101 01100101 01101110 01110100 00101110 00100000 01000011 01101111 01101110 01100111 01110010 01100001 01110100 01110101 01101100 01100001 01110100 01101001 01101111 01101110 01110011 00100001
```
Output: "You've completed the assignment. Congratulations!"