## DSA 2024 Summer School Admittance Check

Thanks for your interest in attending DSA 2024 Nyeri, Kenya. To attend the summer school you have to have some level of basic Python proficiency. Completing the following notebook should ensure you have the right kind of background to benefit maximally from the Summer School. See you in Nyeri!

In [22]:
import pandas as pd
import numpy as np
import otter



grader = otter.Notebook()

**Question 1:** write a function `isValid(s)` that takes as argument a string s containing a sequence of parenthesis '(', ')', '{', '}', '[' and ']', and  determines if the input is valid. A input string is valid if for every open parenthensis there is a close one and parenthesis is well-formed. e.g  "(){}[]" is valid.

In [23]:
def isValid(s):
    stack = []
    mapping = {')': '(', '}': '{', ']': '['}
    
    for char in s:
        if char in mapping.values():
            stack.append(char)
        elif char in mapping:
            if not stack or stack.pop() != mapping[char]:
                return False
        else:
            return False
    
    return len(stack) == 0

# Example usage
print(isValid("(){}[]"))  
print(isValid("([{}])"))
print(isValid("({[}])"))  
print(isValid("()"))      
print(isValid("["))


True
True
False
True
False


In [24]:
grader.check("q1")

ValueError: Tests directory does not exist and no notebook path provided

**Question 2:** Given a paragraph as a string, write a function that return the number of character with odd frequencies. E.g The paragraph ``DSA 2024 Nyeri`` has *10* characters with odd frequencies. i.e the entire frequency count is given as {' ': 2, '2': 2, 'D': 1, 'S': 1, 'A': 1, '0': 1, '4': 1, 'N': 1, 'y': 1, 'e': 1, 'r': 1, 'i': 1}) and there are *10* characters with odd frequences. So the function should return *10*.

In [10]:
from collections import Counter

def count_odd_frequency_characters(paragraph):
    # Count the frequency of each character in the paragraph
    frequency_count = Counter(paragraph)
    
    # Count the number of characters with odd frequencies
    odd_frequency_count = sum(1 for count in frequency_count.values() if count % 2 != 0)
    
    return odd_frequency_count

# Example usage
paragraph = "DSA 2024 Nyeri"
result = count_odd_frequency_characters(paragraph)
print(result)  


10


In [None]:
grader.check("q2")

**Question 3:** Write an infinite generator function `odd_squares_sum` that yields the sum of square of odd numbers. e.g $1^2 + 3^2 + 5^2 + ...$ up to a ``limit``

In [7]:
def odd_squares_sum(limit):
    sum_of_odd_squares = 0
    for number in range(0, limit):
        if number % 2 != 0:
            square = number**2
            sum_of_odd_squares += square
            yield sum_of_odd_squares
    print(sum_of_odd_squares)

for sum_of_squares in odd_squares_sum(10):
    print(sum_of_squares)

# Printing the final sum after the loop
print("Final sum:", sum_of_squares)


1
10
35
84
165
165
Final sum: 165


In [None]:
grader.check("q3")

**Question 4:** Using the `odd_squares_sum` generator defined above, create a list of sum of squares up to a limit of $20$ and store the results in a numpy.array variable called `oddSumList`

In [8]:

# Creating a list of sums using the generator
sum_list = list(odd_squares_sum(20))

# Converting the list to a numpy array
oddSumList = np.array(sum_list)

# Example usage
print(oddSumList)


1330
[   1   10   35   84  165  286  455  680  969 1330]


In [None]:
grader.check("q4")

**Question 5:** Compute the element-wise remainder of ``oddSumList`` when divided by $5$ and merge it with ``oddSumList``. The final output stored in the variable `mergedList` should be in the form of a list of tupples e.g ``[(1,1), (4,9), (0,25), ...]``

In [9]:
# Computing the element-wise remainder when divided by 5
remainderList = oddSumList % 5

# Merge oddSumList and remainderList into a list of tuples
mergedList = list(zip(remainderList, oddSumList))

# Example usage
print(oddSumList)  
print(mergedList) 

[   1   10   35   84  165  286  455  680  969 1330]
[(1, 1), (0, 10), (0, 35), (4, 84), (0, 165), (1, 286), (0, 455), (0, 680), (4, 969), (0, 1330)]


In [None]:
grader.check("q5")

**Question 6:**  Write a function `greatest_common_divisor` that takes two inputs `a` and `b` and returns the greatest common divisor of the two numbers. E.g. input `(10, 15)` would return `5`

In [5]:
def greatest_common_divisor(a, b):
    while b:
        a, b = b, a % b
    return a

# Example usage
print(greatest_common_divisor(10, 15))


5


In [None]:
grader.check("q6")

**Question 7:**  Write a function `get_3_nearest` that takes in a point of interest ``pt`` and a **list** of points ``ptlist``  and returns a list of 3 nearest points from the point of interest ``pt``. Assume the distance between any two point is defined by the `L1-norm`.

In [4]:

def l1_distance(pt1, pt2):
    return sum(abs(a - b) for a, b in zip(pt1, pt2))

def get_3_nearest(pt, ptlist):
    # Calculate the L1 distance for each point in ptlist
    distances = [(l1_distance(pt, p), p) for p in ptlist]
    
    # Sort the points by distance
    distances.sort(key=lambda x: x[0])
    
    # Get the 3 nearest points
    nearest_points = [p for _, p in distances[:3]]
    
    return nearest_points

# Example usage
pt = (0, 0)
ptlist = [(1, 2), (3, 4), (1, -1), (0, 5), (-1, -2)]
print(get_3_nearest(pt, ptlist))


[(1, -1), (1, 2), (-1, -2)]


In [None]:
grader.check("q7")

**Question 8:**  Write a function `diagonal_vector(M)` that returns a **numpy** array of the list of **absolute** values of the main diagonal entries in the matrix $M$

In [3]:
import numpy as np

def diagonal_vector(M):
    # Convert M to a NumPy array if it's not already
    M = np.array(M)
    
    # Extract the main diagonal
    diagonal = np.diag(M)
    
    # Get the absolute values of the main diagonal
    abs_diagonal = np.abs(diagonal)
    
    return abs_diagonal

# Example usage
M = [[-1, 2, 3],
     [4, -5, 6],
     [7, 8, -9]]
print(diagonal_vector(M))


[1 5 9]


In [None]:
grader.check("q8")

**Question 9:**  Write a function `flatten_reverse_lists` that takes in a list of lists and outputs a **reverse** sorted list of elements of sublists of the input list (confusing right?) <br>
Example: given `flatten_reverse_lists([[2,13,44], [6,7]])` it should return `[2,6,7,13,44]`

In [2]:
def flatten_reverse_lists(list_of_lists):
    # Flatten the list of lists
    flattened_list = [item for sublist in list_of_lists for item in sublist]
    
    # Sort the flattened list
    sorted_list = sorted(flattened_list)
    
    return sorted_list

# Example usage
example_input = [[2, 13, 44], [6, 7]]
print(flatten_reverse_lists(example_input))


[2, 6, 7, 13, 44]


In [None]:
grader.check("q9")

**Question 9:** Create a DataFrame mirroring the table below and assign this to `data`.

| flavor | scoops | price |
|-----|-----|-----|
| white chocolate | 1 | 2 |
| vanilla | 1 | 1.5 |
| dark chocolate | 2 | 3 |
| strawberry | 1 | 2 |
| strawberry | 3 | 4 |
| vanilla | 2 | 2 |
| mint | 1 | 4 |
| mint | 2 | 5 |
| white chocolate | 3 | 2 |
| dark chocolate | 3 | 3 |
| white chocolate | 2 | 2 |
| dark chocolate | 5 | 3 |


In [15]:
data = pd.DataFrame({
    "flavor" : ["white chocolate", "vanilla", "dark chocolate", "strawberry", "strawberry", "vanilla", "mint", "mint", "white chocolate",
                "dark chocolate", "white chocolate", "dark chocolaate"],
    "scoops" : [1, 1, 2, 1, 3, 2, 1, 2, 3, 3, 2,5],
    "price": [2, 1.5, 3, 2, 4, 2, 4, 5, 2, 3, 2, 3]
})
duplicates = data.duplicated()
print(duplicates)

print(data.duplicated().sum())

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
dtype: bool
0


**Question 10:** Do the following to the dataframe:
* Create a new collumn ``total_price`` whose value is equal to $scoops * price$*
* Write a function ``groupStatistics(data, groupValue)``. Internally, this function groups ``data``  by ``flavor`` and then returns statistics of a given grouped item ``groupValue`` indexed on the ``total_price`` columns. The statistics is a numpy array contains ``[mean, media, min, max, std]`` of the ``total_price`` column. The ``std`` should be rounded to 2 **decimal places**



In [19]:
import pandas as pd
import numpy as np

# Sample data (replace this with your actual data loading process)
data = pd.DataFrame({
    "flavor" : ["white chocolate", "vanilla", "dark chocolate", "strawberry", "strawberry", "vanilla", "mint", "mint", "white chocolate",
                "dark chocolate", "white chocolate", "dark chocolaate"],
    "scoops" : [1, 1, 2, 1, 3, 2, 1, 2, 3, 3, 2, 5],
    "price": [2, 1.5, 3, 2, 4, 2, 4, 5, 2, 3, 2, 3]
})

# Create new column (total_price)
data["total_price"] = data["scoops"] * data["price"]
print(data.head())

# Define function groupStatistics()
def groupStatistics(data, groupValue):
    group_data_by_flavor = data.groupby("flavor")["total_price"].agg(["mean", "median", "min", "max", "std"]).reset_index()

    # Round standard deviation to 2 decimal places
    group_data_by_flavor['std'] = round(group_data_by_flavor['std'], 2)

    # Extract statistics for the specified groupValue
    stats = group_data_by_flavor[group_data_by_flavor['flavor'] == groupValue][['mean', 'median', 'min', 'max', 'std']].to_numpy()

    # Check if stats has any rows
    if len(stats) > 0:
        return stats[0]  # Return the first row as a numpy array
    else:
        return np.array([np.nan, np.nan, np.nan, np.nan, np.nan])  # Return NaNs if no data found

# Function call 
flavor = 'vanilla'  
stats = groupStatistics(data, flavor)
if not np.isnan(stats).any():
    print(f"Statistics for '{flavor}' group:")
    print("Mean:", stats[0])
    print("Median:", stats[1])
    print("Min:", stats[2])
    print("Max:", stats[3])
    print("Standard Deviation:", stats[4])
else:
    print(f"No statistics found for '{flavor}' group.")

            flavor  scoops  price  total_price
0  white chocolate       1    2.0          2.0
1          vanilla       1    1.5          1.5
2   dark chocolate       2    3.0          6.0
3       strawberry       1    2.0          2.0
4       strawberry       3    4.0         12.0
Statistics for 'vanilla' group:
Mean: 2.75
Median: 2.75
Min: 1.5
Max: 4.0
Standard Deviation: 1.77


## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Download the exported ZIP. Take note of the ZIP number and proceed to fill the summer school form

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)