In [None]:
import csv
import statistics
import warnings
import numpy as np

## SLU 17_3 - Exam Prep I

### Batch 6 - Wave 2 Python Exam

This is the wave 2 Python admission exam from the Lisbon Data Science Starters Academy - Batch 6. The allocated time for this exam was 3 hours. If you plan to take the admission exam this year, it's a good idea to measure the time you need to solve this notebook. You can see how far you get in 3 hours and also how much time you need to complete the notebook. At the same time, it's a good idea to not rush through the exercises and rather concentrate on working thoroughly.

This notebook has 4 exercises, some of which are divided in different parts. Each part is evaluated independently, so if you are stuck or cannot solve a given part, you can move on to the next one.

Note that the last exercise is multiple choice and is only included here so you can review these topics. This year's admission exam will not feature an exercise with this structure.

At the end, you submit the notebook to the portal as usual. You can submit as many times as you like.

## Exercise 1 (6 points)

Consider a csv file that stores information about football matches played in the 2015-16 edition of [UEFA Champions League](https://en.wikipedia.org/wiki/UEFA_Champions_League), an annual competition played by the best clubs in Europe. This file has five columns:
* `stage`: stage of the competition in which the match was played;
* `date`: the date of the match;
* `team_1`: name of the home team;
* `ft`: score at the end of regular time, given in format "X-Y", where X and Y are the number of goals scored by the home (`team_1`) and away (`team_2`) teams, respectively;
* `team_2`: name of the away team.

File `champions_league_15_16.csv` is an example of such file. You can preview it using some command line instructions.

### Part I (2 points)

Implement a function that reads a file with the same format as the `champions_league_15_16.csv` file, and stores the data in a list of dictionaries with the following structure:

```
[
    {
        "stage": stage of the competition in which the match was played (type: str),
        "date": date of the match (type: str),
        "team_1": name of the home team (type: str),
        "team_2": name of the away team (type: str),
        "ft": score at the end of regular time (type: str),
    },
]

```

The function should:
1. be called `read_matches`;
2. receive an argument called `file_path`, the path to the file that the function should read the data from;
3. read data from each match in the same format as read from the file;
4. return the list that was created.

Remember to inspect the contents of the file before writing the function, as it may contain header information that should be skipped.

To read the csv file, you shall use Python's [`csv`](https://docs.python.org/3/library/csv.html) module, which we already imported at the top of the Notebook. **Do not use [pandas](https://pandas.pydata.org/) or any other library not included in the coding test requirements file**.

In [None]:
def read_matches(file_path):
    """
    Reads the file in file_path, parses it and returns the data in a list of dictionaries.

    Parameters:
    file_path (str): Path to the input file to be parsed.

    Returns:
    matches_parsed_data (list): Matches dataset stored as a list of dictionaries.
    """
    
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
matches_data = read_matches("champions_league_15_16.csv")

assert isinstance(matches_data, list)
assert all(isinstance(match_data, dict) for match_data in matches_data)
assert len(matches_data) == 10

match_data = matches_data[2]
assert match_data["stage"] == "Group"
assert match_data["date"] == "9 Dec 2015"
assert match_data["team_1"] == "Chelsea FC"
assert match_data["team_2"] == "FC Porto"
assert match_data["ft"] == "2-0"

from more_tests import test_exercise_2_part_I
test_exercise_2_part_I(read_matches)

### Part II a (2 points)

Consider now that we want to use our dataset to compute some statistics about number of goals scored in matches of the 15-16 UEFA Champions League.

Start by implementing a function called `get_goals_from_ft` that:
1. receives as input a list like the one we created in Part I of this exercise;
2. parses the value associated with the key "ft" of each dictionary to obtain the total number of goals in each match as an integer;
3. returns a list with the total number of goals in each match (the list indexing must match that of the input list).

In [None]:
def get_goals_from_ft(matches_data):
    """
    Computes the total number of goals in each match of the dataset.

    Parameters:
    matches_data (list): List with the matches dataset.

    Returns:
    matches_goals (list): List with the total number of goals in each match of the dataset.
    """
    
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
matches_data = [
    {"stage": "Group", "date": "9 Dec 2015", "team_1": "KAA Gent", "team_2": "Zenit St. Petersburg", "ft": "2-1"}, 
    {"stage": "Group", "date": "9 Dec 2015", "team_1": "Valencia CF", "team_2": "Olympique Lyon", "ft": "0-2"},
    {"stage": "Knockout", "date": "26 Apr 2016", "team_1": "Manchester City FC", "team_2": "Real Madrid CF", "ft": "0-0"},
    {"stage": "Knockout", "date": "27 Apr 2016", "team_1": "Atletico Madrid", "team_2": "Bayern München", "ft": "1-0"},
]


matches_goals = get_goals_from_ft(matches_data)
assert isinstance(matches_goals, list)
assert all(isinstance(match_goals, int) for match_goals in matches_goals)
assert len(matches_goals) == len(matches_data)
assert matches_goals == [3, 2, 0, 1]

from more_tests import test_exercise_2_part_II_a
test_exercise_2_part_II_a(get_goals_from_ft)

### Part II b (2 points)

Consider now that you are given a list with the number of goals scored in each match like that obtained in the previous exercise, e.g. `[2, 0, 3, 0]`.

Create a function called `get_goals_stats` that computes:
* the average number of goals per match, as a float, rounded to one (1) decimal place
* the most common number of goals in a match as an integer (*i.e.*, the [mode](https://en.wikipedia.org/wiki/Mode_(statistics)) of the set of goals)

This function should:
1. receive as input a list like the one we created in Part II a of this exercise;
2. return a tuple with the average and most common number of goals in a match of in the first and second elements, respectively.

You can (and are encouraged to) use the Python's [`statistics`](https://docs.python.org/3/library/statistics.html) module, which we already imported at the top of the Notebook.

In [None]:
def get_goals_stats(goals_list):
    """
    Computes statistics (mean.

    Parameters:
    goals_list (list): List with number of goals in a set of football matches.

    Returns:
    goals_average (float): Average number of goals per match.
    goals_mode (int): Most common number of goals in a match.
    """
    
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
goals_list = [2, 0, 3, 0]

goals_stats = get_goals_stats(goals_list)
assert isinstance(goals_stats, tuple)
assert len(goals_stats) == 2

goals_average, goals_mode = goals_stats
assert isinstance(goals_average, float)
assert isinstance(goals_mode, int)
np.testing.assert_almost_equal(goals_average, 1.3, decimal=1)
assert goals_mode == 0

from more_tests import test_exercise_2_part_II_b
test_exercise_2_part_II_b(get_goals_stats)

## Exercise 2 (4 points)

It's almost vacations time &mdash; a.k.a. [vacay](https://en.wiktionary.org/wiki/vacay) &mdash; and you are preparing a road trip with friends. During the trip, you can pick up or drop passengers, as well as change the volume of carried luggage.

![Packed car trunk](images/car-packed.jpg)

In this exercise, you'll use object oriented programming concepts to model the car used in this road trip.

The model of the car should store information about the number of passengers and the volume of luggage carried by the car.

### Class and methods

Implement a class to represent a vacay car, `VacayCar`. In this class, you'll need to store the following information:
* `num_passenger_seats` (type: int): the total number of passenger seats in the car;
* `num_passengers` (type: int): the current number of passengers in the car;
* `trunk_volume` (type: float): the total volume available to store luggage in the car;
* `luggage_volume` (type: float): the current volume of luggage in the car.

At **initialization (1 point)**, instances of `VacayCar` should:
* be specified a total number of passenger seats and a total volume available to store luggage;
* be specified an initial volume of luggage, corresponding to the driver's luggage.
* always have zero passengers;

Additionally, `VacayCar` should have two methods, described below:

* **`update_passengers_and_luggage` (2 points)**:

  This method updates the current number of passengers and luggage volume in the car. It should:

  1. receive as arguments the increment / decrement in number of passengers (type: int) and luggage volume (type: float), `num_passengers_diff` and `luggage_volume_diff`, respectively. For example, `num_passengers_diff = 1` and `luggage_volume_diff = -30` means that the number of passengers increases by 1 and the volume of luggage decreases by 30;
  2. update the current number of passengers and luggage volume in the car *only if* the required update is valid (*i.e.*, if the updated number of passengers and luggage volume are *both* nonnegative and do not exceed the capacity of the car);
  3. issue a warning with the message "Could not update passengers and luggage." if the update is invalid. To issue this warning, you should use the [`warnings.warn`](https://docs.python.org/3/library/warnings.html#warnings.warn) method (note that we have already imported the `warnings` module for you).


* **`estimate_cost` (1 point)**:
  
  This method estimates the cost of travelling a given distance with the car in the current status. It should:
  
  1. receive as argument a distance, `distance` (type: float);
  2. estimate the cost according to the formula:
     $$ \rm{cost} (d, P, V) = 0.12 \times d + 0.03 \times d \times P + 0.01 \times d \times V \rm{\ ,}$$
     where $d$ is the distance to be travelled, $P$ is the number of passengers in the car and $V$ is its volume of luggage;
  3. return the estimated cost.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
score = 0

try:
    vacay_car = VacayCar(num_passenger_seats=4, trunk_volume=300., luggage_volume=20.)
    assert vacay_car.num_passenger_seats == 4, "The number of passenger seats of `vacay_car` is wrong."
    assert vacay_car.num_passengers == 0, "The number of passengers of `vacay_car` is wrong."
    
    np.testing.assert_almost_equal(vacay_car.trunk_volume, 300., decimal=0, err_msg="The trunk volume of `vacay_car` is wrong.")
    np.testing.assert_almost_equal(vacay_car.luggage_volume, 20., decimal=0, err_msg="The luggage volume of `vacay_car` is wrong.")
except AssertionError as e:
    print(e)
    pass
else:
    score += 1

try:
    # valid updates
    vacay_car = VacayCar(num_passenger_seats=4, trunk_volume=300., luggage_volume=20.)
    
    for (
        num_passenger_diff, 
        luggage_volume_diff, 
        updated_num_passengers, 
        updated_luggage_volume,
    ) in zip(
        [1, 0, 2, -3], 
        [0., 10., 20., -30.],
        [1, 1, 3, 0],
        [20., 30., 50., 20.],
    ):
        with warnings.catch_warnings(record=True) as w:    
            vacay_car.update_passengers_and_luggage(num_passenger_diff, luggage_volume_diff)
            assert vacay_car.num_passengers == updated_num_passengers, "The number of passengers of `vacay_car` is wrong."
            assert vacay_car.num_passenger_seats == 4, "The number of passenger seats of `vacay_car` is wrong."
            np.testing.assert_almost_equal(vacay_car.luggage_volume, updated_luggage_volume, decimal=0, err_msg="The luggage volume of `vacay_car` is wrong.")
            np.testing.assert_almost_equal(vacay_car.trunk_volume, 300., decimal=0, err_msg="The trunk volume of `vacay_car` is wrong.")
            assert len(w) == 0, "Warning is incorrectly raised, the new number of passengers and luggage volume are valid."
    
    # invalid updates
    vacay_car = VacayCar(num_passenger_seats=4, trunk_volume=300., luggage_volume=20.)
    
    for (
        num_passenger_diff, 
        luggage_volume_diff,
    ) in zip(
        [5, 0, -1, 2, -5], 
        [0., 300., 10., -30., -100.],
    ):
        with warnings.catch_warnings(record=True) as w:    
            vacay_car.update_passengers_and_luggage(num_passenger_diff, luggage_volume_diff)
            assert vacay_car.num_passengers == 0, "The number of passengers of `vacay_car` is wrong."
            assert vacay_car.num_passenger_seats == 4, "The number of passenger seats of `vacay_car` is wrong."
            np.testing.assert_almost_equal(vacay_car.luggage_volume, 20., decimal=0, err_msg="The luggage volume of `vacay_car` is wrong.")
            np.testing.assert_almost_equal(vacay_car.trunk_volume, 300., decimal=0, err_msg="The trunk volume of `vacay_car` is wrong.")
            assert len(w) == 1, "A warning was not raised with an invalid new number of passengers or luggage volume."
        
except AssertionError as e:
    print(e)
    pass
else:
    score += 2

try:
    vacay_car = VacayCar(num_passenger_seats=4, trunk_volume=300., luggage_volume=200.)
    
    cost_200 = vacay_car.estimate_cost(200.)
    np.testing.assert_almost_equal(cost_200, 424., decimal=0, err_msg="The estimated cost of a travel with `vacay_car` is wrong.")
    
    
    # force vacay_car update
    vacay_car.num_passengers=2
    vacay_car.luggage_volume=150.
    
    cost_150 = vacay_car.estimate_cost(150.)
    np.testing.assert_almost_equal(cost_150, 252., decimal=0, err_msg="The estimated cost of a travel with `vacay_car` is wrong.")
    
except AssertionError as e:
    print(e)
    pass
else:
    score += 1

if score == 0:
    raise AssertionError("Not enough correct answers to score points.")

print(f"Your score is {score} / 4 in Exercise 3.")

score

## Exercise 3 (4 points)

Consider square matrices (*i.e.*, matrices with the same number of rows and columns) whose entries are either `0` or `1`. An example of such matrix is shown below.

<code>
[[1, 0, 0],
 [0, 1, 0],
 [1, 1, 0]]
</code>

This type of matrices can be used to represent connections between items in a collection, labeled by the row and column indices. More specifically, if items `i` and `j` are connected, then the entry of row `i`, column `j` of the matrix is `1`. This could be used, for example, to express that cities `i` and `j` have a flight connection.

### Part I (2 points)

Write a function, named `count_nonzero_entries` that receives as argument a numpy array representing a matrix such as those described above, and returns an integer indicating the number of nonzero entries of that matrix.

**Hint:** the numpy functions `np.sum` may be useful in this exercise.

In [None]:
def count_nonzero_entries(matrix):
    """
    Counts nonzero entries in a matrix of zeros and ones.

    Parameters:
    matrix (np.ndarray): Matrix of zeros and ones.

    Returns:
    num_nonzero_entries (int): Number of nonzero entries in the input matrix.
    """
    
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
matrix = np.array([
    [1, 0, 0],
    [0, 1, 0],
    [1, 1, 0]
])

num_nonzero_entries = count_nonzero_entries(matrix)
assert isinstance(num_nonzero_entries, (int, np.integer))
assert num_nonzero_entries == 4

### Part II (2 point)

Implement a function called `get_nonzero_entries_indices` that, given a matrix such as those described above, returns an array with the indices of the nonzero entries of that matrix.

The return array should have shape `(num_nonzero_entries, 2)`, where `num_nonzero_entries` is the number of nonzero entries of the input matrix. The elements of this array along the second dimension should correspond to the row and column indices of each nonzero entry, respectively.

For example, the matrix shown in the description of this exercise has ones in entries `[0, 1]`, `[1, 1]`, `[2, 0]` and `[2, 1]`, so `get_nonzero_entries_indices` should return the numpy array corresponding to

<code>
[[0, 0],
 [1, 1],
 [2, 0],
 [2, 1]]
</code>


**Note:** The order in which the nonzero entries are presented in the output array is irrelevant. For example, given the [identity matrix](https://en.wikipedia.org/wiki/Identity_matrix) of size 2 as input to `get_nonzero_entries_indices`, both `[[0, 0], [1, 1]]` and `[[1, 1], [0, 0]]` are considered as correct returns.

In [None]:
def get_nonzero_entries_indices(matrix):
    """
    Determines the indices of the nonzero entries in a matrix of zeros and ones.

    Parameters:
    matrix (np.ndarray): Matrix of zeros and ones.

    Returns:
    nonzero_entries_indices (np.ndarray): Array with indices of nonzero entries of the input matrix.
      Has shape `(num_nonzero_entries, 2)`, where `num_nonzero_entries` is the number of nonzero
      entries of the input matrix.
    """
    
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
matrix = np.array([
    [1, 0, 0],
    [0, 1, 0],
    [1, 1, 0]
])

nonzero_entries_indices = get_nonzero_entries_indices(matrix)
assert isinstance(nonzero_entries_indices, np.ndarray)
assert nonzero_entries_indices.ndim == 2
assert nonzero_entries_indices.shape == (4, 2)

for correct_row in [
    [0, 0],
    [1, 1],
    [2, 0],
    [2, 1],
]:
    assert any([np.array_equal(nonzero_entries_indices_row, correct_row) for nonzero_entries_indices_row in nonzero_entries_indices])

## Exercise 4 (6 points)

This exercise is a quiz with 12 multiple choice questions. Note that this year's admission exam will not feature an exercise with this structure.

In the cell below, we've declared a dictionary called `answers`.

You should fill in that dictionary with your answers, using as keys the question numbers, like `question_x`, and as values the numbers from `1` to `4` that correspond to the right answer.
For each question, you should provide only one answer, *i.e.*, the dict values should have type **int**.

For example, if you want to answer Question 1 with choice number 2, then you do:
```
answers["question_1"] = 2
```

In [None]:
answers = {}

### Part I (questions 1-5)

Imagine that you are using a Unix based machine, and that you have a terminal with the working directory `/users/mary`. This directory has in it:
- a file named `example_data.csv`;
- a directory named `documents` which contains some files.

The file tree below illustrates the contents and structure of `/users/mary`, **that should be considered to answer Questions 1-5**.


```text
/users/mary
├── example_data.csv
└── documents
    ├── profile_picture.png
    ├── notes.txt
    └── .gitignore
```

#### Question 1
Which command would you use to print the current working directory?
1. `pwd`
2. `cd .`
3. `ls ..`
4. `ls .`

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# answers["question_1"] = ...

#### Question 2
Which command would you use to see the first 10 lines of `example_data.csv`?
1. `head -10 example_data.csv`
2. `cat example_data.csv`
3. `head -10 documents/example_data.csv`
4. `cat documents/example_data.csv`

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# answers["question_2"] = ...

#### Question 3
After running one of the following commands, there will be 2 files in the `documents` directory. Which command is it?
1. `cp notes.txt documents`
2. `mv notes.txt documents`
3. `cp documents/notes.txt .`
4. `mv documents/notes.txt .`

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# answers["question_3"] = ...

#### Question 4
The command `rm documents/profile-picture.png`
1. Creates a new file in the `documents` directory
2. Removes a file from the `documents` directory
3. Removes the entire `documents` directory
4. Raises an error

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# answers["question_4"] = ...

#### Question 5
Based on the structure of `/users/mary` and its subdirectories, which folder is likely to be using git to version control its contents? 
1. `/users/mary`
2. `/users/mary/documents`
3. Both of the above
4. None of the above

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# answers["question_5"] = ...

### Part II (questions 6-12)

Answer the following questions about git and Python programming.

#### Question 6
Consider that `numbers` is a list of integers. Which of the following instructions builds a list where each element is the square root of an element of `numbers`? 
1. `[abs(number) for number in numbers]`
2. `(abs(number) for number in numbers)`
3. `[sqrt(number) for number in numbers]`
4. `(sqrt(number) for number in numbers)`

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# answers["question_6"] = ...

#### Question 7
Which of the following sentences is false?

1. Dictionaries can have integers as keys.
2. Dictionaries can have floats as keys.
3. Dictionaries can have lists as keys.
4. Dictionaries can have tuples of integers as keys.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# answers["question_7"] = ...

#### Question 8
How do you import function `my_function` from module `my_module`?

1. `import my_function`
2. `import my_module`
3. `from my_function import my_module`
4. `from my_module import my_function`

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# answers["question_8"] = ...

#### Question 9
Consider that the variable `a` stores the float `5.0`. Which of the following instructions would return `10.`?
1. `a * a`
2. `a / a`
3. `a * 2`
4. `a *= 2.`

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# answers["question_9"] = ...

#### Question 10
You are working on a project that uses git for version control. How do you create a new branch named `feature_dev` to work on a new feature?

1. `git checkout main`
2. `git checkout -b main`
3. `git checkout -b feature_dev`
4. `git checkout feature_dev`

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# answers["question_10"] = ...

#### Question 11
While working on a new git branch, you created a file `script.py`. How do you commit your changes to a local branch?

1. `git add script.py & git commit -m "creates a cool script file"`
2. `git status & git commit -m "creates a cool script file"`
3. `git status & git push`
4. `git add script.py & git push`

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# answers["question_11"] = ...

#### Question 12
How do you upload changes done on a local repository content to a remote repository?

1. `git pull`
2. `git status`
3. `git push`
4. `git reset`

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# answers["question_12"] = ...

In [None]:
from hidden_tests import test_exercise_1
test_exercise_1(answers)


### Submit your work!

To submit your work, [follow the steps here, in the step \"Grading the Exercise Notebook\"!](https://github.com/LDSSA/ds-prep-course-2023/blob/main/weekly-workflow.md#link-to-grading)