# JBI010 - Exercises 4 24-25

This Jupyter notebook provides exercises for practicing your programming skills.
 
**Notes**:

* Submit your _personalized_ notebook to Canvas.

* You then get automatic feedback from Momotor.

* A score below 80% is treated as a 0.

* A score of 80% or more is treated as a 10.

* The best five out of six exercise sets count.

## Table of Contents

<div class="toc" style="margin-top: 1em;">
    <ul class="toc-item">
        <li>
            <span><a href="#1-sorting-pairs" data-toc-modified-id="1.-Sorting-Pairs">1. Sorting Pairs</a></span>
        </li>
        <li>
            <span><a href="#2-dictionary-of-continents" data-toc-modified-id="2.-Dictionary-of-Continents">2. Dictionary of Continents</a></span>
        </li>
        <li>
            <span><a href="#3-current-date" data-toc-modified-id="3.-Current-Date">3. Current Date</a></span>
        </li>
        <li>
            <span><a href="#4-remove-invalid-dates" data-toc-modified-id="4.-Remove-Invalid-Dates">4. Remove Invalid Dates</a></span>
        </li>
        <li>
            <span><a href="#5-movie-dialogues" data-toc-modified-id="5.-Movie-Dialogues">5. Movie Dialogues</a></span>
        </li>
        <li>
            <span><a href="#6-movie-categories" data-toc-modified-id="6.-Movie-Categories">6. Movie Categories</a></span>
        </li>
        </ul>
</div>


# Introduction to This Template Notebook

<div class="alert alert-danger" role="danger">
<h3>Integrity</h3>
<ul>
    <li>In this course, you must act according to the rules of the TU/e code of scientific conduct.</li>
    <li>All the exercises and the graded assignments are to be done within your programming homework group.</li>
    <li>You must not copy from the Internet, your friends, books... If you represent other people's work as your own, then that constitutes fraud and will be reported to the Examination Committee.</li>
    <li>Making your work available to others (complicity) also constitutes fraud.</li>
</ul>
</div>

You are expected to work with Python 3 code in this notebook.

The locations where you should write your solutions can be recognized by
**marker lines**,
which look like this:

>`#//`
>    `BEGIN_TODO [Label]` `Description` `(n points)`
>
>`#//`
>    `END_TODO [Label]`

<div class="alert alert-warning" role="alert">
    <h3>Markers</h3>
    Do NOT modify or delete these marker lines.  Keep them as they are.<br/>
    NEVER write code that is needed for grading <i>outside</i> the marked blocks. It is invisible there.
</div>

Proceed in this notebook as follows:
* **Personalize** the notebook (see below).
* **Read** the text.
* **Fill in** your solutions between `BEGIN_TODO` and `END_TODO` marker lines.
* **Run** _all_ code cells (also the ones _without_ your code),
    _in linear order_ from the first code cell.

**Personalize your notebook**:
1. Fill in your _full name_, _identification number_, and the current _date_ as strings between quotes.
1. Run the code cell by putting the cursor there and typing **Control-Enter**.


In [None]:
#// BEGIN_TODO [Author] Name, Id.nr., Date, as strings (1 point)

AUTHOR_NAME = 'Your Full Name'
AUTHOR_ID_NR = '1234567'
AUTHOR_DATE = '2024-08-20'  # when first modified, e.g. '2024-08-20'

#// END_TODO [Author]

AUTHOR_NAME, AUTHOR_ID_NR, AUTHOR_DATE


## How to Submit Your Work

1. **Rename the notebook**, replacing `...-template.ipynb` with `...-yourIDnr.ipynb`, where `yourIDnr` is your TU/e identification number.

1. **Before submitting**, you must run your notebook by doing **Kernel > Restart & Run All**. Make sure that your notebook runs without errors **in linear order**.

1. Submit the executed notebook with your work for the appropriate assignment in **Canvas**.

1. In the **Momotor** tab in Canvas, you can select that assignment again to find some feedback on your submitted work.
  
1. If there are any problems reported by _Momotor_, then you need to fix those issues and **resubmit the fixed notebook**.


## Preliminaries

Run the cell below. This cell will import additional modules providing additional Python functionality.

In [None]:
# Imports
import datetime
import re
from typing import Dict, List, Tuple


## Important Reminder

Follow all coding conventions defined in the Python Coding Standard document. Remember that you are not just programming for a **machine**, you are mainly programming for other **humans**! In particular:  
- all function definitions must have **type hints** and a **docstring**, and;
- a *valid docstring* starts with a **capital letter** and ends with a **dot**. 

## Nb Mypy

The following cell will attempt to enable `mypy` type checking in the notebook.
`mypy` is an optional static type checker for Python,
which can help you identify type errors in your code.
If you prefer not to use it, just comment the code cell.

For this to work, you need to have installed [`Nb-Mypy`](https://pypi.org/project/nb-mypy/).
This is experimental and optional.
Some additional examples on its use can be found [here](https://gitlab.tue.nl/jupyter-projects/nb_mypy/-/blob/master/Nb_Mypy.ipynb).

**Note**:

* Type checking can be picky.
  In some cases, you can ignore the `nb-mypy` message.
* In case of doubt, ask for help.

In [None]:
# Enable mypy type checking

try:
    %load_ext nb_mypy
except ModuleNotFoundError:
    print('Type checking facility (Nb Mypy) is not installed.')
    print('To use this facility, install Nb Mypy by executing (in a cell):')
    print('  !python3 -m pip install nb_mypy')

## 1. Sorting Pairs

Create the function `sort_pair`, which takes 2 integers as arguments.  
The function returns the pair of values in ascending order.

**Notes**:

* This function returns something and does not print.
* This function must be used in the next task.
* Use `return num1, num2` to return a tuple of values.

In [None]:
def sort_pair(num1: int, num2: int) -> Tuple[int, int]:
    """Return the values num1, num2 as tuple in sorted order.
    """
#// BEGIN_TODO [Sorting_pair] Sorting Pairs

# ===== =====> Replace this line by your code. <===== ===== #

#// END_TODO [Sorting_pair]

print(sort_pair(1, 2))
a, b = 3, 2
a, b = sort_pair(a, b)
a, b

## 2. Dictionary of Continents

In the next cell, we have the dictionary `countries_dict` containing the names of some countries and their corresponding capitals and continents. This dictionary is useful for looking up information about countries. However, we want to have a dictionary to search for information about continents. This is known as a *reverse lookup*.

Create the function `create_continents_dict`, which takes a dictionary like `countries_dict` as input and returns a dictionary whose keys correspond to the names of the continents contained in the input dictionary. For each continent key, its value will be a list of the corresponding countries listed in the input dictionary.

**Example:**
```Python
# Input
countries_dict = {
    'Andorra': ('Europe', 'Andorra la Vella'),
    'Afghanistan': ('Asia', 'Kabul'),
    'Antigua and Barbuda': ('North America', "St. John's"),
    'Albania': ('Europe', 'Tirana'),
    'Armenia': ('Asia', 'Yerevan')
}

# Output
{Europe:  ['Andorra', 'Albania'],
Asia: ['Afghanistan', 'Armenia'],
North America: ['Antigua and Barbuda']}
```

In [None]:
countries_dict = {'Andorra': ('Europe', 'Andorra la Vella'),
                  'Afghanistan': ('Asia', 'Kabul'),
                  'Antigua and Barbuda': ('North America', "St. John's"),
                  'Albania': ('Europe', 'Tirana'),
                  'Armenia': ('Asia', 'Yerevan'),
                  'Angola': ('Africa', 'Luanda'),
                  'Argentina': ('South America', 'Buenos Aires'),
                  'Austria': ('Europe', 'Vienna'),
                  'Australia': ('Oceania', 'Canberra'),
                  'Azerbaijan': ('Asia', 'Baku'),
                  'Barbados': ('North America', 'Bridgetown'),
                  'Bangladesh': ('Asia', 'Dhaka'),
                  'Belgium': ('Europe', 'Brussels')}

In [None]:
#// BEGIN_TODO [Dictionary_of_continents] Dictionary of continents

# ===== =====> Replace this line by your code. <===== ===== #

#// END_TODO [Dictionary_of_continents]

print(create_continents_dict(countries_dict))

## 3. Current Date

In this exercise, you must use the `datetime` module and investigate which methods can help you when solving the problem. 
Create the function `get_current_date`, which has no parameters and returns a tuple with the year, month, and day as elements in the given order.
All these elements should be integers.
You must discover how to get the current date by using the `datetime` module features.

Afterward, create the function `tuple_to_date`, which receives a tuple as a parameter with the year, month, and day of a date in the given order. Then, it returns a `datetime.date` with the corresponding Gregorian values.

**Example:**
```python
# Program
date_tuple = get_current_date() # Value for the 1st October, 2022
date_datetime = tuple_to_date(date_tuple)

print(date_datetime.year)
print(date_datetime.month)
print(date_datetime.day)

# Output
2022
10
1
```

In [None]:
import datetime

#// BEGIN_TODO [Current_date] Current date 

# ===== =====> Replace this line by your code. <===== ===== #

#// END_TODO [Current_date]

date_tuple = get_current_date()
date_date = tuple_to_date(date_tuple)

print(date_date.year)
print(date_date.month)
print(date_date.day)

## 4. Remove Invalid Dates

Create the function `remove_invalid_dates`, which takes a list of tuples as an argument and returns a list of `datetime.date` objects with valid dates.
A valid year has a value equal to or less than 2022 (current year); a valid month is an integer between 1 and 12, and a valid day is an integer between 1 and 31.
Use comprehension to compute the expected list. 
You are also encouraged to use the `tuple_to_date` function developed in the exercise "Current Date".

Afterward, create a function `get_invalid_dates` that does the opposite of the `remove_invalid_dates` function: it also takes a list of tuples as an argument, but instead returns a list of tuple objects with invalid dates. Again, use comprehensions to compute the expected list.

**Note:** Preserve the lists order.

**Example:**
```python
# Program
dates = [(1903, 10, 3), (-230, -10, 2), (1947, 1, 2), (1990, 30, 30), (1678, 3, 30), 
         (680, 11, 23), (2500, 12, 12)]

print(remove_invalid_dates(dates))
print(remove_valid_dates(dates))

# Output valid dates
[datetime.date(1903, 10, 3), 
 datetime.date(1947, 1, 2), 
 datetime.date(1678, 3, 30), 
 datetime.date(680, 11, 23)]

# Output invalid dates
[(-230, -10, 2), (1990, 30, 30), (2500, 12, 12)]

```

In [None]:
#// BEGIN_TODO [Remove_invalid_dates] Remove invalid dates

# ===== =====> Replace this line by your code. <===== ===== #

#// END_TODO [Remove_invalid_dates]

dates = [(1903, 10, 3), (-230, -10, 2), (1947, 1, 2), (1990, 30, 30), (1678, 3, 30), 
         (680, 11, 23), (2500, 12, 12)]

print(remove_invalid_dates(dates))
print(get_invalid_dates(dates))

## 5. Movie Dialogues

Create the function `extract_dialogue` which takes a line of dialogue (string) as input and returns a tuple with two elements: the name of the character (string) and its line in the dialogue (string). 
A typical input for the function looks like this: 

```python
"L872 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Okay -- you're gonna need to learn how to lie."
```

Use **regular expressions** to extract the name of the character and their line in the dialogue. 
Then, return a tuple like the following:

```python
("BIANCA", "Okay -- you're gonna need to learn how to lie.")
```

Notice that whitespaces at the beginning and end of both elements in the tuple have been removed.

Afterward, define the function `extract_all_lines` which gets a path to a file as input (string) and returns a list of tuples.
The function reads each line of the file and extracts the name of the character and its dialogue.
Each character-dialogue tuple is added to the list in the order in which each line is read from the file.
To do so, use the `extract_dialogue` function defined before.

**Example:**
```python
# Program
extract_all_lines('datasets/movie_lines.txt')

# Output
[('BIANCA', 'They do not!'),
 ('CAMERON', 'They do to!'),
 ('BIANCA', 'I hope so.'),
 ('CAMERON', 'She okay?'),
 ('BIANCA', "Let's go."),
 ('CAMERON', 'Wow'),
 ('BIANCA', "Okay -- you're gonna need to learn how to lie."),
 ('CAMERON', 'No'),
 ...]
```

In [None]:
#// BEGIN_TODO [Movie_dialogues] Movie dialogues

# ===== =====> Replace this line by your code. <===== ===== #

#// END_TODO [Movie_dialogues] 

extract_all_lines('datasets/movie_lines.txt')

## 6. Movie Categories

Create the function `extract_categories`, which takes a line of a movie metadata file as input and returns a list of strings with the movie's categories. A typical input for the function looks like this:

```python
"m4 +++$+++ 48 hrs. +++$+++ 1982 +++$+++ 6.90 +++$+++ 22289 +++$+++ ['action', 'comedy', 'crime', 'drama', 'thriller']"
```

Use **regular expressions** to extract the categories from the input line. Remove whitespaces and quotations surrounding each category. Then, return a list like the following:

```python
['action', 'comedy', 'crime', 'drama', 'thriller']
```

Notice that whitespaces at the beginning and end of elements of the list, as well as extra quotes, have been removed.

Then, define the function `extract_all_categories`, which gets a path to a file as input (string) and returns a dictionary.
The function reads each line of the file and extracts the list of categories defined in such line.
To do so, use the `extract_categories` function defined before.
This list is then considered to update the dictionary of all categories, where keys are categories and values are the frequency of these categories in the movies.

**Example:**
```python
# Program
extract_all_categories('movie_titles_metadata.txt')

# Output
{'comedy': 21,
 'romance': 14,
 'adventure': 12,
 'biography': 5,
 ... }
```

In [None]:
#// BEGIN_TODO [Movie_categories] Movie categories

# ===== =====> Replace this line by your code. <===== ===== #

#// END_TODO [Movie_categories] 

extract_all_categories('datasets/movie_titles_metadata.txt')


---

In [None]:
# List of all defined names
%whos

---

# (End of Notebook)

&copy; 2017-2024 - **TU/e** - Eindhoven University of Technology
