# Final Exam: 

In this exam you will work on the following concepts:

* Handling dates and times in Python and Pandas
* Handling text data in Pandas with regular expressions
* Error handling in Python
* Unit testing in Python

Rules:
- The time limit is 80 minutes.
- Every solution must be written in Python, and the result must be printed or returned.
- 5 minutes before the end, a warning will be given to submit what you have so far. No submissions will be accepted after the end of the exam.
- You can resubmit as many times as you want.
- The exam is open-book, open-notes, open-internet, open-everything except for communicating with anyone by any means, sharing solutions, or using AI in any form.
    - The student that doesn't comply with these rules will fail immediately and will be reported to the university.
- If you have a question, please raise your hand. I will come to you.
- For every exercise with a single solution, the solution is provided, but you need to write the code to solve it.
- If you couldn't solve an exercise and you need the solution, you can copy my solution at a penalty of 75% of the exercise's grade.
    - E.g. on exercise 1 (1 point) you don't know the solution, but you need it for exercise 2 (1 point). You can copy the solution, but the maximum grade you can get on exercise 2 is 25% of the points. The total grade in both exercises will be 0.25 points.


Submission:
- Before the end of the exam, submit the following files using the link provided in the final exam announcement.

    * This Jupyter Notebook with your solutions and experiments
    * A script called `functions.py` with all the functions you implemented and used in the Jupyter Notebook
    * A script called `test_functions.py` with the tests for all the functions you implemented in the `functions.py` script.


## Exercise 1

Create a function called `calendar()` that receives two strings with dates in the format `YYYY-MM-DD` and returns a pandas DataFrame with the following elements:

* A `DatetimeIndex` with all the days between the two dates, with hourly granularity
* The following columns:
    * `day`: the day of the month as an integer
    * `month`: the month of the year as an integer
    * `year`: the year as an integer
    * `day_of_week`: the day of the week as an integer (0 is Monday, 6 is Sunday)
    * `is_weekend`: a boolean indicating if the day is a weekend or not
    * `quarter`: the quarter of the year with this format 'YYYY-Qn', where n is the quarter number

The DataFrame must have one row for each hour of each day between the two dates.

One row example:

| DatetimeIndex           | day | month | year | day_of_week | is_weekend | quarter |
|---------------------|-----|-------|------|-------------|------------|---------|
| 2025-01-01 00:00:00 | 1   | 1     | 2025 | 2           | False      | 2025-Q1 |

Make sure that it includes the last day in fully, until the last hour of the day.

In [None]:
import pandas as pd
from datetime import datetime

# I was supposed to do a daterange but I didn't read the question fully

def calendar(string1, string2):
    date1 = pd.to_datetime(string1)
    date2 = pd.to_datetime(string2)
    df = pd.DataFrame(
        {
            'DatetimeIndex': [pd.to_datetime(date1), pd.to_datetime(date2)],
            'day':[date1.day, date2.day],
            'month': [date1.month, date2.month],
            'year': [date1.year, date2.year],
            'day_of_week': [date1.day_of_week, date2.day_of_week],
            'is_weekend': [date1.day_name() in ['Saturday', 'Sunday'], date2.day_name() in ['Saturday', 'Sunday']],
            'quarter': [str(date1.year) + 'Q' + str(date1.quarter), str(date2.year) + 'Q' + str(date2.quarter)]
        }
    )
    return df.set_index('DatetimeIndex')

calendar('2025-01-01', '2025-01-01')

['day', 'month', 'year', 'day_of_week', 'is_weekend', 'quarter']

In [11]:
# try the function here

calendar('2025-01-01', '2025-03-20')

Unnamed: 0_level_0,day,month,year,day_of_week,is_weekend,quarter
DatetimeIndex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2025-01-01,1,1,2025,2,False,2025Q1
2025-03-20,20,3,2025,3,False,2025Q1


## Exercise 2

Create a function called `analyze_text()` that receives a list of strings and returns a pandas DataFrame with the following columns:

* `text`: the original text
* `number_of_words`: the number of words in the text
* `number_of_vowels`: the number of vowels in the text
* `number_of_consonants`: the number of consonants in the text
* `number_of_unique_vowels`: the number of unique vowels in the text
* `number_of_unique_consonants`: the number of unique consonants in the text

Make sure to use regex to count the number of words, vowels, and consonants.

One row example:

| text | number_of_words | number_of_vowels | number_of_consonants | number_of_unique_vowels | number_of_unique_consonants |
|------|-----------------|------------------|----------------------|-------------------------|-----------------------------|
| 'This is a text' | 4 | 4 | 7 | 3 | 4 |

The result of the function must be a DataFrame.

Solution for this string list:

```python
analyze_text(["Check, check, check, is this thing on?"])
```

| text                                | number_of_words | number_of_vowels | number_of_consonants | number_of_unique_vowels | number_of_unique_consonants |
|-------------------------------------|-----------------|------------------|----------------------|-------------------------|-----------------------------|
| Check, check, check, is this thing on? | 7               | 7                | 21                   | 3                       | 8                           |

In [38]:
# * `text`: the original text
# * `number_of_words`: the number of words in the text
# * `number_of_vowels`: the number of vowels in the text
# * `number_of_consonants`: the number of consonants in the text
# * `number_of_unique_vowels`: the number of unique vowels in the text
# * `number_of_unique_consonants`: the number of unique consonants in the text

import re

def analyze_text(list_of_strings):
    vowel_pattern = r'[a, e, i, o, u]'
    consonant_pattern = r'[^aeiou\s]'

    dictionary = {}
    dictionary['text'] = [sentence for sentence in list_of_strings]
    dictionary['number_of_words'] = [len(list(sentence)) for sentence in list_of_strings]
    dictionary['number_of_vowels'] = [len(re.findall(vowel_pattern, sentence)) for sentence in list_of_strings]
    dictionary['number_of_consonants'] = [len(re.findall(consonant_pattern, sentence)) for sentence in list_of_strings]
    dictionary['number_of_unique_vowels'] = [len(set(re.findall(vowel_pattern, sentence))) for sentence in list_of_strings]
    dictionary['number_of_unique_consonants'] = [len(set(re.findall(consonant_pattern, sentence))) for sentence in list_of_strings]

    df = pd.DataFrame(dictionary)
    return df

In [40]:
# try the function here

analyze_text(["Check, check, check, is this thing on?"])

Unnamed: 0,text,number_of_words,number_of_vowels,number_of_consonants,number_of_unique_vowels,number_of_unique_consonants
0,"Check, check, check, is this thing on?",38,16,25,5,10


## Exercise 3

Let's use `try-except` to handle errors in Python.

Apply the following in `calendar()`:

* If the input are not strings, raise a `TypeError` with the message "The input must be a string."
* If the input strings are not in the format `YYYY-MM-DD`, raise a `ValueError` with the message "The input must be in the format 'YYYY-MM-DD'."

In [42]:
import pandas as pd
from datetime import datetime

def calendar(string1, string2):
    if not isinstance(string1, str):
        raise TypeError('The input must be a string')
    if not isinstance(string2, str):
        raise TypeError('The input must be a string')
    
    pattern_check = r'\d{4}-\d{2}-\d{2}'
    if not re.match(pattern_check, string1):
        raise ValueError("The input must be in the format 'YYYY-MM-DD'")
    if not re.match(pattern_check, string2):
        raise ValueError("The input must be in the format 'YYYY-MM-DD'")

    date1 = pd.to_datetime(string1)
    date2 = pd.to_datetime(string2)

    df = pd.DataFrame(
        {
            'DatetimeIndex': [pd.to_datetime(date1), pd.to_datetime(date2)],
            'day':[date1.day, date2.day],
            'month': [date1.month, date2.month],
            'year': [date1.year, date2.year],
            'day_of_week': [date1.day_of_week, date2.day_of_week],
            'is_weekend': [date1.day_name() in ['Saturday', 'Sunday'], date2.day_name() in ['Saturday', 'Sunday']],
            'quarter': [str(date1.year) + 'Q' + str(date1.quarter), str(date2.year) + 'Q' + str(date2.quarter)]
        }
    )
    return df.set_index('DatetimeIndex')

print(calendar('2025-01-01', '2025-01-01'))

               day  month  year  day_of_week  is_weekend quarter
DatetimeIndex                                                   
2025-01-01       1      1  2025            2       False  2025Q1
2025-01-01       1      1  2025            2       False  2025Q1


## Exercise 4

Now for `analyze_text()`:

* If the input is not a list, raise a `TypeError` with the message "The input must be a list."
* If the input list contains elements that are not strings, raise a `ValueError` with the message "All elements in the list must be strings."

In [44]:
import re

def analyze_text(list_of_strings):
    if not isinstance(list_of_strings, list):
        raise TypeError('The input must be a list')
    for i in list_of_strings:
        if not isinstance(i, str):
            raise ValueError('All elements in the list must be strings')

    vowel_pattern = r'[a, e, i, o, u]'
    consonant_pattern = r'[^aeiou\s]'

    dictionary = {}
    dictionary['text'] = [sentence for sentence in list_of_strings]
    dictionary['number_of_words'] = [len(list(sentence)) for sentence in list_of_strings]
    dictionary['number_of_vowels'] = [len(re.findall(vowel_pattern, sentence)) for sentence in list_of_strings]
    dictionary['number_of_consonants'] = [len(re.findall(consonant_pattern, sentence)) for sentence in list_of_strings]
    dictionary['number_of_unique_vowels'] = [len(set(re.findall(vowel_pattern, sentence))) for sentence in list_of_strings]
    dictionary['number_of_unique_consonants'] = [len(set(re.findall(consonant_pattern, sentence))) for sentence in list_of_strings]

    df = pd.DataFrame(dictionary)
    return df

## Exercise 5

Put your functions in a script called `functions.py`.

Create a new function in your script called `main()` that prints the following:

* A calendar from '2025-01-01' to '2025-01-02'
* The text analysis for this list `["Check, check, check, is this thing on?"]`

This main function should be executed when the script is run, not when the script is imported.

## Exercise 6

Run the `functions.py` script in a terminal, and paste the output in the cell below.

               day  month  year  day_of_week  is_weekend quarter
DatetimeIndex
2025-01-01       1      1  2025            2       False  2025Q1
2025-01-02       2      1  2025            3       False  2025Q1
                                     text  number_of_words  ...  number_of_unique_vowels  number_of_unique_consonants
0  Check, check, check, is this thing on?               38  ...                        5                         
  10

[1 rows x 6 columns]

## Exercise 7

Time to build the tests for your functions.

Create a script called `test_functions.py` with the following tests:

For the `calendar()` function:

* Test if the `calendar()` function returns a DataFrame
* Test if the `calendar()` returns the correct number of rows
  * For this test, use the dates '2025-01-01' to '2025-01-02', it should return 48 rows
* Test if the `calendar()` returns the correct columns
  * The columns should be 'day', 'month', 'year', 'day_of_week', 'is_weekend', and 'quarter'

For the `analyze_text()` function:

* Test if the `analyze_text()` function returns a DataFrame
* Test if the `analyze_text()` raises a `TypeError` if the input is not a list
* Test if the `analyze_text()` raises a `ValueError` if the input list contains elements that are not strings

## Exercise 8

Run the tests in a terminal and paste the output in the cell below.

=================================================== FAILURES =================================================== 
______________________________________________ test_calendar_rows ______________________________________________ 

    def test_calendar_rows():
>       assert calendar('2025-01-01', '2025-01-02').shape == (48, 2)
E       assert (2, 6) == (48, 2)
E
E         At index 0 diff: 2 != 48
E         Use -v to get more diff

test_functions_exam.py:11: AssertionError
=========================================== short test summary info ============================================ 
FAILED test_functions_exam.py::test_calendar_rows - assert (2, 6) == (48, 2)
========================================= 1 failed, 5 passed in 2.04s ==========================================