# Practice: Final Exam

* Date and time operations
* Regular expressions
* Error handling
* Unit testing


## Exercise 1

Create a function that receives a text and returns ALL the strings that contain dates in YYYY-MM-DD format.

Example:

```python
text = "The event will take place on 2025-03-15 14:30 and 2025-03-16 at 10:00"

print(find_iso_dates(text))
```

In [12]:
import re


def find_iso_dates(text):
    """Find all YYYY-MM-DD dates and datetimes in a given text."""
    iso_pattern = r"\b\d{4}-\d{2}-\d{2}\b"
    return re.findall(iso_pattern, text)


text = "The event will take place on 2025-03-15 at 14:30 and 2025-03-16 at 10:00"

print(find_iso_dates(text))

['2025-03-15', '2025-03-16']


## Exercise 2

Create a function that receives a list of strings representing datetimes in ISO 8601 format and returns a pandas series of datetime objects.

In [16]:
import pandas as pd


def convert_to_datetime(date_strings):
    """Convert a list of ISO 8601 strings to a pandas Series of datetime objects."""
    return pd.Series(pd.to_datetime(date_strings))


# Example usage
date_strings = ["2024-01-01T12:00:00", "2024-01-02T13:00:00", "2024-01-03T14:00:00"]

# use function
convert_to_datetime(date_strings)

0   2024-01-01 12:00:00
1   2024-01-02 13:00:00
2   2024-01-03 14:00:00
dtype: datetime64[ns]

## Exercise 3

Using `try-except` evaluate the previous function to catch the errors that may arise from the conversion.

Try as many different types of errors as you can.

In [19]:
convert_to_datetime("2025-01-01")  # works without a list of strings

0   2025-01-01
dtype: datetime64[ns]

In [26]:
convert_to_datetime(
    "2025-01-01", "2025-01-02"
)  # doesn't work with multiple strings, it needs a list

# TypeError: convert_to_datetime() takes 1 positional argument but 2 were given

TypeError: convert_to_datetime() takes 1 positional argument but 2 were given

In [None]:
convert_to_datetime(
    ["2025-01-01", "2025"]
)  # doesn't work with a list of strings that are not in the correct format

# ValueError: time data "2025" doesn't match format "%Y-%m-%d"

ValueError: time data "2025" doesn't match format "%Y-%m-%d", at position 1. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

In [24]:
convert_to_datetime(
    [2, 100, 1000]
)  # works with a list of integers, it creates microseconds since 1970-01-01

0   1970-01-01 00:00:00.000000002
1   1970-01-01 00:00:00.000000100
2   1970-01-01 00:00:00.000001000
dtype: datetime64[ns]

In [31]:
# update the function to handle the errors
def convert_to_datetime(date_strings):
    try:
        return pd.Series(pd.to_datetime(date_strings))
    except TypeError as e:
        print(f"Error: {e}")
        return None
    except ValueError as e:
        print(f"Error: {e}")
        return None


convert_to_datetime(["2025-01-01", "2025-01-02", "1"])

Error: time data "1" doesn't match format "%Y-%m-%d", at position 2. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.


## Exercise 4

Create a function called `create_dataframe` that creates a pandas dataframe with 2 columns:

* `ds`: a datetime object with the current date and time
* `y`: an integer with a random number between 0 and 100

The dataframe should have 1000 rows.

In [34]:
import numpy as np
import pandas as pd


def create_dataframe():
    """Create a pandas dataframe with 2 columns: ds and value."""
    return pd.DataFrame(
        {
            "ds": pd.date_range(start="2024-01-01", periods=1000),
            "y": np.random.randint(0, 100, size=1000).astype(int),
        }
    )


create_dataframe()

Unnamed: 0,ds,y
0,2024-01-01,21
1,2024-01-02,95
2,2024-01-03,53
3,2024-01-04,40
4,2024-01-05,14
...,...,...
995,2026-09-22,36
996,2026-09-23,15
997,2026-09-24,31
998,2026-09-25,74


## Exercise 5

Create a test for the `create_dataframe` function in a script called `test_create_dataframe.py`.

It should check the following:

* The dataframe is a pandas dataframe
* The dataframe has 2 columns: `ds` and `y`
* The dataframe has 1000 rows
* The `ds` column is a datetime object
* The `y` column is an integer

Use the `pytest` library to create and run the test. Run the test using the command `pytest` in the terminal, and then copy the output and paste it in the cell below.

In [None]:
import functions_fe
import numpy as np
import pandas as pd
import pytest


def test_create_dataframe():
    df = functions_fe.create_dataframe()
    assert isinstance(df, pd.DataFrame)
    assert df.columns.tolist() == ["ds", "y"]
    assert df.shape == (1000, 2)
    assert isinstance(df["ds"][0], pd.Timestamp)
    assert isinstance(df["y"][0], np.int64)