# Assignment 9: Pandas

Please read the tasks description carefully and implement **only** what the tasks ask you to implement. Closely following the task descriptions will be beneficial, so keep your divergence in check - the test cases below each input cell are the gold standard. Finally, for this assignment, you do not need any error handling, you can assume that all input to your function will be valid.

As for the other assignments, using `print` is encouraged to test your implementation but is never required. Make sure not to confuse `return` and `print` statements: If your function has to **return** something, use the `return` statement. 

Try to implement the tasks yourself or in a small team. If you blindly copy a solution from the Internet or other students, you will not take home any learnings. Rather, make an effort to understand the solution! Furthermore, do not modify the _test cells_ - if you do, you effectively cheat the system which is not helpful for your learning process.

Some aspects of this assignment require you to <strong>self-study</strong> and do some research beyond the lecture contents - use your favorite search engine to look up documentation, usage examples, and definitions of the mentioned functions. There might be tasks where you have to read and investigate the [Python Standard Library](https://docs.python.org/3/library/) to find the documentation for a function that is used or that you want to use.

This assignment will use the third-party module [pandas](https://pandas.pydata.org/).

In Google Colab and Anaconda, it is already installed. If you see an `ImportError` in the next cell, run `%pip install pandas` to install this module.

---
# Task 0: Loading the `csv` files.

We will operate on a `pd.DataFrame` from the [Pandas](https://pandas.pydata.org/) third-party module.

We will load the file with the following function `load_file()`. Do not modify this function.

In [1]:
import pandas as pd

Execute the following cell to check if you uploaded the file correctly.

In [3]:
from pathlib import Path

def load_file(path, msg=True):
    if isinstance(path, str):
        path = Path(path)
    if msg:
        color_success, color_error, reset = "\033[1;42m", "\033[1;41m", "\033[0m"
        tpl_success = f"The {path!s:^13} file was found and can be loaded."
        tpl_error = f"Please upload the the {path!s:^13} file."
        empty_line = " " * 60
        msg_success = f"{color_success}{empty_line}\n{tpl_success:^60}\n{empty_line}{reset}"
        msg_error = f"{color_error}{empty_line}\n{tpl_error:^60}\n{empty_line}{reset}"
    if path.exists():
        if msg:
            print(msg_success)
        return pd.read_csv(path, index_col=0)
    if msg:
        print(msg_error)
    raise ValueError("You must upload the file.")


# We load both DataFrames in the beginning
ORDERS = load_file(Path('orders.csv'))
PERSONS = load_file(Path('persons.csv'))
IRIS = load_file(Path('iris.csv'))

[1;42m                                                            
    The  orders.csv   file was found and can be loaded.     
                                                            [0m
[1;42m                                                            
    The  persons.csv  file was found and can be loaded.     
                                                            [0m
[1;42m                                                            
    The   iris.csv    file was found and can be loaded.     
                                                            [0m


In [5]:
import pandas as pd

In [32]:
ORDERS.loc[1:10]

Unnamed: 0,products,purchase-id
1,"{'chair': 2, 'apple': 1, 'car': 1, 'smartphone...",3
2,"{'drill': 1, 'wallet': 1, 'chair': 1, 'window'...",7
3,"{'book': 3, 'lamp': 1, 'smartphone': 2, 'door'...",8
4,"{'paper': 1, 'shirt': 1, 'car': 1, 'robot': 1,...",10
5,"{'mask': 3, 'drill': 1, 'door': 1, 'paper': 3,...",11
6,"{'table': 2, 'wallet': 2, 'car': 1, 'smartphon...",13
7,"{'chair': 1, 'drill': 2, 'shirt': 2, 'apple': ...",14
8,"{'mask': 2, 'door': 1, 'window': 2, 'apple': 1...",15
9,"{'door': 2, 'banana': 1, 'smartphone': 2, 'tab...",16
10,"{'robot': 1, 'wallet': 2, 'apple': 1, 'window'...",19


In [36]:
ORDERS.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 69703 entries, 0 to 69702
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   products     69703 non-null  object
 1   purchase-id  69703 non-null  int64 
dtypes: int64(1), object(1)
memory usage: 3.6+ MB


In [27]:
PERSONS.loc[1:10]

Unnamed: 0,name,birthdate,purchase-id
1,Anna Hanna MD,1963-08-10,3
2,Anna Hanna MD,1963-08-10,4
3,Anna Hanna MD,1963-08-10,5
4,Anna Hanna MD,1963-08-10,7
5,Anna Hanna MD,1963-08-10,8
6,Mr. Jason Carter,1920-01-11,9
7,Joseph Davis,1906-04-21,10
8,Jeffrey Hayes,1913-02-15,15
9,Maria Heath,1921-07-01,16
10,Diane Ellis,2008-02-09,19


In [41]:
PERSONS.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 59805 entries, 0 to 59804
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   name         59805 non-null  object
 1   birthdate    59805 non-null  object
 2   purchase-id  59805 non-null  int64 
dtypes: int64(1), object(2)
memory usage: 3.8+ MB


In [39]:
IRIS.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


---
# Task 1

Implement the function `valid_orders()` that returns all orders of the DataFrame`ORDERS` which have a matching person in the DataFrame `PERSONS`.
Your function must return a DataFrame with this information.

Meaning: Return **only** the orders that have a matching person. There will not be any NaN value in the dataframe.

The columns of the resulting dataframe must be: products, purchase-id, name, birthdate.

Do not modify the two dataframes!

In [14]:
# ⬇️ Add your code below this line ⬇️
### BEGIN SOLUTION

def valid_orders():
    return pd.merge(       # similar to a join, we merge the two dfs
        ORDERS,            # 
        PERSONS,           # 
        on='purchase-id',  # both dfs have the same column purchase-id, our join value
        how='inner'        # we only want orders with a purchase-id, so, a inner join
    )

### END SOLUTION
# ⬆️ Add your code above this line ⬆️

In [30]:
valid_orders().loc[0:10]

Unnamed: 0,products,purchase-id,name,birthdate
0,"{'chair': 2, 'apple': 1, 'car': 1, 'smartphone...",3,Anna Hanna MD,1963-08-10
1,"{'drill': 1, 'wallet': 1, 'chair': 1, 'window'...",7,Anna Hanna MD,1963-08-10
2,"{'book': 3, 'lamp': 1, 'smartphone': 2, 'door'...",8,Anna Hanna MD,1963-08-10
3,"{'paper': 1, 'shirt': 1, 'car': 1, 'robot': 1,...",10,Joseph Davis,1906-04-21
4,"{'mask': 2, 'door': 1, 'window': 2, 'apple': 1...",15,Jeffrey Hayes,1913-02-15
5,"{'door': 2, 'banana': 1, 'smartphone': 2, 'tab...",16,Maria Heath,1921-07-01
6,"{'robot': 1, 'wallet': 2, 'apple': 1, 'window'...",19,Diane Ellis,2008-02-09
7,"{'drill': 2, 'paper': 3, 'lamp': 1, 'mask': 2,...",21,Edward Rodriguez,1989-11-02
8,"{'chair': 1, 'book': 2, 'table': 2, 'car': 1, ...",23,Edward Rodriguez,1989-11-02
9,"{'book': 1, 'robot': 2, 'mask': 1, 'wallet': 2...",29,Rachel Phillips MD,1916-12-30


In [25]:

def valid_orders_outer():
    return pd.merge(       # similar to a join, we merge the two dfs
        ORDERS,            # 
        PERSONS,           # 
        on='purchase-id',  # both dfs have the same column purchase-id, our join value
        how='outer'        # we only want orders with a purchase-id, so, a inner join
    )

valid_orders_outer().head()

Unnamed: 0,products,purchase-id,name,birthdate
0,"{'robot': 1, 'chair': 1, 'apple': 1, 'book': 1}",2,,
1,"{'chair': 2, 'apple': 1, 'car': 1, 'smartphone...",3,Anna Hanna MD,1963-08-10
2,"{'drill': 1, 'wallet': 1, 'chair': 1, 'window'...",7,Anna Hanna MD,1963-08-10
3,"{'book': 3, 'lamp': 1, 'smartphone': 2, 'door'...",8,Anna Hanna MD,1963-08-10
4,"{'paper': 1, 'shirt': 1, 'car': 1, 'robot': 1,...",10,Joseph Davis,1906-04-21


In [19]:
# Test Case
from unittest import TestCase
__ = TestCase()

# Sanity
__.assertTrue('valid_orders' in locals(), msg='You have to call the function `valid_orders`.')

# Check if the dataframes were modified.
_original_orders = load_file('orders.csv', False)
_original_persons = load_file('persons.csv', False)
try:
    pd.testing.assert_frame_equal(ORDERS, _original_orders)
    pd.testing.assert_frame_equal(PERSONS, _original_persons)
except AssertionError:
    __.assertFalse(True, msg='You must not modify the two original Dataframes.')

# reset DF in case it was modified
ORDERS, PERSONS = _original_orders, _original_persons

# Call your function
_student = valid_orders()

__.assertIsInstance(_student, pd.DataFrame, msg="You must return a DataFrame.")
__.assertEqual(41715, len(_student), msg="The resulting DataFrame must have 41,715 entries.")
__.assertEqual(4, len(_student.columns), msg="There must be exactly 4 columns.")
__.assertListEqual(['products', 'purchase-id', 'name', 'birthdate'], list(_student.columns), msg='Your columns differ from the expected value.')

__.assertFalse(_student.isna().any(axis=1).any(), msg='There must not be any NaN values.')

# Check one sample
__.assertEqual(
    {'products': "{'chair': 1, 'mask': 1, 'apple': 2, 'book': 1}", 'purchase-id': 60932, 'name': 'Kevin Williams', 'birthdate': '1951-10-08'},
    _student.loc[25491].to_dict(),
    msg="At index 25491, the row differs."
)

---
# Task 2

Implement the function `all_orders()` that returns all orders of the DataFrame`ORDERS` with the matching entry in the `PERSONS` dataframe. Opposed to Task 1, now, include also the entries that don't have a matching entry.

Meaning: Return a dataframe that contains **all** orders and, if found, their respective person. There will be NaN values in the dataframe.

Your function must return a DataFrame with this information.

The columns of the resulting dataframe must be: products, purchase-id, name, birthdate.

Do not modify the two dataframes!


In [None]:
# ⬇️ Add your code below this line ⬇️
### BEGIN SOLUTION

def all_orders():
    return pd.merge(       # similar to a join, we merge the two dfs
        ORDERS,            # 
        PERSONS,           # 
        on='purchase-id',  # both dfs have the same column purchase-id, our join value
        how='left'         # We want the orders, regardless of a matching value
    )

### END SOLUTION
# ⬆️ Add your code above this line ⬆️

In [None]:
# Test Case
from unittest import TestCase
__ = TestCase()

# Sanity
__.assertTrue('valid_orders' in locals(), msg='You have to call the function `valid_orders`.')

# Check if the dataframes were modified.
_original_orders = load_file('orders.csv', False)
_original_persons = load_file('persons.csv', False)
try:
    pd.testing.assert_frame_equal(ORDERS, _original_orders)
    pd.testing.assert_frame_equal(PERSONS, _original_persons)
except AssertionError:
    __.assertFalse(True, msg='You must not modify the two original Dataframes.')

# reset DF in case it was modified
ORDERS, PERSONS = _original_orders, _original_persons


_student = all_orders()

__.assertIsInstance(_student, pd.DataFrame, msg="You must return a DataFrame.")
__.assertEqual(69703, len(_student), msg="The resulting DataFrame must have 69703 entries.")
__.assertEqual(4, len(_student.columns), msg="There must be exactly 4 columns.")
__.assertListEqual(['products', 'purchase-id', 'name', 'birthdate'], list(_student.columns), msg='Your columns differ from the expected value.')

# Check one sample
pd.testing.assert_series_equal(
    pd.Series({'products': "{'window': 1, 'car': 1, 'chair': 1, 'banana': 1}", 'purchase-id': 77316, 'name': pd.NA, 'birthdate': pd.NA}, name=53895),
    _student.loc[53895]
)

---
# Task 3

Implement the function `missing_orders()` that returns all orders of the DataFrame`ORDERS` with the matching entry in the `PERSONS` dataframe. Opposed to Task 1 and 2, now, include the entries from `PERSONS` that don't have a matching entry.

Meaning: Return a dataframe with **all** persons and if existent, their orders. There will be NaN values in the dataframe.

Your function must return a DataFrame with this information.

The columns of the resulting dataframe must be: products, purchase-id, name, birthdate.

Do not modify the two dataframes!

In [None]:
# ⬇️ Add your code below this line ⬇️
### BEGIN SOLUTION

def missing_orders():
    return pd.merge(       # similar to a join, we merge the two dfs
        ORDERS,            # 
        PERSONS,           # 
        on='purchase-id',  # both dfs have the same column purchase-id, our join value
        how='right'        # We want also the persons even if there is not order
    )

### END SOLUTION
# ⬆️ Add your code above this line ⬆️

In [None]:
# Test Case
from unittest import TestCase
__ = TestCase()

# Sanity
__.assertTrue('valid_orders' in locals(), msg='You have to call the function `valid_orders`.')

# Check if the dataframes were modified.
_original_orders = load_file('orders.csv', False)
_original_persons = load_file('persons.csv', False)
try:
    pd.testing.assert_frame_equal(ORDERS, _original_orders)
    pd.testing.assert_frame_equal(PERSONS, _original_persons)
except AssertionError:
    __.assertFalse(True, msg='You must not modify the two original Dataframes.')

# reset DF in case it was modified
ORDERS, PERSONS = _original_orders, _original_persons


_student = missing_orders()

__.assertIsInstance(_student, pd.DataFrame, msg="You must return a DataFrame.")
__.assertEqual(59805, len(_student), msg="The resulting DataFrame must have 59,805 entries.")
__.assertEqual(4, len(_student.columns), msg="There must be exactly 4 columns.")
__.assertListEqual(['products', 'purchase-id', 'name', 'birthdate'], list(_student.columns), msg='Your columns differ from the expected value.')

# Check two samples
pd.testing.assert_series_equal(
    pd.Series({'products': "{'table': 1, 'paper': 1, 'wallet': 1, 'robot': 1}", 'purchase-id': 7165, 'name': "Shelly Sullivan", 'birthdate': "1968-06-09"}, name=4269),
    _student.loc[4269]
)
pd.testing.assert_series_equal(
    pd.Series({'products': None, 'purchase-id': 52255, 'name': "Thomas Smith", 'birthdate': "1941-11-14"}, name=31337),
    _student.loc[31337]
)


---
# Task 4

Implement the function `count_items(df)` that adds a new column to the given DataFrame `df` with the orders: `product_count`.
It must contain the number of items for this order. Don't forget to return the new DataFrame.

The items per row are encoded in [JSON](https://de.wikipedia.org/wiki/JavaScript_Object_Notation). Use the imported [json](https://docs.python.org/3/library/json.html) to convert this json into a Python dictionary. Then, sum up the numbers of items per order.

For the `json` module to understand a string correctly as JSON, all single quotation marks `'` must be replaced by double quotation marks `"` before calling the appropriate function from the `json` module.

_Hint:_ Use the `apply` function of the DataFrame.

_Hint:_ Use a `lambda` function to replace the quotation marks, interpret the JSON string, and sum the number of items. However, you can define your own function as well.

In [None]:
import json

def count_items(df):
    # ⬇️ Add your code below this line ⬇️
    ### BEGIN SOLUTION
    
    df['product_count'] = df['products'].apply(
        lambda v:
            sum(                          # 4. sum over all entries
                json.loads(               # 2. interpret the string value as a dict
                    v.replace("'", '"')   # 1. replace ' by "
                ).values()                # 3. we care about the counts
            )
    )
    return df

### END SOLUTION
# ⬆️ Add your code above this line ⬆️

# Reset the DataFrame
count_items(ORDERS.copy())  # We run it on a copy!

In [None]:
# Test Case
from unittest import TestCase
__ = TestCase()

# Sanity
__.assertTrue('count_items' in locals(), msg='You have to call the function `count_items`.')

# Check if the dataframes were modified.
_original_orders = load_file('orders.csv', False)
try:
    pd.testing.assert_frame_equal(ORDERS, _original_orders)
except AssertionError:
    __.assertFalse(True, msg='You must not modify the original DataFrame.')

_student = count_items(_original_orders.copy())

__.assertIsNotNone(_student, msg="Your function must not return None.")
__.assertIsInstance(_student, pd.DataFrame, msg="You must return a DataFrame.")
__.assertEqual(69703, len(_student), msg="The resulting DataFrame must have 69703 entries.")
__.assertEqual(3, len(_student.columns), msg="There must be exactly 3 columns.")
__.assertListEqual(['products', 'purchase-id', 'product_count'], list(_student.columns), msg='Your columns differ from the expected value.')

# Check two samples
__.assertEqual(
    "products         {'window': 1, 'car': 1, 'chair': 1, 'banana': 1}\npurchase-id                                                 77316\nproduct_count                                                   4",
    _student.loc[53895].to_string()
)
__.assertEqual(
    "products         {'table': 2, 'shirt': 1, 'drill': 1, 'chair': ...\npurchase-id                                                   1576\nproduct_count                                                    7",
    _student.loc[1122].to_string()
)


---
# Task 5: Flowers

Now, we load a different dataset: [IRIS](https://archive.ics.uci.edu/ml/datasets/Iris)

It has 4 *feature* columns: `sepal_length`, `sepal_width`, `petal_length`, and `petal_width`.

And one that gives the *species* of the observed flower: `species`.


In [None]:
IRIS = load_file("iris.csv")

---
## Task 5.1: Mean

Implement the function `iris_mean()` that return a new dataframe with the `species` as the index and the **average** value columnwise of each of the four feature columns.

The returned dataframe will then have 4 columns and 3 rows.

In [None]:
# ⬇️ Add your code below this line ⬇️
### BEGIN SOLUTION

def iris_mean():
    return IRIS.groupby("species").mean()  # per species, columnwise mean

### END SOLUTION
# ⬆️ Add your code above this line ⬆️

In [None]:
# Test Case
from unittest import TestCase
__ = TestCase()

# Sanity
__.assertTrue('iris_mean' in locals(), msg='You have to call the function `iris_mean`.')

# reset DF in case it was modified
IRIS = load_file("iris.csv", msg=False)

_student = iris_mean()

__.assertIsInstance(_student, pd.DataFrame, msg="You must return a DataFrame.")
__.assertEqual(3, len(_student), msg="The resulting DataFrame must have 3 rows.")
__.assertEqual(4, len(_student.columns), msg="There must be exactly 4 columns.")
__.assertListEqual(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], list(_student.columns), msg='Your columns differ from the expected value.')

# Check full sample
pd.testing.assert_frame_equal(
    pd.DataFrame(
        [{
            'sepal_length': 5.006,
            'sepal_width': 3.41,
            'petal_length': 1.464,
            'petal_width': 0.244,
        },
        {
            'sepal_length': 5.963,
            'sepal_width': 2.770,
            'petal_length': 4.260,
            'petal_width': 1.326,
        },
        {
            'sepal_length': 6.588,
            'sepal_width': 2.974,
            'petal_length': 5.552,
            'petal_width': 2.026,
        }],
        index=pd.Series(['setosa', 'versicolor', 'virginica'], name='species')
    ),
    _student,
    check_exact=False, rtol=0.5e-2
)


# Check if the dataframes were modified.
try:
    pd.testing.assert_frame_equal(IRIS, load_file("iris.csv", msg=False))
except AssertionError:
    __.assertFalse(True, msg='You must not modify the original Dataframe.')

print("\n\033[37;42;2m  Success! Your code works as intended.  \033[0m\n")

---
## Task 5.2: Other Aggregations

Implement the function `iris_complex_aggregations()` that return a new dataframe with the `species` as the index and the following columns:

* average of `sepal_length`
* the minimum and maximum of `petal_length` (as multi-column)
* the median of `petal_width`.
* the standard deviation of `sepal_width`.

The returned dataframe will then have 5 columns and 3 rows.

_Hint:_ Use the `.agg()` [agg](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html)

In [None]:
# ⬇️ Add your code below this line ⬇️
### BEGIN SOLUTION

def iris_complex_aggregations():
    return IRIS.groupby("species").agg(
        {
            'sepal_length': "mean",
            'petal_length': ["min", "max"],
            'petal_width': "median",
            'sepal_width': "std"
        }
    )

### END SOLUTION
# ⬆️ Add your code above this line ⬆️

In [None]:
# Test Case
from unittest import TestCase
__ = TestCase()

# Sanity
__.assertTrue('iris_complex_aggregations' in locals(), msg='You have to call the function `iris_complex_aggregations`.')

# reset DF in case it was modified
IRIS = load_file("iris.csv", msg=False)

_student = iris_complex_aggregations()

__.assertIsInstance(_student, pd.DataFrame, msg="You must return a DataFrame.")
__.assertEqual(3, len(_student), msg="The resulting DataFrame must have 3 rows.")
__.assertEqual(5, len(_student.columns), msg="There must be exactly 5 columns.")
__.assertListEqual([('sepal_length', 'mean'), ('petal_length', 'min'), ('petal_length', 'max'), ('petal_width', 'median'), ('sepal_width', 'std')], list(_student.columns), msg='Your columns differ from the expected value.')

# Check full sample
pd.testing.assert_frame_equal(
    pd.DataFrame(
        {('sepal_length', 'mean'): {'setosa': 5.006, 'versicolor': 5.936, 'virginica': 6.588},
         ('petal_length', 'min'): {'setosa': 1.0, 'versicolor': 3.0, 'virginica': 4.5},
         ('petal_length', 'max'): {'setosa': 1.9, 'versicolor': 5.1, 'virginica': 6.9},
         ('petal_width', 'median'): {'setosa': 0.2, 'versicolor': 1.3,'virginica': 2.0},
         ('sepal_width', 'std'): {'setosa': 0.381,'versicolor': 0.314,'virginica': 0.322}},
        index=pd.Series(['setosa', "versicolor", "virginica"], name='species')), 
    _student,
    check_exact=False,
    rtol=1e-2
)

# Check if the dataframes were modified.
try:
    pd.testing.assert_frame_equal(IRIS, load_file('iris.csv', msg=False))
except AssertionError:
    __.assertFalse(True, msg='You must not modify the original Dataframe.')


print("\n\033[37;42;2m  Success! Your code works as intended.  \033[0m\n")

---
# Task 6: Flipping a DataFrame

Implement the function `study_overview` that operates on the given dataframe `STUDIES`. It must return a new DataFrame that gives an overview of where which combinations of `course` and `canton` exist for Bachelor and Master studies.

Example:
```python
STUDIES = load_studies()

overview = study_overview()  # <- your function

# We check if there is "Computer Science" in "St.Gallen":
print(overview['Computer Science']['St.Gallen'])
>>> Bachelor   # meaning yes

# Of "Marketing" in "Aargau":
print(overview['Marketing']['Aargau'])
>>> nan   # meaning no
```

In [None]:
def load_studies():
    regions = [ "St.Gallen"] * 5 + ["Thurgau"] * 3 + ["Zürich"] * 2 + ["Aargau"] * 4
    bachelor_master = ["Bachelor"] * 3 + ["Master"] * 3 + ["Bachelor"] * 3 + ["Master"] * 4
    courses = ["Computer Science", "Economy", "Law", "International Affairs", "Marketing", "Economy", "Computer Science", "Law", "Computer Science", "Law", "Computer Science", "Law", "Economy"    ]
    return pd.DataFrame(zip(regions, bachelor_master, courses), columns=["region", "bachelor/master", "course"])

STUDIES = load_studies()

In [None]:
# ⬇️ Add your code below this line ⬇️
### BEGIN SOLUTION

def study_overview():
    return STUDIES.pivot(index='region', columns='course', values='bachelor/master')

### END SOLUTION
# ⬆️ Add your code above this line ⬆️

In [None]:
# Test Case
from unittest import TestCase
__ = TestCase()

# Sanity
__.assertTrue('study_overview' in locals(), msg='You have to call the function `study_overview`.')

# reset DF in case it was modified
STUDIES = load_studies()

_student = study_overview()

__.assertIsInstance(_student, pd.DataFrame, msg="You must return a DataFrame.")
__.assertEqual(4, len(_student), msg="The resulting DataFrame must have 4 rows.")
__.assertEqual(5, len(_student.columns), msg="There must be exactly 5 columns.")
__.assertListEqual(['Computer Science', 'Economy', 'International Affairs', 'Law', 'Marketing'], list(_student.columns), msg='Your columns differ from the expected value.')

# Check full sample
__.assertEqual('Master', _student['Computer Science']['Aargau'], msg='Wrong value for Computer Science-Aargau')
__.assertEqual('Master', _student['Economy']['Aargau'], msg='Wrong value for Economy-Aargau')
__.assertTrue(pd.isna(_student['International Affairs']['Aargau']), msg='No valid entry for International Affairs-Aargau')
__.assertEqual('Master', _student['Law']['Aargau'], msg='Wrong value for Law-Aargau')
__.assertTrue(pd.isna(_student['Marketing']['Aargau']), msg='No valid entry for Marketing-Aargau')
__.assertEqual('Bachelor', _student['Computer Science']['St.Gallen'], msg='Wrong value for Computer Science-St.Gallen')
__.assertEqual('Bachelor', _student['Economy']['St.Gallen'], msg='Wrong value for Economy-St.Gallen')
__.assertEqual('Master', _student['International Affairs']['St.Gallen'], msg='Wrong value for International Affairs-St.Gallen')
__.assertEqual('Bachelor', _student['Law']['St.Gallen'], msg='Wrong value for Law-St.Gallen')
__.assertEqual('Master', _student['Marketing']['St.Gallen'], msg='Wrong value for Marketing-St.Gallen')
__.assertEqual('Bachelor', _student['Computer Science']['Thurgau'], msg='Wrong value for Computer Science-Thurgau')
__.assertEqual('Master', _student['Economy']['Thurgau'], msg='Wrong value for Economy-Thurgau')
__.assertTrue(pd.isna(_student['International Affairs']['Thurgau']), msg='No valid entry for International Affairs-Thurgau')
__.assertEqual('Bachelor', _student['Law']['Thurgau'], msg='Wrong value for Law-Thurgau')
__.assertTrue(pd.isna(_student['Marketing']['Thurgau']), msg='No valid entry for Marketing-Thurgau')
__.assertEqual('Bachelor', _student['Computer Science']['Zürich'], msg='Wrong value for Computer Science-Zürich')
__.assertTrue(pd.isna(_student['Economy']['Zürich']), msg='No valid entry for Economy-Zürich')
__.assertTrue(pd.isna(_student['International Affairs']['Zürich']), msg='No valid entry for International Affairs-Zürich')
__.assertEqual('Master', _student['Law']['Zürich'], msg='Wrong value for Law-Zürich')
__.assertTrue(pd.isna(_student['Marketing']['Zürich']), msg='No valid entry for Marketing-Zürich')

# Check if the dataframes were modified.
try:
    pd.testing.assert_frame_equal(STUDIES, load_studies())
except AssertionError:
    __.assertFalse(True, msg='You must not modify the original Dataframe.')


print("\n\033[37;42;2m  Success! Your code works as intended.  \033[0m\n")