# COMP0213: Object-Oriented Programming for Robotics and AI - Exercises

## Lab Session on 18 - Nov - 2025
By Daniel Tozadore

---


# Pandas & Matplotlib Practice Notebook — With Solutions & Method Hints

---

## Data preparation

In [None]:
# === Setup: imports & sample datasets ===
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

np.random.seed(2025)

### Task 1

Considering the code below:

In [17]:
# Customers (100 rows)
n_customers = 100
customer_ids = np.arange(1001, 1001 + n_customers)
first_names = np.array(["Alex","Sam","Chris","Taylor","Jordan","Morgan","Riley","Cameron","Casey","Jamie"]) 
last_names = np.array(["Smith","Jones","Brown","Taylor","Williams","Davies","Evans","Wilson","Thomas","Roberts"]) 
full_names = [f"{np.random.choice(first_names)} {np.random.choice(last_names)}" for _ in range(n_customers)]
ages = np.random.randint(18, 75, size=n_customers)
cities = np.random.choice(["London","Bristol","Manchester","Leeds","Birmingham","Cardiff","Glasgow"], size=n_customers)



customers = pd.DataFrame({
    "customer_id": customer_ids,
    "name": full_names,
    "age": ages,
    "city": cities,
})


* Create the Dataframe "customers" using the lists/arrays/dictionary: customer_id, full_names, ages, and cities. (Hint: use the pd.DataFrame() method)
* Print the dataframe "customers"

Your output should look like:

```python
--- customers ---
    customer_id           name  age     city
0         1001   Chris Thomas   50  Cardiff
1         1002  Taylor Taylor   70  Cardiff
2         1003     Alex Evans   23  Cardiff
3         1004   Casey Davies   63    Leeds
4         1005     Sam Thomas   52  Bristol
```

In [None]:
# TODO 
customers = ...

### Task 2 - OOP concepts

Create a class named "Customer_Handler" that contains the following elements: 
* The init method that:
    * Receives a dataframe as argument and stores it as a private variable named "customers"
    * Keep track of the number of customers with the private variable "__n_customers"
* __print_customer(self)__: A method that prints the private dataframe "customer"
* __generate_new_customer(self)->dict{}__:A method that generates a new fake data (entire row) of a customer and returns it as a dictionary. Use the code above to create the lists. 

In [None]:
class Customer_Handler:
    """Class to handle customer data operations."""
    
    def __init__(self, dataframe):
        self.__customers = dataframe
        self.__n_customers = len(dataframe)

    def generate_new_customer(self):
          
        customer_id = self.__n_customers + 1001
        # first_names = np.array(["Alex","Sam","Chris","Taylor","Jordan","Morgan","Riley","Cameron","Casey","Jamie"]) 
        # last_names = np.array(["Smith","Jones","Brown","Taylor","Williams","Davies","Evans","Wilson","Thomas","Roberts"]) 
        full_name = [f"{np.random.choice(first_names)} {np.random.choice(last_names)}"]
        ages = np.random.randint(18, 75)
        city = np.random.choice(["London","Bristol","Manchester","Leeds","Birmingham","Cardiff","Glasgow"])
        new_customer = {
            "customer_id": customer_id,
            "name": full_name,
            "age": ages,
            "city": city,
        }
        return new_customer
    
    def print_customers(self):
        print(self.__customers)    

    


In [None]:
# Testing the class

CusHandler = Customer_Handler(customers)
print("Average age of customers:", CusHandler.print_customers())

print("New customer data:", CusHandler.generate_new_customer())




### Task 3 - Adding more methods and pandas basics of inserting rows and columns

Copy and paste the code for your class in the code below and add the following methods:

* __get_customer(self, customer_id)__: A method (function) that returns a dataframe of the customer with the same id as the parameter.
    

* __add_customer(self)->new_id__: A method (function) that uses the output of the "generate_new_customer()" method and insert it into the dataframe customer and return the number of the new customer_id created. For that:
    * Don't forget to update the variable "__n_customer"
    * **Hint:** Use pd.concat([...]) or df.loc[new_index] = {...}

* __check_status(self)__: A method that creates another column "status" to check wheter the customer is active or inactive
    * You create another column in a dataframe by doing dataframe['new_column_name'] = new_values
    * All existing customers will be 'active' by default. 
    * Think about how to give the values a pandas Series with the name number of inputs as the customer number of the dataframe.

* __change_customer_city(self, customer_id, new_city)__:  A method that changes the customer city to the given new city. 
    * You can access the data in different ways: df.iloc[row, column]; df.loc[row, column], df.at[row, column], or df.iat[row, column], being:
        * loc: only work on index
        * iloc: work on position with integers
        * at: Get scalar values. It's a very fast loc
        * iat: Get scalar values. It's a very fast iloc






In [None]:
# TODO: Your code here

class Customer_Handler:
    """Class to handle customer data operations."""
    
    

Testing you code: 

In [None]:
CusHandler = Customer_Handler(customers)

new_id = CusHandler.add_customer()
print("Adding a customer:", new_id)
print("Checking new entries:", CusHandler.get_customer(new_id))


print("Checking status:", CusHandler.check_status())
print("Print dataframe (it should display the 'check' column)\n", CusHandler.print_customers())


---

### Task 4 - Accessing, filtering, and deleting in rows and columns
---


* __delete_customer(self, customer_id)__: A method that deletes the given customer id information from the database.
    * You should do it in 2 steps:
        * First, finding the row's index that has the value you are looking for: df.[df['customer_id']=='customer_id].index
        * Use the df.drop(<<index(es) of the row(s) you want to delete>>) 


* __change_customer_status(self, customer_id, status)__: A method that, given a customer id and a status, change the customer's status to the given new status.
    * You should do it in 2 steps:
        * First, finding the row's index  that has the value you are looking for: df.[df['customer_id']=='customer_id].index
        * Change their value with an accessing method (iloc, loc, at, iat). E.g.: df.iloc[id, 'column_name']=new_value

* __change_city_status(self, city, status)__: A method that, given a city and a status, change all the status of the customers in the city to the given status.



* __get_customers_by_city(self, city_name)__: A method that returns a subsetting of the customer dataframe filtered by the given city.
    * Remember to use the function .copy() to get a copy of the private dataframe and be able to use it.


* __get_ids_and_names(self)__: A method that returns a subsetting dataframe (a copy) with only customers ids and names
    * There are two different ways of doing it: 
        * Using the DataFrame.drop(columns=[...]) speficifying the columns (by name) that you want to drop; or
        * Creating a new dataframe with a subsetting of the columns (by name) that we want to keep. For instance: new_df = pd.DataFrame(old_df['column_name1','column_name2',...])



In [None]:
# TODO: Your code here

class Customer_Handler:
    """Class to handle customer data operations."""
    

Testing you code: 

In [None]:
CusHandler = Customer_Handler(customers)

# new_id = CusHandler.add_customer()

print(f'Deleting customer we just created. What a short life for {CusHandler.get_customer(new_id)["full_name"][0]}!')
# print("Checking new entries:", CusHandler.delete_customer(new_id))

CusHandler.delete_customer(new_id)
print("Checking deleted entry:", CusHandler.get_customer(new_id))
# Have you implemented an exception handling in here?? WHY NOT????


# Changing the status:
typed_id = int(input("Enter a customer ID to change their status: "))
CusHandler.__change_customer_status(typed_id, 'inactive')
print("Checking new status:", CusHandler.get_customer(typed_id))


CusHandler.change_city_status('London', 'inactive')
print("Checking new status for London customers:\n", CusHandler.get_customers_by_city('London'))


# Finally
print("Getting all customer IDs and names:\n", CusHandler.get_ids_and_names())

---
### Task 5 - 
---

* __return_aged_sorted(self)__: A method that returns a copy f the customer dataset sorted by age
    * Use the method df.sort('column_name')
    * If yoy want, you can reindex the new values with the function 'df.reindex()'

* __get_customers_age_average(self)__: A method that returns the customers age average using the function '.mean()' 


* __count_customer_per_city(self, city)__: A method that returns a subsetting of the customer dataframe using the function '.group_by()' and '.sum()'



In [None]:
# TODO: Your code here

class Customer_Handler:
    """Class to handle customer data operations."""
    

Testing you code: 


---
## Part 2 — Subsetting (Extra exercises)
---

### Data preparation

In [18]:
# Transactions (1,200 rows) across 2025
n_tx = 1200
start_date = pd.Timestamp("2025-01-01")
dates = start_date + pd.to_timedelta(np.random.randint(0, 365, size=n_tx), unit='D')
amount = np.round(np.random.normal(loc=20.0, scale=15.0, size=n_tx), 2)
category = np.random.choice(["groceries","transport","utilities","rent","entertainment","other"], size=n_tx, p=[0.25,0.2,0.15,0.1,0.2,0.1])
status = np.random.choice(["cleared","pending","failed"], size=n_tx, p=[0.8,0.18,0.02])

transactions = pd.DataFrame({
    "tx_id": np.arange(1, n_tx+1),
    "customer_id": np.random.choice(customer_ids, size=n_tx),
    "date": dates.normalize(),
    "amount": amount,
    "category": category,
    "status": status,
})



In [21]:
transactions

Unnamed: 0,tx_id,customer_id,date,amount,category,status
0,1,1078,2025-02-06,23.38,utilities,cleared
1,2,1088,2025-03-13,30.43,transport,cleared
2,3,1096,2025-02-22,13.83,groceries,cleared
3,4,1005,2025-10-19,11.60,utilities,cleared
4,5,1008,2025-08-06,-3.26,utilities,pending
...,...,...,...,...,...,...
1195,1196,1083,2025-02-06,13.05,entertainment,cleared
1196,1197,1070,2025-11-09,43.27,transport,cleared
1197,1198,1099,2025-05-21,24.69,groceries,cleared
1198,1199,1072,2025-10-31,6.66,utilities,cleared


### 1. Basic filter: X == Y

> **Hint:** Use boolean indexing: df[df['col'] == value]

In [None]:
# TODO
# Filter transactions where status == 'pending'.

#### Solution

In [None]:
transactions[transactions['status']=='pending'].head()

### 2. Multiple conditions

> **Hint:** Use & (and) and | (or) with parentheses

In [None]:
# TODO
# Filter groceries AND amount > 0.

#### Solution

In [None]:
transactions[(transactions['category']=='groceries') & (transactions['amount']>0)]

### 3. isin filter

> **Hint:** Use Series.isin([...])

In [None]:
# TODO
# Filter categories in ['rent','utilities'] using .isin().

#### Solution

In [None]:
transactions[transactions['category'].isin(['rent','utilities'])]

### 4. between

> **Hint:** Use Series.between(low, high, inclusive='both')

In [None]:
# TODO
# Customers aged between 30 and 50 inclusive; show name and age.

#### Solution

In [None]:
customers.loc[customers['age'].between(30,50), ['name','age']].head()

### 5. query method

> **Hint:** Use DataFrame.query('expr')

In [None]:
# TODO
# amount < 0 and status == 'cleared'.

#### Solution

In [None]:
transactions.query('amount < 0 and status == "cleared"')

### 6. Top-N after filtering

> **Hint:** Use DataFrame.nlargest(k, column)

In [None]:
# TODO
# Among entertainment transactions, return top 5 by amount desc.

#### Solution

In [None]:
transactions.loc[transactions['category']=='entertainment'].nlargest(5, 'amount')

### 7. Boolean mask reuse

> **Hint:** Build mask with .abs() and reuse it

In [None]:
# TODO
# Create mask where |amount| > 30 and compute mean of those amounts.

#### Solution

In [None]:
mask = transactions['amount'].abs() > 30
transactions.loc[mask, 'amount'].mean()

### 8. groupby sum

> **Hint:** Use df.groupby('col', as_index=False)['value'].sum()

In [None]:
# TODO
# Total amount per category sorted desc.

#### Solution

In [None]:
transactions.groupby('category', as_index=False)['amount'].sum().sort_values('amount', ascending=False)

### 9. groupby mean per customer

> **Hint:** Use groupby()+.mean() with a pre-filter

In [None]:
# TODO
# Mean amount per customer for cleared transactions only.

#### Solution

In [None]:
transactions.loc[transactions['status']=='cleared'].groupby('customer_id', as_index=False)['amount'].mean().rename(columns={'amount':'mean_amount'})

### 10. aggregate multiple metrics

> **Hint:** Use groupby().agg(count='count', total='sum', mean='mean')

In [None]:
# TODO
# For each category compute count, sum, mean of amount.

#### Solution

In [None]:
transactions.groupby('category')['amount'].agg(count='count', total='sum', mean='mean').reset_index()

---
## Part 3 — Pandas Methods (Extra)
---

Data preparation

In [None]:
# Sales (Products x Months)
np.random.seed(20251)
products = pd.DataFrame({
    "product_id": np.arange(2001, 2011),
    "product": [f"Product_{i:02d}" for i in range(1,11)],
    "category": np.random.choice(["A","B","C"], size=10)
})
months = pd.date_range("2025-01-01","2025-12-01", freq="MS")
sales_records = []
for _, p in products.iterrows():
    for m in months:
        sales_records.append({
            "product_id": p["product_id"],
            "month": m,
            "units": int(np.random.gamma(shape=5, scale=20)),
            "price": np.round(np.random.uniform(5, 50), 2)
        })
sales = pd.DataFrame(sales_records)


### 1. Sorting

> **Hint:** Use DataFrame.sort_values([...], ascending=[...])

In [None]:
# TODO
# Sort customers by city asc, age desc.

#### Solution

In [None]:
customers.sort_values(['city','age'], ascending=[True, False]).head()

### 2. fillna

> **Hint:** Use Series.fillna(value) and random index selection

In [None]:
# TODO
# Inject ~5% NaN in amount then compare mean before/after fillna(0).

#### Solution

In [None]:
import numpy as np
np.random.seed(0)
tx_nan = transactions.copy()
idx = np.random.choice(len(tx_nan), size=int(0.05*len(tx_nan)), replace=False)
mean_before = tx_nan['amount'].mean()
tx_nan.loc[tx_nan.index[idx], 'amount'] = np.nan
mean_after_fill = tx_nan['amount'].fillna(0).mean()
mean_before, mean_after_fill

### 3. dropna

> **Hint:** Use DataFrame.dropna(subset=[...])

In [None]:
# TODO
# From the NaN-injected copy, drop rows with NaN in amount and show shape.

#### Solution

In [None]:
tx_drop = tx_nan.dropna(subset=['amount'])
tx_drop.shape

### 4. drop_duplicates

> **Hint:** Use DataFrame.drop_duplicates(keep=...)

In [None]:
# TODO
# Create tiny df with duplicates; drop with different keep options.

#### Solution

In [None]:
small_dup = pd.DataFrame({'a':[1,1,2,2,2],'b':[3,3,4,4,5]})
no_dups_keep_first = small_dup.drop_duplicates(keep='first')
no_dups_keep_last = small_dup.drop_duplicates(keep='last')
no_dups_keep_none = small_dup.drop_duplicates(keep=False)
no_dups_keep_first, no_dups_keep_last, no_dups_keep_none

### 5. apply (column-wise)

> **Hint:** Use Series.apply(lambda x: ...)

In [None]:
# TODO
# New column 'sign': credit if amount>0 else debit.
# Use the function apply with a custom function for that. It should work column-wise.

#### Solution

In [None]:
transactions.assign(sign=transactions['amount'].apply(lambda x: 'credit' if x>0 else 'debit')).head()

### 6. apply (row-wise)

> **Hint:** Use DataFrame.apply(func, axis=1)

In [None]:
# TODO
# Build label like 'cleared:groceries' or 'pending:rent'.
# Same, but now is row-wise (with axis=1.)

#### Solution

In [None]:
transactions.assign(label=transactions.apply(lambda r: r['status'] + ':' + r['category'], axis=1)).head()

### 7. Summarizing with describe

> **Hint:** Use Series.describe()

In [None]:
# TODO
# Describe the distribution of amount.

#### Solution

In [None]:
transactions['amount'].describe()

---
## Part 4 — Matplotlib: Handling X and Y Axis (10 exercises)
---
Data Preparation

In [None]:
# Time series demo (daily)
ts = pd.DataFrame({
    "date": pd.date_range("2025-01-01","2025-12-31", freq="D"),
    "y": np.cumsum(np.random.normal(0, 1, 365))
})

# print("Datasets ready: customers, transactions, sales, ts")
# for name, df in [("customers", customers.head()), ("transactions", transactions.head()), ("sales", sales.head()), ("ts", ts.head())]:
#     print("\n---", name, "---\n", df)

### 1. Basic line plot

> **Hint:** Use plt.plot(x, y), plt.title(), plt.xlabel(), plt.ylabel()

In [None]:
# TODO
# Plot ts as a line with labels and title.
plt.figure(figsize=(8,3))
# TODO: your code here

plt.show()

#### Solution

In [None]:
plt.figure(figsize=(8,3))
plt.plot(ts['date'], ts['y'], label='y')
plt.title('Daily Series y')
plt.xlabel('Date'); plt.ylabel('y')
plt.tight_layout(); plt.show()

### 2. Date formatting

> **Hint:** Use matplotlib.dates MonthLocator, DateFormatter; fig.autofmt_xdate()

In [None]:
# TODO
# Format ticks as 'Mon YYYY' and rotate labels.

#### Solution

In [None]:
import matplotlib.dates as mdates
fig, ax = plt.subplots(figsize=(8,3))
ax.plot(ts['date'], ts['y'])
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))
fig.autofmt_xdate()
ax.set_title('Date formatting example')
plt.tight_layout(); plt.show()

### 3. Axis limits

> **Hint:** Use ax.set_xlim() and ax.set_ylim() with quantiles

In [None]:
# TODO
# First half of the year and y-limits at 5th/95th percentiles.

#### Solution

In [None]:
midpoint = ts['date'].min() + pd.Timedelta(days=182)
lo, hi = ts['y'].quantile([0.05, 0.95])
fig, ax = plt.subplots(figsize=(8,3))
ax.plot(ts['date'], ts['y'])
ax.set_xlim(ts['date'].min(), midpoint)
ax.set_ylim(lo, hi)
ax.set_title('First half with percentile y-limits')
plt.tight_layout(); plt.show()

### 4. Custom ticks

> **Hint:** Use ax.xaxis.set_major_locator(MonthLocator) and set_major_formatter()

In [None]:
# TODO
# Set monthly ticks labelled by month abbreviation.

#### Solution

In [None]:
import matplotlib.dates as mdates
fig, ax = plt.subplots(figsize=(8,3))
ax.plot(ts['date'], ts['y'])
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
plt.tight_layout(); plt.show()

### 5. Grid and legend

> **Hint:** Use plt.grid(alpha=...) and plt.legend()

In [None]:
# TODO
# Add grid and legend to your line plot.

#### Solution

In [None]:
plt.figure(figsize=(8,3))
plt.plot(ts['date'], ts['y'], label='y', color='tab:blue')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout(); plt.show()

---
# That's it! 

### Thank you for you effort. 

---
Dani.