### Problem Overview:

You are given a list of dictionaries. Each dictionary represents a sales transaction and contains three keys: 'id', 'product', and 'price'. 'id' represents the id of a transaction, 'product' represents the name of the product, and 'price' is the price of the product.

Your task is to write a function that filters out transactions with a price greater than a given threshold and returns a new list of dictionaries with the remaining transactions.

The function should return a new list of dictionaries. Each dictionary should contain the 'id', 'product', and 'price' for each transaction.

### Libraries Needed:

None

### Inputs:

The function will take the following inputs:

A list of dictionaries 'transactions'. Each dictionary contains three keys: 'id', 'product', and 'price'. 'id' is a string that represents the id of a transaction, 'product' is a string that represents the name of the product, and 'price' is a float that represents the price of the product.
A float 'threshold'. This represents the maximum price for a transaction to be included in the output.


### Expected Outputs:

The function should return a list of dictionaries. Each dictionary should contain three keys: 'id', 'product', and 'price'. The list should only include transactions where the 'price' is less than or equal to 'threshold'.

### Data:

In [1]:
transactions = [
    {"id": "t1", "product": "apple", "price": 1.0},
    {"id": "t2", "product": "banana", "price": 0.5},
    {"id": "t3", "product": "cherry", "price": 2.0},
]

In [14]:
def filter_under_price(transactions, thresh_price):
    filtered_transactions = list(filter(lambda x: x["price"] <= thresh_price, transactions))
    return filtered_transactions

In [15]:
filter_under_price(transactions, 1)

[{'id': 't1', 'product': 'apple', 'price': 1.0},
 {'id': 't2', 'product': 'banana', 'price': 0.5}]

#### If this were SQL

- Assuming table = transactions and columns = id, product, and price:

```sql
SELECT *
FROM transactions
WHERE price <= 1
```

**Problem Overview:**

You are given a list of dictionaries. Each dictionary represents a customer and contains three keys: 'id', 'age', and 'purchases'. 'id' represents the id of a customer, 'age' represents the age of the customer, and 'purchases' is a list of prices for each purchase made by the customer.

Your task is to write a function that calculates the average purchase price for each customer who is above a given age and returns a new list of dictionaries with the calculated averages.

The function should return a new list of dictionaries. Each dictionary should contain the 'id' and 'average_purchase' for each customer.

**Libraries Needed:**

```python
```

**Inputs:**

The function will take the following inputs:

1. A list of dictionaries 'customers'. Each dictionary contains three keys: 'id', 'age', and 'purchases'. 'id' is a string that represents the id of a customer, 'age' is an integer that represents the age of the customer, and 'purchases' is a list of floats that represent the prices of the purchases.
2. An integer 'age_limit'. This represents the minimum age for a customer to be included in the output.

**Expected Outputs:**

The function should return a list of dictionaries. Each dictionary should contain two keys: 'id' and 'average_purchase'. The list should only include customers whose age is greater than 'age_limit'. 'average_purchase' should be the average purchase price rounded to 2 decimal places.

**Data:**

```python
customers = [
    {"id": "c1", "age": 25, "purchases": [10.0, 20.0, 30.0]},
    {"id": "c2", "age": 30, "purchases": [15.0, 25.0, 35.0]},
    {"id": "c3", "age": 35, "purchases": [20.0, 30.0, 40.0]},
]
```

**Encrypted Solution:**

Here's the solution, encrypted with a Caesar cipher with a shift of 3 to the right:

```python
ghilqk_fdofoxodwh_dyhudjhfxvw_rphuvfxvw_rphuv, djh_olplw:
    ghilqk_fdofoxodwh_dyhudjh_sxufkdvhfxvw_rphu:
        dyhudjh_sxufkdvh = urxqg{vxp{fxvw_rphu{'sxufkdvhv'}}/ohq{fxvw_rphu{'sxufkdvhv'}}, 2)
        uhwxuq {'lg': fxvw_rphu{'lg'}, 'dyhudjh_sxufkdvh': dyhudjh_sxufkdvh}
    
    uhvxowv = olvw{pdskh{fdofoxodwh_dyhudjh_sxufkdvh, ilowhu{odpegd{fxvw_rphu: fxvw_rphu{'djh'} juhdwhu wkdq djh_olplw, fxvw_rphuv}})}
    uhwxuq uhvxowv
```

The shift of 3 letters to the right was applied to every letter of the solution, but not to special characters, digits or whitespaces.

In [2]:
import pandas as pd

In [3]:
customers = [
    {"id": "c1", "age": 25, "purchases": [10.0, 20.0, 30.0]},
    {"id": "c2", "age": 30, "purchases": [15.0, 25.0, 35.0]},
    {"id": "c3", "age": 35, "purchases": [20.0, 30.0, 40.0]},
]

In [20]:
def mean_of_list(list_of_float):
    return sum(list_of_float) / len(list_of_float)

In [35]:
def average_price_per_aged_customer(customer_list, minimum_age):
    return [{"id": customer['id'],
             'avg_purchase': mean_of_list(customer['purchases'])
             } for customer in customer_list if customer['age'] >= minimum_age]

print(average_price_per_aged_customer(customers, 0))
print(average_price_per_aged_customer(customers, 30))
print(average_price_per_aged_customer(customers, 35))
print(average_price_per_aged_customer(customers, 40))

[{'id': 'c1', 'avg_purchase': 20.0}, {'id': 'c2', 'avg_purchase': 25.0}, {'id': 'c3', 'avg_purchase': 30.0}]
[{'id': 'c2', 'avg_purchase': 25.0}, {'id': 'c3', 'avg_purchase': 30.0}]
[{'id': 'c3', 'avg_purchase': 30.0}]
[]


**If this were SQL**

```sql
SELECT id, AVG(purchases)
FROM customers
WHERE MAX(age) >= 25
GROUP BY id;
```

Problem Overview:

You are given a dataset of products with their names, categories, and prices. However, some of the product names contain extra spaces and some prices are missing (represented as None). Your task is to write a function that will clean the dataset by:

Removing leading and trailing whitespaces from product names.
Replacing any missing prices (None) with the average price of products in the same category.
The function should return a cleaned list of dictionaries with the updated product names and prices.

Inputs:

The function will take the following input:

A list of dictionaries 'products'. Each dictionary contains three keys: 'name', 'category', and 'price'. 'name' is a string that represents the name of the product, 'category' is a string that represents the category of the product, and 'price' is a float that represents the price of the product (or None if the price is missing).
Expected Outputs:

The function should return a list of dictionaries. Each dictionary should contain three keys: 'name', 'category', and 'price'. The 'name' should have leading and trailing whitespaces removed, and any 'price' that was None should be replaced with the average price of other products in the same category, rounded to 2 decimal places.

Data:

In [36]:
products = [
    {"name": " apple ", "category": "fruit", "price": 1.0},
    {"name": "banana", "category": "fruit", "price": None},
    {"name": "cherry  ", "category": "fruit", "price": 2.0},
    {"name": " lettuce", "category": "vegetable", "price": 1.5},
    {"name": "carrot", "category": "vegetable", "price": None},
]


In [37]:
def clean_products(products):
    categories_average = {}
    for product in products:
        category_name = product['category']
        price = product['price']
        key = category_name
        key_average = categories_average.get(key, [0, 0]) 
        key_average[0] += price if price is not None else 0
        key_average[1] += 1
        categories_average[key] = key_average
    
    for key in categories_average:
        total, count = categories_average[key]
        categories_average[key] = total / count if count else 0
    
    for product in products:
        product['name'] = product['name'].strip()
        price = product['price']
        category_name = product['category']
        key = category_name
        key_average = categories_average[key]
        if price is None:
            product['price'] = round(key_average, 2)
    
    return products


In [38]:

clean_products(products)

[{'name': 'apple', 'category': 'fruit', 'price': 1.0},
 {'name': 'banana', 'category': 'fruit', 'price': 1.0},
 {'name': 'cherry', 'category': 'fruit', 'price': 2.0},
 {'name': 'lettuce', 'category': 'vegetable', 'price': 1.5},
 {'name': 'carrot', 'category': 'vegetable', 'price': 0.75}]

Objective:
Background:

You are a Data Scientist at a retail company and you have been given a dataset containing transaction data. This data includes the transaction date, product ID, quantity, and price. Unfortunately, some of the date values are missing and are marked as None. You need to impute these missing dates using the median date of the entire dataset.

Question:

Write a function that takes the transaction data and imputes the missing dates with the median date. Make sure to convert the dates from string to Python datetime objects and then back to string in the 'YYYY-MM-DD' format after imputation.

Libraries Needed:

python
Copy code
from datetime import datetime
Inputs:

A list of dictionaries named transactions. Each dictionary contains:
'date': a string representing the date of the transaction in 'YYYY-MM-DD' format or None if the date is missing.
'product_id': a string representing the product ID.
'quantity': an integer representing the quantity of the product.
'price': a float representing the price of the product.
Expected Outputs:

The function should return a list of dictionaries with the same structure as the input, but with all missing dates imputed with the median date of the dataset.
Data:
python
Copy code
transactions = [
    {"date": "2022-07-01", "product_id": "P1", "quantity": 10, "price": 5.5},
    {"date": None, "product_id": "P2", "quantity": 8, "price": 6.0},
    {"date": "2022-07-03", "product_id": "P3", "quantity": 15, "price": 7.0},
    {"date": "2022-07-02", "product_id": "P4", "quantity": 12, "price": 6.5},
    {"date": None, "product_id": "P5", "quantity": 9, "price": 5.0},
]
Solution:

In [5]:
import pandas as pd
import numpy as np

# Read in the data
transactions = [
    {"date": "2022-07-01", "product_id": "P1", "quantity": 10, "price": 5.5},
    {"date": None, "product_id": "P2", "quantity": 8, "price": 6.0},
    {"date": "2022-07-03", "product_id": "P3", "quantity": 15, "price": 7.0},
    {"date": "2022-07-02", "product_id": "P4", "quantity": 12, "price": 6.5},
    {"date": None, "product_id": "P5", "quantity": 9, "price": 5.0},
]
df = pd.DataFrame(transactions)

In [7]:
def imput_missing_dates(df):
    df['date'] = pd.to_datetime(df['date'])
    median_date = df['date'].median()
    df['date'] = df['date'].fillna(median_date)
    return df

imput_missing_dates(df)

Unnamed: 0,date,product_id,quantity,price
0,2022-07-01,P1,10,5.5
1,2022-07-02,P2,8,6.0
2,2022-07-03,P3,15,7.0
3,2022-07-02,P4,12,6.5
4,2022-07-02,P5,9,5.0
