# Practice session on pandas

Reminder: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

In [1]:
import numpy as np
import pandas as pd

### Task 0: Read data
Read in the LEGO.csv file and save it in a DataFrame called `LEGO_all`.
Then create a DataFrame `LEGO`, which only contains the current models.

### Task 1: Identify specific models

Your nephew's birthday is coming up. Since he loves playing police, you want to find all current models that contain the word "Polizei" in the name and cost at most 20€.

(Note that the name is of type `str`. A reference of str-specific functions can be found here:
https://pandas.pydata.org/docs/reference/series.html#string-handling Since the name should **contain** a specific word, you might easily guess the correct function.)

### Task 2: Merchandising

Some products are just merchandising. 
Think of a way to identify these rows and add a new boolean column "merch" to `LEGO_all` that indicates whether it contains a merchandising product or not.

Does LEGO have more or less merchandising items in its range today than in the past?
Build a DataFrame that shows for each date the total number of merchandising products and the number of real LEGO models.

### Task 3: Pricing policy

How often do the different prices occur in the current models?
Build a DataFrame that shows the absolute frequency of every price and sort it according to these frequencies.

Also add a column to your table in which the relative frequency (i.e. the percentage share) is shown rather than the absolute frequency.

### Insertion

So far, we have seen split-apply-combine in a relatively simple form.
In most cases, an explicit column was selected after grouping and an aggregation function was applied to it.
In principle, you can also apply any function to the respective SubDataFrames.

In [None]:
def some_stats(groupdf):
    rows = len(groupdf)
    unique_prices = groupdf["price"].nunique()
    total_price = groupdf["price"].sum()
    return pd.Series([rows, unique_prices, total_price], index= ["num_sets", "unique_prices", "total_price"])

(LEGO.groupby("theme")
    .apply(some_stats, include_groups=False)
    .sort_values("total_price", ascending = False)
    .head(5))

### Task 4

Which models are permanently in the product range?

Which models are brand new to the product range?