# Data Science Ex 03 - Grouping, Missing Values & Basic Plotting

03.03.2021, Lukas Kretschmar (lukas.kretschmar@ost.ch)

## Let's have some Fun with Data and Visualization!

In this exercise, you are going to see how you can remove rows that have missing values or how you can replace them.
And you are going to learn how you can visualize data with plots.

In [None]:
import numpy as np
import pandas as pd

## Introduction

In [None]:
chPopulation = pd.read_csv("Demo_CH_2018.csv", sep=";")
chPopulation.info()

#### Grouping

Having seen some basic aggration functions, we now go a step further and play around with some more complex applications.
Aggregations on the whole `DataFrame` usually take away to much detail.
Thus, we need a way to aggregate only parts of the `DataFrame`.

Basically, we want to execute the following steps:
- split (taking junks apart)
- apply (using functions on junks)
- combine (putting the results together)

Splitting `DataFrames` is done by using the `groupby()` function.
The result is just another object but we cannot see actual data.
We first have to apply another function on the group.

Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

In [None]:
languages = chPopulation.groupby("Lang")
languages

In [None]:
languages.count()

In [None]:
languages.describe()

Having a group, we can now apply functions on it (as seen above).
Usually, the *apply* and *combine* steps are executed in one go.
For example, you can decompose the `count()` function to *apply* (return a value of 1 for every entry) and *combine* (add the results of every entry together).

If we want to apply some specific functions on our groups, we have the following possibilities:
- `aggregate()` takes some existing aggregation functions and applies them on a group
- `filter()` filters results that don't match a predicate
- `transform()` transforms given values of a `Series` to other values but does not have to reduce them (as `aggregate()` must)
- `apply()` applies a given function onto a `DataFrame` but still using the groups

These functions are also part of every `DataFrame` object.
So you do not need to create groups first.

Let's say, we want some basic stats on our groups.
We could do this by using `aggregate()`.

In [None]:
languages[["Jan 2018", "Dec 2018"]].aggregate([np.min, np.median, np.max])

Or what if we want only the entries of groups, where the median of a group is above a certain value.
Then we could use the `filter()` function.

In [None]:
languages.filter(lambda g : g["Dec 2018"].median() > 200000)

With `transform()`, we can calculate how many more people are living in a canton than the median of the group the canton is part of.

In [None]:
groupedMedian = languages.transform(lambda col : col - col.median())
groupedMedian.head(5)

In [None]:
groupedMedian = pd.merge(groupedMedian, chPopulation[["Canton", "Lang"]], left_index=True, right_index=True)
groupedMedian.head(5)

And if the functions presented above do not have enough flexibility, we could rely on `apply()`.
Since `apply()` works on a whole `DataFrame`, it is even possible to extend the instance or change existing columns.

For example, we can calculate the difference in population compared to the smallest canton of each group and add this information to the `DataFrame` in one go.

In [None]:
def calcYearlyDiff(df):
    df["Diff"] = df["Dec 2018"] - df["Dec 2018"].min()
    return df

languages.apply(calcYearlyDiff).sort_values(["Lang", "Diff"], ascending=[True, False])

We are aware that using these functions isn't the simplest task, but it's just a matter of knowing their abbilities and experience.
So don't worry if you are now a bit puzzled, during the remainder of this course you will get pretty familiar with them.

### Checking the Quality of Data

So far, we have seen some simple dummy `DataFrames` and loaded a bit more complex structures.
But they never had any values missing.
When working with real data, for example log files of a machine, you will encounter many missing values or values that just don't make sense (e.g. sensor gone rouge).
In this section, we'll show you how you can spot such values and how you can get rid of them.

Usually, missing values are indicated in three different ways:
- Entry is empty
- NaN/NA indicates a missing value
- A specific value indicates a missing value (e.g. -1 when the valid values have to be > 0)

If Pandas encounters such values, it will handle them as follows:
- Empty gets `None` or `np.nan` (Pandas default behavior favors `np.nan`)
- NaN/NA gets `np.nan` or the column is handled as text
- Specific values cannot be detected as they are valid values. Here, knowledge of the data scientist is needed.

In [None]:
missing = pd.read_csv("./Demo_Missing.csv", sep=";")
missing

In [None]:
missing.info()

Having a look at the structure, you see that Pandas can handle missing values pretty smoothly.
Although, all but the first column contain integers, Pandas reads them as floating points because `NaN` is a specific floating point value.
Only the last column is interpreted as integer.
But reading files with missing values is a piece of cake for Pandas.

#### Detecting missing Values

Now that we have our data as `DataFrame`, we can check the columns with the following functions:
- `isna()` or `isnull()`
- `notna()` or `notnull()` (opposite of `isna()`/`isnull()`)

If we apply `isna()` on a `DataFrame`, we get a `DataFrame` containing booleans.
For a first check, we can use `sum()` on this result to get a first impression of how bad it is.
Since `True` counts as 1, we get the number of missing values per column.

In [None]:
missing.isna()

In [None]:
missing.isna().sum()

Applied twice, we get an overall number.

In [None]:
missing.isna().sum().sum()

Or we can get the percentage of missing values per column.

In [None]:
missing.isna().mean()

#### Why we want to detect missing Values

The problem with missing values is pretty simple.
They mess around with our functions.
For example, if they have some specific value that indicated the absence of a value, an aggregation will return a wrong result.

In [None]:
missing["Specific"].min()

In [None]:
missing["Specific"].sum()

In [None]:
specific = missing["Specific"]
specific[specific >= 0].sum()

Luckily, `NaN` values are handled correctly with the built-in functions.
But if we define our own functions, we could run into trouble.

In [None]:
print(f"min: {missing['NaN'].min()}")
print(f"max: {missing['NaN'].max()}")
print(f"sum: {missing['NaN'].sum()}")

In [None]:
def mySum(series):
    sum = 0
    for v in series:
        print(v)
        sum += v # sum = sum + v
    return sum

missing["NaN"].aggregate(mySum)

In [None]:
nan = missing["NaN"]
nan[nan.notna()].aggregate(mySum)

#### Removing missing Values

Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html

The most simple approach of dealing with missing values is to remove the entire row that is not complete.
Pandas offers the `dropna()` method that does exactly this.

Please note: Calling this function returns a new object with the rows dropped but does not change the original object.
Thus, when getting rid of rows, make sure that you assign the newly created `DataFrame` to a variable so you can use it later.

In [None]:
missing.dropna()

In [None]:
missing

As you've seen with aggregation functions, we can also drop all columns that contain `NaN` values.

In [None]:
missing.dropna(axis=1)

Since these methods are a bit too radical, it is also possible to remove some rows or columns with `drop()`.

Reference:  https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

In [None]:
missing.drop(0) # removing row with index 0

In [None]:
missing.drop([1,3]) # removing rows with index 1 and 3

In [None]:
missing.drop("NaN", axis=1) # removing column "NaN"

We can also use a bool-array to select the rows we want to keep.

In [None]:
missing[missing["NaN"].notna()]

Usually, removing data can falsify the outcome of your analysis, since the "healthy" entries in a row or column can contain valuable information.
On the other hand, if a certain amount of rows or columns must be fixed to work with, it can be simpler just to ignore them completely.

#### Filling missing Values

Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html

Besides removing, we can also fill holes in our dataset.
Several strategies exist to accomplish that.

- Setting a fixed value
- Taking the value above
- Taking the value below 

The method that offers these strategies in Pandas is called `fillna()`.

In [None]:
missing.fillna(0)

As you see, everywhere a value was missing, it got replaced by `0`.

If we don't want every value to be the same, we can provide a dictionary specifying which values to take per column.

In [None]:
missing.fillna({"Text": "Zero", "NaN Text": 32})

Or we could take values based on the existing values.

In [None]:
missing.fillna({"Empty" : missing["Empty"].mean(), "NaN" : missing["NaN"].min()})

The other two strategies, taking values from above or below can be acomplished by stating the strategy.

In [None]:
missing.fillna(method="ffill") # forward fill will take the value from above

In [None]:
missing.fillna(method="bfill") # backward fill will take value from below

These methods also work using the values of the same row.

In [None]:
missing.fillna(method="ffill", axis=1)

What we are left, is the case of `-1` in the last column.
To resolve this, there is no simple method since Pandas cannot assume by default that `-1` representats the absence of a value.
Thus, we have to replace the value by hand.

In [None]:
copy = missing.copy()
copy["Specific"] = copy["Specific"].replace(-1, 43)
copy

#### Advanced Filling

So far, we either used a value or dictionary to replace missing values.
But some other applications can be powerful for more specifc cases.
- `transform()` with a custom method
- `fillna(Series)`
- `fillna()` in combination with `groupby()` and `apply()`

Since we could imagine what the missing values must be, we can fix the `DataFrame` by calculating the values.
Using the `transform()` method, we can replace `Series` by `Series` and calculating values where they are not defined.

Please note: This is a solution for the given `DataFrame`, you cannot assume that this method handles future problems.
But it shows a possible pattern to apply.

In [None]:
def fillNa(col):
    if col.dtype not in {np.dtype("int64"), np.dtype("float64")}: # checking if we have a number
        return col                                                # if the Series is not a number (e.g. object) we return the Series immediately
    values = []
    for i, v in col.items():                                      # going through every item in the Series
        value = getValueFor(v, i, col)                            # calculating the value
        values.append(value)                                      # adding the value to the list
    return pd.Series(values, name=col.name, index=col.index)      # build new series and returning it

def getValueFor(v, i, col):
    if ~np.isnan(v) and v >= 0:                                   # checking if not NaN (as in most cases) and greater or equal than 0 (handling our special case), 
        return v                                                  # and just returning the value
    # if NaN, calculate the mean
    print(f"NaN @ {col.name}[{i}] -> Taking values from {max(i - 1, 0)} & {min(i + 1, len(col)-1)}")
    above = col.iloc[max(i - 1, 0)]                               # getting the value above the current index i
    below = col.iloc[min(i + 1, len(col)-1)]                      # getting the value below the current index i
    value = np.mean([above, below])                               # calculating the mean of both values
    print(f"Mean: {value}")
    return value

missing.transform(fillNa)

Another approach, inserting a `Series` into `fillna()`, works pretty straight forward.
If a value is missing, the corresponding value of the `Series` with the same index is taken.

Please note: Here, index means the `Series` index and not the location (0 to max).
Thus, the order of the provided `Series` doesn't matter.

In [None]:
missingValues = pd.Series([np.nan, 10, 20, np.nan, 40, np.nan])
print(missingValues)
print()

fillin = pd.Series(list(range(6)))
print(fillin)
print()

missingValues.fillna(fillin)

In [None]:
infill = fillin[::-1] # reverse order
missingValues.fillna(infill)

As you can see, the values were replaced the same way, despite the `infill` `Series` is ordered differently.

Using `fillna()` in combination with `groupby()` and `apply()` is a slightly more complicated approach, but more powerful.
This approach comes in handy when we want to fill in an aggregated value, but the aggregation only considers parts of the whole dataset.

Let's say we have a list of students, with their current semester and age.

In [None]:
students = pd.read_csv("./Demo_Students.csv")
students.head(5)

In [None]:
students.isna().mean()

As you see, some age information is missing.

We can now group the students by their semester and get the mean age per semester.

In [None]:
semesters = students.groupby("Semester")["Age"]
semesters.mean()

And then we can apply some function per students per semester.
The call to `fillna()` now takes the mean value per group (the same as setting one fixed value).

In [None]:
students["Age"] = semesters.apply(lambda g : g.fillna(np.round(g.mean())))
students.isna().mean()

### Plotting Basics

Until now, we've only worked with showing numbers or tables.
But data science, especially when communicating results, is a visual task.
Well, visualization also helps understanding given data.

Working with Python, Jupyter Notebooks and Pandas, we'll use the Matplotlib module.

In [None]:
import matplotlib.pyplot as plt

And since we want to use plots within this notebook (and every notebook during this course), we also need to apply the following magic command.

In [None]:
%matplotlib inline

This command enables a feature within the notebook that it will show plots as soon as a `plot()` method is called.

Since this module is modeled after a visualization library from MATLAB, the plots can look sometimes a bit - let's say Spartanic.
Thus, we recommend to use Seaborn to enhance coloring.
But it's not mandatory to use it.

In [None]:
import seaborn as sns
sns.set()

#### "Plots do not mind correlation"

As you will see, plots will take two arrays.
They are not interested in values and a function - the arrays just have to be of the same length and the plot method will stitch together the values based on their position within the array.
Thus the following code is totally valid.

In [None]:
rng = np.random.RandomState(42)
plt.plot(np.arange(0,20,1), rng.randint(-10, 10, 20))

You'll see during this course that this freedom is pretty cool when working with plots.
Thus, we don't mind, either.

#### Drawing Plots

Reference: https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.plot.html

The `pyplot` module comes with two interfaces.
A MATLAB-style interface and an object-oriented-style interface.
You can spot the difference pretty simple, since the former only contains methodcalls to `plt` and the latter uses results from method calls.

In [None]:
x = np.linspace(0, 10, 100)

In [None]:
# MATLAB-style
plt.figure()

plt.subplot(2, 1, 1) # Select left panel (2 rows, 1 column, 1st panel)
plt.plot(x, np.sin(x))

plt.subplot(2, 1, 2) # Select right panel (2 rows, 1 column, 2nd panel)
plt.plot(x, np.cos(x))

In [None]:
# OO-style
fig, ax = plt.subplots(2) # Getting figure and array of axis (aka panels)
ax[0].plot(x, np.sin(x))
ax[1].plot(x, np.cos(x))

**We recommend using the OO-style plot, since the interface is much more cleaner and we don't have to rely on side-effects.
Within this course, all examples will be using this style.**

If we want to plot multiple lines, we simply can call the plot method multiple times.

In [None]:
fig, ax = plt.subplots()
ax.plot(x, np.sin(x))
ax.plot(x, np.cos(x))

Or we can use plots in two dimensions.

In [None]:
fig, ax = plt.subplots(2,2, figsize=(14, 7))
ax[0,0].plot(x, np.sin(x))
ax[0,1].plot(x, x)
ax[1,0].plot(x, np.cos(x))
ax[1,1].plot(x, -x)

With xlim and ylim, we can set the ranges of each plot.

In [None]:
fig, ax = plt.subplots()
ax.set(xlim=(2, 6))
ax.plot(x, np.sin(x))

In [None]:
fig, ax = plt.subplots()
ax.set(ylim=(-10, 20))
ax.plot(x, x + 1)

#### Styling Plots

We can color lines by using the `color` parameter.

In [None]:
fig, ax = plt.subplots()
ax.plot(x, np.sin(x), color="green")
ax.plot(x, np.sin(x + 1), color="r")
ax.plot(x, np.sin(x + 2), color="#123456") # Hex color
ax.plot(x, np.sin(x + 3), color=".5") # Grayscale

We can also change how a line is drawn by using the `linestyle` parameter.
Here, you can use specific keywords or characters.

In [None]:
fig, ax = plt.subplots()
ax.plot(x, x, linestyle="solid")
ax.plot(x, x + 1, linestyle="dashed")
ax.plot(x, x + 2, linestyle="dashdot")
ax.plot(x, x + 3, linestyle="dotted")

In [None]:
fig, ax = plt.subplots()
ax.plot(x, x, linestyle="-") # solid
ax.plot(x, x + 1, linestyle="--") # dashed
ax.plot(x, x + 2, linestyle="-.") # dashdot
ax.plot(x, x + 3, linestyle=":") # dotted

And for the lazy ones of you who want to combine color and linestyle, it goes like this:

In [None]:
fig, ax = plt.subplots()
ax.plot(x, x, "-r")
ax.plot(x, x + 1, "--g")
ax.plot(x, x + 2, "-.k")
ax.plot(x, x + 3, ":c")

#### Labeling Plots

To prevent confusion, it's a good idea to set labels on plots and figures.

In [None]:
fig, ax = plt.subplots(2, constrained_layout=True) # with constrained_layout we prevent overlap of labels
fig.suptitle("This figure shows sin(x) and cos(x)")
ax[0].plot(x, np.sin(x))
ax[0].set(title="y = sin(x)", xlabel="x", ylabel="sin(x)")

ax[1].plot(x, np.cos(x))
ax[1].set(title="y = cos(x)", xlabel="x", ylabel="cos(x)")

And since both plots use the same values for the horizontal axis, we can also share them with `sharex` (there is also a `sharey`).

In [None]:
fig, ax = plt.subplots(2, sharex=True)
fig.suptitle("This figure shows sin(x) and cos(x)")
ax[0].plot(x, np.sin(x))
ax[0].set(title="y = sin(x)", ylabel="sin(x)")

ax[1].plot(x, np.cos(x))
ax[1].set(title="y = cos(x)", xlabel="x", ylabel="cos(x)")

Only having one line, it is often enough setting a title to the plot.
But as soon as we have multiple lines shown in one figure, a legend might get handy.
To get this, we simply define a label per plot and enable the legend.

In [None]:
fig, ax = plt.subplots()
ax.plot(x, np.sin(x), label="sin(x)")
ax.plot(x, np.cos(x), label="cos(x)")
ax.legend()

As you see, the legend is in a pretty good spot.
This is due to its default value for finding a location is set to `best`.
But we can specify where we want the legend.
We can define a combination of the following value pairs `upper` or `lower` and `right` or `left`.
Further, `center` is applicable for either of these values or `best`.

In [None]:
fig, ax = plt.subplots()
ax.plot(x, np.sin(x), label="sin(x)")
ax.legend(loc="lower right")

And we can even create multiple legends.
This is a bit trickier, but also no rocket science.

In [None]:
fig, ax = plt.subplots()
sinLine = ax.plot(x, np.sin(x), label="sin(x)")
cosLine = ax.plot(x, np.cos(x), label="cos(x)")

sinLeg = plt.legend(handles=sinLine, loc="lower right")
ax.add_artist(sinLeg)
cosLeg = plt.legend(handles=cosLine, loc="upper left")
ax.add_artist(cosLeg)

And we can change the appearance of the legend.

In [None]:
fig, ax = plt.subplots(2)
ax[0].plot(x, np.sin(x), label="sin(x)")
ax[0].legend(loc="center", frameon=False)

ax[1].plot(x, np.cos(x), label="cos(x)")
ax[1].legend(loc="center", shadow=True, framealpha=.5, borderpad=1.5, fancybox=True)

#### Scatter Plots

Now, not everythin is a line.
Chances are that we frequently have to deal just with points.

Continuing with the already know, scatter plots can be drawn quite easily.
We just use a specific style.

In [None]:
x = np.linspace(0, 10, 30) # we reduce the number of points to increase the distance between our points
fig, ax = plt.subplots()
ax.plot(x, np.sin(x), "o")

And we can even combine the two styles.

In [None]:
fig, ax = plt.subplots()
ax.plot(x, np.sin(x), "-or") # - for the line, o for the points, and r for the color

Or we can draw the two styles on top of each other, but keep in mind that the order of `plot()`-calls matters.

In [None]:
fig, ax = plt.subplots()
ax.plot(x, np.sin(x), "ob")
ax.plot(x, np.sin(x), "-r", linewidth=3)
ax.set(title="Line on top of Points")

In [None]:
fig, ax = plt.subplots()
ax.plot(x, np.sin(x), "-r", linewidth=3)
ax.plot(x, np.sin(x), "ob")
ax.set(title="Points on top of the Line")

Using points, there are many possible ways of drawing them.
The type of point is called `marker`.

In [None]:
rng = np.random.RandomState(42)
fig, ax = plt.subplots(figsize=(20,5))
for marker in list("o.,x+v^<>sd"):
    ax.plot(rng.rand(5), rng.rand(5), marker, markersize=rng.randint(8,16), label=f" = {marker}")
ax.legend()

Now, using the `plot()` method is pretty simple.
But sometimes, a generic method isn't the right choice.
Maybe we need some more flexibility in drawing our points.

If this is the case, Matplotlib has you covered - with the `scatter()` method.
The simplest usage looks like a call of the `plot()` method.

In [None]:
fig, ax = plt.subplots()
ax.scatter(x, np.sin(x)) # we can omit the marker since scatter() will draw points

`scatter()` shows it's power when we want to encode more information in the points.
So we can change the transparency `alpha`, color `c` and size `s` of every point.

In [None]:
rng = np.random.RandomState(42)
px = rng.randn(50)
py = rng.randn(50)
colors = px * py # The color depends on the location
sizes = abs(px * py) * 1000 # The size depends on the location

fix, ax = plt.subplots(1,2, figsize=(20,5))
ax[0].scatter(px, py, c=colors, s=sizes, alpha=.5)
fig.colorbar(mappable=ax[0].collections[0], ax=ax[0]) # with ax.collections[0] we get metadata of the scatterplot (here, we need the cmap to know the used colors)

# Use other colors
ax[1].scatter(px, py, marker="v", c=colors, s=sizes, alpha=.8, cmap="viridis")
fig.colorbar(mappable=ax[1].collections[0], ax=ax[1])

You can find more colormaps under https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html - but depending on using seaborn or not, they might look a bit smoother than shown on the page.

**A note on performance:** Since `scatter()` renders each point individually, it could result in bad performance when drawing large datasets.
In this case and if you can do without different point sizes and/or colors, use `plot()` where each point is a copy of one point.

## Exercises

### Ex01 - Missing Values

In the following exercise, we'll work with data from the file **Ex03_01_Data.csv**.
So, at first, load the file into a `DataFrame` and show the first 5 lines to check if the file was loaded successfully.
This file contains video games sales and ratings.

As you see, some values are missing.
Show the percentage of missing values per column.

Be radical, drop all rows with missing values.
How many entries are left?

Drop all columns that have missing values.

Drop all games that do not have a name.

We saw that many games were not scored by critics, thus drop all the games that have no value in the column *Critic_Score*.
And check if all entries of the result have a score.

Drop the *Developer* column.

Drop all games with a *User_Score* lower than 8.

Let's assume that zero sales means missing data.
Drop all games were the european marked data is missing.

On the other hand, since many values are missing, drop all colums where more than 40% of values are missing.

#### Solutions

In [None]:
# %load ./Ex03_01_Sol.py

### Ex02 - Replacing Values

Load the dataset **Ex03_02_Data.csv** and create a copy of it, since you will modify the `DataFrame`.
The goal of this exercise is to get a complete data set (no values are missing) at the end.

Check which columns do miss values.

As you can see, scores for many games are missing.
But let's start with something easy.

Since we have publishers for all games, let's set them also as developers where they are not available.

For the critic count, we just take the mean of all critic counts.

Next, we will fill the user score.
Let's assume it's a 100th of the global sales added to the median user score.

#### Challenges!

For the year of release, we will take the platform into account.
Set the years to the median year per platform where the release year is not given.

Let's do the same for the critic score.
We assume that the score is the median per platform, genre and publisher.
Since for some games, no score can be calculated, we then take only genre and publisher and for the remaining without a score, we just take the genre.

Check if there are no more missing values.

#### Solutions

In [None]:
# %load ./Ex03_02_Sol.py

### Ex03 - Line Plots

Plot a horizontal line from `x=[0, 10]` at `y=2`.

Plot a vertical line at `x=5` from `y=[-1, 7.5]`.
And the line should be dotted and red.

Plot the `cos()` from $\pi$ to 5$\pi$.
The curve should be green.

Plot a line that connects 50 random numbers in the range of `x=[-250, 250]` and `y=[-250, 250]`.
`x` and `y` are independent of each other.
The line should be styled as dash-dot and black.

Plot `y=x^3` with `x=[0, 20]` but limit the view to `y=[100, 7000]`.

#### Solutions

In [None]:
# %load ./Ex03_03_Sol.py

### Ex04 - Scatter Plots

Plot 50 random points between `x=[-50, 50]` and `y=[-50, 50]`.

Now draw the points as triangles.

Now the size of each triangle is bound to its location with smallest points at `[-50, -50]` having `size=10` and largest at `[50, 50]` having `size=100`.

Now the color of each triangle is bound to its `x` value.
You can choose the colormap individually.
Sizes are fix at `150`.
And don't forget to plot the colorbar.

#### Challenge!

Now the symbols are bound to the points location.
- `x & y < 0` use a blue square with `alpha=.25`
- `x & y >= 0` use a red rhombus with `alpha=.75`
- `x < 0 & y >= 0` use a green point with `alpha=.5`
- `x >= 0 & y < 0` use a black x

#### Solutions

In [None]:
# %load ./Ex03_04_Sol.py