## SOIL Python Tutorial


Evan Thomas (evan.thomas@epfl.ch)

24.11.23

# This is a markdown block

## We
### can
#### illustrate ideas in these

* Put unordered lists
1. Ordered lists

`code snippets`

Images... ![SOIL](https://avatars.githubusercontent.com/u/130440995?s=200&v=4)

And links -> [Markdown cheatsheet](https://www.markdownguide.org/cheat-sheet/)

Note: Markdown is the same language as what is put in the README.md file of a repository, that is shown in the repository's main page

[SOIL: lab-codes](https://github.com/LabSOIL/lab-codes)

**You can switch between `Code` and `Markdown` (... and `Raw`) with the toggle in the main toolbar ^^**

In [None]:
# This is a code block
# In these, you can execute code!
print("Hello, I am code...")

## Download data 
Let's begin with some `.csv` data.

[Cattle - Evolution of the cow population in Switzerland: *Number of registered and living cows in Switzerland*](https://opendata.swiss/en/dataset/rinder-entwicklung-der-kuhpopulation-in-der-schweiz)

Move the file into the same location as this jupyter notebook.

Note: Here's a trick to find where we are located. The `os` library interfaces with the computer's OS (Operating System), and can help us do things that we can normally do in the **Terminal**/**Command Prompt**.

In [None]:
import os
print(os.getcwd())  # get CWD: current working directory 

In [None]:
import pandas as pd  # "as pd" is not necessary but is common shorthand to see for pandas in examples online

# For a quick insight we can open it, but this is not saved into memory 
# (ie. We can't do anything more with it)
pd.read_csv('cow-CH.csv')

In [None]:
pd.set_option('display.max_rows', None)

In [None]:
# They use semicolons instead of commas, sometimes it's a tab, so we have to define
df = pd.read_csv('cow-CH.csv', sep=';',header=1, index_col=['Year', 'Month'])
df

In [None]:
df[4:7]

In [None]:
df.plot()

In [None]:
df

In [None]:
# Aggregate data by a column
# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html
df_by_year = df.groupby("Year").sum()
df_by_year

In [None]:
# Divide all values by 1000
df_by_year /= 1000  # Same as: df_by_year = df_by_year / 1000 (same can be done with other operators (+, -, *)
df_by_year

In [None]:
df_by_year.describe()

In [None]:
df_by_year.boxplot()

In [None]:
# Maybe we don't want 2023, as it's an outlier in our data as data update interval is Monthly
df_complete_years = df_by_year[df_by_year.index <= 2022]
df_complete_years

In [None]:
df_complete_years.plot.box()

In [None]:
df_complete_years.plot.bar(y='Total')

In [None]:
df_by_year.plot.pie(y='Total')

In [None]:
df_by_year.to_numpy()

#### Stylising the line chart

[pandas.DataFrame.plot API reference](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html)

In [None]:
pd.options.plotting.backend = "matplotlib"
plt = df_complete_years.plot(
    title="Number of registered and living cows in Switzerland",
    xlabel="Period (Year)",
    ylabel="Total of cows (Thousands)"
)
fig = plt.get_figure()
fig.savefig('RegisteredCowsPerYearSwitzerland.pdf')

In [None]:
import plotly.express as px

# Use the Plotly backend in pandas
pd.options.plotting.backend = "plotly"

# Plot the total cows with error bars using Plotly
fig = df_complete_years.plot(
    title="Number of registered and living cows in Switzerland",
    labels={
        'value': 'Total of cows (Thousands)',
        'year': "Year",
        }
)
# Show the plot
fig.show()

**Note**: Before saving the notebook (unless intended), it's best to clear all cell content as this is saved with the notebook. Especially if uploading to git.

`Kernel` -> `Restart Kernel and Clear Outputs of All Cells...`

In [3]:
print("This is terrible work, please redo")

This is terrible work, please redo
