# Meise Crop Source Investigations Tutorials

Welcome to the MCSI tutorials and workshop space. In this directory you can find several notebooks with examples on how to process data in light of the European Union Deforestastion Regulation (EUDR).

## Coffee bean weight distribution

We will start at the beginning. How much does one coffee bean weigh, and how does this vary between different coffee varieties or species? To answer these questions with Python the very first thing to do is load the data into memory. For small datasets like these, all you need is pandas, a Python library inspired on the R `data.frame`.

In [None]:
# Import the pandas package
import pandas as pd
# Read in the csv (comma separated values) document
bean_sizes = pd.read_csv('data/md_coffee_beans.csv', decimal=',')
# Display the data set
bean_sizes 

In [None]:
# Read in the csv with sample information
samples = pd.read_csv('data/coffee_samples.csv', decimal=',', index_col=0)
# Display the data set
samples

In [None]:
import seaborn as sns
ax = sns.histplot(bean_sizes["Caffeinated"])
sns.kdeplot(bean_sizes["Caffeinated"], ax=ax)

In [None]:
import pingouin as pg
pg.ttest(bean_sizes["Swiss water"], bean_sizes["Caffeinated"])

In [None]:
# Stack dataframe for group comparisons
bean_sizes_stacked = bean_sizes.stack().reset_index().drop('level_0', axis=1).rename(
    {'level_1':'name',0:'weight'},axis=1
).join(samples, on='name')
bean_sizes_stacked

In [None]:
bean_sizes_grouped = bean_sizes_stacked.groupby('Decaffeination')
bean_sizes_grouped.boxplot()

In [None]:
sns.kdeplot(bean_sizes_stacked, x='weight', hue='Roasted')

In [None]:
pg.anova(data=bean_sizes_stacked,dv='weight',between='Region', detailed=True)

In [None]:
print(pg.normality(bean_sizes["Caffinated"])) 

In [None]:
bean_sizes.stack().reset_index()

In [None]:
pg.anova?