# Mattermost Lunch Channel History

- Data Source: [Mattermost API](https://api.mattermost.com/), [CCTB instance](https://cctb-intern.biologie.uni-wuerzburg.de/)
- Tasks:
	- Part I - June 2024: retrieving chat history data through the mattermost API
	- Part II - September 2024: analyzing messages in the lunch channel
	- Part III - September 2024: specific tasks
- Language: [python](https://www.python.org/)

## Select one of the following tasks

> General comment: your estimate in step 1 does not need to be perfect, settle for a heuristic that is good enough

### Task A - most crowded day of the week
- estimate the total number of people having lunch (or coffee) at the CCTB/mensa for each day → when was the time that most people went to lunch?
- plot the number of people per day over time (also try to summarize by week/month/year)
- plot a boxplot for the number of people per day of the week → what is the most crowded day of the week?
- make the same plot as above, but separately for every month/year → is there a shift in day of the week preference?
- perform a statistical test for the hypothesis: "Mondays and Fridays are less crowded than Tuesday to Thursday"
- discuss caveats of the data and methods used

### Task B - lunch time
- estimate the time of lunch/coffee for each day → when was the most popular time?
    - try to consider proposed times ("mensa at 12?", "11:15?")
    - direct calls ("mensa?", "now")
    - relative times ("lunch in 5min", "mensa in half an hour?")
- plot the lunch time over the years (also try to summarize by week/month/year) → is there a trend (gradual shift or break point(s)) in lunch time?
- plot a boxplot for the lunch time per day of the week → is there a difference in lunch time per day of the week?
- make the same plot as above, but separately for every month/year → is the pattern above consistent over the year(s)?
- perform a statistical test for the hypothesis: "Lunch time is later during semester break (April,May,August,September) than during lecture period since 2022"
- discuss caveats of the data and methods used

### Task C - your own idea
If you have other ideas, feel free to follow them, but create a plan similar to that for Task A and B above, before you start.

In [3]:
import numpy as np

In [4]:
np.random.seed(42)
" → ".join(np.random.permutation("Dominik Magdalena Felix Joel Robin".split()))

'Magdalena → Robin → Felix → Dominik → Joel'

## 1. Data loading

Load files:
- `messages.csv`
- `reactions.csv`
- `files.csv`

In [5]:
import pandas as pd
import matplotlib.pyplot as plt

In [6]:
messages = pd.read_csv('messages.csv')
reactions = pd.read_csv('reactions.csv')
files = pd.read_csv('files.csv')