# 5-1-2: Analysis documentation and open science

In [None]:
# https://github.com/cca-cce/osm-cca/blob/main/jnb/aicasc/module05_ethical_aspects_of_ai_aided_content_analysis_10pct.ipynb
# https://nbviewer.org/github/cca-cce/osm-cca/blob/main/jnb/aicasc/module05_ethical_aspects_of_ai_aided_content_analysis_10pct.ipynb


Open science empowers researchers to produce **reproducible and transparent workflows** by storing data, code, and results in accessible, shareable formats. By using **cloud platforms and version control systems**, teams can collaborate efficiently across institutions and time zones. Framing analyses to support **peer review and replication** not only strengthens research integrity but also accelerates discovery. Embracing open science means aligning **computational practices with core values of openness, accountability, and collaboration**.


## What are Jupyter notebooks

Jupyter Notebooks are powerful tools that support **open, reproducible, and collaborative scientific work**:

* 🧪 **Combine code and narrative** in one document to explain methods alongside results.
* 📊 **Visualize data interactively** using plots and widgets to explore findings in real time.
* 🔁 **Enable reproducibility** by sharing executable workflows with embedded code and outputs.
* 🌐 **Collaborate easily** via platforms like GitHub, Google Colab, or [JupyterLab](https://jupyter.org/try-jupyter/lab/).
* 📄 **Export to multiple formats** (HTML, PDF, Markdown) for publication or peer review.

In [None]:
# https://seaborn.pydata.org/examples/grouped_boxplot.html

## Notebook research example

**_Introduction_**

This study uses the **Palmer Penguins dataset** to investigate variation in bill morphology across species. By visualizing the relationship between **bill length** and **bill depth**, we aim to identify potential structural differences that may reflect ecological divergence.

**_Research Question_**

How does the relationship between bill length and bill depth differ across penguin species?

**_Data analysis_**


In [None]:
import seaborn as sns
sns.set_theme()

# Load the penguins dataset
penguins = sns.load_dataset("penguins")

# Plot bill_length_mm as a function of bill_depth_mm
g = sns.lmplot(
    data=penguins,
    x="bill_length_mm", y="bill_depth_mm", hue="species",
    height=5
)

# Use more informative axis labels than are provided by default
g.set_axis_labels("Snoot length (mm)", "Snoot depth (mm)")

The code begins with `import seaborn as sns` and `sns.set_theme()`, which is a good practice to apply a consistent visual style to all plots. Using `sns.load_dataset("penguins")` efficiently loads the Palmer Penguins dataset, though it’s worth noting that this dataset contains some missing values—checking for those before plotting is recommended.

The comment above the plot, `# Plot sepal width as a function of sepal_length across days`, is misleading and seems to be copied from another context (likely the iris dataset). It should be updated to reflect the actual variables being plotted: bill length and depth in penguins.

The line `sns.lmplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", hue="species", height=5)` is well constructed and makes good use of `hue` to distinguish between species in the regression plot. The regression lines help illustrate group differences in morphology.

Finally, `g.set_axis_labels("Snoot length (mm)", "Snoot depth (mm)")` improves readability, though the use of "Snoot" is informal. For scientific or professional use, it's better to stick with "Bill length" and "Bill depth".

Overall, the code is effective for exploratory data visualization, but it would benefit from clearer commenting and more precise axis labeling.


## Download Jupyter Notebook

In [None]:
# use File -> Download -> .ipynb

# 5-1-3: Communicating AI-aided content analysis

In [None]:
# https://seaborn.pydata.org/examples/grouped_boxplot.html

In [None]:
import seaborn as sns
sns.set_theme(style="ticks", palette="pastel")

# Load the example tips dataset
tips = sns.load_dataset("tips")

# Draw a nested boxplot to show bills by day and time
sns.boxplot(x="day", y="total_bill",
            hue="smoker", palette=["m", "g"],
            data=tips)
sns.despine(offset=10, trim=True)

## Upload Jupyter Notebook

In [None]:
# use Left menu -> Files -> Upload to session storage

## Install Quarto publishing system

In [None]:
# https://quarto.org/

In [None]:
#!curl -s https://api.github.com/repos/quarto-dev/quarto-cli/releases/latest | jq -r '.assets[] | select(.name | test("linux-amd64\\.deb$")) | .browser_download_url' | head -n 1 | xargs -n 1 curl -LO


In [None]:
#!sudo dpkg -i quarto-*-linux-amd64.deb


In [None]:
#!quarto --version
#!jupyter --version


In [None]:
#!quarto install tinytex

## Convert Notebook to publication

In [None]:
#!quarto convert /content/module05_ethical_aspects_of_ai_aided_content_analysis_10pct.ipynb

```
---
title: "Your Document Title"
author: "Your Name"
date: today
format:
  pdf:
    papersize: a4
    toc: true
    number-sections: false
  html:
    self-contained: true
    toc: true
    number-sections: false
  docx:
    toc: true
    number-sections: false
---
```

In [None]:
#!quarto render /content/module05_ethical_aspects_of_ai_aided_content_analysis_10pct.qmd

In [None]:
#!quarto render module05_ethical_aspects_of_ai_aided_content_analysis_10pct.ipynb --to pdf


In [None]:
#!rm -rf module05*