dataWave: the data processing notebook

Metousiosis | Transitioning from a raw Excel export to a refined synthesized dataset that powers the dataWave web app backend.

Waves breaking on a Lee Shore at Margate (W. Turner, 1840). Tate Modern, London.(Photography: Betty Saunders)

This notebook details the data-processing pipeline used to produce the data that powers this contribution to the pacific dataviz challenge 2025. If you're curious about the submission:

Live demo.

Features

Data structuring with logical groups by using sentinel flags
Multiple layers of data cleaning
Data reshaping: convert wide-format data into a long-format
Reciprocal transformation: apply reciprocal conversion to values exhibiting inverse relationships
Value scaling: normalizes raw values for consistency and easy comparability across groups
Data quality checks
Interactive analytics visualization

Requirements

Python 3.10+
jupyterlab/notebook or an IDE with the jupyter extension
Recommended: virtual environment (venv or conda)

Installation

Clone the repository:

git clone https://github.com/brooks-code/dataWave-data-processing-notebook.git
cd dataWave-data-processing-notebook

Create and activate a virtual environment (venv example):

python -m venv .venv
# macOS / Linux
source .venv/bin/activate
# Windows (PowerShell)
.venv\Scripts\Activate.ps1

Install dependencies:

pip install -r requirements.txt

or just run:

pip install jupyterlab notebook pandas numpy ipywidgets

I you have Jupyter notebooks already available on your system, you can also do it straight from the notebook by uncommenting and running this cell:

# Uncomment this line
#%pip install pandas numpy ipywidgets

Usage

Start Jupyter, from the terminal:

# JupyterLab
jupyter lab

# or classic Notebook
jupyter notebook

In the browser, open the notebook file (e.g., dataWave_processing.ipynb).
Run cells in order:

Use Kernel -> Restart & Run All to execute the entire notebook from a clean state.

Contributing

Fork the repository.
Create a feature branch (git checkout -b feature/your‑feature).
Commit your changes (git commit -m "Add …").
Push and open a Pull Request.

Please verify that:

All new features are documented in this README.

Acknowledgements

The pacific dataviz team! 非常感谢你 (fēi cháng gǎn xiè nǐ).

License

This project is released into the public domain under the Unlicense. See the LICENSE file for details. The source dataset remains the property and under the license provided by the original owner.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
data		data
img		img
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
dataWave_processing.ipynb		dataWave_processing.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

dataWave: the data processing notebook

Table of Contents

Features

Requirements

Installation

Usage

Contributing

Acknowledgements

License

About

Uh oh!

Uh oh!

Languages

License

brooks-code/dataWave-data-processing-notebook

Folders and files

Latest commit

History

Repository files navigation

dataWave: the data processing notebook

Table of Contents

Features

Requirements

Installation

Usage

Contributing

Acknowledgements

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages