Metousiosis | Transitioning from a raw Excel export to a refined synthesized dataset that powers the dataWave web app backend.
Waves breaking on a Lee Shore at Margate (W. Turner, 1840). Tate Modern, London.(Photography: Betty Saunders)
This notebook details the data-processing pipeline used to produce the data that powers this contribution to the pacific dataviz challenge 2025. If you're curious about the submission:
- Live demo.
Contents - click to expand
- Data structuring with logical groups by using sentinel flags
- Multiple layers of data cleaning
- Data reshaping: convert wide-format data into a long-format
- Reciprocal transformation: apply reciprocal conversion to values exhibiting inverse relationships
- Value scaling: normalizes raw values for consistency and easy comparability across groups
- Data quality checks
- Interactive analytics visualization
- Python 3.10+
- jupyterlab/notebook or an IDE with the jupyter extension
- Recommended: virtual environment (venv or conda)
- Clone the repository:
git clone https://github.com/brooks-code/dataWave-data-processing-notebook.git
cd dataWave-data-processing-notebook- Create and activate a virtual environment (venv example):
python -m venv .venv
# macOS / Linux
source .venv/bin/activate
# Windows (PowerShell)
.venv\Scripts\Activate.ps1Install dependencies:
pip install -r requirements.txtor just run:
pip install jupyterlab notebook pandas numpy ipywidgetsI you have Jupyter notebooks already available on your system, you can also do it straight from the notebook by uncommenting and running this cell:
# Uncomment this line
#%pip install pandas numpy ipywidgets- Start Jupyter, from the terminal:
# JupyterLab
jupyter lab
# or classic Notebook
jupyter notebook-
In the browser, open the notebook file (e.g.,
dataWave_processing.ipynb). -
Run cells in order:
Use Kernel -> Restart & Run All to execute the entire notebook from a clean state.
- Fork the repository.
- Create a feature branch (git checkout -b feature/your‑feature).
- Commit your changes (git commit -m "Add …").
- Push and open a Pull Request.
Please verify that:
- All new features are documented in this README.
The pacific dataviz team! 非常感谢你 (fēi cháng gǎn xiè nǐ).
This project is released into the public domain under the Unlicense. See the LICENSE file for details. The source dataset remains the property and under the license provided by the original owner.