This repository contains a Python script (execute.py) for processing data, an example data file (data.csv), and a GitHub Actions workflow for continuous integration and deployment.
- execute.py: A Python script that reads- data.csv, processes it using Pandas, and outputs the result to- result.json.
- data.csv: The input data file, converted from- data.xlsx.
- .github/workflows/ci.yml: GitHub Actions workflow definition for linting, execution, and deployment.
- index.html: A single-file responsive HTML application demonstrating a simple web page.
- LICENSE: The MIT License for this project.
To set up and run the data processing script locally, follow these steps:
- 
Clone the repository: git clone <repository-url> cd <repository-name> 
- 
Create a virtual environment (recommended): python -m venv venv source venv/bin/activate # On Windows: `venv\Scripts\activate` 
- 
Install dependencies: This project requires pandasandruff(for linting).pip install pandas==2.3.0 ruff Note: Python 3.11+ is required. 
- 
Run the data processing script: python execute.py This will generate a result.jsonfile in the project root directory, containing the processed data.
- 
Run the linter (optional): ruff check .
The execute.py script performs the following actions:
- Reads data.csvinto a Pandas DataFrame.
- Converts the 'Value' column to numeric types, coercing errors to NaNand fillingNaNwith 0 to ensure robust processing.
- Groups the data by 'Category' and calculates the sum of 'Value' for each category.
- Outputs the aggregated data as a JSON file named result.json.
The original execute.py might have encountered issues with non-numeric data in the 'Value' column or missing expected columns, leading to runtime errors. The fix implemented involves:
- Column Existence Check: Explicitly checking for the presence of 'Category' and 'Value' columns before proceeding.
- Robust Type Conversion: Using pd.to_numeric(df['Value'], errors='coerce').fillna(0)to gracefully handle non-numeric values in the 'Value' column by converting them toNaNand then replacingNaNwith 0, preventing script crashes due to data type errors.
The .github/workflows/ci.yml file defines a GitHub Actions workflow that automatically runs on every push to the main branch.
Workflow Steps:
- Checkout Repository: Fetches the code from the repository.
- Set up Python 3.11: Configures the environment with Python 3.11.
- Install Dependencies: Installs ruffandpandas==2.3.0.
- Run Ruff Linter: Executes ruff check .to lint the Python code. This step will show linting results in the CI log.
- Execute script and generate result.json: Runspython execute.py > result.jsonto process the data and create the output file.
- Setup Pages: Configures the GitHub Pages environment.
- Upload artifact: Uploads result.jsonas a GitHub Pages artifact.
- Deploy to GitHub Pages: Deploys the result.jsonartifact to GitHub Pages, making it accessible athttps://<your-username>.github.io/<repository-name>/result.json.
This ensures that result.json is always up-to-date with the latest data processing logic and inputs, and is published for easy access, without being committed directly into the repository.