Data Processing and CI/CD Pipeline

This repository contains a Python script (execute.py) for processing data, an example data file (data.csv), and a GitHub Actions workflow for continuous integration and deployment.

Project Structure

execute.py: A Python script that reads data.csv, processes it using Pandas, and outputs the result to result.json.
data.csv: The input data file, converted from data.xlsx.
.github/workflows/ci.yml: GitHub Actions workflow definition for linting, execution, and deployment.
index.html: A single-file responsive HTML application demonstrating a simple web page.
LICENSE: The MIT License for this project.

Setup and Local Execution

To set up and run the data processing script locally, follow these steps:

Clone the repository:

git clone <repository-url>
cd <repository-name>

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: `venv\Scripts\activate`

Install dependencies: This project requires pandas and ruff (for linting).
```
pip install pandas==2.3.0 ruff
```
Note: Python 3.11+ is required.
Run the data processing script:
```
python execute.py
```
This will generate a result.json file in the project root directory, containing the processed data.
Run the linter (optional):
```
ruff check .
```

`execute.py` Details

The execute.py script performs the following actions:

Reads data.csv into a Pandas DataFrame.
Converts the 'Value' column to numeric types, coercing errors to NaN and filling NaN with 0 to ensure robust processing.
Groups the data by 'Category' and calculates the sum of 'Value' for each category.
Outputs the aggregated data as a JSON file named result.json.

Non-Trivial Error Fix

The original execute.py might have encountered issues with non-numeric data in the 'Value' column or missing expected columns, leading to runtime errors. The fix implemented involves:

Column Existence Check: Explicitly checking for the presence of 'Category' and 'Value' columns before proceeding.
Robust Type Conversion: Using pd.to_numeric(df['Value'], errors='coerce').fillna(0) to gracefully handle non-numeric values in the 'Value' column by converting them to NaN and then replacing NaN with 0, preventing script crashes due to data type errors.

GitHub Actions CI/CD Workflow

The .github/workflows/ci.yml file defines a GitHub Actions workflow that automatically runs on every push to the main branch.

Workflow Steps:

Checkout Repository: Fetches the code from the repository.
Set up Python 3.11: Configures the environment with Python 3.11.
Install Dependencies: Installs ruff and pandas==2.3.0.
Run Ruff Linter: Executes ruff check . to lint the Python code. This step will show linting results in the CI log.
Execute script and generate result.json: Runs python execute.py > result.json to process the data and create the output file.
Setup Pages: Configures the GitHub Pages environment.
Upload artifact: Uploads result.json as a GitHub Pages artifact.
Deploy to GitHub Pages: Deploys the result.json artifact to GitHub Pages, making it accessible at https://<your-username>.github.io/<repository-name>/result.json.

This ensures that result.json is always up-to-date with the latest data processing logic and inputs, and is published for easy access, without being committed directly into the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Processing and CI/CD Pipeline

Project Structure

Setup and Local Execution

`execute.py` Details

Non-Trivial Error Fix

GitHub Actions CI/CD Workflow

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
LICENSE		LICENSE
README.md		README.md
data.xlsx		data.xlsx
execute.py		execute.py
index.html		index.html

License

cscprojishnu/analyze

Folders and files

Latest commit

History

Repository files navigation

Data Processing and CI/CD Pipeline

Project Structure

Setup and Local Execution

execute.py Details

Non-Trivial Error Fix

GitHub Actions CI/CD Workflow

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`execute.py` Details

Packages