This project provides a backend system for uploading, processing, and classifying BC3 construction project files using a machine learning model. It includes a FastAPI server for file handling and an ML classification service.
- Upload and convert BC3 files to JSON format
- Maintain a registry of uploaded files with metadata
- Classify text nodes in the JSON using a pre-trained ML model
- Serve a simple web frontend to edit classifications for retraining
Start Main Backend (File Processing):
poetry run backend
# or
poetry run start
# or
poetry run python -m backend.main
Server runs on http://localhost:8005
Install Dependencies:
poetry install
This is a dual-backend file processing and ML classification system:
- Main Backend (
backend/main.py
): FastAPI server for BC3 file upload, conversion, ML categorizer, and management - Frontend (
frontend/
): Static HTML/JS/CSS interface - Tools (
tools/
): BC3 file converter and processing utilities
File Processing Workflow:
- Upload .bc3 files via
/uploadfile/
with metadata (project_name, localization, email, year) - Files are assigned sequential codes (C00001, C00002, etc.)
- BC3 files converted to JSON using
tools/bc3_converter.py
- Registry maintained in
data/uploads/records.json
ML Classification Workflow:
- Post-process converted JSON files via
/records/{code}/ml
- PARTIDA nodes get ML predictions using Joblib pipeline
- Results stored in
data/categorized/
directory with_prediction
objects - Supports manual labeling via
/records/{code}/label
data/uploads/
- Original .bc3 files and registrydata/processed/
- Converted JSON filesdata/categorized/
- ML-enriched JSON filesdata/models/
- ML model artifactsfrontend/
- Static web interfacetools/
- BC3 conversion utilities
Allowed Localizations:
- NAVARRA, PAIS VASCO, ZARAGOZA, CASTILLA Y LEON, MADRID
ML Model Path:
Set ML_JOBLIB_MODEL
environment variable (default: ../data/models/linear_ovr_tfidf.joblib
)
POST /uploadfile/?name={name}
- Upload and process files (especially .bc3 files)
- Parameters:
name
(query parameter) - Name for the processed file - Body:
multipart/form-data
with file upload - Response: JSON with processing status
GET /files/
- List all uploaded and processed files
- Response:
{
"uploaded_files": ["file1.bc3", "file2.txt"],
"processed_files": ["file1.json", "file2.json"]
}
GET /
- Serves main HTML interface
GET /calc.html
- Serves calculator interface
GET /
- Check server status and model loading state
- Response:
{
"status": "ok",
"model_loaded": true,
"model_path": "/path/to/model_bundle.pkl"
}
POST /predict
- Classify Spanish text using the loaded ML model
- Request Body:
{
"text": "Main text to classify",
"descriptive": "Optional additional descriptive text"
}
- Response:
{
"label": "predicted_category"
}
- Error Responses:
503
: Model not loaded500
: Prediction failed
# Health check
curl http://localhost:8001/
# Text prediction
curl -X POST http://localhost:8001/predict \
-H "Content-Type: application/json" \
-d '{"text": "Texto en español para clasificar", "descriptive": "descripción adicional"}'
This project supports a second processing step where a text classification model enriches the converted JSON with predictions. The result is stored separately and the record is marked as processed by ML.
-
Upload step:
- Upload a
.bc3
file via/upload.html
with required metadata. - The backend converts it to JSON using
tools/bc3_converter.py
and stores it underprocessed/Cxxxxx.json
. - A registry entry is stored in
uploads/records.json
including metadata andml_processed: false
initially.
- Upload a
-
ML step:
- Trigger per file from the main page using the “Process ML” button, or via API:
POST /records/{code}/ml
. - The backend loads a Joblib pipeline and traverses the processed JSON. For nodes with
concept_type == "PARTIDA"
, it calls the classifier (optionally withdescriptive_text
) and inserts a_prediction
object:{ "predicted_label": "...", "predicted_proba": 0.93, "topk_labels": ["...", "..."], "topk_probas": [0.93, 0.04, 0.03] }
- The enriched JSON is saved under
categorized/Cxxxxx.json
. - The registry updates to set
ml_processed: true
,ml_processed_at
, andcategorized_filename
. - You can re-run ML any time via the “Reprocess” button or
POST /records/{code}/ml
.
- Trigger per file from the main page using the “Process ML” button, or via API:
-
Predict endpoint:
POST /predict
body:{ "text": string, "descriptive"?: string, "topk"?: number }
.- Response includes the top-1 label/probability and top-k lists as shown above.
-
Model status endpoint:
GET /ml/status
returns{ model_path, loaded, error? }
and attempts a lazy load of the model.
-
Configuring the model:
- Set env var
ML_JOBLIB_MODEL
to point to your Joblib pipeline (e.g. TF–IDF + Linear SVM/LogReg). Default path is../data/models/linear_ovr_tfidf.joblib
.
- Set env var
-
Folders used:
uploads/
original uploads anduploads/records.json
registryprocessed/
converted JSON from.bc3
categorized/
ML-enriched JSON
- Run ML for a file with code
C00001
:
curl -X POST http://localhost:8005/records/C00001/ml
- Use the predict endpoint directly:
curl -X POST http://localhost:8005/predict \
-H 'Content-Type: application/json' \
-d '{"text": "Hormigón HA-25 en zapata...", "topk": 3}'
- Check model status:
curl http://localhost:8005/ml/status
See temp/predict_linear.py
for a standalone prediction script using a Joblib model, and temp/json_ml_cat.py
as a reference for walking a JSON and calling a predict endpoint.