**Sustainable AI Energy Estimator & Prompt Optimizer
Project Report**

**1. Overview**

This project aims to estimate the energy consumption of running prompts on large language models (LLMs) and recommend lower-energy alternatives that maintain semantic equivalence. The system integrates natural language processing, supervised learning energy prediction, anomaly detection, and prompt optimization to facilitate sustainable AI practices.

**2. Application Features**

User inputs prompt text, number of model layers, known training time, and expected FLOPs per hour.
Estimates energy consumption (in kWh) for the given prompt execution.
Recommends semantically equivalent prompts with lower expected energy use.
Detects anomalies for unusually high energy consumption cases.
Provides a user-friendly GUI via Streamlit.

**3. System Architecture**

The high-level system components and data flow:


[User Input: Prompt + Model Parameters]
         ↓
[NLP Module]
   → Token Count
   → Complexity Score
         ↓
[Energy Prediction Model]
   (Supervised Learning: RF / NN)
         ↓
[Anomaly Detection Module]
   (Isolation Forest / One-Class SVM)
   → Flags Unusual Consumption
         ↓
[Prompt Optimizer]
   → Uses Embeddings + Paraphrasing
   → Suggests Low-Energy Equivalent Prompt
         ↓
[Output Layer]
   → Predicted Energy (kWh)
   → Optimized Prompt Suggestion
         ↓
[Streamlit or Flask GUI]
   → Visualizes Results
   → Accepts Next Prompt

Component Descriptions:
Module	Functionality	Technologies / Libraries
User Interface	Input collection, result visualization	Streamlit
NLP Module	Prompt parsing, complexity scoring, simplification	sentence-transformers, T5/GPT-2, OpenAI embeddings
Energy Prediction Engine	Predict energy use based on parameters and prompt features	scikit-learn (Random Forest)
Anomaly Detection	Flag abnormal energy consumption patterns	Isolation Forest
Optimization & Recommendation	Generate and rank low-energy prompt alternatives	fine-tuned LLMs, semantic similarity search
Data Logging	Record inputs, predictions, and recommendations	SQLite
    

**4. Folder Structure**

The project follows a modular and scalable folder organization:


SustainableAI_Project/
├── data/                         # Raw, processed, and synthetic datasets
│   ├── raw/
│   ├── processed/
│   └── synthetic/
│
├── model/                        # Trained models (.pt, .pkl, .h5)
│   ├── energy_predictor/
│   ├── nlp_transformer/
│   ├── anomaly_detector/
│   └── prompt_optimizer/
│
├── src/                          # Source code
│   ├── gui/                      # Frontend app code (Streamlit/Tkinter)
│   ├── nlp/                      # NLP components (parser, simplifier)
│   ├── prediction/               # Energy estimation logic
│   ├── optimization/             # Prompt optimization engine
│   ├── anomaly/                  # Anomaly detection
│   └── utils/                    # Utilities (logging, config)
│
├── reports/                      # Reports, logs, visualizations
├── documentation/               # Architecture and user docs
├── notebooks/                   # Experimental Jupyter notebooks
├── tests/                       # Unit and integration tests
├── requirements.txt             # Python dependencies
├── README.md                    # Project overview
└── run.sh                       # Optional run script
    

**5. Progress & Implementation Details**

Frontend GUI: Developed with Streamlit, supports prompt and parameter input, displays results clearly.
NLP Module: Token counting and basic complexity scoring implemented; integration with sentence-transformers ongoing.
Energy Prediction: Baseline Linear Regression and Random Forest models trained on sample data; results verified.
Anomaly Detection: Prototype Isolation Forest model trained and flags tested.
Prompt Optimization: Initial implementation using sentence embeddings for semantic similarity; paraphrasing model fine-tuning planned.
Logging: SQLite backend setup; logging of prompts and results enabled.

**6. Next Steps**

Fine-tune prompt simplification model with T5 or GPT-2 for better paraphrasing.
Optimize energy prediction model with more diverse training data.
Integrate anomaly detection fully with GUI for user feedback.
Improve recommendation engine to rank alternative prompts by energy use and semantic closeness.
Prepare comprehensive unit and integration tests.