A manufacturing analytics dashboard for chemical process optimization. Built with Streamlit to help you understand what's driving yield, catch outliers, and make smarter operational decisions.
Author: Hridesh Singh Chauhan
Purpose: Portfolio project showcasing data science and operational analytics for manufacturing processes.
This dashboard helps you analyze manufacturing process data and answer questions like:
- Which runs are outliers and why?
- What process parameters drive yield?
- Where should we focus improvement efforts?
- What happens if we change temperature by 5°C?
Outlier Detection
- Finds runs with low yield using Isolation Forest
- Shows you what makes outliers different
- Configurable threshold (default: any run below 70% yield)
Predictive Modeling
- Predicts yield based on process parameters
- Uses Random Forest with cross-validation
- Shows which parameters matter most
Strategic Insights
- Cost-yield tradeoff analysis (is higher yield worth the energy cost?)
- Improvement prioritization (where to focus first)
- What-if scenarios (test changes before implementing)
Time Series Analysis
- See how parameters change over time
- Spot trends and patterns
- Identify when outliers occurred
Realistic Energy Modeling
- Energy prices vary by time of day and season
- Ambient temperature affects cooling needs
- More realistic cost calculations
You'll need Python 3.8+. Install the dependencies:
cd "Operational Intelligence Dashboard"
pip install -r requirements.txtThen run it:
streamlit run dashboard.pyOnce the app loads, you'll see:
- KPIs at the top - Average yield, downtime, energy costs, and outlier rate
- Sidebar controls - Adjust simulation settings and outlier thresholds
- Analysis tabs - Dive into different views of your data
- Check out the KPIs to get a quick sense of overall performance
- Scroll down to see time series of your process parameters
- Click on "Strategic Insights" to see improvement opportunities
- Try the "What-If Simulator" to test different scenarios
Four main metrics at the top:
- Average Yield - How it compares to the top 10% of runs
- Downtime Percentage - Whether it's getting better or worse over time
- Total Energy Cost - Average cost per run
- Outliers Detected - Percentage of runs flagged as problematic
Interactive charts showing how each parameter changes over time. You can:
- Switch which parameter is on the X-axis
- See outlier markers on yield plots
- View all your process parameters
Scatter plots to explore how parameters relate to each other. Pick any two parameters and see:
- How they correlate
- Where outliers cluster
- Relationships that might not be obvious
This is where the magic happens. Three tabs:
Cost-Yield Tradeoffs See the relationship between yield and energy costs. Are you getting good bang for your buck? Which runs are the most efficient?
Improvement Prioritization Answers: "Where should I focus first?" Shows you:
- Which parameters will give the biggest yield boost
- How hard it would be to improve each one
- Top 3 priorities with specific recommendations
What-If Simulator Test changes before making them. Adjust sliders for process parameters and see:
- Predicted yield change
- Estimated energy cost impact
- Whether it's worth it
A detailed table of all outlier runs with:
- All their parameter values
- What makes them outliers
- Sortable and filterable
The project is organized into a few main files:
Operational Intelligence Dashboard/
├── dashboard.py # The main Streamlit app
├── data_preprocessing.py # Generates synthetic data and calculates metrics
├── model.py # ML models for prediction and outlier detection
├── external_data.py # Simulates energy prices and ambient temperature
└── requirements.txt # What you need to install
dashboard.py - The main interface. Handles all the visualizations and user interactions.
data_preprocessing.py - Creates realistic synthetic data with proper relationships between parameters and yield. Also calculates business metrics.
model.py - Contains the machine learning models:
- Random Forest for yield prediction
- Isolation Forest for outlier detection
- Feature importance calculations
external_data.py - Simulates external factors like energy prices (higher during peak hours, seasonal variation) and ambient temperature.
- Launch the app
- Play with the sidebar settings (try changing the number of samples)
- Adjust the outlier threshold if needed
- Scroll through the different sections
- Go to "Strategic Insights"
- Check "Improvement Prioritization" - this shows you exactly where to focus
- Look at the top 3 priorities and their recommendations
- Use the "What-If Simulator" to test those changes
- Open "What-If Simulator"
- The sliders start at average values (this is your baseline)
- Move a slider (e.g., increase temperature)
- Click "Run Scenario Analysis"
- See predicted yield, energy cost, and whether it's a good idea
The simulator will tell you if your scenario matches the baseline (useful for verifying it's working correctly).
The app generates synthetic data that mimics real manufacturing processes. It includes:
Process Parameters:
- Temperature, Pressure, Catalyst Concentration, Flow Rate
External Factors:
- Energy prices (varies by time of day and season)
- Ambient temperature (affects cooling needs)
Calculated:
- Energy consumption (based on process and ambient conditions)
- Energy cost (consumption × time-varying price)
- Yield (realistic relationships with process parameters)
The data generation models things like:
- Optimal temperature ranges
- Diminishing returns at high parameter values
- Interactions between parameters
- Realistic yield distributions
Machine Learning:
- Isolation Forest finds outliers without needing labeled data
- Random Forest predicts yield and shows feature importance
- Cross-validation ensures the metrics are reliable
Energy Modeling:
- Energy prices are higher during peak hours (9 AM - 5 PM)
- Summer months see price increases
- Ambient temperature affects how much cooling you need
- More realistic than assuming fixed prices
Strategic Analysis:
- Looks at actual data to find improvement opportunities
- Considers both impact (how much yield gain) and feasibility (how big is the gap)
- Prioritizes based on what's achievable, not just what's important
Simulation Settings (in the sidebar):
- Number of samples - How many runs to generate (default: 1000)
- Process parameter ranges - Adjust temperature, pressure, etc.
- Energy unit price - Base price for energy calculations
Outlier Detection (in the sidebar):
- Outlier Yield % - Below this threshold, runs are considered outliers (default: 70%)
- For synthetic data, this automatically calculates the contamination rate
"No suitable feature columns found"
- Make sure the data generation completed. Check the sidebar settings.
"Yield column not found"
- The preprocessing should create this automatically. Try refreshing or check the logs.
Model predictions seem off
- Check that you have enough samples (try increasing in sidebar)
- The model is trained on the synthetic data, so predictions should match those patterns
Energy costs don't make sense
- Energy costs depend on consumption and time-varying prices
- Check that external data integration completed (you should see energy_price and ambient_temperature columns)
- The costs vary by time of day and season, so they won't all be the same
What-if simulator shows different values than baseline
- The sliders start at average values, so if you haven't moved them, it should show baseline
- Make sure you click "Run Scenario Analysis" to see results
- The app uses synthetic data generation - no CSV upload needed
- Outlier detection is based on yield threshold (you can change it)
- Energy costs use realistic time-varying prices
- Strategic insights are calculated from your actual data
- The what-if simulator starts at average values so you can compare
Last Updated: Nov 11th, 2025