# Evaluating Log Reduction Targets (LRTs) and Their Impact on Water Reuse Systems


## Data set selection

> Source:
>
> - U.S. Environmental Protection Agency (EPA), Office of Research and Development
Dataset Link: https://catalog.data.gov/dataset/balancing-human-health-protection-with-sustainable-design-in-water-reuse-how-do-log-reduct

> Fields: Treatment type
> 
> - Pathogen type
> - Log Reduction Target (LRT) level
> - Cost (capital, operational, total)
> - Energy consumption
> - Environmental impact metrics (e.g., GHG emissions)
> - Scenario conditions
> - System configuration
> 

>  License: 
> - Public Use â€“ Free for analysis and academic purposes.
> - (License information provided on the EPA dataset page.)

### Data set selection rationale

> Why did you select this data set?
>- I selected this dataset because it focuses on water reuse systems, human health protection, and environmental sustainability topics that are relevant to modern infrastructure design and public safety. The tables contain structured numeric data that is suitable for statistical analysis and visualisation.
Additionally, the dataset is clean, well-organized, and provided by a reliable scientific source (U.S. EPA), making it ideal for learning how to analyse real-world engineering and environmental data.

### Questions to be answered

> Using statistical analysis and visualization, what questions would you like to be able answer about this dataset.
> This could include questions such as:
>
> - How do different Log Reduction Targets (LRTs) affect total system cost?
> - Is there a relationship between LRT level and environmental impact (e.g., energy use, emissions)?
> - Which treatment methods are the most cost-effective while still meeting required LRT standards?
> - How do operational costs compare with capital costs across different LRTs?
> - Are certain pathogen categories associated with higher system costs or more strict reduction targets?


### Visualization ideas

> Provide a few examples of what you plan to visualize to answer the questions you posed in the previous section. In this project, you will be producing 6-8 visualizations. You will also be producing an interactive chart using Plotly.
> Think about what those visualization could be: what are the variables used in the charts? what insights do you hope to gain from them?
>
>For this project, I plan to create 6â€“8 visualizations, including one interactive chart (using Plotly).
Here are the visualisations and what they will show:
>
> Bar Chart â€“ Average total system cost by LRT level
> - Variables: LRT level (x), cost (y)
> - Insight: Which LRT levels are most expensive?
>
> Box Plot â€“ Distribution of costs across different treatment methods
> - Insight: Identify which treatment methods are reliable or highly variable.
>
>Line Chart â€“ Environmental impact vs. LRT level
> - Insight: Does requiring stricter treatment dramatically increase environmental burden?
> 
> Scatter Plot â€“ Relationship between energy consumption and total cost
> - Insight: Are high-cost systems always high-energy systems?
> 
> Stacked Bar Chart â€“ Comparison of capital vs operational cost across scenarios
> - Insight: Break down cost structure for different system designs.
> 
> Histogram â€“ Distribution of LRT values in the dataset
> - Insight: What LRT levels are most common?
> 
> Heatmap (optional) â€“ Correlation between numerical variables
> - Insight: Which variables move together? Helps identify strongest relationships.
>
> Plotly interactive visual â€“ LRT vs. cost vs. environmental impact (3D)
> - Insight: A dynamic chart that allows stakeholders to explore trade-offs.
>
> These visualisations will help me demonstrate clear patterns in cost, environmental impact, and treatment effectiveness.



In [1]:
# ðŸš€ Importing some libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns