

---


# **Data Science and Machine Learning for Environmental Engineering**


---



---
# **Module 6**
---

# Before we start

![Data Science Workflow](https://www.tutorialspoint.com/google_colab/images/saving_google_drive.jpg)

---

# Introduction to Data Analysis
---
## Types of Data Analysis Questions
1. **Descriptive** â€“ What happened?
2. **Exploratory** â€“ Are there patterns in the data?
3. **Inferential** â€“ What can we infer about a larger population?
4. **Predictive** â€“ What will happen in the future?
5. **Causal** â€“ What happens if we change something?
6. **Mechanistic** â€“ How exactly does one variable affect another?



---

# Descriptive Analysis

## Objective: Describe a dataset
- The first type of data analysis typically conducted
- Commonly applied in **census data**  
- Distinction between **description** and **interpretation**  
- Descriptions alone **cannot** be generalized without statistical modeling  

ðŸ“Œ Example: Descriptive analysis of environmental pollution levels.



In [1]:
import pandas as pd

# Example: Water quality dataset
data='https://ckan-data.montevideo.gub.uy/dataset/92121547-888b-417e-b0e1-2899f707a4fa/resource/ecb68681-9eb2-45b5-97af-dcf95f4e7a53/download/monitoreo_de_calidad_de_cuerpos_de_agua.csv'
df = pd.read_csv(data).dropna()
df

Unnamed: 0,fecha,estacion_muestreo,nombre_arroyo,temperatura,ph,ce,od,dbo,dqo,sst,amonio_n_nh4,fosforo_total,nitrogeno_total,cromo,plomo,coliformes_fecales,nh3,isca,tensoactivos
0,2025-01-28,LP1,Las Piedras,27.0,8.02,1003.0,8.93,6.0,32.0,25.0,6.83,1.91,17.60,0.005,0.010,2400.0,0.438,66.0,0.25
1,2025-01-28,LP2,Las Piedras,25.4,7.85,989.0,4.61,4.0,26.0,25.0,6.70,1.68,12.60,0.005,0.010,3000.0,0.267,59.0,0.19
2,2025-01-28,LP3,Las Piedras,25.0,7.77,1296.0,0.83,17.0,88.0,25.0,22.36,2.62,44.70,0.005,0.010,78000.0,0.725,42.0,1.05
3,2025-01-28,LP4,Las Piedras,25.3,7.95,1004.0,4.39,4.0,45.0,130.0,7.89,1.89,15.00,0.005,0.010,5100.0,0.389,41.0,0.26
4,2025-01-28,LP5,Las Piedras,24.4,7.92,1217.0,5.00,6.0,33.0,26.0,7.01,1.81,16.10,0.005,0.010,1800.0,0.304,58.0,0.18
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
482,2018-01-03,M1,Miguelete,22.4,7.55,1270.0,2.25,7.0,46.0,25.0,21.32,1.76,61.91,0.005,0.009,330.0,0.352,49.0,0.36
483,2018-01-03,M2,Miguelete,22.0,7.57,1030.0,2.70,6.0,36.0,25.0,12.57,3.92,29.35,0.005,0.005,2000.0,0.211,55.0,0.38
485,2018-01-03,M5,Miguelete,24.6,7.55,800.0,5.09,13.0,37.0,25.0,5.73,1.94,20.01,0.005,0.012,310000.0,0.110,60.0,0.79
486,2018-01-03,M6,Miguelete,25.6,7.43,730.0,5.25,6.0,39.0,25.0,6.10,1.65,15.73,0.005,0.005,730.0,0.096,57.0,0.49


---

# Exploratory Analysis

## Objective: Discover unknown relationships
- Helps in **identifying new patterns** in the data  
- Useful for defining **future studies**  
- **Exploratory analysis alone** should not be used for generalization or prediction  
- **Correlation â‰  Causation**  

ðŸ“Œ Example: Finding unexpected relationships between **air quality** and **urban vegetation**.



In [7]:
df[['temperatura','od','dbo','nitrogeno_total']].corr()

Unnamed: 0,temperatura,od,dbo,nitrogeno_total
temperatura,1.0,-0.334854,0.140053,0.266249
od,-0.334854,1.0,-0.384407,-0.322178
dbo,0.140053,-0.384407,1.0,0.478273
nitrogeno_total,0.266249,-0.322178,0.478273,1.0
