#### 📘 Azure AI Foundry — Python Environment for ETL

#### 📖 Title

Configure Python Environment for ETL (Extract, Transform, Load)

---

#### 📌 Purpose

Python is the **glue** of your ETL pipeline.
With the right libraries, you can:

* **Extract** data from multiple sources (SQL Server, PostgreSQL, CSV, Excel).
* **Transform** data using `pandas` (cleaning, joins, feature engineering).
* **Load** prepared datasets into Azure AI Foundry for machine learning and analytics.
* **Visualize** results directly in Jupyter or reports.

This document sets up a consistent **Python ETL environment** so all scripts and notebooks in this project run smoothly.

---


#### 1. Confirm Python & Conda

Run in **PowerShell**:

```powershell
python --version
conda --version
```

✅ Expected:

* Python 3.12.7
* Conda installed (Anaconda distribution).

---


#### 2. Activate Base Environment

We’re keeping everything in **base** for this project (so you don’t have to juggle multiple envs).

```powershell
conda activate base
```

---

#### 3. Install Core ETL Packages

Run in **PowerShell (inside base)**:

```powershell
pip install pandas sqlalchemy pyodbc psycopg2-binary openpyxl scikit-learn
```

📌 Explanation of each:

* **pandas** → Data cleaning, transformations, DataFrame handling.
* **sqlalchemy** → Unified way to connect Python to databases.
* **pyodbc** → Required for SQL Server (ODBC driver).
* **psycopg2-binary** → Required for PostgreSQL.
* **openpyxl** → Excel file support.
* **scikit-learn** → Future ML tasks (feature scaling, regression, classification).

---

#### 4. Install Visualization Packages (Optional)

For plotting and dashboarding, install:

```powershell
pip install matplotlib seaborn plotly
```

📌 Explanation of each:

* **matplotlib** → Core plotting library (line, bar, scatter).
* **seaborn** → Statistical plots (heatmaps, distributions).
* **plotly** → Interactive charts for Jupyter + dashboards.

---


#### 5. Verify Installation

📂 Save as:

```
C:\Users\massa\Desktop\Python\Reference\AzureAI-Foundry\scripts\test_python_env.py
```

```python
import pandas as pd
import sqlalchemy
import pyodbc
import psycopg2
import sklearn
import matplotlib
import seaborn
import plotly

print("✅ Python ETL Environment Ready")
print("pandas:", pd.__version__)
print("sqlalchemy:", sqlalchemy.__version__)
print("pyodbc:", pyodbc.version)
print("psycopg2:", psycopg2.__version__)
print("scikit-learn:", sklearn.__version__)
print("matplotlib:", matplotlib.__version__)
print("seaborn:", seaborn.__version__)
print("plotly:", plotly.__version__)
```

Run:

```powershell
conda activate base
cd "C:\Users\massa\Desktop\Python\Reference\AzureAI-Foundry"
python scripts\test_python_env.py
```

✅ Expected output:

```
✅ Python ETL Environment Ready
pandas: 2.x.x
sqlalchemy: 2.x.x
pyodbc: 5.x.x
psycopg2: 2.x.x
scikit-learn: 1.x.x
matplotlib: 3.x.x
seaborn: 0.x.x
plotly: 5.x.x
```

---


#### 6. Toolchain Diagram

```
SQL Server (AI)        PostgreSQL (Local)
     │                         │
     └─── Extract via ODBC ────┘
               │
          Python ETL (pandas, sqlalchemy, pyodbc, psycopg2)
               │
        Transform + Feature Engineering
               │
      Visualize (matplotlib, seaborn, plotly)
               │
          Load to Azure AI Foundry
```

---


#### 📊 Summary

* Python **base environment** configured with all ETL + visualization libraries.
* Verified all packages with a test script.
* Established Python as the **bridge** between local databases and Azure AI Foundry.
* Visualization libraries installed for **in-notebook plots and dashboards**.

📂 Save as:
`14_python_environment_for_etl.md`

---

Would you like me to prepare a **demo notebook** (`notebooks/14_python_env_demo.ipynb`) that shows:

1. Connect to SQL Server + PostgreSQL.
2. Query small tables.
3. Visualize results with matplotlib/seaborn?
