#### ðŸ“˜ Azure AI Foundry â€” Python Environment for ETL

#### ðŸ“– Title

Configure Python Environment for ETL (Extract, Transform, Load)

---

#### ðŸ“Œ Purpose

Python is the **glue** of your ETL pipeline.
With the right libraries, you can:

* **Extract** data from multiple sources (SQL Server, PostgreSQL, CSV, Excel).
* **Transform** data using `pandas` (cleaning, joins, feature engineering).
* **Load** prepared datasets into Azure AI Foundry for machine learning and analytics.
* **Visualize** results directly in Jupyter or reports.

This document sets up a consistent **Python ETL environment** so all scripts and notebooks in this project run smoothly.

---


#### 1. Confirm Python & Conda

Run in **PowerShell**:

```powershell
python --version
conda --version
```

âœ… Expected:

* Python 3.12.7
* Conda installed (Anaconda distribution).

---


#### 2. Activate Base Environment

Weâ€™re keeping everything in **base** for this project (so you donâ€™t have to juggle multiple envs).

```powershell
conda activate base
```

---

#### 3. Install Core ETL Packages

Run in **PowerShell (inside base)**:

```powershell
pip install pandas sqlalchemy pyodbc psycopg2-binary openpyxl scikit-learn
```

ðŸ“Œ Explanation of each:

* **pandas** â†’ Data cleaning, transformations, DataFrame handling.
* **sqlalchemy** â†’ Unified way to connect Python to databases.
* **pyodbc** â†’ Required for SQL Server (ODBC driver).
* **psycopg2-binary** â†’ Required for PostgreSQL.
* **openpyxl** â†’ Excel file support.
* **scikit-learn** â†’ Future ML tasks (feature scaling, regression, classification).

---

#### 4. Install Visualization Packages (Optional)

For plotting and dashboarding, install:

```powershell
pip install matplotlib seaborn plotly
```

ðŸ“Œ Explanation of each:

* **matplotlib** â†’ Core plotting library (line, bar, scatter).
* **seaborn** â†’ Statistical plots (heatmaps, distributions).
* **plotly** â†’ Interactive charts for Jupyter + dashboards.

---


#### 5. Verify Installation

ðŸ“‚ Save as:

```
C:\Users\massa\Desktop\Python\Reference\AzureAI-Foundry\scripts\test_python_env.py
```

```python
import pandas as pd
import sqlalchemy
import pyodbc
import psycopg2
import sklearn
import matplotlib
import seaborn
import plotly

print("âœ… Python ETL Environment Ready")
print("pandas:", pd.__version__)
print("sqlalchemy:", sqlalchemy.__version__)
print("pyodbc:", pyodbc.version)
print("psycopg2:", psycopg2.__version__)
print("scikit-learn:", sklearn.__version__)
print("matplotlib:", matplotlib.__version__)
print("seaborn:", seaborn.__version__)
print("plotly:", plotly.__version__)
```

Run:

```powershell
conda activate base
cd "C:\Users\massa\Desktop\Python\Reference\AzureAI-Foundry"
python scripts\test_python_env.py
```

âœ… Expected output:

```
âœ… Python ETL Environment Ready
pandas: 2.x.x
sqlalchemy: 2.x.x
pyodbc: 5.x.x
psycopg2: 2.x.x
scikit-learn: 1.x.x
matplotlib: 3.x.x
seaborn: 0.x.x
plotly: 5.x.x
```

---


#### 6. Toolchain Diagram

```
SQL Server (AI)        PostgreSQL (Local)
     â”‚                         â”‚
     â””â”€â”€â”€ Extract via ODBC â”€â”€â”€â”€â”˜
               â”‚
          Python ETL (pandas, sqlalchemy, pyodbc, psycopg2)
               â”‚
        Transform + Feature Engineering
               â”‚
      Visualize (matplotlib, seaborn, plotly)
               â”‚
          Load to Azure AI Foundry
```

---


#### ðŸ“Š Summary

* Python **base environment** configured with all ETL + visualization libraries.
* Verified all packages with a test script.
* Established Python as the **bridge** between local databases and Azure AI Foundry.
* Visualization libraries installed for **in-notebook plots and dashboards**.

ðŸ“‚ Save as:
`14_python_environment_for_etl.md`

---

Would you like me to prepare a **demo notebook** (`notebooks/14_python_env_demo.ipynb`) that shows:

1. Connect to SQL Server + PostgreSQL.
2. Query small tables.
3. Visualize results with matplotlib/seaborn?
