# Part 1: UV - Modern Python Project Management

**Workshop Overview:** Learn how to use `uv` as a fast, modern replacement for traditional Python environment and dependency management tools.

## 1. The Problem: Why Do We Need `uv`?

For years, Python developers have relied on a combination of tools to manage projects:

* **`venv`**: To create isolated environments
* **`pip`**: To install packages
* **`pip-tools`**: To pin dependencies and create `requirements.txt`

### The Pain Points:

* **Slow Performance:** `pip`'s dependency resolution can be very slow, especially in complex projects
* **Tool Complexity:** You need to learn and manage multiple tools and files (`venv`, `pip`, `requirements.txt`, `setup.py`)
* **Environment Inconsistency:** It can be difficult to ensure every team member has exactly the same package versions
* **Deployment Challenges:** Replicating a local environment on different systems (servers, Docker containers, HPC clusters) is often complex and error-prone

### The `uv` Solution:
`uv` is an extremely fast Python package installer and resolver, written in Rust. It's designed to be a **drop-in replacement** for traditional Python tooling.

| Feature             | Traditional Way (`pip` + `venv`) | The `uv` Way                        |
| ------------------- | -------------------------------- | ----------------------------------- |
| **Speed** | Slow dependency resolution             | ⚡️ **10-100x faster** resolution                   |
| **Tools** | Multiple separate tools  | 📦 **All-in-One** solution                       |
| **Project Standard**| `requirements.txt` (legacy)               | `pyproject.toml` (**Modern Standard**)  |
| **Reproducibility** | Manual version management | 🔒 **Automatic lock files** |

## 2. Live Demo: Setting Up Our Project

### Step 0: Get Started

Before we begin, you'll need to get the project files. Choose one of the two methods below to prepare your environment.

#### Option 1: Clone with Git (Recommended)

This is the best way to get the latest version of the project and is a common practice in software development.

First, clone the repository and navigate into the project directory:

```bash
git clone https://github.com/Isongzhe/dataset-catalog.git
cd dataset-catalog
```

Once inside the folder, back up the existing configuration files so you can practice creating them from scratch:

```bash
mv pyproject.toml pyproject.toml.bak
mv uv.lock uv.lock.bak
```

#### Option 2: Copy from Local Storage

If you've already downloaded the project files to a local drive or NAS, this method is a fast way to get started.

Make sure you're in an empty directory. Then, copy the project's contents into your current folder:

```bash
cp -r /home/NAS/homes/sungche-10024/workshop/dataset-catalog/* .
```

After copying the files, back up the existing `uv` configuration files:

```bash
mv pyproject.toml pyproject.toml.bak
mv uv.lock uv.lock.bak
```

### Step 1: Initialize the Project

First, we'll initialize our project. This is equivalent to creating a virtual environment and project definition file all at once.

**Command to run in terminal:**

```bash
uv init --python 3.11 --name dataset-catalog
```

**What this does:**
- Creates a `README.md`, `.gitignore`, and `.python-version` file
- Creates a `pyproject.toml` file (modern, standardized way to define a Python project)
- Sets up the project structure following modern Python standards

**Alternative for existing projects:**
```bash
uv init --no-readme --name dataset-catalog
```

### Step 2: Add & Manage Dependencies

Now we'll add and manage our project's packages. The main command is `uv add`, which is a powerful, all-in-one tool for dependency management.

> ⚡️ **Speed Comparison:** `uv` uses an advanced package resolver written in Rust that is often **10-100x faster** than `pip`. It resolves all dependencies, downloads packages, and installs them in seconds!

-----

#### **Adding Packages**

Let's install all the packages we need for this workshop. We'll install specific versions and demonstrate different installation patterns.

**Commands to run in terminal:**

```bash
# Add packages with version constraints
uv add xarray "zarr<3.0" 

# Add additional dependencies for this workshop
uv add intake-xarray intake-parquet dask h5netcdf jinja2
```

#### **Removing Packages**

If you add a package by mistake, `uv` makes removal clean and simple.

**Command to run in terminal:**

```bash
uv remove <package-name>
```

#### **Common Commands Reference**

Here's a quick summary of the basic dependency commands in `uv` and their traditional equivalents:

| Command | Action | Traditional (Multi-Step) Equivalent |
| :--- | :--- | :--- |
| `uv add <pkg>` | Adds the latest version of a package | `pip install <pkg>` then manually add to `requirements.txt` |
| `uv add "<pkg><version>"` | Adds a package with version constraint | `pip install "<pkg><version>"` then manual file updates |
| `uv remove <pkg>` | Removes a package cleanly | `pip uninstall <pkg>` then manual file cleanup |
| `uv sync` | Installs exact versions from lock file | `pip install -r requirements.txt` (but less reliable) |

### Step 3: Making the Environment Available in Jupyter

Our environment is ready, but Jupyter doesn't know about it yet. We need to install `ipykernel` and register it as a Jupyter "kernel".

**First, add `ipykernel` as a development dependency:**

```bash
uv add --dev ipykernel
```
>This command adds `ipykernel` to a special `[dev]` group. This is a best practice because `ipykernel` is a development tool, not a core dependency of our project.

**Then, register the kernel using `uv run`:**

```bash
uv run ipython kernel install --user --env VIRTUAL_ENV $(pwd)/.venv --name="dataset-catalog-workshop"
```

**After registration:**
- Restart Jupyter Lab/Notebook
- Select "dataset-catalog" as your kernel from the kernel dropdown
- You should now have access to all installed packages

**Further Reading:** 

This process follows the standard method for connecting virtual environments to Jupyter, as recommended in the **[official `uv` documentation on Jupyter Integration](https://docs.astral.sh/uv/guides/integration/jupyter/#using-jupyter-within-a-project)**.

## 3. Verification: Testing Our Environment

Now let's verify that our environment is working correctly and all libraries are available.

> 🔄 **Important:** Make sure you've changed your notebook kernel to "dataset-catalog" via `Kernel > Change kernel` in the notebook interface.

In [1]:
import pandas as pd
import xarray as xr
import intake
import dask
import sys

print("✅ All libraries loaded successfully!")
print("-" * 30)
print(f"Pandas version: {pd.__version__}")
print(f"Xarray version: {xr.__version__}")
print(f"Intake version: {intake.__version__}")
print(f"Dask version: {dask.__version__}")
print("-" * 30)
print(f"Running from Python executable at: {sys.executable}")
print(f"Python version: {sys.version}")
print("-" * 30)
print("🎉 Environment setup complete! Ready for the workshop.")

✅ All libraries loaded successfully!
------------------------------
Pandas version: 2.3.1
Xarray version: 2025.8.0
Intake version: 2.0.8
Dask version: 2025.7.0
------------------------------
Running from Python executable at: /home/sungche/dataset-catalog/.venv/bin/python
Python version: 3.11.13 (main, Jun 26 2025, 21:19:53) [Clang 20.1.4 ]
------------------------------
🎉 Environment setup complete! Ready for the workshop.


### Additional Verification Tests

Let's also verify that our key data science libraries are working properly:

In [2]:
# Test Pandas
print("Testing Pandas...")
df_pandas = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['New York', 'London', 'Tokyo']
})
print(f"Created DataFrame with shape: {df_pandas.shape}")
print(df_pandas.head())

Testing Pandas...
Created DataFrame with shape: (3, 3)
      name  age      city
0    Alice   25  New York
1      Bob   30    London
2  Charlie   35     Tokyo


In [3]:
# Test Xarray
print("Testing Xarray...")
import numpy as np

# Create a simple 3D array (time, lat, lon)
data = np.random.rand(10, 5, 8)
coords = {
    'time': pd.date_range('2023-01-01', periods=10),
    'lat': np.linspace(-90, 90, 5),
    'lon': np.linspace(-180, 180, 8)
}
da = xr.DataArray(data, coords=coords, dims=['time', 'lat', 'lon'], name='temperature')
print(f"Created DataArray with shape: {da.shape}")
print(da)

Testing Xarray...
Created DataArray with shape: (10, 5, 8)
<xarray.DataArray 'temperature' (time: 10, lat: 5, lon: 8)> Size: 3kB
array([[[0.22346567, 0.42507899, 0.87154802, 0.61154616, 0.9364686 ,
         0.8611716 , 0.36821849, 0.54035126],
        [0.12247514, 0.05833557, 0.83826056, 0.241185  , 0.7602889 ,
         0.21085674, 0.91653239, 0.78295684],
        [0.39672825, 0.50790339, 0.2658263 , 0.57536602, 0.98529021,
         0.46852209, 0.82207303, 0.73590505],
        [0.75470253, 0.67638644, 0.3764746 , 0.15675023, 0.71697019,
         0.13609279, 0.69116073, 0.52801706],
        [0.23131365, 0.16682917, 0.16527334, 0.58417078, 0.64426829,
         0.60535252, 0.71375071, 0.67478242]],

       [[0.90277955, 0.8320828 , 0.07455613, 0.8328408 , 0.72185153,
         0.59623708, 0.78192752, 0.13035124],
        [0.2536184 , 0.015503  , 0.39260139, 0.75361769, 0.54781059,
         0.79969561, 0.02305845, 0.05632728],
        [0.15260089, 0.05760107, 0.8416598 , 0.86724957, 0.92826

## 4. Recap: What We Gained with `uv`

### Key Benefits Achieved

* **⚡️ Speed:** We set up a complex environment in seconds, not minutes
* **📦 Unified Tooling:** We used a single tool (`uv`) instead of juggling `venv`, `pip`, and `pip-tools`
* **🔄 Perfect Reproducibility:** Our dependencies are locked in `pyproject.toml` and `uv.lock`, ensuring any team member can create an identical environment with `uv sync`
* **🎯 Modern Standards:** We're using `pyproject.toml`, which is the modern Python packaging standard (PEP 518)

### What `uv` Accomplished for Us

1. **Environment Creation:** Created an isolated Python environment (`.venv`)
2. **Lightning-Fast Resolution:** Resolved complex dependency trees in seconds
3. **Package Installation:** Downloaded and installed all packages efficiently
4. **Automatic Lock Files:** Created exact version specifications in `uv.lock`
5. **Project Definition:** Set up proper `pyproject.toml` configuration

### Traditional vs Modern Workflow Comparison

| Task | Traditional Way | The `uv` Way | Time Saved |
|------|----------------|--------------|-------------|
| Create environment | `python -m venv .venv` | `uv venv` | ~Same |
| Activate & run | `source .venv/bin/activate` + `python main.py` | `uv run main.py` | **50% faster** |
| Install packages | `pip install pandas xarray ...` | `uv add pandas xarray ...` | **10-100x faster** |
| Lock dependencies | `pip freeze > requirements.txt` | **Automatic** in `uv.lock` | **Manual → Automatic** |
| Reproduce environment | `pip install -r requirements.txt` | `uv sync` | **More reliable** |

## 5. Useful `uv` Commands Reference


Here are some essential `uv` commands you'll use in your daily workflow:

### Project Management
```bash
uv init                    # Initialize a new project
uv add <package>          # Add a new dependency
uv remove <package>       # Remove a dependency
uv sync                   # Install dependencies from lock file
```

### Environment Management
```bash
uv venv                   # Create virtual environment only
uv run <command>          # Run command in the environment
uv run python script.py  # Run Python script with dependencies
uv run jupyter lab        # Start Jupyter Lab with all dependencies
```

### Information Commands
```bash
uv tree                   # Show dependency tree
uv list                   # List installed packages
uv show <package>         # Show package information
```

### Advanced Usage
```bash
uv add <package> --dev     # Add a development dependency
uv python list             # List all Python versions installed via uv
uv python install <version> # Install a specific Python version
```


common shell: https://docs.astral.sh/uv/getting-started/features/