- install uv: https://docs.astral.sh/uv/getting-started/installation/#__tabbed_1_1
- common shell: https://docs.astral.sh/uv/getting-started/features/

## 1. The Problem: Why Do We Need `uv`?



For years, Python developers have relied on a combination of tools to manage projects:

* **`venv`**: To create isolated environments.
* **`pip`**: To install packages.
* **`pip-tools`**: To pin dependencies and create `requirements.txt`.

### The Pain Points:

* **Slow:** `pip`'s dependency resolution can be very slow, especially in complex projects.
* **Juggling Tools:** You need to learn and manage multiple tools and files (`venv`, `pip`, `requirements.txt`, `setup.py`).
* **Inconsistent Environments:** It can be difficult to ensure every team member has the exact same package versions.

* **Difficult Migration**: Replicating a local environment on a different system, like a server, Docker container, or an HPC cluster, is often complex. Manually managed requirements.txt files can be brittle and lead to errors during deployment.

### The `uv` Solution:
`uv` is an extremely fast Python package installer and resolver, written in Rust. It's designed to be an all-in-one, drop-in replacement for the tools mentioned above.

| Feature             | Traditional Way (`pip` + `venv`) | The `uv` Way                        |
| ------------------- | -------------------------------- | ----------------------------------- |
| **Speed** | Slow                             | ⚡️ Blazingly Fast                   |
| **Tools** | Multiple (pip, venv, pip-tools)  | 📦 All-in-One                       |
| **Project Standard**| `requirements.txt`               | `pyproject.toml` (Modern Standard)  |

## 2. Live Demo: Setting Up Our Project

### Step 0: Get Started

Before we begin, you'll need to get the project files. Choose one of the two methods below to prepare your environment.

#### Option 1: Clone with Git (Recommended)

This is the best way to get the latest version of the project and is a common practice in software development.

First, clone the repository and navigate into the project directory:

```bash
git clone https://github.com/Isongzhe/dataset-catalog.git
cd dataset-catalog
```

Once inside the folder, back up the existing configuration files so you can practice creating them from scratch:

```bash
mv pyproject.toml pyproject.toml.bak
mv uv.lock uv.lock.bak
```

#### Option 2: Copy from Local Storage

If you've already downloaded the project files to a local drive or NAS, this method is a fast way to get started.

Make sure you're in an empty directory. Then, copy the project's contents into your current folder:

```bash
cp -r /home/NAS/homes/sungche-10024/workshop/dataset-catalog/* .
```

After copying the files, back up the existing `uv` configuration files:

```bash
mv pyproject.toml pyproject.toml.bak
mv uv.lock uv.lock.bak
```

### Step 1: Initialize the Project

First, we'll initialize our project. This is the equivalent of creating a virtual environment and a project definition file all at once.

**Command to run in terminal:**

```bash
uv init --python <version> --name <project-name>
```

**What this does:**
- Creates a `README.md` & `.gitignore` & `.python-version` file
- Creates a `pyproject.toml` file (modern, standardized way to define a Python project)


### Step 2: Add & Manage Dependencies

Now we will add and manage our project's packages. The main command is `uv add`, which is a powerful, all-in-one tool for this.

> ⚡️ **Speed Comparison:** `uv` uses an advanced package resolver written in Rust that is often **10-100x faster** than `pip`. It resolves all dependencies, downloads packages, and installs them in seconds\!

-----

#### **Adding Packages**

Let's install all the packages we need. We'll install the latest versions of some, a specific version of another, and even one with optional extras, all in a single command.

**Command to run in terminal:**

```bash
uv add xarray "zarr<3.0" <package-name>
```

other dependence in this workshop
```bash
uv add intake-xarray intake-parquet dask h5netcdf jinja2
```


#### **Removing Packages**

What if you add a package by mistake? `uv` makes removal just as clean and simple.

**Command to run in terminal:**

```bash
uv remove <pacakage-name>
```

#### **Common Command**

Here is a quick summary of the basic dependency commands in `uv` and their traditional equivalents.

| Command | Action | Traditional (Multi-Step) Equivalent |
| :--- | :--- | :--- |
| `uv add <pkg>` | Adds the latest version of a package. | `pip install <pkg>` then manually add to `requirements.txt` / `pyproject.toml`. |
| `uv add "<pkg> <version>"` | Adds a package with a version constraint. | `pip install "<pkg> <version>"` then manually add to `requirements.txt` / `pyproject.toml`. |
| `uv remove <pkg>` | Removes a package. | `pip uninstall <pkg>` then manually remove from `requirements.txt` / `pyproject.toml`. |
| `uv sync` | Installs all packages listed in the lock file to match the project's state exactly. | `pip install -r requirements.txt` |

### Step 3: Making the Environment Available in Jupyter

Our environment is ready, but Jupyter doesn't know about it yet. We need to install `ipykernel` and then register it as a Jupyter "kernel".

**First, add `ipykernel` as a development dependency:**

```bash
uv add --dev ipykernel
```
>This command adds `ipykernel` to a special `[dev]` group. This is a best practice because `ipykernel` is a tool for our development process, not a core dependency of our project itself.


**Then, register the kernel using `uv run`:**

```bash
uv run ipython kernel install --user --env VIRTUAL_ENV $(pwd)/.venv --name=<kernel-name>
```


**Further Reading:** 

This process is the standard method for connecting a virtual environment to Jupyter, as recommended in the **[official `uv` documentation on Jupyter Integration](https://docs.astral.sh/uv/guides/integration/jupyter/#using-jupyter-within-a-project)**.

## 3. Verification: Testing Our Environment

Now let's verify that our environment is working correctly and all libraries are available.

> 🔄 **Important:** Make sure you've changed your notebook kernel to "modern-data-workflow" via `Kernel > Change kernel` in the notebook interface.

In [4]:
import pandas as pd
import xarray as xr
import intake
import dask
import sys

print("✅ All libraries loaded successfully!")
print("-" * 30)
print(f"Pandas version: {pd.__version__}")
print(f"Xarray version: {xr.__version__}")
print(f"Intake version: {intake.__version__}")
print(f"Dask version: {dask.__version__}")
print("-" * 30)
print(f"Running from Python executable at: {sys.executable}")
print(f"Python version: {sys.version}")
print("-" * 30)
print("🎉 Environment setup complete! Ready for the workshop.")

✅ All libraries loaded successfully!
------------------------------
Pandas version: 2.3.1
Xarray version: 2025.8.0
Intake version: 2.0.8
Dask version: 2025.7.0
------------------------------
Running from Python executable at: /home/NAS/homes/sungche-10024/workshop/dataset-catalog/.venv/bin/python
Python version: 3.11.13 (main, Jun 26 2025, 21:19:53) [Clang 20.1.4 ]
------------------------------
🎉 Environment setup complete! Ready for the workshop.


### Additional Verification Tests

Let's also verify that our key data science libraries are working properly:

In [5]:
# Test Pandas
print("Testing Pandas...")
df_pandas = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['New York', 'London', 'Tokyo']
})
print(f"Created DataFrame with shape: {df_pandas.shape}")
print(df_pandas.head())

Testing Pandas...
Created DataFrame with shape: (3, 3)
      name  age      city
0    Alice   25  New York
1      Bob   30    London
2  Charlie   35     Tokyo


In [6]:
# Test Xarray
print("Testing Xarray...")
import numpy as np

# Create a simple 3D array (time, lat, lon)
data = np.random.rand(10, 5, 8)
coords = {
    'time': pd.date_range('2023-01-01', periods=10),
    'lat': np.linspace(-90, 90, 5),
    'lon': np.linspace(-180, 180, 8)
}
da = xr.DataArray(data, coords=coords, dims=['time', 'lat', 'lon'], name='temperature')
print(f"Created DataArray with shape: {da.shape}")
print(da)

Testing Xarray...
Created DataArray with shape: (10, 5, 8)
<xarray.DataArray 'temperature' (time: 10, lat: 5, lon: 8)> Size: 3kB
array([[[0.91989529, 0.77009264, 0.17186946, 0.30644491, 0.90954196,
         0.98477563, 0.73067798, 0.08011099],
        [0.49331844, 0.5022603 , 0.1162546 , 0.365627  , 0.35561235,
         0.24169421, 0.17839048, 0.2321403 ],
        [0.99748857, 0.20443742, 0.42629079, 0.50778914, 0.56204698,
         0.64404535, 0.54593008, 0.30813897],
        [0.22666878, 0.77469908, 0.4308849 , 0.56449528, 0.56825295,
         0.77098414, 0.16593902, 0.72316592],
        [0.59620473, 0.26608022, 0.96084171, 0.97900203, 0.28337268,
         0.05369823, 0.05751729, 0.57698662]],

       [[0.51381116, 0.8167426 , 0.20355046, 0.96403599, 0.00649609,
         0.57327039, 0.44447663, 0.37651989],
        [0.61270128, 0.76053223, 0.10451188, 0.29880618, 0.14827027,
         0.96773942, 0.89200144, 0.34983347],
        [0.38525159, 0.77857457, 0.45395781, 0.3694412 , 0.53283

## 4. Recap: What We Gained with `uv`


### Key Benefits

* **⚡️ Speed:** We set up a complex environment in seconds, not minutes.
* **📦 All-in-One:** We used a single tool (`uv`) instead of juggling `venv`, `pip`, and `pip-tools`.
* **🔄 Reproducibility:** Our dependencies are locked in `pyproject.toml`, so any team member can create an identical environment with a simple `uv sync`.
* **🎯 Modern Standards:** We're using `pyproject.toml` which is the modern Python packaging standard.

### What `uv` Did for Us

1. **Environment Creation:** Created an isolated Python environment (`.venv`)
2. **Dependency Resolution:** Resolved complex dependency trees ultra-fast
3. **Package Installation:** Downloaded and installed all packages
4. **Lock File Generation:** Created exact version specifications
5. **Project Definition:** Set up proper `pyproject.toml` configuration

### Traditional vs Modern Workflow

| Task | Traditional Way | The `uv` Way |
|------|----------------|--------------|
| Create environment | `python -m venv .venv` | `uv venv`|
| Run script | `source .venv/bin/activate` + `python run main.py` | `uv run main.py`|
| Install packages | `pip install pandas polars ...` | `uv add pandas polars ...` |
| Lock dependencies | `pip freeze > requirements.txt` | Automatic in `uv.lock` |
| Reproduce environment | `pip install -r requirements.txt` | `uv sync` |

## 5. Useful `uv` Commands Reference


Here are some essential `uv` commands you'll use in your daily workflow:

### Project Management
```bash
uv init                    # Initialize a new project
uv add <package>          # Add a new dependency
uv remove <package>       # Remove a dependency
uv sync                   # Install dependencies from lock file
```

### Environment Management
```bash
uv venv                   # Create virtual environment only
uv run <command>          # Run command in the environment
uv run python script.py  # Run Python script with dependencies
uv run jupyter lab        # Start Jupyter Lab with all dependencies
```

### Information Commands
```bash
uv tree                   # Show dependency tree
uv list                   # List installed packages
uv show <package>         # Show package information
```

### Advanced Usage
```bash
uv add <package> --dev     # Add a development dependency
uv python list             # List all Python versions installed via uv
uv python install <version> # Install a specific Python version
```


common shell: https://docs.astral.sh/uv/getting-started/features/