[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MLMI2-CSSI/foundry/blob/main/examples/01_quickstart/quickstart.ipynb)

---

# Foundry Quickstart

**Time:** 5 minutes  
**Prerequisites:** Python basics  
**What you'll learn:** Search, load, and use a materials science dataset

---

## Installation

```bash
pip install foundry-ml
```

## 1. Connect to Foundry

In [1]:
from foundry import Foundry

# Create a Foundry client (uses HTTPS download by default)
# For Google Colab or cloud environments, add: no_browser=True, no_local_server=True
f = Foundry()

## 2. Search for Datasets

Search by keyword - you don't need to know exact dataset names.

In [2]:
# Search for band gap datasets (a key property in materials science)
results = f.search("band gap", limit=5)
results

Unnamed: 0,dataset_name,title,year,DOI
0,foundry_g4mp2_solvation_v1.2,DFT Estimates of Solvation Energy in Multiple ...,root=2022,10.18126/jos5-wj65


## 3. Load a Dataset

In [3]:
# Get the first result
dataset = results.iloc[0].FoundryDataset

# Load the data
data = dataset.get_as_dict()

# Most datasets have train/test splits with (inputs, targets)
print(f"Available splits: {list(data.keys())}")

TransferAPIError: ('GET', 'https://transfer.api.globus.org/v0.10/operation/endpoint/82f1b5c6-6e9b-11e5-ba47-22000b92c6ec/ls?path=%2Ffoundry%2Ffoundry_g4mp2_solvation_v1.2%2F', 'Bearer', 502, 'ExternalError.DirListingFailed.ConnectFailed', 'Command Failed: Error (connect)\nEndpoint: globuspublish#mdf-publications (82f1b5c6-6e9b-11e5-ba47-22000b92c6ec)\nServer: 141.142.218.119:443\nMessage: Could not connect to server\n---\nDetails: globus_xio: The GSI XIO driver failed to establish a connection via the underlying protocol.\\nglobus_xio: Unable to connect to 141.142.218.119:443\\nglobus_xio: System error in connect: No route to host\\nglobus_xio: A system call failed: No route to host\\n\n', 'R8Vt4ibVW')

In [None]:
# Extract training data
X_train, y_train = data['train']

print(f"Input features: {type(X_train)}")
print(f"Targets: {type(y_train)}")

## 4. Use the Data

Foundry data works with any ML framework.

In [None]:
# Example: Simple sklearn model
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
import pandas as pd

# Convert to arrays if needed
if isinstance(X_train, dict):
    X_train_df = pd.DataFrame(X_train)
    y_train_arr = list(y_train.values())[0]  # Get first target
else:
    X_train_df = X_train
    y_train_arr = y_train

print(f"Training samples: {len(X_train_df)}")
print(f"Features: {X_train_df.shape[1] if hasattr(X_train_df, 'shape') else 'N/A'}")

## 5. Get Citation

Always cite datasets you use in publications!

In [None]:
citation = dataset.get_citation()
print(citation)

## Summary

```python
from foundry import Foundry

f = Foundry()  # HTTPS download by default
results = f.search("band gap")
dataset = results.iloc[0].FoundryDataset
X, y = dataset.get_as_dict()['train']
```

**Next:** See `02_working_with_data.ipynb` for advanced data handling.