# Hello Foundry! ðŸš€

Welcome to Foundry-ML! This notebook will show you how to:

1. **Search** for materials science datasets
2. **Load** a dataset into Python
3. **Explore** the data

No domain expertise required - just Python basics!

## Step 1: Install Foundry

If you haven't already, install Foundry-ML:

In [None]:
# Uncomment the line below to install
# !pip install foundry-ml

## Step 2: Import and Connect

First, import Foundry and create a client. If you're running this in Google Colab or a cloud environment, use `no_browser=True`.

In [None]:
from foundry import Foundry

# Create a Foundry client
# For cloud environments (Colab, etc.), use:
# f = Foundry(no_browser=True, no_local_server=True)
f = Foundry()

## Step 3: Search for Datasets

Let's search for datasets. You can search by keyword - no need to know the exact name!

In [None]:
# Search for datasets related to "band gap" (a property in materials science)
results = f.search("band gap", limit=5)

# Display the results - Foundry shows a nice table in notebooks!
results

## Step 4: Get a Dataset

Pick a dataset from the search results. You can access it by name or DOI.

In [None]:
# Get the first dataset from our search results
dataset = results.iloc[0].FoundryDataset

# Display dataset info
dataset

## Step 5: Understand the Schema

Before loading data, let's see what fields it contains:

In [None]:
# Get the schema - what columns/fields are in this dataset?
schema = dataset.get_schema()

print(f"Dataset: {schema['name']}")
print(f"Data Type: {schema['data_type']}")
print(f"\nSplits: {[s['name'] for s in schema['splits']]}")
print(f"\nFields:")
for field in schema['fields']:
    print(f"  - {field['name']} ({field['role']}): {field['description'] or 'No description'}")

## Step 6: Load the Data

Now let's load the actual data. Foundry handles downloading and caching automatically!

In [None]:
# Load data as a dictionary
data = dataset.get_as_dict()

# See what we got
print(f"Data keys: {data.keys()}")

In [None]:
# For ML datasets, data is typically split into inputs (X) and targets (y)
# Let's explore the training split
if 'train' in data:
    train_data = data['train']
    print(f"Training data shape: {type(train_data)}")
    
    # If it's a tuple of (inputs, targets)
    if isinstance(train_data, tuple) and len(train_data) == 2:
        X, y = train_data
        print(f"\nInputs (X): {type(X)}")
        print(f"Targets (y): {type(y)}")

## Step 7: Use with Your Favorite ML Framework

Foundry datasets work seamlessly with PyTorch and TensorFlow!

In [None]:
# For PyTorch users:
# torch_dataset = dataset.get_as_torch(split='train')
# from torch.utils.data import DataLoader
# loader = DataLoader(torch_dataset, batch_size=32)

# For TensorFlow users:
# tf_dataset = dataset.get_as_tensorflow(split='train')
# model.fit(tf_dataset)

print("Foundry works with PyTorch and TensorFlow out of the box!")

## Step 8: Get the Citation

When you use a dataset in research, cite it properly!

In [None]:
# Get BibTeX citation
citation = dataset.get_citation()
print(citation)

## ðŸŽ‰ That's It!

You've just:
- Connected to Foundry
- Searched for datasets
- Loaded data into Python
- Explored the schema
- Got a proper citation

### Next Steps

- Explore more datasets with `f.list()`
- Check out other examples in the `/examples` folder
- Use the CLI: `foundry search "your query"`
- Read the docs: https://github.com/MLMI2-CSSI/foundry

Happy researching! ðŸ”¬