### Selection Exercise 1

This example uses the same source as previously, but this time we're going to acquire the data and make multiple selections (and have a look at what each selection contains).

All code cells in this notebook can be run directly, by selecting the cell then the play icon on the above ribbon.

![](./images/example1_spreadsheet.png)

In [None]:
# Acquisition

from databaker.framework import *
from tutorialResources.scraper import Scraper

scraper = Scraper("https://www.fake-website.com/example1")
scraper

------
*Once you've run the above cell.....*

You'll see some output, this is an example of metadata acquired by the scraper - the current practice is for any metadata acquired by the scraping process to be output here as a convenience, so the data engineer to see what they've got to work with. On many occasions you'll be needing to expand on this initial metadata.

For now, we're just concerned with the source dataset (i.e observations and dimensions) and how to select them from a spreadsheet with databaker, so be aware of but otherwise ignore the above metadata output.

------

In [None]:
# Selection

# Get the sheets from the scraped distribution as a list of databaker-style "tabs"
# A "tab" as created this way is a "bags of cells", that is to say a python "bag" object, holding a list of "cell" objects
tabs = scraper.distribution.as_databaker()

# Filter out the tabs we dont want with a python list comprehension
# NOTE - databaker "tabs" always have a .name property - it's a good one to remember
tabs = [x for x in tabs if x.name != "sheet1"]

# Typically, we'd iterate the tabs in the following fashion
for tab in tabs:
    
    # do....stuff! (usually)
    pass


# ----
# Instead, for this example we'll do it without the loop - just remember that's the exception rather than the rule

tab = tabs[0] # get our one tab

# Create a that represent the observations
observations = tab.excel_ref("C5").expand(DOWN).expand(RIGHT).is_not_blank()
    
# Create a cellbag with the cells that represent the "Assets" dimension
assets = tab.excel_ref("C3").expand(RIGHT).is_not_blank()
    
# Create a cellbag with the cells that represent the "Group" dimension
group = tab.excel_ref("A5").expand(DOWN).is_not_blank()
    
# Create a cellbag with the cells that represnt the "Name" dimension
name = tab.excel_ref("B5").fill(DOWN).is_not_blank()


-----

## Print the contents of each of the cellbags

We're just going to use fact that jupyter by default prints any lone variables that finish a code cell. Otherwise you'd need to use a python print() statement.

**NOTE**: if you call print an individual cellBag object it will always print "cells" in the form of `<EXCEL_REF, VALUE>`.

In [None]:
observations

In [None]:
assets

In [None]:
group

In [None]:
name