In [1]:
%%html
<!-- Improve the styling of the Notebook. -->
<link href="https://fonts.googleapis.com/css2?family=Source+Code+Pro&family=Source+Sans+3&family=Source+Serif+4:opsz@8..60&display=swap" rel="stylesheet">
<style>
   div.jp-MarkdownOutput p { font-family: 'Source Serif 4', serif; width: 50em; }
   div.jp-MarkdownOutput h1,h2,h3,h4,h5,h6 { font-family: 'Source Sans 3', sans-serif; }
   div.cm-line { font-family: 'Source Code Pro', monospace; }
</style>

In [2]:
import hail as hl
hl.init()  # Not necessary, but sometimes you need to configure Hail by passing arguments to hl.init

SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier.
SLF4J: Ignoring binding found at [jar:file:/Users/dking/miniconda3/lib/python3.10/site-packages/pyspark/jars/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See https://www.slf4j.org/codes.html#ignoredBindings for an explanation.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.3.2
SparkUI available at http://wm28c-761.broadinstitute.org:4040
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.120-f00f916faf78
LOGGING: writing to /Users/dking/projects/ww2023/notebooks/hail-20230829-1555-0.2.120-f00f916faf78.log


# Importing a TSV File as a Hail Table

[`hl.import_table`](https://hail.is/docs/0.2/methods/impex.html#hail.methods.import_table), by default, imports tab-separated files, but supports many kinds of delimiters. Hail can import many kinds of files, such as VCF, PLINK, UCSC BED, BGEN, and GEN, see the [Import section](https://hail.is/docs/0.2/methods/impex.html#import) of the docs for details.

In [3]:
ht = hl.import_table('data/sample_table.tsv', impute=True, min_partitions=2)
ht

2023-08-29 15:55:53.284 Hail: INFO: Reading table to impute column types 1) / 1]
2023-08-29 15:55:56.328 Hail: INFO: Finished type imputation
  Loading field 'name' as type str (imputed)
  Loading field 'age' as type int32 (imputed)
  Loading field 'freckles' as type str (imputed)


<hail.table.Table at 0x28719ceb0>

The printed form of a table is the inscrutable `<hail.table.Table ...>` because Hail has not yet run anything. The table is just a recipe with one step: import a TSV. We must explicitly request that the recipe is executed with an _action_. We can use the action [`Table.show`](https://hail.is/docs/0.2/hail.Table.html#hail.Table.show) (you can click on that) to see the first few rows:

In [4]:
ht.show(n=3)

name,age,freckles
str,int32,str
"""Alice""",25,"""Yes"""
"""Bob""",35,"""No"""
"""Charlie""",28,"""No"""


# Describing a Table

We can also use [`Table.describe`](https://hail.is/docs/0.2/hail.Table.html#hail.Table.describe), which is not an action. It lists all the fields the recipe will produce without executing the recipe.

In [5]:
ht.describe()

----------------------------------------
Global fields:
    None
----------------------------------------
Row fields:
    'name': str 
    'age': int32 
    'freckles': str 
----------------------------------------
Key: []
----------------------------------------


Tables can have "keys". If a table has a key, then Hail ensures the table is sorted by its key. Keys are important for combining two tables or combining a table and a matrix table.

# Filtering to Certain Rows

[`Table.filter`](https://hail.is/docs/0.2/hail.Table.html#hail.Table.filter) creates a new recipe that both imports the table _and_ keeps only certain rows:

In [6]:
ht.filter(ht.age > 30).show(n=3)

name,age,freckles
str,int32,str
"""Bob""",35,"""No"""
"""David""",40,"""No"""
"""Zoe""",31,"""Yes"""


Notice that the above command did not modify `ht`. Run `ht.show()` again to verify that:

In [7]:
ht.show(n=3)

name,age,freckles
str,int32,str
"""Alice""",25,"""Yes"""
"""Bob""",35,"""No"""
"""Charlie""",28,"""No"""


### Exercise

Filter to people who have freckles

In [8]:
ht.filter(ht.freckles == "Yes").show()

name,age,freckles
str,int32,str
"""Alice""",25,"""Yes"""
"""Eve""",22,"""Yes"""
"""Zoe""",31,"""Yes"""
"""Bella""",33,"""Yes"""
"""Dana""",27,"""Yes"""
"""Yara""",23,"""Yes"""
"""Brooke""",26,"""Yes"""
"""Lena""",38,"""Yes"""
"""Nina""",29,"""Yes"""
"""Paige""",24,"""Yes"""


### Exercise

Filter to people who have freckles _and_ are older than thirty. Hint: use [`hl.all`](https://hail.is/docs/0.2/functions/collections.html#hail.expr.functions.all)

In [9]:
ht.filter(
    hl.all(
        ht.freckles == "Yes",
        ht.age > 30,
    )
).show()

name,age,freckles
str,int32,str
"""Zoe""",31,"""Yes"""
"""Bella""",33,"""Yes"""
"""Lena""",38,"""Yes"""
"""Samantha""",35,"""Yes"""
"""Grace""",31,"""Yes"""
"""Mia""",39,"""Yes"""
"""Olivia""",32,"""Yes"""
"""Wendy""",42,"""Yes"""
"""Alice""",35,"""Yes"""
"""Charlotte""",38,"""Yes"""


# Head and Tail of the Dataset

[`Table.head`](https://hail.is/docs/0.2/hail.Table.html#hail.Table.head) and [`Table.tail`](https://hail.is/docs/0.2/hail.Table.html#hail.Table.tail) filter the dataset to the first few or last few rows.

In [10]:
ht.head(5).show()

name,age,freckles
str,int32,str
"""Alice""",25,"""Yes"""
"""Bob""",35,"""No"""
"""Charlie""",28,"""No"""
"""David""",40,"""No"""
"""Eve""",22,"""Yes"""


In [11]:
ht.tail(5).show()

name,age,freckles
str,int32,str
"""Amina""",40,"""No"""
"""Adedeji""",24,"""Yes"""
"""Hiroko""",21,"""No"""
"""Kazi""",25,"""Yes"""
"""Ezio""",23,"""No"""


# Adding New Fields with Annotate

Usually we build up one big recipe by repeatedly mutating the same variable. Let's do that and add a new field using [`Table.annotate`](https://hail.is/docs/0.2/hail.Table.html#hail.Table.annotate):

In [12]:
ht = ht.annotate(is_twenty_something = hl.all(ht.age >= 20, ht.age < 30))

In [13]:
ht.show(n=3)

name,age,freckles,is_twenty_something
str,int32,str,bool
"""Alice""",25,"""Yes""",True
"""Bob""",35,"""No""",False
"""Charlie""",28,"""No""",True


We can also convert the freckles field into a Boolean field with [`hl.case`](https://hail.is/docs/0.2/functions/core.html#hail.expr.functions.case) which is one of the many [core language functions](https://hail.is/docs/0.2/functions/core.html#hail.expr.functions.case) in Hail's standard library.

In [14]:
ht = ht.annotate(has_freckles = (
    hl.case()
    .when(ht.freckles == "Yes", True)
    .when(ht.freckles == "No", False)
    .or_error(hl.format("Expected \"Yes\" or \"No\" for the field \"freckles\" but found: %s", ht.freckles))
))

In [15]:
ht.show()

name,age,freckles,is_twenty_something,has_freckles
str,int32,str,bool,bool
"""Alice""",25,"""Yes""",True,True
"""Bob""",35,"""No""",False,False
"""Charlie""",28,"""No""",True,False
"""David""",40,"""No""",False,False
"""Eve""",22,"""Yes""",True,True
"""Zoe""",31,"""Yes""",False,True
"""Aaron""",29,"""No""",True,False
"""Bella""",33,"""Yes""",False,True
"""Chris""",41,"""No""",False,False
"""Dana""",27,"""Yes""",True,True


There are two ways to remove the old `freckles` field: [`Table.select`](https://hail.is/docs/0.2/hail.Table.html#hail.Table.select) and [`Table.drop`](https://hail.is/docs/0.2/hail.Table.html#hail.Table.drop):

In [16]:
ht.select('name', 'age', 'is_twenty_something', 'has_freckles').show(n=3)

name,age,is_twenty_something,has_freckles
str,int32,bool,bool
"""Alice""",25,True,True
"""Bob""",35,False,False
"""Charlie""",28,True,False


In [17]:
ht.drop('freckles').show(n=3)

name,age,is_twenty_something,has_freckles
str,int32,bool,bool
"""Alice""",25,True,True
"""Bob""",35,False,False
"""Charlie""",28,True,False


### Exercise

Add a field that describes the person: "Freckled twenty-something", "Freckled thirty-something", "Unfreckled twenty-something", etc.

In [19]:
freckles_description = hl.if_else(
    ht.has_freckles,
    "Freckled",
    "Unfreckled"
)

age_group = (
    hl.case()
    .when(hl.all(ht.age >= 10, ht.age < 20), "teenager.")
    .when(hl.all(ht.age >= 20, ht.age < 30), "twenty-something.")
    .when(hl.all(ht.age >= 30, ht.age < 40), "thirty-something.")
    .when(hl.all(ht.age >= 40, ht.age < 50), "fourty-something.")
    .or_error(hl.format("Unknown age: %s", ht.age))
)

ht.annotate(
    description = freckles_description + " " + age_group
).show()

name,age,freckles,is_twenty_something,has_freckles,description
str,int32,str,bool,bool,str
"""Alice""",25,"""Yes""",True,True,"""Freckled twenty-something."""
"""Bob""",35,"""No""",False,False,"""Unfreckled thirty-something."""
"""Charlie""",28,"""No""",True,False,"""Unfreckled twenty-something."""
"""David""",40,"""No""",False,False,"""Unfreckled fourty-something."""
"""Eve""",22,"""Yes""",True,True,"""Freckled twenty-something."""
"""Zoe""",31,"""Yes""",False,True,"""Freckled thirty-something."""
"""Aaron""",29,"""No""",True,False,"""Unfreckled twenty-something."""
"""Bella""",33,"""Yes""",False,True,"""Freckled thirty-something."""
"""Chris""",41,"""No""",False,False,"""Unfreckled fourty-something."""
"""Dana""",27,"""Yes""",True,True,"""Freckled twenty-something."""


### Exercise

Add a field named "decades" indicating how many full decades this person has been alive.

In [20]:
ht.annotate(
    decades = ht.age // 10
).show()

name,age,freckles,is_twenty_something,has_freckles,decades
str,int32,str,bool,bool,int32
"""Alice""",25,"""Yes""",True,True,2
"""Bob""",35,"""No""",False,False,3
"""Charlie""",28,"""No""",True,False,2
"""David""",40,"""No""",False,False,4
"""Eve""",22,"""Yes""",True,True,2
"""Zoe""",31,"""Yes""",False,True,3
"""Aaron""",29,"""No""",True,False,2
"""Bella""",33,"""Yes""",False,True,3
"""Chris""",41,"""No""",False,False,4
"""Dana""",27,"""Yes""",True,True,2


# Aggregating a Table to a Single Python Value

Another "action" we can use to execute a Hail table's recipe is [`Table.aggregate`](https://hail.is/docs/0.2/hail.Table.html#hail.Table.aggregate). Let's use the [`hl.agg.mean`](https://hail.is/docs/0.2/aggregators.html#hail.expr.aggregators.mean) aggregator from the [`hl.agg`](https://hail.is/docs/0.2/aggregators.html) module.

In [21]:
ht.aggregate(hl.agg.mean(ht.age))

31.140845070422536

Each time we execute an action, the entire table recipe is executed from the beginning. For example, consider how long it takes to execute four aggregations:

In [22]:
%%time
mean_age = ht.aggregate(hl.agg.mean(ht.age))
sum_age = ht.aggregate(hl.agg.sum(ht.age))
max_age = ht.aggregate(hl.agg.max(ht.age))
min_age = ht.aggregate(hl.agg.min(ht.age))

(mean_age, sum_age, max_age, min_age)

CPU times: user 19.7 ms, sys: 6.02 ms, total: 25.7 ms
Wall time: 2.37 s


(31.140845070422536, 6633, 43, 20)

Instead of executing the table's recipe four times, once for each aggregator, we can execute the recipe once computing the four aggregators in parallel:

In [23]:
%%time
mean_age, sum_age, max_age, min_age = ht.aggregate(
    (
        hl.agg.mean(ht.age),
        hl.agg.sum(ht.age),
        hl.agg.max(ht.age),
        hl.agg.min(ht.age),
    )
)

(mean_age, sum_age, max_age, min_age)

CPU times: user 13.1 ms, sys: 5.4 ms, total: 18.5 ms
Wall time: 1.06 s


(31.140845070422536, 6633, 43, 20)

### Exercise

Count the number of people having freckles and the number of people not having freckles.

In [25]:
ht.aggregate(hl.struct(
    n_having_freckles = hl.agg.filter(ht.has_freckles, hl.agg.count()),
    n_not_having_freckles = hl.agg.filter(~ht.has_freckles, hl.agg.count()),
))

Struct(n_having_freckles=108, n_not_having_freckles=105)

In [26]:
ht.aggregate(
    hl.agg.group_by(ht.has_freckles, hl.agg.count())
)

{False: 105, True: 108}

### Exercise

Count the number of people whose name start with "A". Hint: use ht.field[0].

In [27]:
ht.aggregate(
    hl.agg.count_where(ht.name[0] == "A")
)

22

In [28]:
first_letter = ht.name[0]
ht.aggregate(
    hl.agg.count_where(first_letter == "A")
)

22

In [29]:
xx = ht
xx = xx.annotate(first_letter = ht.name[0])
xx.aggregate(
    hl.agg.count_where(xx.first_letter == "A")
)

22

### Exercise

Extra hard: for each letter of the alphabet, count the number of people whose names start with that letter. Hint: [`hl.agg.group_by`](https://hail.is/docs/0.2/aggregators.html#hail.expr.aggregators.group_by).

In [30]:
ht.aggregate(
    hl.agg.group_by(ht.name[0], hl.agg.count())
)

{'A': 22,
 'B': 8,
 'C': 12,
 'D': 6,
 'E': 12,
 'F': 4,
 'G': 5,
 'H': 10,
 'I': 7,
 'J': 9,
 'K': 5,
 'L': 14,
 'M': 14,
 'N': 8,
 'O': 7,
 'P': 5,
 'Q': 5,
 'R': 11,
 'S': 16,
 'T': 5,
 'U': 5,
 'V': 3,
 'W': 4,
 'X': 3,
 'Y': 5,
 'Z': 8}

# Aggregating within Groups of Rows to Produce a New Table.

Instead of aggregating over the entire table to produce just one value, we can combine groups of rows into new rows by aggregating over each group separately. We use [`Table.group_by`](https://hail.is/docs/0.2/hail.Table.html#hail.Table.group_by) with [`hl.agg.filter`](https://hail.is/docs/0.2/aggregators.html#hail.expr.aggregators.filter), [`hl.agg.count`](https://hail.is/docs/0.2/aggregators.html#hail.expr.aggregators.count), and [`hl.agg.count_where`](https://hail.is/docs/0.2/aggregators.html#hail.expr.aggregators.count_where).

In [31]:
ht.group_by(
    ht.age
).aggregate(
    count_having_freckles = hl.agg.filter(ht.freckles == "Yes", hl.agg.count()),
    count_not_having_freckles = hl.agg.filter(ht.freckles == "No", hl.agg.count()), 
    count_names_starting_with_A = hl.agg.count_where(ht.name[0] == "A"),
)

<hail.table.Table at 0x16d30c340>

Oops! We forgot to use an action, like [`Table.show`](https://hail.is/docs/0.2/hail.Table.html#hail.Table.show), so nothing happened! Let's try again:

In [32]:
ht2 = ht.group_by(
    ht.age
).aggregate(
    count_having_freckles = hl.agg.filter(ht.freckles == "Yes", hl.agg.count()),
    count_not_having_freckles = hl.agg.filter(ht.freckles == "No", hl.agg.count()), 
    count_names_starting_with_A = hl.agg.count_where(ht.name[0] == "A"),
)
ht2.show(n=3)

2023-08-29 16:19:59.448 Hail: INFO: Ordering unsorted dataset with network shuffle


age,count_having_freckles,count_not_having_freckles,count_names_starting_with_A
int32,int64,int64,int64
20,0,3,1
21,1,2,0
22,7,1,0


Notice that we used a new variable name, `ht2`, so that we can still access the old table, `ht` containing all the individual people.

In [33]:
ht.show(n=3)

name,age,freckles,is_twenty_something,has_freckles
str,int32,str,bool,bool
"""Alice""",25,"""Yes""",True,True
"""Bob""",35,"""No""",False,False
"""Charlie""",28,"""No""",True,False


# Plotting Tables

Hail has a new module, [`hail.ggplot`](https://hail.is/docs/0.2/ggplot/index.html#), which implements a grammar of graphics for Hail tables and matrix tables. There is also a [ggplot tutorial](https://hail.is/docs/0.2/tutorials/09-ggplot.html).

In [37]:
from hail.ggplot import *

import plotly
import plotly.io as pio
pio.renderers.default='iframe'

In [38]:
ggplot(ht) + geom_bar(aes(x=ht.age))

In [43]:
ggplot(ht) + geom_bar(aes(x=ht.age, fill=hl.str(hl.if_else(ht.has_freckles, "has freckles", "doesn't"))))

# Writing and Reading Tables in Hail Native Format

Hail has a partitioned, indexed, binary file format for quickly reading and writing datasets. [`Table.write`](https://hail.is/docs/0.2/hail.Table.html#hail.Table.write) is the action which writes a table in Hail native format. We use the ".ht" file extension by convention.

In [46]:
ht.write('output/sample_table.ht')

2023-08-29 16:25:50.099 Hail: INFO: wrote table with 213 rows in 2 partitions to output/sample_table.ht


Writing a Hail table executes the recipe once and saves the results in a file for future use. We recommend writing after importing or after executing computationally intensive pipelines. [`hl.read_table`](https://hail.is/docs/0.2/methods/impex.html#hail.methods.read_table) reads a table in Hail native format. Most operations are faster when starting from a Hail native format table.

In [47]:
ht = hl.read_table('output/sample_table.ht')

# Exporting a Table to a File

Hail tables support export to many file formats including TSV and CSV.

In [48]:
ht.export('output/sample_table.tsv')
ht.export('output/sample_table.csv', delimiter=',')
ht.export('output/sample_table.@sv', delimiter='@')

2023-08-29 16:25:58.548 Hail: INFO: merging 3 files totalling 5.0K...
2023-08-29 16:25:58.595 Hail: INFO: while writing:
    output/sample_table.tsv
  merge time: 42.894ms
2023-08-29 16:25:59.116 Hail: INFO: merging 3 files totalling 5.0K...
2023-08-29 16:25:59.140 Hail: INFO: while writing:
    output/sample_table.csv
  merge time: 22.614ms
2023-08-29 16:25:59.421 Hail: INFO: merging 3 files totalling 5.0K...
2023-08-29 16:25:59.447 Hail: INFO: while writing:
    output/sample_table.@sv
  merge time: 25.775ms


In [49]:
!head output/sample_table.tsv

name	age	freckles	is_twenty_something	has_freckles
Alice	25	Yes	true	true
Bob	35	No	false	false
Charlie	28	No	true	false
David	40	No	false	false
Eve	22	Yes	true	true
Zoe	31	Yes	false	true
Aaron	29	No	true	false
Bella	33	Yes	false	true
Chris	41	No	false	false


In [50]:
!head output/sample_table.csv

name,age,freckles,is_twenty_something,has_freckles
Alice,25,Yes,true,true
Bob,35,No,false,false
Charlie,28,No,true,false
David,40,No,false,false
Eve,22,Yes,true,true
Zoe,31,Yes,false,true
Aaron,29,No,true,false
Bella,33,Yes,false,true
Chris,41,No,false,false


In [51]:
!head output/sample_table.@sv

name@age@freckles@is_twenty_something@has_freckles
Alice@25@Yes@true@true
Bob@35@No@false@false
Charlie@28@No@true@false
David@40@No@false@false
Eve@22@Yes@true@true
Zoe@31@Yes@false@true
Aaron@29@No@true@false
Bella@33@Yes@false@true
Chris@41@No@false@false


We did not compress the outputs for ease of viewing. Exporting large tables uncompressed is almost always a mistake. Hail detects the ".bgz" extension and compresses the output using block GZIP. This is almost always faster than exporting an uncompressed text file.

In [52]:
ht.export('output/sample_table.tsv.bgz')

2023-08-29 16:26:18.066 Hail: INFO: merging 3 files totalling 1.5K...
2023-08-29 16:26:18.093 Hail: INFO: while writing:
    output/sample_table.tsv.bgz
  merge time: 26.254ms


The `INFO` output mentions a "merge time". This is a slow, serial operation in which Hail concatenates the partitioned dataset into a single file. Whenever possible, you should use partitioned text files. [`Table.export`](https://hail.is/docs/0.2/hail.Table.html#hail.Table.export) exports a folder of partitions when `parallel` is set to `header_per_shard` or `separate_header`.

In [53]:
ht.export('output/sample_table_partitions_header_per_shard.tsv/', parallel='header_per_shard')

In [54]:
!head output/sample_table_partitions_header_per_shard.tsv/*

==> output/sample_table_partitions_header_per_shard.tsv/_SUCCESS <==

==> output/sample_table_partitions_header_per_shard.tsv/part-0-d4f7a5e1-0fa6-40d4-8050-cb070468aed1 <==
name	age	freckles	is_twenty_something	has_freckles
Alice	25	Yes	true	true
Bob	35	No	false	false
Charlie	28	No	true	false
David	40	No	false	false
Eve	22	Yes	true	true
Zoe	31	Yes	false	true
Aaron	29	No	true	false
Bella	33	Yes	false	true
Chris	41	No	false	false

==> output/sample_table_partitions_header_per_shard.tsv/part-1-930da4f1-e9d6-4f07-911c-5b98fb72ce52 <==
name	age	freckles	is_twenty_something	has_freckles
Katherine	26	Yes	true	true
Liam	38	No	false	false
Mia	37	No	false	false
Noah	34	Yes	false	true
Olivia	26	Yes	true	true
Peter	22	No	true	false
Quinn	40	Yes	false	true
Ryan	24	No	true	false
Sophia	21	Yes	true	true

==> output/sample_table_partitions_header_per_shard.tsv/shard-manifest.txt <==
part-0-d4f7a5e1-0fa6-40d4-8050-cb070468aed1
part-1-930da4f1-e9d6-4f07-911c-5b98fb72ce52


In [55]:
ht.export('output/sample_table_partitions_separate_header.tsv/', parallel='separate_header')

In [56]:
!head output/sample_table_partitions_separate_header.tsv/*

==> output/sample_table_partitions_separate_header.tsv/_SUCCESS <==

==> output/sample_table_partitions_separate_header.tsv/header <==
name	age	freckles	is_twenty_something	has_freckles

==> output/sample_table_partitions_separate_header.tsv/part-0-c8dad996-20a8-434c-a3b4-4f248fd797e5 <==
Alice	25	Yes	true	true
Bob	35	No	false	false
Charlie	28	No	true	false
David	40	No	false	false
Eve	22	Yes	true	true
Zoe	31	Yes	false	true
Aaron	29	No	true	false
Bella	33	Yes	false	true
Chris	41	No	false	false
Dana	27	Yes	true	true

==> output/sample_table_partitions_separate_header.tsv/part-1-574cb0ef-d8a1-49d9-a895-d7147561fecd <==
Katherine	26	Yes	true	true
Liam	38	No	false	false
Mia	37	No	false	false
Noah	34	Yes	false	true
Olivia	26	Yes	true	true
Peter	22	No	true	false
Quinn	40	Yes	false	true
Ryan	24	No	true	false
Sophia	21	Yes	true	true
Tom	25	No	true	false

==> output/sample_table_partitions_separate_header.tsv/shard-manifest.txt <==
header
part-0-c8dad996-20a8-434c-a3b4-4f248fd797e5
part-1-574cb0

# Collecting a Table to a List or Pandas DataFrame

[`Table.collect`](https://hail.is/docs/0.2/hail.Table.html#hail.Table.collect) collects the distributed & partitioned values of a table into a Python list. This will, obviously, run out of memory if the table is large.

In [57]:
ht.collect()

[Struct(name='Alice', age=25, freckles='Yes', is_twenty_something=True, has_freckles=True),
 Struct(name='Bob', age=35, freckles='No', is_twenty_something=False, has_freckles=False),
 Struct(name='Charlie', age=28, freckles='No', is_twenty_something=True, has_freckles=False),
 Struct(name='David', age=40, freckles='No', is_twenty_something=False, has_freckles=False),
 Struct(name='Eve', age=22, freckles='Yes', is_twenty_something=True, has_freckles=True),
 Struct(name='Zoe', age=31, freckles='Yes', is_twenty_something=False, has_freckles=True),
 Struct(name='Aaron', age=29, freckles='No', is_twenty_something=True, has_freckles=False),
 Struct(name='Bella', age=33, freckles='Yes', is_twenty_something=False, has_freckles=True),
 Struct(name='Chris', age=41, freckles='No', is_twenty_something=False, has_freckles=False),
 Struct(name='Dana', age=27, freckles='Yes', is_twenty_something=True, has_freckles=True),
 Struct(name='Yara', age=23, freckles='Yes', is_twenty_something=True, has_freck

[`Table.to_pandas`](https://hail.is/docs/0.2/hail.Table.html#hail.Table.to_pandas) collects the values into a Pandas DataFrame. As above, large tables will exceed the memory available on your laptop.

In [58]:
ht.to_pandas()

Unnamed: 0,name,age,freckles,is_twenty_something,has_freckles
0,Alice,25,Yes,True,True
1,Bob,35,No,False,False
2,Charlie,28,No,True,False
3,David,40,No,False,False
4,Eve,22,Yes,True,True
...,...,...,...,...,...
208,Amina,40,No,False,False
209,Adedeji,24,Yes,True,True
210,Hiroko,21,No,True,False
211,Kazi,25,Yes,True,True
