In [1]:
from google.cloud import bigquery

client = bigquery.Client()

DefaultCredentialsError: Your default credentials were not found. To set up Application Default Credentials, see https://cloud.google.com/docs/authentication/external/set-up-adc for more information.

In `BigQuery`, each dataset is contained in a corresponding project. In this case, our `hacker_news` dataset is contained in the `bigquery-public-data` project. To access the dataset,

* We begin by constructing a reference to the dataset with the `dataset()` method.
* Next, we use the `get_dataset()` method, along with the reference we just constructed, to fetch the dataset.


In [None]:

# ! create a dataset reference
dataset_ref = client.dataset("hacker_news", project="bigquery-public-data")

# ! fetch the dataset
dataset = client.get_dataset(dataset_ref)

You can think of a dataset as a spreadsheet file containing multiple tables, all composed of rows and columns.

We use the `list_tables()` method to list the tables in the dataset.

In [None]:
# List all the tables in the "hacker_news" dataset
tables = list(client.list_tables(dataset))

# Print names of all tables in the dataset (there are four!)
for table in tables:  
    print(table.table_id)

we can fetch one table.

In [None]:
# Construct a reference to the "full" table
table_ref = dataset_ref.table("full")

# API request - fetch the table
table = client.get_table(table_ref)

![img](../img/schema1.png)

## Table schema

The structure of a table is called its **schema**. We need to understand a table's schema to effectively pull out the data we want.

In [None]:
#print info about the table "full" we fetch before
table.schema

Each `SchemaField` tells us about a specific column (which we also refer to as a **field**). In order, the information is:

* The name of the column
* The field type (or datatype) in the column
* The mode of the column ('NULLABLE' means that a column allows NULL values, and is the default)
* A description of the data in that column

The first field has the SchemaField:

`SchemaField('by', 'string', 'NULLABLE', "The username of the item's author.",())`

* the field (or column) is called by,
* the data in this field is strings,
* NULL values are allowed, and
* it contains the usernames corresponding to each item's author.

In [None]:
# Preview the first five lines of the "full" table
client.list_rows(table, max_results=5).to_dataframe()

Keep only the first columns and keep 5 rows:

In [None]:
client.list_rows(table, selected_fields=table.schema[:1], max_results=5).to_dataframe()