## Basics of accessing & examining BigQuery datasets

- 1. Create client object
- 2. Retrieve dataset
- 3. View tables in dataset
- 4. Retrieve table from dataset
- 5. View table schema
- 6. View table rows
- 7. View a specific column

In [1]:
from google.cloud import bigquery
import os

## Set Goolge Credentials - pathway to json file
os.environ['GOOGLE_APPLICATION_CREDENTIALS']='####.json'

### 1. Create client object 
- Client object plays a central role in retrieving information from BigQuery datasets

In [2]:
## Create Client object
client = bigquery.Client()

### 2. Retrieve dataset

- In BigQuery, each dataset is contained in a corresponding project 
- In this case, the hacker_news dataset is contained in the bigquery-public-data project 
- To access the dataset: 
    - a) construct a reference to the dataset using the `dataset()` method & 
    - b) use the `get_dataset()` method along with the reference created to retrieve the dataset

In [3]:
## Create reference to the hacker_news dataset
data_ref = client.dataset('hacker_news', project='bigquery-public-data')

## API request - fetch the dataset
dataset = client.get_dataset(data_ref)

## Datatype
print(f"Type: {type(dataset)}")

Type: <class 'google.cloud.bigquery.dataset.Dataset'>


### 3. View tables in dataset
- Every dataset is a collection of tables - (spreadsheeet file with multiple tables)
- Use the `list_tables()` method to list the tables in the dataset & `table_id` to view

In [4]:
## List all the tables in the hacker_news dataset
tables = list(client.list_tables(dataset))

## Print names of all tables in the dataset
for table in tables: print(table.table_id)

comments
full
full_201510
stories


### 4. Retrieve table from dataset
- Similar to how the dataset was retrieved: 
    - a) construct a reference to the dataset using the `table()` method & 
    - b) use the `get_table()` method along with the reference created to retrieve the dataset

In [5]:
## Construct a reference to the table: full
table_ref = data_ref.table('full')

## API request - fetch the table
table = client.get_table(table_ref)

## Datatype
print(f"Type: {type(table)}")

Type: <class 'google.cloud.bigquery.table.Table'>


### 5. View table schema

- View the table's schema to effectively pull out the required data


- Each SchemaField reveals:

    - The name of column
    - Its datatype
    - The mode ('NULLABLE': a column allows NULL values, & is the default)
    - A description of the data for that column

In [6]:
## View column information
for col in table.schema: print(col)

SchemaField('title', 'STRING', 'NULLABLE', 'Story title', (), None)
SchemaField('url', 'STRING', 'NULLABLE', 'Story url', (), None)
SchemaField('text', 'STRING', 'NULLABLE', 'Story or comment text', (), None)
SchemaField('dead', 'BOOLEAN', 'NULLABLE', 'Is dead?', (), None)
SchemaField('by', 'STRING', 'NULLABLE', "The username of the item's author.", (), None)
SchemaField('score', 'INTEGER', 'NULLABLE', 'Story score', (), None)
SchemaField('time', 'INTEGER', 'NULLABLE', 'Unix time', (), None)
SchemaField('timestamp', 'TIMESTAMP', 'NULLABLE', 'Timestamp for the unix time', (), None)
SchemaField('type', 'STRING', 'NULLABLE', 'Type of details (comment, comment_ranking, poll, story, job, pollopt)', (), None)
SchemaField('id', 'INTEGER', 'NULLABLE', "The item's unique id.", (), None)
SchemaField('parent', 'INTEGER', 'NULLABLE', 'Parent comment ID', (), None)
SchemaField('descendants', 'INTEGER', 'NULLABLE', 'Number of story or poll descendants', (), None)
SchemaField('ranking', 'INTEGER', 'N

### 6. View table rows
- Use the `list_rows()` method to view lines of the table


- This returns a `BigQuery RowIterator` object that can quickly be converted to a pandas DataFrame using the `to_dataframe()` method

In [7]:
## View the first 3 lines of the 'full' tale
client.list_rows(table, max_results=3).to_dataframe()

Unnamed: 0,title,url,text,dead,by,score,time,timestamp,type,id,parent,descendants,ranking,deleted
0,,,&gt; What about those murderous caravans from ...,,jevoten,,1644614470,2022-02-11 21:21:10+00:00,comment,30307128,30307030,,,
1,,,&gt;This probably also means people have to be...,,dTal,,1644614472,2022-02-11 21:21:12+00:00,comment,30307129,30306385,,,
2,,,Interesting. What services to you provide to h...,,1123581321,,1644614400,2022-02-11 21:20:00+00:00,comment,30307121,30305959,,,


### 7. View a specific column

- The `list_rows()` method can also retrieve information for a specific column

In [8]:
## View first 3 entires of text column
client.list_rows(table, selected_fields=table.schema[2:3], max_results=3).to_dataframe()

Unnamed: 0,text
0,&gt; What about those murderous caravans from ...
1,&gt;This probably also means people have to be...
2,Interesting. What services to you provide to h...
