# Getting Started

## BigQuery Commands

In [1]:
from google.cloud import bigquery

**Client** to bundle configuration needed for API requests.

In [2]:
client = bigquery.Client()

In BigQuery, each dataset is contained in a corresponding project. In this case, our `hacker_news` dataset is contained in the `bigquery-public-data project`.

To access the dataset:

1. We begin by constructing a reference to the dataset with the dataset() method. 
2. Next, we use the `get_dataset()` method, along with the reference we just constructed, to fetch the dataset.


In [3]:
# Reference to dataset
dataset_ref = client.dataset("hacker_news", project="bigquery-public-data")

# API request - Fetch the dataset
dataset = client.get_dataset(dataset_ref)

Every dataset is just a collection of tables. 

In [4]:
# List all the tables in the "hacker_news" dataset
tables = list(client.list_tables(dataset))

# Print names of all tables in the dataset
for table in tables: print(table.table_id)

comments
full
full_201510
stories


Similar to how we fetched a dataset, we can fetch a table.

In [5]:
# Reference to the 'full' table
table_ref = dataset_ref.table("full")

# API request - fetch the table
table = client.get_table(table_ref)

![image.png](https://i.imgur.com/biYqbUB.png)

---

## Table schema

The structure of a table is called its **schema**.

Each `SchemaField` tells us about a specific column (which we also refer to as a **field**).

1. The `name` of the column;
2. The `field type` (or datatype) in the column;
3. The `mode` of the column (`'NULLABLE'` means that a column allows `NULL` values, and is the `default`);
4. A `description` of the data in that column.


In [6]:
for schema_field in table.schema: print(schema_field)

SchemaField('title', 'STRING', 'NULLABLE', 'Story title', ())
SchemaField('url', 'STRING', 'NULLABLE', 'Story url', ())
SchemaField('text', 'STRING', 'NULLABLE', 'Story or comment text', ())
SchemaField('dead', 'BOOLEAN', 'NULLABLE', 'Is dead?', ())
SchemaField('by', 'STRING', 'NULLABLE', "The username of the item's author.", ())
SchemaField('score', 'INTEGER', 'NULLABLE', 'Story score', ())
SchemaField('time', 'INTEGER', 'NULLABLE', 'Unix time', ())
SchemaField('timestamp', 'TIMESTAMP', 'NULLABLE', 'Timestamp for the unix time', ())
SchemaField('type', 'STRING', 'NULLABLE', 'Type of details (comment, comment_ranking, poll, story, job, pollopt)', ())
SchemaField('id', 'INTEGER', 'NULLABLE', "The item's unique id.", ())
SchemaField('parent', 'INTEGER', 'NULLABLE', 'Parent comment ID', ())
SchemaField('descendants', 'INTEGER', 'NULLABLE', 'Number of story or poll descendants', ())
SchemaField('ranking', 'INTEGER', 'NULLABLE', 'Comment ranking', ())
SchemaField('deleted', 'BOOLEAN', 'NULL

We can use the `list_rows()` method to check just the first five lines of of the full table to make sure this is right. (Sometimes databases have outdated descriptions, so it's good to check.) 

This returns a BigQuery `RowIterator` object that can quickly be converted to a pandas DataFrame with the `to_dataframe()` method.

In [7]:
client.list_rows(table, max_results=5).to_dataframe()

Unnamed: 0,title,url,text,dead,by,score,time,timestamp,type,id,parent,descendants,ranking,deleted
0,,,"Lol, yes! Safari can sometimes feel like the n...",,osrec,,1505158465,2017-09-11 19:34:25+00:00,comment,15221546,15221525.0,,,
1,Uber Got Off Easy,https://jalopnik.com/uber-got-off-easy-1839948349,,,lawrenceyan,2.0,1574776920,2019-11-26 14:02:00+00:00,story,21638297,,0.0,,
2,,,Also those who review the open sourced code co...,,giancarlostoro,,1505735829,2017-09-18 11:57:09+00:00,comment,15275056,15274950.0,,,
3,,,Microsoft OneDrive for Business is moving in t...,,SREinSF,,1499058110,2017-07-03 05:01:50+00:00,comment,14685946,14685920.0,,,
4,,,"maybe, but any regular human being might not w...",,colorincorrect,,1580281613,2020-01-29 07:06:53+00:00,comment,22178198,22177697.0,,,


The `list_rows()` method will also let us look at just the information in a specific column. 

In [8]:
client.list_rows(table, selected_fields=table.schema[2:3], max_results=5).to_dataframe()

Unnamed: 0,text
0,"Lol, yes! Safari can sometimes feel like the n..."
1,
2,Also those who review the open sourced code co...
3,Microsoft OneDrive for Business is moving in t...
4,"maybe, but any regular human being might not w..."


---

## Exercise

In [9]:
from google.cloud import bigquery

client = bigquery.Client()
dataset_ref = client.dataset("chicago_crime", project="bigquery-public-data")
dataset = client.get_dataset(dataset_ref)

### 1) Count tables in the dataset

In [10]:
print(len(list(client.list_tables(dataset))))

1


### 2) Explore the table schema. How many columns in the crime table have TIMESTAMP data?

In [11]:
from functools import reduce

table_ref = dataset_ref.table("crime")
table = client.get_table(table_ref)

l_calc_timestamp = lambda prev, schemaField: prev + (1 if schemaField.field_type == 'TIMESTAMP' else 0)
print(reduce(l_calc_timestamp, [0, *list(table.schema)]))

2


### 3) Create a crime map

If you wanted to create a map with a dot at the location of each crime, what are the names of the two fields you likely need to pull out of the `crime` table to plot the crimes on a map?

`['x_coordinate', 'y_coordinate']` or `['latitude', 'longitude']`