# Intro

SQL is the programming language used with databases, and it is an important skill for any data scientist. You'll build your SQL skills in this course apply those skills using BigQuery, a database system that lets you apply SQL to huge datasets.

This lesson describes basics about connecting to the database and running your first SQL query. After you have a handle on these basics, we'll come back to build your SQL skills.


# Your First BigQuery Commands
We'll access BigQuery using a Python package called `bq_helper` that puts BigQuery results into Pandas DataFrames. This is valuable if you are familiar with Pandas. In case you aren't, we have a separate [Pandas course](https://www.kaggle.com/learn/pandas).

You can import`bq_helper` in the standard way.

In [None]:
import bq_helper

We also need to create a BigQueryHelper object pointing to a specific dataset. 

For now, we will give you the names of the datasets you will connect to. The current example uses a dataset of posts to HackerNews.

In [None]:
# create a helper object for our bigquery dataset
hacker_news = bq_helper.BigQueryHelper(active_project= "bigquery-public-data", 
                                       dataset_name = "hacker_news")

# Database Schemas

The structure of a dataset is called its **schema**.

We need to understand a database's schema to effectively pull out the data we want (called "querying the database"). The `BigQueryHelper.list_tables()` method lists the tables in the dataset. A table is composed of rows and columns, like a spreadsheet table. The database itself can hold multiple tables, much as a spreadsheet file can hold multiple tables.

In [None]:
# print a list of all the tables in the hacker_news dataset
hacker_news.list_tables()

Now that we know what tables are in this dataset, we can explore the columns in individual tables. In this example, we'll look at table called "full". Note that other data sets have different table names, so you will not always use "full." 

In [None]:
# print information on all the columns in the "full" table
# in the hacker_news dataset
hacker_news.table_schema("full")

Each SchemaField tells us about a specific column. In order, the information is:

* The name of the column
* The datatype in the column
* [The mode of the column](https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#schema.fields.mode) (NULLABLE means that a column allows NULL values, and is the default)
* A description of the data in that column

The first field has the SchemaField:

`SchemaField('by', 'string', 'NULLABLE', "The username of the item's author.",())`

This tells us 
- the field is called "by"
- the data in this field is strings 
- NULL values are allowed
- It contains the "username" of the item's author.

We can use the `BigQueryHelper.head()` method to check just the first couple of lines of of the "full" table to make sure this is right. (Sometimes databases out there have outdated description, so it's good to check.)

In [None]:
# preview the first couple lines of the "full" table
hacker_news.head("full")

The `BigQueryHelper.head()` method will also let us look at just the information in a specific column. If we want to see the first ten entries in the "by" column, for example, we can do that!

In [None]:
# preview the first ten entries in the by column of the full table
hacker_news.head("full", selected_columns="by", num_rows=10)

# Wrap Up
You've seen how to:
- Set up a helper function to access your database (`BigQueryHelper`)
- List the tables in your database (`list_tables`)
- Review the schema for any table (`table_schema`)
- Inspect the top few rows in a table (`head`)

You're about to get a chance to try these out. 

Before we go into the coding exercise, a quick disclaimer for those who already know some SQL:

**Each Kaggle user can scan 5TB every 30 days for free.  Once you hit that limit, you'll have to wait for it to reset.**

The commands you've seen so far won't demand a meaningful fraction of that limit. But some BiqQuery datasets are huge. So, if you already know SQL, wait to run `SELECT` queries until you've seen how to use your allotment effectively. If you are like most people reading this, you don't know how to write these queries yet, so you don't need to worry about this disclaimer.


# Your Turn
Practice the commands you've seen to **[Explore The Structure of a Dataset](#$NEXT_NOTEBOOK_URL$)** with crimes in the city of Chicago.