<div style="display: flex; align-items: left;">
    <a href="https://sites.google.com/corp/google.com/genai-solutions/home?authuser=0">
        <img src="https://storage.googleapis.com/miscfilespublic/Linkedin%20Banner%20%E2%80%93%202.png" style="margin-right">
    </a>
</div>

In [1]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


# **Open Data QnA: Set up BigQuery Source**

---

This notebook shows how to copy a BigQuery public dataset to your GCP project 


This is accomplished through the three following steps:  
> i. Create a BigQuery dataset in your GCP project

> ii. Create a table in the above dataset

> iii. Copy data from the public dataset to the dataset on your project


## 🔗 **1. Connect Your Google Cloud Project**
Time to connect your Google Cloud Project to this notebook. 

In [2]:
#@markdown Please fill in the value below with your GCP project ID and then run the cell.
PROJECT_ID = input("Enter the project id (same as your SetUpVectorStore) to copy source data in bigquery for this solution")

# Quick input validation
assert PROJECT_ID, "⚠️ Please provide your Google Cloud Project ID"

# Configure gcloud.
!gcloud config set project {PROJECT_ID}
print(f'Project has been set to {PROJECT_ID}')
# !gcloud auth application-default set-quota-project {PROJECT_ID}

Updated property [core/project].
Project has been set to eight-p-o


## 🔐 **2. Authenticate to Google Cloud**
Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.

You can do this within Google Colab or using the Application Default Credentials in the Google Cloud CLI.

In [3]:
# Authentication step

"""Colab Auth""" 
# from google.colab import auth
# auth.authenticate_user()


"""Google CLI Auth"""
# !gcloud auth application-default login

"""Jupiter Notebook Auth"""
import google.auth
credentials, project_id = google.auth.default()
# credentials = google.auth.credentials.with_scopes_if_required(credentials)
# authed_http = google.auth.transport.requests.AuthorizedSession(credentials)

## ☁️ **Copy a Public Dataset to your GCP Project**

Conside the table from the public dataset to ask questions against. Copy that the needed table to local dataset so that.



In [4]:
# Details of source Dataset
BQ_SRC_PROJECT = "bigquery-public-data"
BQ_SRC_DATASET = "fda_food"
BQ_SRC_TABLE = "food_enforcement"
BQ_SRC_REGION= "us"

# Details of destination Dataset
BQ_DST_PROJECT = PROJECT_ID
BQ_DST_DATASET = "fda_food"
BQ_DST_TABLE = "food_enforcement"

In [5]:
def createBQDataset(bq_project_id, dataset_name,dataset_region):
    from google.cloud import bigquery
    import google.api_core 

    client=bigquery.Client(project=PROJECT_ID)

    dataset_ref = f"{bq_project_id}.{dataset_name}"

    try:
        client.get_dataset(dataset_ref)
        print("Destination Dataset exists")
    except google.api_core.exceptions.NotFound:
        print("Cannot find the dataset hence creating.......")
        dataset=bigquery.Dataset(dataset_ref)
        dataset.location=dataset_region
        client.create_dataset(dataset)
        
    return dataset_ref

def createBQTable(bq_project_id,dataset_name, table_name):
        from google.cloud import bigquery
        import google.api_core 

        client=bigquery.Client(project=PROJECT_ID)

        table_ref = client.dataset(dataset_name, project=bq_project_id).table(table_name)

        try:
            client.get_table(table_ref)
            print("Destination Table exists")
            
        except google.api_core.exceptions.NotFound:
            print("Cannot find the table hence creating.......")
            table = bigquery.Table(table_ref)
            client.create_table(table)

        return table_ref



In [6]:
#Create destination table and copy table data
from google.cloud import bigquery

client=bigquery.Client(project=PROJECT_ID)

dst_dataset_ref=createBQDataset(BQ_DST_PROJECT,BQ_DST_DATASET,BQ_SRC_REGION)

dst_table_ref=createBQTable(BQ_DST_PROJECT,BQ_DST_DATASET,BQ_DST_TABLE)

src_table_ref = client.dataset(BQ_SRC_DATASET, project=BQ_SRC_PROJECT).table(BQ_SRC_TABLE)

job_config = bigquery.CopyJobConfig(write_disposition="WRITE_TRUNCATE")

copy_job = client.copy_table(src_table_ref, dst_table_ref, job_config=job_config)



# Wait for the job to complete and check for errors
copy_job.result()  


Cannot find the dataset hence creating.......
Cannot find the table hence creating.......


CopyJob<project=eight-p-o, location=US, id=acc09426-c15f-4747-aaf5-f423f69fef54>

### If all the above steps are executed suucessfully, the Bigquery Public dataset should be copied to your GCP project