Skip to content

An efficient Python tool to extract, consolidate, and export metadata from all tables in a Google Cloud BigQuery project, providing a comprehensive view of your data landscape.

Notifications You must be signed in to change notification settings

MilhosOU/BigQueryEagleView

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BigQuery Table Metadata Extraction Script

an-eagle-observing-an-herd-of-animals-while-flying-in-the-sky- (1)

Overview

This script serves a crucial role in understanding the state of data within a Google Cloud BigQuery project. It is designed to systematically explore all the datasets and their associated tables, retrieving key metadata.

Metadata includes but is not limited to:

  • Table and schema names
  • Creation time
  • Last update time
  • Row count
  • Storage size

The script collates all this information and compiles it into a CSV file. This high-level view of your BigQuery data landscape is invaluable for data governance, system optimization, auditing, and other strategic purposes.

How it works

The script uses the Google Cloud BigQuery client to iterate through each dataset within a given project. For each dataset, it runs a query to extract the metadata from the INFORMATION_SCHEMA.TABLES, __TABLES__, and __TABLES_SUMMARY__ views. All extracted data is then compiled into a pandas DataFrame, which is then exported to a CSV file.

Usage

Run the script from the command line, passing in your Google Cloud Project ID as an argument. For example:

python main.py your-project-id

Replace your-project-id with the ID of your actual Google Cloud project. The script will create a CSV file named bigquery_results.csv in the same directory, containing metadata for all tables in the project.

Dependencies

The script requires the following Python packages:

  • os
  • sys
  • pandas
  • tqdm
  • termcolor
  • google.cloud.bigquery

Ensure these are installed in your Python environment before running the script. You can install them with pip:

pip install pandas tqdm termcolor google-cloud-bigquery

Authorization

Before running the script, ensure that you're authenticated with Google Cloud and have the necessary permissions to access the BigQuery project and datasets. The script uses your default Google Cloud credentials. If you encounter a DefaultCredentialsError, follow the Google Cloud documentation to authenticate with a service account.

About

An efficient Python tool to extract, consolidate, and export metadata from all tables in a Google Cloud BigQuery project, providing a comprehensive view of your data landscape.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages