<div align="center">
<h1><img width="30" src="https://madewithml.com/static/images/rounded_logo.png">&nbsp;<a href="https://madewithml.com/">Made With ML</a></h1>
Applied ML · MLOps · Production
<br>
Join 30K+ developers in learning how to responsibly <a href="https://madewithml.com/about/">deliver value</a> with ML.
    <br>
</div>

<br>

<div align="center">
    <a target="_blank" href="https://madewithml.com"><img src="https://img.shields.io/badge/Subscribe-40K-brightgreen"></a>&nbsp;
    <a target="_blank" href="https://github.com/GokuMohandas/MadeWithML"><img src="https://img.shields.io/github/stars/GokuMohandas/MadeWithML.svg?style=social&label=Star"></a>&nbsp;
    <a target="_blank" href="https://www.linkedin.com/in/goku"><img src="https://img.shields.io/badge/style--5eba00.svg?label=LinkedIn&logo=linkedin&style=social"></a>&nbsp;
    <a target="_blank" href="https://twitter.com/GokuMohandas"><img src="https://img.shields.io/twitter/follow/GokuMohandas.svg?label=Follow&style=social"></a>
    <br>
    🔥&nbsp; Among the <a href="https://github.com/GokuMohandas/MadeWithML" target="_blank">top MLOps</a> repositories on GitHub
</div>

<br>
<hr>


# Data stack

This notebook complements the [data stack lesson](https://madewithml.com/courses/mlops/data-stack/) where we learn how to contrust a modern data stack for analytics and machine learning applications. All the concepts mentioned here are covered in much more detail and tied to software engineering best practices for building ML systems. So be sure to check out the [lesson](https://madewithml.com/courses/mlops/data-stack/) if you haven't already.

<div align="left">
<a target="_blank" href="https://madewithml.com/courses/mlops/testing/"><img src="https://img.shields.io/badge/📖 Read-lesson-9cf"></a>&nbsp;
<a href="https://github.com/GokuMohandas/data-stack/blob/main/data-stack.ipynb" role="button"><img src="https://img.shields.io/static/v1?label=&amp;message=View%20On%20GitHub&amp;color=586069&amp;logo=github&amp;labelColor=2f363d"></a>&nbsp;
<a href="https://colab.research.google.com/github/GokuMohandas/data-stack/blob/main/data-stack.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</div>

# Set up

Follow the instructions in our [data-stack repository](https://github.com/GokuMohandas/data-stack) to do the following:

1. Create data sources (CSV file) and destinations ([BigQuery](https://cloud.google.com/bigquery) [DWH](https://madewithml.com/courses/mlops/data-stack/#data-warehouse)).
2. Establish connections between data sources and destinations via [Airbyte](https://airbyte.com/).
3. Create a [dbt](https://www.getdbt.com/) repository to [transform](https://madewithml.com/courses/mlops/data-stack/#transform) our data in our DWH.

# Consumers

Once we have our transformed data in our data warehouse, it's ready for downstream [applications](https://madewithml.com/courses/mlops/data-stack/#applications) such as dashboarding, machine learning, etc. In this notebook, we'll establish a connection to our BigQuery DWH and make a call to extract data.

In [None]:
!pip install google-cloud-bigquery==1.21.0 -q

In [None]:
from google.cloud import bigquery
from google.colab import files
from google.oauth2 import service_account

In order to access our data warehouse (or any cloud service), we need a service account with the appropriate privileges. We already created a service account key JSON file earlier when we set up the DWH destination so we can just upload that here. Here are the steps to create a service account and retrieve the JSON file for Google BigQuery:

1. Go to [service accounts](https://console.cloud.google.com/iam-admin/serviceaccounts) under IAM & Admin in our Google Cloud console.
2. Click on our project (ex. `made-with-ml-XXXXXX`).
3. Press **+**`Create Service Account` at the top and give it a name and an `Owner` role.
4. Once the service account is created, we should see it on our [project](https://console.cloud.google.com/iam-admin/serviceaccounts?project=made-with-ml&supportedpurview=project) page. Click on `Actions` > `Manage keys`.
5. Create a new key by pressing `ADD KEY` > `Create new key` > `JSON`. This will download a JSON file to our local system, which we can upload here.

In [None]:
# Upload service account key JSON file
uploaded = files.upload()

In [None]:
!ls

made-with-ml-359923-9df280204d63.json  sample_data


In [None]:
# Replace these with your own values
PROJECT_ID = "made-with-ml-XXXXX" # REPLACE
SERVICE_ACCOUNT_KEY_JSON = "made-with-ml-XXXXXX-XXXXXXXXXXXX.json" # REPLACE

In [None]:
# Establish connection
credentials = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_KEY_JSON)
client = bigquery.Client(credentials=credentials, project=PROJECT_ID)

In [None]:
# Query data
query_job = client.query("""
   SELECT *
   FROM mlops_course.labeled_projects""")
results = query_job.result()
results.to_dataframe().head()

Unnamed: 0,id,created_on,title,description,tag
0,1994.0,2020-07-29 04:51:30,Understanding the Effectivity of Ensembles in ...,The report explores the ideas presented in Dee...,computer-vision
1,1506.0,2020-06-19 06:26:17,Using GitHub Actions for MLOps & Data Science,A collection of resources on how to facilitate...,mlops
2,807.0,2020-05-11 02:25:51,Introduction to Machine Learning Problem Framing,This course helps you frame machine learning (...,mlops
3,1204.0,2020-06-05 22:56:38,Snaked: Classifying Snake Species using Images,Proof of concept that it is possible to identi...,computer-vision
4,1706.0,2020-07-04 11:05:28,PokeZoo,A deep learning based web-app developed using ...,computer-vision


# Orchestration

Be sure to check out our [orchestration lesson](https://madewithml.com/courses/mlops/orchestration/) where we use our data stack to create our DataOps workflows. In that lesson, we programatically execute all the ELT operations using the data stack we established in this lesson.