<a href="https://colab.research.google.com/github/UlrikeDetective/code/blob/main/BigQuery_Sandbox_and_DataFrames.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## BigQuery Sandbox Setup

In [1]:
# @title Authenticate with Google Account
from google.colab import auth
auth.authenticate_user()

In [3]:
# @title Create a GCP Project
!gcloud projects create big-query-sandbox-colab

Create in progress for [https://cloudresourcemanager.googleapis.com/v1/projects/big-query-sandbox-colab].
Enabling service [cloudapis.googleapis.com] on project [big-query-sandbox-colab]...
Operation "operations/acat.p2-869278308248-5d3ed056-47e5-4aa2-ba58-8ff926151de2" finished successfully.




---



## Sample Data Science Workload

In [5]:
# @title Query a BigQuery Public Dataset

%%bigquery --project big-query-sandbox-colab
SELECT
  *
FROM
  `bigquery-public-data.noaa_gsod.gsod2025`
TABLESAMPLE system (0.1 percent)

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,stn,wban,date,year,mo,da,temp,count_temp,dewp,count_dewp,...,flag_min,prcp,flag_prcp,sndp,fog,rain_drizzle,snow_ice_pellets,hail,thunder,tornado_funnel_cloud
0,010014,99999,2025-01-01,2025,01,01,28.4,4,20.3,4,...,*,0.00,I,999.9,0,0,0,0,0,0
1,010020,99999,2025-06-01,2025,06,01,28.1,4,21.4,4,...,,0.00,I,999.9,0,0,0,0,0,0
2,010020,99999,2025-03-05,2025,03,05,14.4,4,10.0,4,...,,0.00,I,999.9,0,0,0,0,0,0
3,010020,99999,2025-05-02,2025,05,02,16.3,4,11.3,4,...,,0.00,I,999.9,0,0,0,0,0,0
4,010020,99999,2025-04-26,2025,04,26,7.0,4,1.8,4,...,,0.00,I,999.9,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
146667,A51256,00451,2025-05-12,2025,05,12,66.4,24,51.0,24,...,*,99.99,,999.9,0,1,0,0,0,0
146668,A51256,00451,2025-05-15,2025,05,15,78.2,24,66.6,24,...,*,0.00,I,999.9,0,0,0,0,0,0
146669,A51256,00451,2025-06-20,2025,06,20,79.2,24,66.9,24,...,*,99.99,,999.9,0,1,0,0,0,0
146670,A51256,00451,2025-01-25,2025,01,25,40.5,24,15.5,24,...,*,0.00,I,999.9,0,0,0,0,0,0


In [6]:
# @title Create BigQuery DataFrame with weather data

import bigframes.pandas as bpd
import pandas as pd

# Set the BigQuery project for your BigFrames session
bpd.options.bigquery.project = 'big-query-sandbox-colab'
bpd.options.bigquery.location = 'US'

# Read data from BigQuery using a query with a filter condition
# wban = '23174' = Los Angeles
weather = bpd.read_gbq("SELECT * FROM `bigquery-public-data.noaa_gsod.gsod202*` WHERE wban = '23174'")

In [7]:
# @title Get summary statistics, computed remotely in BigQuery

description = weather['temp'].describe()
print(description)

count       2061.0
mean     62.574915
std       5.836044
min           46.1
25%           58.0
50%           62.4
75%           66.9
max           86.5
Name: temp, dtype: Float64


In [8]:
# @title Train a model to predict whether temperature will be >65 degrees

from bigframes.ml.linear_model import LogisticRegression
from bigframes.ml.model_selection import train_test_split

# Define features and target
features = weather[['date', 'dewp', 'wdsp']]
target = weather['temp'] > 65

# Split data and train a model. This all runs in BigQuery.
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2)

model = LogisticRegression()
model.fit(X_train, y_train)

# Run inference and get the model's accuracy score
model.score(X_test, y_test)


Unnamed: 0,precision,recall,accuracy,f1_score,log_loss,roc_auc
0,0.72028,0.762963,0.825243,0.741007,0.621279,0.846344


In [9]:
# @title Generate sample data to run inference

sample_data = pd.DataFrame({
    'date': pd.to_datetime(['2026-01-01', '2026-07-25', '2026-09-15']).date,
    'dewp': [55.4, 65.1, 67.8],
    'wdsp': ['1.5', '8.2', '16.1'],
})

sample_data_bf = bpd.DataFrame(sample_data)

In [10]:
# @title Predict whether it'll be >65 degrees

# Get predictions from the model trained in BigQuery
predictions = model.predict(sample_data_bf)

print("Predictions for new data (temp > 65°F):")
print(predictions[['date','predicted_temp']])

Predictions for new data (temp > 65°F):


         date  predicted_temp
0  2026-01-01            True
1  2026-07-25            True
2  2026-09-15            True

[3 rows x 2 columns]
