# Feature Backfill for Google Trends
**Goal of this notebook**

This notebook will backfill the feature groups containing google trends data and the flight data
* Supports backfill
* Produces daily features
* Is point-in-time correct
* Uploads features to Hopsworks Feature Store

**Imports & setup**

In [1]:
# Core
import pandas as pd
import numpy as np
from datetime import datetime, timedelta, date
import asyncio

# Google Trends
from pytrends.request import TrendReq

# Hopsworks
import hopsworks


## Google Trends Feature Backfilll

### Feature Config

In [2]:
# Search terms used as predictors
KEYWORDS = [
    "vikings",
    "fika",
    "stockholm",
    "ikea",
    "abba"
]

# Country code for Sweden
COUNTRY = "SE"

### Backfill Dates

In [3]:
# Backfill range (used only in backfill mode)
start_date = date(2019, 12, 31)
end_date   = date(2025, 12, 27)

print(f"Running feature pipeline from {start_date} to {end_date}")

Running feature pipeline from 2019-12-31 to 2025-12-27


### Fetch Google Trends Data

In [4]:
def fetch_google_trends_daily(keywords, start_date, end_date):
    pytrends = TrendReq(hl="en-US", tz=360)
    all_data = []

    start_date = pd.to_datetime(start_date)
    end_date = pd.to_datetime(end_date)

    while start_date < end_date:
        window_end = min(start_date + pd.Timedelta(days=89), end_date)

        pytrends.build_payload(
            kw_list=keywords,
            timeframe=f"{start_date:%Y-%m-%d} {window_end:%Y-%m-%d}",
            geo=COUNTRY
        )

        df = pytrends.interest_over_time()
        if not df.empty:
            df = df[~df["isPartial"]]
            all_data.append(df)

        start_date = window_end + pd.Timedelta(days=1)

    return pd.concat(all_data)

### Clean and resample to Daily Data

In [5]:
# Fetch raw data
raw_trends = fetch_google_trends_daily(KEYWORDS, start_date, end_date)

# Remove partial rows (important!)
raw_trends = raw_trends[raw_trends["isPartial"] == False]

# Drop metadata column
raw_trends = raw_trends.drop(columns=["isPartial"])

# Convert to daily frequency using forward-fill
daily_trends = (
    raw_trends
    .resample("D")
    .ffill()
    .reset_index()
)

# Ensure proper datetime + ordering BEFORE rolling
daily_trends["date"] = pd.to_datetime(daily_trends["date"], errors="coerce").dt.normalize()
daily_trends = daily_trends.sort_values("date")

daily_trends


TooManyRequestsError: The request failed: Google returned a response with code 429

In [7]:
daily_trends['city'] = "Märsta"

missing_days = pd.to_datetime([
    '2023-12-02',
    '2023-12-03',
    '2023-12-04'
])

daily_trends = daily_trends[~daily_trends['date'].isin(missing_days)].copy()

In [8]:
daily_trends.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2186 entries, 0 to 2188
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   date       2186 non-null   datetime64[ns]
 1   vikings    2186 non-null   int64         
 2   fika       2186 non-null   int64         
 3   stockholm  2186 non-null   int64         
 4   ikea       2186 non-null   int64         
 5   abba       2186 non-null   int64         
 6   city       2186 non-null   object        
dtypes: datetime64[ns](1), int64(5), object(1)
memory usage: 136.6+ KB


### Connect to Hopsworks

In [9]:
project = hopsworks.login()
fs = project.get_feature_store()

2026-01-09 12:03:24,501 INFO: Initializing external client
2026-01-09 12:03:24,502 INFO: Base URL: https://c.app.hopsworks.ai:443
2026-01-09 12:03:26,238 INFO: Python Engine initialized.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/1286325


### Create/get Feature Group

In [10]:
feature_group = fs.get_or_create_feature_group(
    name="google_trends_daily",
    version=1,
    primary_key=['city'],
    event_time="date",
    description="Daily Google Trends features for Sweden tourism prediction",
    online_enabled=False
)

### Write features to Hopsworks

In [11]:
feature_group.insert(
    daily_trends,
    write_options={"wait_for_job": True}
)

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/1286325/fs/1265794/fg/1911247


Uploading Dataframe: 100.00% |█| Rows 2186/2186 | Elapsed Time: 00:02 | Remaini


Launching job: google_trends_daily_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1286325/jobs/named/google_trends_daily_1_offline_fg_materialization/executions
2026-01-09 12:03:53,943 INFO: Waiting for execution to finish. Current state: SUBMITTED. Final status: UNDEFINED
2026-01-09 12:04:00,327 INFO: Waiting for execution to finish. Current state: RUNNING. Final status: UNDEFINED
2026-01-09 12:06:01,231 INFO: Waiting for execution to finish. Current state: AGGREGATING_LOGS. Final status: SUCCEEDED
2026-01-09 12:06:01,425 INFO: Waiting for log aggregation to finish.
2026-01-09 12:06:23,500 INFO: Execution finished successfully.


(Job('google_trends_daily_1_offline_fg_materialization', 'SPARK'), None)