# Fetch Offers Search Tool

## Introduction

This notebook aims to build a text-based search tool for Fetch's offer data. The tool allows users to search for offers by entering text queries related to categories, brands, or retailers. It also provides a similarity score to rank the search results.

**REQUIREMENTS**: Load the .csv

In [1]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

## Data Loading and Exploration

### Load Data


In [2]:
# Load the datasets
categories_df = pd.read_csv("categories.csv")
offer_retailer_df = pd.read_csv("offer_retailer.csv")
brand_category_df = pd.read_csv("brand_category.csv")

### Explore Data


In [3]:
# Display first few rows of each dataset
print(categories_df.head())
print(offer_retailer_df.head())
print(brand_category_df.head())

                            CATEGORY_ID             PRODUCT_CATEGORY  \
0  1f7d2fa7-a1d7-4969-aaf4-1244f232c175              Red Pasta Sauce   
1  3e48a9b3-1ab2-4f2d-867d-4a30828afeab  Alfredo & White Pasta Sauce   
2  09f3decc-aa93-460d-936c-0ddf06b055a3             Cooking & Baking   
3  12a89b18-4c01-4048-94b2-0705e0a45f6b             Packaged Seafood   
4  2caa015a-ca32-4456-a086-621446238783             Feminine Hygeine   

  IS_CHILD_CATEGORY_TO  
0          Pasta Sauce  
1          Pasta Sauce  
2               Pantry  
3               Pantry  
4    Health & Wellness  
                                               OFFER            RETAILER  \
0     Spend $50 on a Full-Priced new Club Membership           SAMS CLUB   
1       Beyond Meat® Plant-Based products, spend $25                 NaN   
2           Good Humor Viennetta Frozen Vanilla Cake                 NaN   
3  Butterball, select varieties, spend $10 at Dil...  DILLONS FOOD STORE   
4  GATORADE® Fast Twitch®, 12-ounce 1

# Data Preprocessing and Feature Engineering

### Text Preprocessing

In [4]:
# Convert text columns to lower case
offer_retailer_df['OFFER'] = offer_retailer_df['OFFER'].str.lower()
categories_df['PRODUCT_CATEGORY'] = categories_df['PRODUCT_CATEGORY'].str.lower()
brand_category_df['BRAND'] = brand_category_df['BRAND'].str.lower()

## NLP Model Development

### TF-IDF Vectorization

In [5]:
# Create a TF-IDF vectorizer
vectorizer = TfidfVectorizer()

# Fit and transform the 'OFFER' text data
tfidf_matrix = vectorizer.fit_transform(offer_retailer_df['OFFER'])

## Search Functionality

### Search by Category, Brand, and Retailer

In [6]:
def search_offers(query, search_type='CATEGORY'):
    # Convert the query to lower case
    query = query.lower()

    # Vectorize the query
    query_vector = vectorizer.transform([query])

    # Calculate cosine similarity
    similarity_scores = cosine_similarity(query_vector, tfidf_matrix)

    # Create a DataFrame to hold the results
    result_df = offer_retailer_df.copy()
    result_df['similarity_score'] = similarity_scores[0]

    # Filter by search_type if specified
    if search_type == 'CATEGORY':
        relevant_brands = brand_category_df[brand_category_df['BRAND_BELONGS_TO_CATEGORY'].str.lower() == query]['BRAND']
        result_df = result_df[result_df['BRAND'].isin(relevant_brands)]
    elif search_type == 'BRAND':
        result_df = result_df[result_df['BRAND'] == query]
    elif search_type == 'RETAILER':
        result_df = result_df[result_df['RETAILER'].str.lower() == query]

    # Sort by similarity score
    result_df = result_df.sort_values(by='similarity_score', ascending=False)

    return result_df[['OFFER', 'RETAILER', 'BRAND', 'similarity_score']]

## Scoring Mechanism

The scoring mechanism used here is cosine similarity, which measures the cosine of the angle between two vectors in a multidimensional space. The similarity score ranges from 0 to 1, where 1 means the vectors are identical.

## Conclusion and Next Steps

This notebook provides a simple yet effective text-based search tool for offers. Future improvements could include using more advanced NLP techniques and deploying the tool as a web application.

