In [2]:
### Notebook Imports

import pandas as pd
import numpy as np
import random
from datetime import datetime, timedelta

from IPython.display import display

### Set Notebook Parameters
pd.set_option('display.max_columns', None)

<div style="text-align: center;">
    <img src="images/feature_store_step.png" alt="Feature Store" style="width: 1000px;"/>
</div>"

### Understanding [BOS] and [EOS] Tokens:

In the context of sequence-to-sequence models and tokenization processes, `[BOS]` and `[EOS]` are special tokens that play a significant role. Let's explore these tokens in more detail:

- `[BOS]` (Beginning of Sequence): This token signifies the start of a sequence. It's used to indicate the initiation point for the model to process or generate a sequence of tokens. When encoding text, `[BOS]` serves as a marker that the subsequent text represents the beginning of the input or target sequence.

- `[EOS]` (End of Sequence): This token marks the conclusion of a sequence. It's employed to denote the endpoint where the model should halt token generation or processing for a particular sequence. In tokenized text, `[EOS]` indicates that the sequence concludes at that specific point.

For instance, in the context of text generation:

- When generating a sequence of words using a sequence-to-sequence model like BERT or BART, `[BOS]` may be used to start and `[EOS]` to conclude the generated text.
- While tokenizing a sentence like "Hello, world," it could be encoded as `[BOS] Hello , world [EOS]`.

These tokens provide vital context to the models, allowing them to comprehend the sequence structure and create coherent text accordingly. In the the below markdown examples, `[BOS]` and `[EOS]` are employed to frame the swipe type information within the sequence for both BERT and BART.

<div style="text-align: center;">
    <img src="images/pairy4.jpg" alt="Feature Store" style="width: 300px;"/>
</div>

## Interaction Features for BERT and BART Models:

When utilizing interaction data for recommendation systems with BERT and BART models, it's important to craft meaningful features from the swipe interactions, aligning with the models' ability to process text sequences. In the context of Pairy's swipe interactions, the following features can be generated:

1. **Swipe Type:**
   - Encode the type of swipe action as tokens in the sequence.
   - For instance, `[BOS] Swipe Right [EOS]`, `[BOS] Swipe Left [EOS]`.

2. **Swipe Content:**
   - Incorporate a concise description of the content associated with the interaction.
   - Example: `[BOS] Liked a fashion post [EOS]`, `[BOS] Disliked a travel post [EOS]`.

3. **Content Details:**
   - Integrate supplementary details such as hashtags or keywords pertaining to the content.
   - Example: `[BOS] Liked a fashion post with #summerfashion [EOS]`.

4. **Timestamp:**
   - Include the timestamp of the interaction to capture temporal patterns.
   - Example: `[BOS] Liked a fashion post on 2023-08-21 [EOS]`.

5. **User Profile:**
   - Feature the user's profile information to provide context to the interaction.
   - Example: `[BOS] User A liked a fashion post [EOS]`.

6. **Influencer/Brand Profile:**
   - Include the profile of the influencer or brand related to the interaction.
   - Example: `[BOS] Liked a post by Influencer X [EOS]`.

7. **Aggregated Interaction Summary:**
   - Aggregate a user's interactions over a period into a summary.
   - Example: `[BOS] User A engaged with 10 fashion posts in the last week [EOS]`.

8. **Combined Features:**
   - Combine multiple features to encapsulate intricate interactions.
   - Example: `[BOS] User A liked a fashion post with #summerfashion by Influencer X on 2023-08-21 [EOS]`.

These interaction features are tokenized using the respective BERT and BART tokenizers. Each tokenized sequence is then input into the models for further processing, allowing the models to grasp the nuances of user interactions and preferences. By incorporating timestamp information, user profiles, and interaction particulars, the models can capture the context and furnish more precise and tailored recommendations.

Remember that the selection of features hinges on the richness of the interaction data and the degree of personalization desired. Experimentation and refinement will assist in identifying the most effective features for your recommendation engine.


## Features for Database Schema

Below is a table outlining various features for a database schema following the BERT and BART format features. The features are categorized based on different aspects of user interactions and preferences. Please note that this table is a comprehensive representation and can be adapted according to our specific recommendation system's needs.

| Category            | Feature               | Description                                            | Example                           |
|---------------------|-----------------------|--------------------------------------------------------|-----------------------------------|
| Interaction         | Swipe Type            | Type of swipe action: right (like) or left (dislike). | Swipe Right                       |
| Interaction         | Swipe Content         | Description of the content associated with the interaction. | Liked a fashion post          |
| Interaction         | Content Details       | Additional details about the content, such as hashtags or keywords. | #summerfashion                    |
| Interaction         | Timestamp             | Timestamp of the interaction to capture temporal patterns. | 2023-08-21 15:30:00               |
| User Profile        | User Gender           | Gender of the user.                                  | Female                            |
| User Profile        | User Age              | Age of the user.                                     | 25                                |
| User Profile        | User Location         | Location or region of the user.                      | New York, USA                     |
| User Profile        | User Interests        | Interests or preferences indicated by the user.       | Fashion, Travel                   |
| Influencer Profile  | Influencer Name       | Name of the influencer or brand.                     | Influencer X                      |
| Influencer Profile  | Influencer Category   | Category or niche of the influencer's content.        | Fashion, Beauty                   |
| Influencer Profile  | Influencer Followers  | Number of followers or subscribers of the influencer. | 100,000                           |
| Influencer Profile  | Influencer Engagement | Level of engagement (likes, comments) on the influencer's content. | High                              |
| User Feedback       | User Ratings          | Ratings or scores given by the user for content.      | 4.5/5                             |
| User Feedback       | User Comments         | Comments or feedback provided by the user on content. | "Love the fashion tips!"          |
| User Feedback       | User Preferences      | User's expressed preferences or requirements for recommendations. | Interested in eco-friendly brands |
| Aggregated Summary  | Weekly Interactions   | Number of interactions by the user in the past week. | 20 interactions                   |
| Aggregated Summary  | Monthly Likes         | Number of likes given by the user in the past month. | 50 likes                          |


### SQL Table: `user_interactions`

This SQL table captures various interaction features, user profiles, influencer profiles, user feedback, and aggregated summaries. The table is designed to store data related to user interactions and preferences for building a recommendation engine. Here's the SQL code to create the table structure:

```sql
CREATE TABLE user_interactions (
    interaction_id INT PRIMARY KEY,
    user_id INT,
    swipe_type VARCHAR(10),
    swipe_content VARCHAR(255),
    content_details VARCHAR(255),
    interaction_timestamp TIMESTAMP,
    user_gender VARCHAR(10),
    user_age INT,
    user_location VARCHAR(100),
    user_interests VARCHAR(255),
    influencer_name VARCHAR(100),
    influencer_category VARCHAR(100),
    influencer_followers INT,
    influencer_engagement VARCHAR(20),
    user_rating DECIMAL(3, 2),
    user_comment TEXT,
    user_preferences VARCHAR(255),
    weekly_interactions INT,
    monthly_likes INT,
    PRIMARY KEY (interaction_id)
);

In [6]:
# Generate 10 realistic dummy interactions
dummy_data = generate_realistic_dummy_interactions(10)
for interaction in dummy_data:
    print(interaction)

{'swipe_type': 'Swipe Left', 'swipe_content': 'Liked a Fashion post', 'content_details': '#Travel', 'interaction_timestamp': datetime.datetime(2023, 8, 28, 3, 15, 52, 358308), 'user_gender': 'Female', 'user_age': 20, 'user_location': 'New York, Japan', 'user_interests': 'Food', 'influencer_name': 'Influencer 51', 'influencer_category': 'Travel', 'influencer_followers': 41894, 'influencer_engagement': 'Low', 'user_rating': 4.02, 'user_comment': 'Awesome!', 'user_preferences': 'Food, Fashion, Tech, Travel', 'weekly_interactions': 19, 'monthly_likes': 52}
{'swipe_type': 'Swipe Right', 'swipe_content': 'Liked a Fashion post', 'content_details': '#Food', 'interaction_timestamp': datetime.datetime(2023, 8, 3, 3, 15, 52, 358358), 'user_gender': 'Female', 'user_age': 39, 'user_location': 'New York, Japan', 'user_interests': 'Fitness, Food', 'influencer_name': 'Influencer 31', 'influencer_category': 'Fashion', 'influencer_followers': 44419, 'influencer_engagement': 'Medium', 'user_rating': 3.58

In [11]:
import random
from datetime import datetime, timedelta

def generate_realistic_dummy_interactions(num_interactions, model):
    """
    Generate realistic dummy interactions for the "Pairy" app.

    Parameters:
    num_interactions (int): The number of interactions to generate.
    model (str): The model type ("BERT" or "BART").

    Returns:
    list: A list of dictionaries representing interactions with various features.
    """
    interactions = []
    swipe_types = ['Swipe Right', 'Swipe Left']
    genders = ['Male', 'Female', 'Other']
    interests = ['Fashion', 'Travel', 'Food', 'Fitness', 'Tech']
    categories = ['Fashion', 'Beauty', 'Fitness', 'Lifestyle', 'Travel']
    engagements = ['Low', 'Medium', 'High']
    
    for interaction_id in range(1, num_interactions + 1):
        user_id = random.randint(1, 100)  # Generate a random user ID
        interaction = {
            'interaction_id': interaction_id,
            'user_id': user_id,
            'swipe_type': f"[BOS] {random.choice(swipe_types)} [EOS]",
            'swipe_content': f"[BOS] Liked a {random.choice(interests)} post [EOS]",
            'content_details': f"[BOS] #{random.choice(interests)} [EOS]",
            'interaction_timestamp': datetime.now() - timedelta(days=random.randint(1, 30)),
            'user_gender': f"[BOS] {random.choice(genders)} [EOS]",
            'user_age': f"[BOS] {random.randint(18, 45)} [EOS]",
            'user_location': f"[BOS] {random.choice(['New York', 'Los Angeles', 'London', 'Tokyo'])}, {random.choice(['USA', 'UK', 'Japan'])} [EOS]",
            'user_interests': f"[BOS] {', '.join(random.sample(interests, random.randint(1, len(interests))))} [EOS]",
            'influencer_name': f"[BOS] Influencer {random.randint(1, 100)} [EOS]",
            'influencer_category': f"[BOS] {random.choice(categories)} [EOS]",
            'influencer_followers': f"[BOS] {random.randint(1000, 50000)} [EOS]",
            'influencer_engagement': f"[BOS] {random.choice(engagements)} [EOS]",
            'user_rating': f"[BOS] {round(random.uniform(3, 5), 2)} [EOS]",
            'user_comment': f"[BOS] {random.choice(['Great content!', 'Awesome!', 'Not my style.'])} [EOS]",
            'user_preferences': f"[BOS] {', '.join(random.sample(interests, random.randint(1, len(interests))))} [EOS]",
            'weekly_interactions': f"[BOS] {random.randint(5, 30)} [EOS]",
            'monthly_likes': f"[BOS] {random.randint(20, 100)} [EOS]"
        }
        interactions.append(interaction)
    
    return interactions


In [12]:
# Generate 10 realistic dummy interactions for BERT
dummy_data_bert = generate_realistic_dummy_interactions(10, "BERT")
print("Dummy Data for BERT:")
for interaction in dummy_data_bert:
    print(interaction)

print("\n")

# Generate 10 realistic dummy interactions for BART
dummy_data_bart = generate_realistic_dummy_interactions(10, "BART")
print("Dummy Data for BART:")
for interaction in dummy_data_bart:
    print(interaction)
    

Dummy Data for BERT:
{'interaction_id': 1, 'user_id': 23, 'swipe_type': '[BOS] Swipe Right [EOS]', 'swipe_content': '[BOS] Liked a Tech post [EOS]', 'content_details': '[BOS] #Tech [EOS]', 'interaction_timestamp': datetime.datetime(2023, 8, 12, 3, 24, 8, 816168), 'user_gender': '[BOS] Female [EOS]', 'user_age': '[BOS] 18 [EOS]', 'user_location': '[BOS] Tokyo, USA [EOS]', 'user_interests': '[BOS] Tech [EOS]', 'influencer_name': '[BOS] Influencer 57 [EOS]', 'influencer_category': '[BOS] Lifestyle [EOS]', 'influencer_followers': '[BOS] 15518 [EOS]', 'influencer_engagement': '[BOS] Low [EOS]', 'user_rating': '[BOS] 3.31 [EOS]', 'user_comment': '[BOS] Great content! [EOS]', 'user_preferences': '[BOS] Travel, Fitness [EOS]', 'weekly_interactions': '[BOS] 24 [EOS]', 'monthly_likes': '[BOS] 43 [EOS]'}
{'interaction_id': 2, 'user_id': 46, 'swipe_type': '[BOS] Swipe Right [EOS]', 'swipe_content': '[BOS] Liked a Travel post [EOS]', 'content_details': '[BOS] #Travel [EOS]', 'interaction_timestamp'

In [14]:
dummy_data_bert

[{'interaction_id': 1,
  'user_id': 23,
  'swipe_type': '[BOS] Swipe Right [EOS]',
  'swipe_content': '[BOS] Liked a Tech post [EOS]',
  'content_details': '[BOS] #Tech [EOS]',
  'interaction_timestamp': datetime.datetime(2023, 8, 12, 3, 24, 8, 816168),
  'user_gender': '[BOS] Female [EOS]',
  'user_age': '[BOS] 18 [EOS]',
  'user_location': '[BOS] Tokyo, USA [EOS]',
  'user_interests': '[BOS] Tech [EOS]',
  'influencer_name': '[BOS] Influencer 57 [EOS]',
  'influencer_category': '[BOS] Lifestyle [EOS]',
  'influencer_followers': '[BOS] 15518 [EOS]',
  'influencer_engagement': '[BOS] Low [EOS]',
  'user_rating': '[BOS] 3.31 [EOS]',
  'user_comment': '[BOS] Great content! [EOS]',
  'user_preferences': '[BOS] Travel, Fitness [EOS]',
  'weekly_interactions': '[BOS] 24 [EOS]',
  'monthly_likes': '[BOS] 43 [EOS]'},
 {'interaction_id': 2,
  'user_id': 46,
  'swipe_type': '[BOS] Swipe Right [EOS]',
  'swipe_content': '[BOS] Liked a Travel post [EOS]',
  'content_details': '[BOS] #Travel [EOS]