# Innovating AI-Driven Marketing Solutions

Hello, I'm Sanchay Thalnerkar, an AI Engineer with a relentless drive to explore the frontiers of artificial intelligence. This notebook is a culmination of cutting-edge AI techniques tailored to revolutionize the marketing and social media landscapes.

In this collaborative workspace, you'll find tools and methods that harness the power of AI to generate innovative marketing scenarios, fine-tune advanced language models, and ensure diversity in AI-generated content. Whether you're here to learn, contribute, or simply explore, this notebook is designed to inspire creativity and push the boundaries of what's possible with AI in marketing.

Let's dive in and create something extraordinary together.

In [None]:
import json
from typing import List , Dict
from together import Together
from dotenv import load_dotenv
import os
import time
import re
from ratelimit import limits , sleep_and_retry
from collections import Counter

# Loading the Together API Key from Google Colab userdata

### The API key is securely loaded from the Colab environment to ensure proper access to the Together API services.

In [None]:
from google.colab import userdata


TOGETHER_API_KEY = userdata.get('together')


# Initializing the Together API Client

### Setting up the Together API client with the retrieved API key.
### A rate-limited function is defined to handle API calls, ensuring that calls are within the allowed limits.
### The function uses the Meta LLaMA model for generating completions.

In [None]:
together = Together(api_key = TOGETHER_API_KEY )

@sleep_and_retry
@limits(calls = 1,period = 1)
def rate_limited_api_call(messages):
    return together.chat.completions.create(
        model="meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo",
        messages=messages,
        temperature=0.7,
    )


In [None]:
response  = rate_limited_api_call(messages)
print(response)

In [None]:
from rich.console import Console
from rich.markdown import Markdown

console = Console()

In [None]:
formatted = Markdown(response.choices[0].message.content.strip())

# Executing an API call to generate a response

# Using the defined `rate_limited_api_call` function to send a request to the Together model and store the response.

In [None]:
def get_model_responses(messages : List[Dict[str,str]] , max_retries : int = 3 , timeout : int=60 ) -> str:
  for attempt in range(max_retries):
    try:
      respone = rate_limited_api_call(messages)
      return Markdown(response.choices[0].message.content.strip())
    except Exception as e:
      console.print(f"[bold red] Error in the model response ( Attempt { attempt + 1} / {max_retries} ) : {e}  [/bold red]")
      if attempt < max_retries -1 :
        time.sleep(5)
  raise Exception( " Failed to get the response after repeated errors")


## Generating Multiple Marketing Scenarios

This function, `generate_multiple_marketing_samples`, creates a set number of diverse and innovative marketing scenarios.
Each scenario is formatted with an instruction, input, and response, making it suitable for fine-tuning AI models for marketing tasks.

In [None]:
def generate_multiple_marketing_samples(num_samples: int = 5) -> List[str]:
    system_message = f"""You are an elite marketing AI, tasked with generating {num_samples} exceptional, diverse, and innovative marketing scenarios. Each scenario must be unique and push the boundaries of creative marketing. Your goal is to create a dataset that will revolutionize AI-assisted marketing when used for fine-tuning.

    For each sample, provide:

    1. Instruction: A specific, challenging marketing task that requires creativity and expertise.
    2. Input: Detailed context, including company info, target audience, constraints, and goals.
    3. Response: An outstanding, creative solution that addresses the instruction and input.

    Guidelines:
    - Cover a wide range of industries: tech, fashion, food, finance, entertainment, healthcare, education, travel, etc.
    - Include various marketing channels: social media, email, content marketing, influencer partnerships, guerrilla marketing, AR/VR campaigns, etc.
    - Address different marketing objectives: brand awareness, lead generation, customer retention, product launch, crisis management, etc.
    - Incorporate current trends: sustainability, AI, personalization, interactive content, social causes, etc.
    - Consider various audience segments: Gen Z, millennials, seniors, B2B executives, niche hobbyists, global markets, etc.
    - Include challenging scenarios: limited budgets, tight deadlines, highly competitive markets, rebranding efforts, etc.
    - Explore innovative formats: interactive stories, user-generated content campaigns, multi-platform narratives, gamification, etc.

    Format each sample as follows:
    ### Instruction:
    [Concise, specific marketing task]

    ### Input:
    [Detailed context and requirements]

    ### Response:
    [Creative, comprehensive marketing solution]

    Separate each sample with ---

    Remember: Quality, creativity, and diversity are crucial. Each sample should be a masterclass in innovative marketing that would challenge and inspire even seasoned professionals."""

    user_message = f"Generate {num_samples} cutting-edge, diverse marketing scenarios that showcase the future of AI-assisted marketing creativity. Use the specified format, separating each sample with ---."

    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message},
    ]

    response = get_model_response(messages)
    return [sample.strip() for sample in response.split("---") if sample.strip()]


## Quality Check for Generated Samples

The `quality_check` function ensures that each generated marketing scenario meets specific quality standards.
It checks for minimum length and word diversity to maintain the dataset's overall quality.

In [None]:
def quality_check(sample: dict) -> bool:
    min_length = 50
    max_repetition = 0.3

    for key in ["instruction", "input", "response"]:
        text = sample[key]
        if len(text) < min_length:
            return False

        words = re.findall(r"\w+", text.lower())
        unique_words = set(words)
        if len(unique_words) / len(words) < (1 - max_repetition):
            return False

    return True



## Tracking Diversity Across Scenarios

The `track_diversity` function analyzes the diversity of the generated marketing samples by tracking key industry, channel, and objective keywords.
It provides a summary of the dataset's diversity, helping to ensure a wide range of marketing strategies are represented.

In [None]:
industry_keywords = [
    "tech",
    "fashion",
    "food",
    "finance",
    "entertainment",
    "healthcare",
    "education",
    "travel",
]
channel_keywords = [
    "social media",
    "email",
    "content marketing",
    "influencer",
    "guerrilla",
    "AR",
    "VR",
]
objective_keywords = [
    "brand awareness",
    "lead generation",
    "customer retention",
    "product launch",
    "crisis management",
]


def track_diversity(dataset: List[dict]) -> None:
    industries = Counter()
    channels = Counter()
    objectives = Counter()

    for sample in dataset:
        text = " ".join(
            [sample["instruction"], sample["input"], sample["response"]]
        ).lower()
        industries.update(word for word in industry_keywords if word in text)
        channels.update(word for word in channel_keywords if word in text)
        objectives.update(word for word in objective_keywords if word in text)

    console.print("[bold]Dataset Diversity:")
    console.print(f"Industries: {dict(industries)}")
    console.print(f"Channels: {dict(channels)}")
    console.print(f"Objectives: {dict(objectives)}")


## Creating a Fine-Tuning Dataset for Marketing & Social Media Content

This function, `create_finetuning_dataset`, is responsible for generating a large dataset for fine-tuning AI models specifically for marketing and social media content.
The process involves making multiple API calls to generate samples, ensuring each sample meets quality standards before being added to the dataset.
The progress is continuously saved, and diversity across different marketing scenarios is tracked and reported.

In [None]:
def create_finetuning_dataset(target_samples: int, output_file: str):
    console.print(
        Panel(
            f"Creating fine-tuning dataset for [bold]Marketing & Social Media Content[/bold]",
            expand=False,
        )
    )

    dataset = []
    samples_per_call = 50  # Reduced to respect token limits
    calls_made = 0
    max_calls = 200  # Increased to allow for more total samples

    with console.status("[bold green]Generating samples...") as status:
        while len(dataset) < target_samples and calls_made < max_calls:
            calls_made += 1
            status.update(
                f"API call {calls_made} (Dataset size: {len(dataset)}/{target_samples})"
            )

            samples = generate_multiple_marketing_samples(samples_per_call)

            for sample in samples:
                if len(dataset) >= target_samples:
                    break

                parts = sample.split("###")
                if len(parts) == 4:
                    instruction = parts[1].replace("Instruction:", "").strip()
                    input_text = parts[2].replace("Input:", "").strip()
                    response = parts[3].replace("Response:", "").strip()

                    sample_dict = {
                        "instruction": instruction,
                        "input": input_text,
                        "response": response,
                    }

                    if quality_check(sample_dict):
                        dataset.append(sample_dict)
                    else:
                        console.print("[yellow]Skipped low-quality sample")
                else:
                    console.print(
                        f"[yellow]Skipped malformed sample: {sample[:100]}..."
                    )

            # Save progress after each API call
            with open(output_file, "w") as f:
                json.dump(dataset, f, indent=2)

    console.print(
        f"[bold green]Dataset creation complete! Total samples: {len(dataset)}"
    )
    track_diversity(dataset)


## Executing the Fine-Tuning Dataset Creation Process

This cell executes the `create_finetuning_dataset` function to generate 1000 samples and saves the results in a JSON file named "marketing_social_media_dataset_v1.json".

In [None]:
create_finetuning_dataset(
        target_samples=1000,
        output_file="marketing_social_media_dataset_v1.json",
    )


In [None]:
%%capture
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

## Loading a Pre-Trained 4-bit Quantized Language Model

This cell demonstrates the process of loading a pre-trained language model using the `FastLanguageModel` class from the `unsloth` library.
The model is set up with specific configurations, such as a sequence length of 600 and 4-bit quantization, to optimize performance and reduce memory usage.
A variety of 4-bit quantized models are supported, allowing for faster downloads and avoiding out-of-memory issues.
In this example, the "Meta-Llama-3.1-8B" model is loaded, which is suitable for tasks requiring extensive token processing.

In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 600 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 15 trillion tokens model 2x faster!
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # We also uploaded 4bit for 405b!
    "unsloth/Mistral-Nemo-Base-2407-bnb-4bit", # New Mistral 12b 2x faster!
    "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
    "unsloth/mistral-7b-v0.3-bnb-4bit",        # Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3-mini-4k-instruct",          # Phi-3 2x faster!d
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

## Loading a Pre-Trained 4-bit Quantized Language Model

This cell demonstrates how to load a pre-trained language model using the `FastLanguageModel` class from the `unsloth` library.
The model is loaded with 4-bit quantization to reduce memory usage, making it efficient to work with on limited hardware.
A list of supported 4-bit models is provided, and the specific model "Meta-Llama-3.1-8B" is loaded for further use.

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

<a name="Data"></a>
### Data Prep
We now use the Alpaca dataset from [yahma](https://huggingface.co/datasets/yahma/alpaca-cleaned), which is a filtered version of 52K of the original [Alpaca dataset](https://crfm.stanford.edu/2023/03/13/alpaca.html). You can replace this code section with your own data prep.

**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).

**[NOTE]** Remember to add the **EOS_TOKEN** to the tokenized output!! Otherwise you'll get infinite generations!

If you want to use the `llama-3` template for ShareGPT datasets, try our conversational [notebook](https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing).

For text completions like novel writing, try this [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing).

In [None]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["response"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset = load_dataset('json', data_files='/content/marketing_social_media_dataset_v1.json', split='train')
dataset = dataset.map(formatting_prompts_func, batched=True)

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

In [None]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

In [None]:
trainer_stats = trainer.train()

In [None]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/drive/1T-YBVfnphoVc8E2E854qF3jdia2Ll2W2?usp=sharing)**

In [None]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Create a marketing campaign to promote the choclate bar", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

In [None]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Write a marketing campaign for zara.", # instruction
        "1, 1, 2, 3, 5, 8", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("lora_model") # Local saving
tokenizer.save_pretrained("lora_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [None]:
if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

# alpaca_prompt = You MUST copy from above!

inputs = tokenizer(
[
    alpaca_prompt.format(
        "Create a marketing campaign to promote the choclate bar", # instruction
        "Company : Cadbury , target audience : adults/boomers", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)