# Automatic Generation of Bid Phrases Review

Textual advertising is the most popular online advertising method on the internet. To create a textual ad, an advertiser has to think of a relevant <i>creative</i>(textual representation of the ad) that redirects to a landing page. Landing page is the place where advertisers introduce their product to the users. Also, advertiser has to build relevant bid phrases for the <i>creative</i>. Those bid phrases are important for search engines to relate a query to the ad. Since advertisers aim is to reach as many users as possible, choosing many bid phrases is a good strategy. Also, it is critical that these bid phrases are relevant to the product, othervise users probably don't click on ads or even if they click, they probably do not buy the product. 


One of the problems of generating bid phrases is that an ad campaign might have too many products, and for each product advertiser has to generate as many relevant bid phrases as possible. Since this process is done manually, it costs too much time and energy to the advertiser. In this paper, authors suggest an algorithm that given a landing page of a product, it would automatically create <i>creative</i> and related bid phrases to that <i>creative</i>. Currently, there are some tools create bid phrases from a given seed bid phrases. These tools are called "keyword suggestion tools." The problem with these tools is that they are more prone to topic drift. Topic drift happens when there are very few seed phrases that the tool generates words that are not relevant to product anymore. That's why advertisers have to put more seed word to get more relevant keywords. However, this would create more manual work for advertisers. The suggested method in this paper tries to overcome this problem without asking any seed phrases. 

Creating bid phrases from landing pages is troublesome because most of the time description text in the landing pages do not have enough information to build bid phrases. Authors find out that almost 96% of the landing pages has at least one bid phrase that does not occur on the landing page. Authors conclude that it is because the language model of the landing pages is not same with the language model of their bid terms.

## Algorithm

Let's call a set of bid phrases $b$ and a set of landing pages $l$. To calculate the probability of a bid phrese given landing page $P(b|l)$:

\begin{align}
P(b|l) = \frac{P(l|b)p(b)}{P(l)}
\end{align}

where $P(l|b)$ it the probability of generating $l$ given $b$, p(b) is the language model of bid phases. 

We can calculate $P(l|b)$ like this:

\begin{align}
P(l|b) \propto \prod_{j} \sum_{i} t(l_j|b_i)
\end{align}

where $t(l_j|b_i)$ is the probability of $l_j$ being generated from $b_i$. Since some words are located in more important places like title or heading of the page, we add weight on those terms like:

\begin{align}
P(l|b) \propto \prod_{j} [\sum_{i} t(l_j|b_i)]^{w_{j}}
\end{align}

Since most of the queries are bigram(like cheap mp3, new car), we assume the bid phrases can be modeled with bigrams.
Let's calculate P(b):

\begin{align}
P(b) = \prod_i P(b_i|b_{i-1})
\end{align}

To eliminate zero values, we smooth unigram model with bigram model.

\begin{align}
    \lambda_1 + \lambda_2 = 1, \hspace{40mm}\\
    P(b_i|b_{i-1}) \approx \lambda_1 P(b_i) + \lambda_2 P(b_i|b_{i-1})
\end{align}

The algorithm chooses the terms that have the most weight $w_j$ and for each of them, it takes n most likely translation. From these translations, the algorithm creates all possible permutations as a candidate phrase.

## Experiments

### Data

The data is obtained from Yahoo! Ad corpus. In order to eliminate skewed data, they sampled at most five advertisements from each domain. For those ads, the number bid phrases on average is nine. For preprocessing, stopwords are eliminated, all characters are converted to lowercase, strings are tokenized and, weight measure is used to eliminate unnecessary words.

To calculate weight of a word:

\begin{align}
    w = \frac{weight_{tag}\text{ x }f_j}{log(N_d)} 
\end{align}

where $weight_{tag}$ is predetermined value of a tag. Essential tags like &lt;title&gt;, &lt;h1&gt;, &lt;keywords&gt; has $weight_{tag}$ value of 10, otherwise the value is 1. $f_j$ is the frequency of word and $N_d$ is the number of documents contains that word. This weight is used to filter out the words that have weight value w less than 0.5.

### Evaluation

In order to evaluate system authors get "gold standard" bid phrases from advertisers. It is called "gold standard" to differentiate it from candidate bid phrases. Since the main problem involves creating phrases, it is not possible to use precision or recall measures. That's why authors used two evaluation criteria. First one is normalized edit distance. Edit distance in this context is the total number of words required to remove or add to convert best candidate same to a gold standard bid phrase. For normalization, we divide edit distance with the number of words in the candidate bid phrase.

\begin{align}
    ED(b, b^*) = \frac{\text{# of operation to covert b into } b^*}{\text{#of words in b}}
\end{align}

where $b$ is candidate term, and $b^*$ is gold standard bid phrase. This calculation is done for all gold standard bid phrases, and the minimum score is going to be evaluation criteria.

Second criteria is rouge-1. It is similar to Jaccard distance, however instead of dividing by the number of words in the intersection, we divide by the total number of words in gold standard bid phrases.

\begin{align}
    ROUGE-1(b,b^*) = \frac{\sum_{b^* \epsilon B^*} \text{#of words in b $\cap$ $b^*$}}{\sum_{b^* \epsilon B^*} \text{#of words in $b^*$}}
\end{align}
<br>

## Discussion

Compared to CMS, baseline and, discriminative system, proposed model gives the best result for both edit distance and ROUGE-1 measures. Contribution of this paper is using translatin model to generate candidate bid phrases and discovering two evaluation criteria to benchmark "keyword suggestion tools". The problem with the evaluation is that, they did not use cross validation for testing. Also, choosing the minimum normalized edit distance is bit strange, since it is comparing only the <b>weakest</b> candidate bid tearms. However if we choose <i>average</i> or <i>median</i> of the normalized edit distance values we could have more accurate evaluation metric.

