# Exploring the generated ranking from my initial dataset

This notebook explores the click model I trained on the first dump of data from the Google Analytics. This covers about 100 queries which received at least 1000 clicks during that time period.

The click model used is the Simplified Dynamic Bayesian Network. The goal of this notebook is to understand why the generated ranking seems worse than the original one.

First, import pandas and load the model parameters dataset:

In [4]:
import pandas as pd

In [5]:
df = pd.read_csv('data/pyclick-comparison/2018-04-26-model-uncertainty.csv')

In [8]:
df = df.set_index(['search_term_lowercase', 'result'])

My data uses content IDs, so use this data to translate into URLs and titles:

In [25]:
content = pd.read_csv('public_data/content_items.csv').set_index(['content_id'])
content.head()

Unnamed: 0_level_0,base_path,title
content_id,Unnamed: 1_level_1,Unnamed: 2_level_1
a6ac6905-080f-5cba-892b-c258529a07e4,/hmrc-internal-manuals/import-and-export-pipel...,Technical guidance: Control of imported goods
91c1ad27-cb10-508c-a862-20cb22bc485b,/hmrc-internal-manuals/import-and-export-pipel...,Approval: Specimen Letter
f4ac2cda-eff3-5c42-a2ac-2e71a33801a7,/hmrc-internal-manuals/vat-place-of-supply-tra...,Freight transport: definitions
fd0ae676-feb6-5318-a2af-72b6fe489083,/hmrc-internal-manuals/vat-place-of-supply-tra...,Freight transport: Place of Supply from 1 Janu...
4b0fcd96-88f3-5c6f-8f9e-e40e9b45207b,/hmrc-internal-manuals/excise-competent-offici...,Introduction: I want to exchange information ...


Now slice up the data by query, and rank based on estimated relevance + error.

When I initially tested the dataset, I forgot to incorporate the error, but it doesn't seem to have made a huge difference.

In [50]:
def ranking(term):
    results = df.loc[term, :]
    results['relevance_low'] = results.relevance - results.relevance_error
    results = results.merge(content, how='left', left_index=True, right_index=True)
    results = results.set_index('base_path')
    results.sort_values('relevance_low', ascending=False, inplace=True)
    results['rank'] = results.relevance_low.rank(ascending=False, method='min')
    return results

In [60]:
self_assessment = ranking('self assessment')

# How are the current top results affected by this ranking?

Overall the new ranking is significantly different from the current ranking. The mainstream guide is still ranked highly, but the other top results are different.


## Ranked higher than before

The HMRC guidance and collection of paper forms have gone from bottom of the ranking to the top.


The top 5 results are

1. [/government/collections/self-assessment-hmrc-manuals](https://www.gov.uk/government/collections/self-assessment-hmrc-manuals) - Internal HMRC guidance (the most detailed guidance there is)
2. [/log-in-register-hmrc-online-services](https://www.gov.uk/log-in-register-hmrc-online-services) - Tax account service via governement gateway (not the preferred path to the service but does the same thing)
3. [/government/collections/self-assessment-helpsheets-main-self-assessment-tax-return](https://www.gov.uk/government/collections/self-assessment-helpsheets-main-self-assessment-tax-return) - Paper forms (just a list of them)
4. [/self-assessment-tax-returns](https://www.gov.uk/self-assessment-tax-returns) - Mainstream guide, points to preferred start page but no clear call to action
5. [/self-assessment-forms-and-helpsheets](https://www.gov.uk/self-assessment-forms-and-helpsheets) - Guide to the different forms and when you need them. Gives online and paper versions

In [61]:
self_assessment.head()

Unnamed: 0_level_0,chosen,clicked,skipped,clicked_error,skipped_error,chosen_error,cov_clicked_skipped,examined,examined_error,cov_clicked_examined,...,attractiveness_error,cov_chosen_clicked,satisfyingness,satisfyingness_error,cov_attractiveness_satisfyingness,relevance,relevance_error,relevance_low,title,rank
base_path,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
/government/collections/self-assessment-hmrc-manuals,7.0,7.0,2.0,2.645751,1.414214,2.645751,1.924631,9.0,3.584587,8.686729,...,0.124738,6.989711,1.0,0.020493,0.001462,0.777778,0.11636,0.661418,Self Assessment: HMRC manuals,1.0
/log-in-register-hmrc-online-services,1703.0,1849.0,759.0,43.0,27.549955,41.267421,609.358503,2608.0,61.860464,2436.406732,...,0.006835,1771.890803,0.921038,0.001488,6e-06,0.652991,0.005758,0.647233,HMRC services: sign in or register,2.0
/government/collections/self-assessment-helpsheets-main-self-assessment-tax-return,19.0,19.0,11.0,4.358899,3.316625,4.358899,7.436296,30.0,6.698701,26.744565,...,0.058901,18.972072,1.0,0.012439,0.000419,0.633333,0.054779,0.578554,Self Assessment forms and helpsheets: main Sel...,3.0
/self-assessment-tax-returns,2010.0,2297.0,3208.0,47.927028,56.639209,44.833024,1396.307947,5505.0,91.09125,3998.759852,...,0.003654,2145.555242,0.875054,0.001623,3e-06,0.365123,0.002865,0.362258,Self Assessment tax returns,4.0
/self-assessment-forms-and-helpsheets,122.0,150.0,213.0,12.247449,14.59452,11.045361,91.943062,363.0,23.385597,262.338629,...,0.014201,135.078652,0.813333,0.008161,6.6e-05,0.336088,0.010011,0.326077,Self Assessment forms and helpsheets,5.0


The results that jumped up to the top are both based on < 20 views, because it's ranked so low down right now than not many people see them.

This introduces a couple of problems.

- Because we ignore clicks on the following page, we assume that these results were "chosen" even when the user could have kept browsing onto the next page. On hitting the bottom of the page, the user is also likely to revisit earlier links, or refine their query, which isn't accounted for in this model.
- Since we have much less data for these results, we have to account for uncertainty in the number of clicks (if we sampled again we might get a very different result)

However in this case:

- The error contribution decreased our confidence in low-volume results, but didn't actually affect the new ranking
- The satisfyingness was basically irrelevant - the ranking is also ordered by attractiveness

## Ranked lower than before



The current top result, [/log-in-file-self-assessment-tax-return](https://www.gov.uk/log-in-file-self-assessment-tax-return) does not appear in the top 5:

In [54]:
self_assessment.loc['/log-in-file-self-assessment-tax-return']

chosen                                                                            2814
clicked                                                                           3375
skipped                                                                           5284
clicked_error                                                                  58.0948
skipped_error                                                                  72.6911
chosen_error                                                                   53.0471
cov_clicked_skipped                                                            2172.21
examined                                                                          8659
examined_error                                                                 114.033
cov_clicked_examined                                                           6067.84
attractiveness                                                                0.389768
attractiveness_error                       

The reason is it has a very low clickthrough rate. Everyone examines it, but less than 40% of people click it

Even worse is current number 3: [/topic/personal-tax/self-assessment](https://www.gov.uk/topic/personal-tax/self-assessment)

In [55]:
self_assessment.loc['/topic/personal-tax/self-assessment']

chosen                                           522
clicked                                          641
skipped                                         2768
clicked_error                                 25.318
skipped_error                                52.6118
chosen_error                                 22.8473
cov_clicked_skipped                          685.166
examined                                        3409
examined_error                               69.1327
cov_clicked_examined                         1603.17
attractiveness                              0.188032
attractiveness_error                      0.00422129
cov_chosen_clicked                           577.598
satisfyingness                              0.814353
satisfyingness_error                      0.00393301
cov_attractiveness_satisfyingness        9.49416e-06
relevance                                   0.153124
relevance_error                           0.00307515
relevance_low                               0.

Less than 20% of people who see this click on it, which makes sense because the snippet is really vague:

#### Self Assessment
> List of information about Self Assessment.

It's a similar story for "Personal tax account: sign in or set up"

In [56]:
self_assessment.loc['/personal-tax-account']

chosen                                                                   161
clicked                                                                  169
skipped                                                                  623
clicked_error                                                             13
skipped_error                                                          24.96
chosen_error                                                         12.6886
cov_clicked_skipped                                                  166.906
examined                                                                 792
examined_error                                                       33.5531
cov_clicked_examined                                                 399.525
attractiveness                                                      0.213384
attractiveness_error                                              0.00890637
cov_chosen_clicked                                                   164.709

I assume that a lot of people don't know what a personal tax account is. If you just skim the titles it's not obvious that this is something you can use to file a self assessment.

# Another example: national minimum wage

In [76]:
min_wage = ranking('national minimum wage')
min_wage.loc[:, ['examined', 'attractiveness', 'attractiveness_error', 'satisfyingness', 'satisfyingness_error', 'relevance', 'relevance_error', 'relevance_low']]

Unnamed: 0_level_0,examined,attractiveness,attractiveness_error,satisfyingness,satisfyingness_error,relevance,relevance_error,relevance_low
base_path,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
/national-minimum-wage-rates,756.0,0.890212,0.014939,0.922734,0.00245,0.821429,0.012664,0.808764
/hmrc-internal-manuals/compliance-operational-guidance/cog14685,1.0,1.0,0.410019,1.0,0.054219,1.0,0.381614,0.618386
/government/publications/pay-and-work-rights-complaints,1.0,1.0,0.410019,1.0,0.054219,1.0,0.381614,0.618386
/government/collections/national-minimum-wage,39.0,0.538462,0.047181,0.809524,0.022124,0.435897,0.032868,0.403029
/guidance/tell-hmrc-if-youve-underpaid-national-minimum-wage-in-the-social-care-sector,2.0,0.5,0.201956,1.0,0.054219,0.5,0.187775,0.312225
/guidance/tell-hmrc-if-youve-underpaid-national-minimum-wage-in-the-social-care-sector.cy,2.0,0.5,0.201956,1.0,0.054219,0.5,0.187775,0.312225
/am-i-getting-minimum-wage,115.0,0.373913,0.024798,0.790698,0.016546,0.295652,0.016853,0.2788
/government/publications/national-minimum-wage-information-for-employers-nmw-fs1,6.0,0.333333,0.10702,1.0,0.038339,0.333333,0.100261,0.233072
/national-minimum-wage,63.0,0.253968,0.03218,0.8125,0.025067,0.206349,0.023104,0.183246
/government/publications/national-minimum-wage-code-of-best-practice-on-service-charges-tips-gratuities-and-cover-charges,4.0,0.25,0.127513,1.0,0.054219,0.25,0.120276,0.129724


For this query:

- A bunch of results are stilling getting ranked high with only one click (the error seems underestimated)
- The top result is unchanged
- The clickthrough rate on the other results is quite low

# Why are the metrics unchanged if the results are different?

The two metrics I have now both measure time saved from avoiding known bad results:

- **Saved clicks** counts clicks that the user wouldn't need to make in the new ranking - if the current chosen result appears higher up than other things the user clicked on before, then with the new ranking they'd stop examining before seeing them.

- **Change in rank** is the difference in rank of the current chosen result - if it now appears higher up in the ranking, the user doesn't have to examine as many results before finding it

## Number of saved clicks

**The limitation of this metric is it only tells half the story.** I measure known bad results that have been removed, but there could also be unknown bad results that have been added in above the chosen result.

### Results

In [92]:
test_set = pd.read_csv('data/pyclick-comparison/2018-04-26-test_set-uncertainty.csv')
test_set = test_set.set_index(['search_term_lowercase', 'id'])
test_set.saved_clicks.sum()

2308

In [88]:
test_set.loc['self assessment'].saved_clicks.value_counts()

0    2402
1     136
2      18
3       1
Name: saved_clicks, dtype: int64

In [87]:
test_set.loc['national minimum wage'].saved_clicks.value_counts()

0    233
1      3
Name: saved_clicks, dtype: int64

In [95]:
test_set.groupby('search_term_lowercase').sum().saved_clicks

search_term_lowercase
[postcode]                  6
apprenticeship              4
apprenticeships             2
ated                        7
attendance allowance       15
blue badge                  3
budgeting loan             45
car tax                    46
carers allowance            5
change address             31
change of address          45
child benefit               9
child tax credit           12
childcare                   8
childcare account           2
cis                         7
companies house             4
contact                     5
contact hmrc                6
contact number              2
corporation tax            35
council tax                19
dart charge                 9
dbs                        65
divorce                    15
driving licence            12
driving test                1
dvla                       10
esa                         2
exchange rates              6
                         ... 
sa100                      46
sa302             

## Change in rank

This is the metric I am trying to optimise. For every user in the test set I have a "chosen" result that I want to place as high as possible. The sum across all sessions is the difference in the amount of URLs all the users have to examine to find their chosen results.

**The limitation of this metric is that it ignores the fact that the user could choose a different result under the new ranking, that is better than or as good as the previously chosen result.**

For example, using the training data we might estimate that a result seen by a very small number of people is really relevant. But the metric doesn't even care about this result because so few people chose it.

**It's biased towards improvements in the results that are examined the most, i.e. rankings that are closer to the current ranking**

### Results
This metric has gotten worse with the new ranking.

In [118]:
test_set.change_in_rank.median()

-3.0

This is consistently worse for every query:

In [119]:
test_set.groupby('search_term_lowercase').median().change_in_rank

search_term_lowercase
[postcode]               -63.0
apprenticeship            -4.0
apprenticeships           -1.0
ated                      -4.0
attendance allowance      -1.0
blue badge                -2.0
budgeting loan            -2.0
car tax                   -4.0
carers allowance          -1.0
change address            -5.0
change of address         -8.0
child benefit             -2.0
child tax credit           1.0
childcare                 -6.0
childcare account         -1.0
cis                       -2.0
companies house           -2.0
contact                  -14.0
contact hmrc              -2.0
contact number            -3.0
corporation tax           -3.5
council tax               -7.0
dart charge               -1.0
dbs                       -2.0
divorce                   -9.0
driving licence           -4.0
driving test              -2.0
dvla                      -2.0
esa                       -2.0
exchange rates            -4.0
                          ... 
sa100            

For self assessment queries only:

In [98]:
test_set.loc['self assessment'].change_in_rank.median()

-2.0

In [117]:
test_set.loc['self assessment'].change_in_rank.value_counts()

-5.0     1002
-2.0      651
 2.0      549
-9.0      225
 4.0       55
-4.0       29
 5.0        9
 16.0       7
 7.0        7
 0.0        6
 6.0        5
-10.0       3
-8.0        2
 10.0       2
-1.0        2
 19.0       1
 11.0       1
 1.0        1
Name: change_in_rank, dtype: int64

The change in rank just depends on the final thing chosen. This is the distribution of that:

In [125]:
worse_sessions = test_set[test_set.change_in_rank < 0]
final_clicks = worse_sessions.loc['self assessment'].final_click_url.value_counts()
final_clicks.to_frame().merge(content, how='left', left_index=True, right_index=True)

Unnamed: 0,final_click_url,base_path,title
02f7b151-bcd4-462b-b5dd-b02445b4417c,2,/pay-tax-debit-credit-card,Pay your tax bill by debit or credit card
1b3d586c-538d-4983-b61a-c3a16c1efddb,1,/self-assessment-ready-reckoner,Budget for your Self Assessment tax bill if yo...
32b54f44-fca1-4480-b13b-ddeb0b0238e1,2,/estimate-self-assessment-penalties,Estimate your penalty for late Self Assessment...
40dab11d-12c3-4c55-a429-74138c6f7132,2,/understand-self-assessment-bill,Understand your Self Assessment tax bill
5ff12f59-7631-11e4-a3cb-005056011aef,2,/government/publications/self-assessment-regis...,Register for Self Assessment
5ff12f59-7631-11e4-a3cb-005056011aef,2,/government/publications/self-assessment-regis...,Cofrestru ar gyfer Hunanasesiad
6a2bf66e-2313-4204-afd5-9940de5e1d66,946,/log-in-file-self-assessment-tax-return,Register for and file your Self Assessment tax...
7beb97b6-75c9-4aa7-86be-a733ab3a21aa,189,/topic/personal-tax/self-assessment,Self Assessment
86f14e34-ba09-4e35-913e-af9e213cff2e,29,/contact-hmrc,Contact HMRC
999dd0f5-41d3-4a63-8bb9-f00a68c129a8,36,/pay-self-assessment-tax-bill,Pay your Self Assessment tax bill


# Different metrics

## Log likelihood of test data

This is one of the evaluation methods used by PyClick

- The model tells you the conditional click probabilities of each document clicked in each session
- Want to maximise the likelihood of all of the clicks in the test set ([PyClick code](https://github.com/markovi/PyClick/blob/master/pyclick/click_models/Evaluation.py#L47))
-  LL(Sessions | Model) = sum_{session in Sessions} log P(clicks in the session | Model)
- Optimising for log likelihood is the same as optimising for likelihood, but it's simpler because you can sum all the log probabilities instead of multiplying all the probabilities

## Perplexity
This is an alternative metric used by PyClick.

They reference a paper from [Dupret, Georges E. and Piwowarski, Benjamin](http://www.bpiwowar.net/wp-content/papercite-data/pdf/dupret2008a-user-browsing.pdf).

It's also based on click probabilities given the model is true. I'm going to ignore it for now.