# Qualitative Analysis of Optimal 2c and 3c Contracts

Table of Contents
- [Setup](#setup)
- [Conflict Deal vs. Optimal Contracts](#conflict-deal-vs-optimal-contracts)
- [Gains of Granularization of Consent and Content Resolutions](#conflict-deal-vs-optimal-contracts)
- [Characteristics of Contracts](#conflict-deal-vs-optimal-contracts)
- [honesty / prefs manigpp.>>???](#conflict-deal-vs-optimal-contracts)


## Intro and Data Preparation
In this section dependencies are imported and the csv file generated by the JS script is loaded into a pandas dataframe. Each row is based on a variation of user's and site's preferences for each issue (not on a resolution level!). For each combination, different contracts were calculated, based on a systematic variation of resolutions for consent and content. For a given combination of issue preferences, multiple contracts were calculated, varying in consent resolutions (2, 3, 6) or content resolutions (2/4). For each contract the score is included as a column. 

Assumptions
- Preferences for resolutions (e.g., analytics or 2 EUR) remain same as defined in descriptive analysis

Overview of variables

| Variable            | Description                                                              | Example    |
|---------------------|------------------------------------------------------------|------------|
| `user`        | The relevancies of a user persona (L,H,M)                                                           | 0.2        |
| `site`     | The relevancies of a site persona (L,H,M)                                                         | 0.5        |
| `default`     | The relevance score a user gave the issue 'content'                                                        | 0.3        |
| `score_CC`      | The score of the optimal contract. CCC (Cost, Consent, Content) in numbers from 2 to 5 express the number of resolutions. E.g. a 5 Cost options, 3 Consent options and 2 Content options result in a CCC of 532 | 6000       |
| `consent_CC`    | The consent granted for the contract of CCC                                                            | % |
| `content_CC`          | The agreed content of contract CCC                                                                        | 80     |


In [286]:
#imports
import pandas as pd

# load and present dataset 
two_c_df = pd.read_csv('./qualitative_2c.csv')

# csv creating script always puts one unnamed last column. removed here
two_c_df.drop(two_c_df.columns[-1], axis=1, inplace=True)
    
# from decimal relevancies to categorical relevancies
def toCategoricalRelevancies(text):
    text = text.replace("0.2", "L") # Issue is underweighted
    text = text.replace("0.5", "M") # Issues have same relevance
    text = text.replace("0.8", "H") # Issue is overweighted
    text = text.replace(" ", "")
    text = text.replace("MM", "M")
    return text

two_c_df['user'] = two_c_df['user'].apply(toCategoricalRelevancies)
two_c_df['site'] = two_c_df['site'].apply(toCategoricalRelevancies) 

# from categorical relevancies to personas
def toUserPersona(value):
     if value == 'M':
        return 'Balanced Brian'
     elif value == 'HL':
         return 'Privacy Priscilla'
     elif value == 'LH':
         return 'Content Connie & Tabloid Terry'
     
def toSitePersona(value):
     if value == 'M':
        return 'Balanced Brief'
     elif value == 'HL':
         return 'Tabloid Talker & Premium Press'
     elif value == 'LH':
         return 'Affordable News Network'
    
two_c_df['u_persona'] = two_c_df["user"].apply(toUserPersona)
two_c_df['s_persona'] = two_c_df["site"].apply(toSitePersona)

# Personas fit logically, e.g. Balanced Brian and Balanced Brief or Tabloid Terry and Tabloid Talker
two_c_df['persona_fit'] = ((two_c_df['u_persona'] == 'Balanced Brian') & (two_c_df['s_persona'] == 'Balanced Brief')) | \
                    ((two_c_df['u_persona'].isin(['Content Connie & Tabloid Terry', 'Tabloid Talker & Premium Press'])) & \
                     (two_c_df['s_persona'].isin(['Content Connie & Tabloid Terry', 'Tabloid Talker & Premium Press']))) | \
                    ((two_c_df['u_persona'] == 'Privacy Priscilla') & (two_c_df['s_persona'] == 'Affordable News Network'))


# from consent string to percentage
for name in ['consent_34', 'consent_64', 'consent_67']:
    two_c_df[name].fillna('0', inplace=True)


consent_options = 6

for name in ['consent_64', 'consent_67']:
    two_c_df[name] = two_c_df[name].apply(lambda value: 0 if value == '0' else(round(len(value.split())/consent_options*100,2)))
    two_c_df[name] = two_c_df[name].apply(lambda value: 'ALL' if value == 100.0 else ('REJECTED' if value == 0.0 else value))

consent_options = 3

for name in ['consent_34']:
    two_c_df[name] = two_c_df[name].apply(lambda value: 0 if value == '0' else(round(len(value.split())/consent_options*100,2)))
    two_c_df[name] = two_c_df[name].apply(lambda value: 'ALL' if value == 100.0 else ('REJECTED' if value == 0.0 else value))

# arrange order of columns
two_c_df = two_c_df[['user', 'site','u_persona', 's_persona', 'persona_fit', 'default', 'score_34', 'score_64', 'score_67', 
                   'consent_34', 'consent_64', 'consent_67', 'content_34', 'content_64', 'content_67']]

# show df
two_c_df

Unnamed: 0,user,site,u_persona,s_persona,persona_fit,default,score_34,score_64,score_67,consent_34,consent_64,consent_67,content_34,content_64,content_67
0,HL,HL,Privacy Priscilla,Tabloid Talker & Premium Press,False,900,3480,3480,3480,33.33,33.33,33.33,70,70,70
1,HL,LH,Privacy Priscilla,Affordable News Network,True,3600,7840,7840,7840,REJECTED,REJECTED,REJECTED,70,70,70
2,HL,M,Privacy Priscilla,Balanced Brief,False,2250,4950,4950,4950,33.33,16.67,16.67,70,70,70
3,LH,HL,Content Connie & Tabloid Terry,Tabloid Talker & Premium Press,True,600,7520,7520,7520,ALL,ALL,ALL,80,80,80
4,LH,LH,Content Connie & Tabloid Terry,Affordable News Network,False,2400,7392,7396,7396,33.33,50.0,50.0,70,70,70
5,LH,M,Content Connie & Tabloid Terry,Balanced Brief,False,1500,7200,7200,7200,ALL,ALL,ALL,70,70,70
6,M,HL,Balanced Brian,Tabloid Talker & Premium Press,False,750,4730,4730,4730,66.67,83.33,83.33,80,80,80
7,M,LH,Balanced Brian,Affordable News Network,False,3000,7600,7600,7600,REJECTED,REJECTED,REJECTED,70,70,70
8,M,M,Balanced Brian,Balanced Brief,True,1875,5250,5250,5250,33.33,16.67,16.67,70,70,70


## Conflict Deal vs. Optimal Contracts

Objective: 
- Are all Nash contracts better (higher scores) than the default/conflict deal?

Input Parameters:

- `default_score`: default

- `nash_score`: score_CC

Metrics:

- `gain` = $\frac{\text{default\_score}}{\text{nash\_score}}$

In [287]:
# Any contract should be better than the default/conflict deal
score_columns = ['score_34','score_64','score_67']

# Find the minimum value in each row for the selected columns
min_score = two_c_df[score_columns].min(axis=1)

# Minimum Gain Ratio
gain_ratios = min_score / two_c_df['default'] * 100
gain_ratios.name = 'relative gains in %'

# Print descriptive statistics
gain_ratios.describe().astype(int).to_frame()

Unnamed: 0,relative gains in %
count,9
mean,447
std,331
min,217
25%,253
50%,308
75%,480
max,1253


Observations
- Any optimal contract is better than the conflict deal (0 EUR, Rejected All, 50% Content)
- Median gain is 308%
- Minimal gain is 217%

# Characteristics of Contracts
Objective: 
- Distribution of Scores
- Consent or **Privacy Friendliness**
  - ??
- Content

Input Parameters

- `consent_34`       
- `consent_64`        
- `consent_67`        
      
Metrics
- count of consent (categorical) and content (numeric) resolutions
- relation of issues and cost/consent



## Scores
Distribution
Scores and Personas
Scores and Relevancies

In [288]:
two_c_df['score_34'].describe().astype(int).to_frame().drop(['count' ,'mean', 'std'])

Unnamed: 0,score_34
min,3480
25%,4950
50%,7200
75%,7520
max,7840


In [289]:
two_c_df.groupby('u_persona')['score_34'].describe().drop(['count','mean', 'std', '25%', '75%'], axis=1)

Unnamed: 0_level_0,min,50%,max
u_persona,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Balanced Brian,4730.0,5250.0,7600.0
Content Connie & Tabloid Terry,7200.0,7392.0,7520.0
Privacy Priscilla,3480.0,4950.0,7840.0


In [290]:
two_c_df.groupby('s_persona')['score_34'].describe().drop(['count','mean', 'std', '25%', '75%'], axis=1)

Unnamed: 0_level_0,min,50%,max
s_persona,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Affordable News Network,7392.0,7600.0,7840.0
Balanced Brief,4950.0,5250.0,7200.0
Tabloid Talker & Premium Press,3480.0,4730.0,7520.0


In [291]:
pivot_personas_scores = two_c_df.pivot_table(index='u_persona', columns='s_persona', values='score_34')

pivot_personas_scores.reset_index(inplace=True)

#print(pivot_personas_scores.to_string(index=False))

pivot_personas_scores

s_persona,u_persona,Affordable News Network,Balanced Brief,Tabloid Talker & Premium Press
0,Balanced Brian,7600,5250,4730
1,Content Connie & Tabloid Terry,7392,7200,7520
2,Privacy Priscilla,7840,4950,3480


In [292]:
# Calculate the mean score_64 for rows where persona_fit is False
persona_fit_true_stats = two_c_df[two_c_df['persona_fit'] == True]['score_64'].describe().drop(['count', 'mean', 'std'])
persona_fit_false_stats = two_c_df[two_c_df['persona_fit'] == False]['score_64'].describe().drop(['count', 'mean', 'std'])

# Concatenate the two DataFrames vertically
persona_fit_scores = pd.concat([persona_fit_true_stats, persona_fit_false_stats], axis=1)

# Rename the columns
persona_fit_scores.columns = ['persona_fit_True', 'persona_fit_False']

persona_fit_scores

Unnamed: 0,persona_fit_True,persona_fit_False
min,5250.0,3480.0
25%,6385.0,4785.0
50%,7520.0,6075.0
75%,7680.0,7347.0
max,7840.0,7600.0


## Consent
Does granted consent vary across

In [293]:
consent_counts_df = pd.concat([two_c_df['consent_34'], two_c_df['consent_64'], two_c_df['consent_67']]).value_counts().to_frame()
consent_counts_df = consent_counts_df.reset_index()
consent_counts_df.columns = ['Consent', 'Count']
consent_counts_df['Consent'] = pd.Categorical(consent_counts_df['Consent'], categories=['REJECTED', 16.67, 33.33, 50.0, 66.67, 83.33, 'ALL'], ordered=True)
consent_counts_df = consent_counts_df.sort_values(by='Consent')
consent_counts_df

Unnamed: 0,Consent,Count
1,REJECTED,6
3,16.67,4
0,33.33,6
4,50.0,2
6,66.67,1
5,83.33,2
2,ALL,6


In [294]:
pivot_personas_consent = two_c_df.pivot(index='u_persona', columns='s_persona', values='consent_64')

pivot_personas_consent.reset_index(inplace=True)

#print(pivot_personas_consent.to_string(index=False))

pivot_personas_consent


s_persona,u_persona,Affordable News Network,Balanced Brief,Tabloid Talker & Premium Press
0,Balanced Brian,REJECTED,16.67,83.33
1,Content Connie & Tabloid Terry,50.0,ALL,ALL
2,Privacy Priscilla,REJECTED,16.67,33.33


In [295]:
two_c_df.groupby('consent_64')['persona_fit'].mean()

consent_64
16.67       0.5
33.33       0.0
50.0        0.0
83.33       0.0
ALL         0.5
REJECTED    0.5
Name: persona_fit, dtype: float64

Observations
- if persona fit, then tradeoff on lower end (16.67) or full/rejected consent

In [296]:
# Calculate the mean score_64 for rows where persona_fit is False
persona_fit_true_stats = two_c_df[two_c_df['persona_fit'] == True]['consent_64'].value_counts()
persona_fit_false_stats = two_c_df[two_c_df['persona_fit'] == False]['consent_64'].value_counts()

# Concatenate the two DataFrames vertically
persona_fit_scores = pd.concat([persona_fit_true_stats, persona_fit_false_stats], axis=1)

# Rename the columns
persona_fit_scores.columns = ['persona_fit_True', 'persona_fit_False']

persona_fit_scores

Unnamed: 0,persona_fit_True,persona_fit_False
REJECTED,1.0,1
ALL,1.0,1
16.67,1.0,1
33.33,,1
50.0,,1
83.33,,1


## Content

In [297]:
content_counts_df = pd.concat([two_c_df['content_34'], two_c_df['content_64'], two_c_df['content_67']]).value_counts().to_frame()
content_counts_df = content_counts_df.reset_index()
content_counts_df.columns = ['Content', 'Count']
content_counts_df

Unnamed: 0,Content,Count
0,70,21
1,80,6


In [298]:
pivot_personas_content = two_c_df.pivot(index='u_persona', columns='s_persona', values='content_64')

pivot_personas_content.reset_index(inplace=True)

#print(pivot_personas_consent.to_string(index=False))

pivot_personas_content

s_persona,u_persona,Affordable News Network,Balanced Brief,Tabloid Talker & Premium Press
0,Balanced Brian,70,70,80
1,Content Connie & Tabloid Terry,70,70,80
2,Privacy Priscilla,70,70,70


# Granularization of Consent and Content Resolutions
Objective: 
- Does the granularization of consent and content resolutions lead to better deals compared to binary resolutions?
- Does the granularization of consent and content resolutions lead to different outcomes regarding consent or content?

Input Parameters
- Score_CC
- Consent_CC
- Content_CC

In [299]:
two_c_df[['score_34', 'score_64', 'score_67', 'consent_34','consent_64','consent_67','content_34','content_64','content_67',]]


Unnamed: 0,score_34,score_64,score_67,consent_34,consent_64,consent_67,content_34,content_64,content_67
0,3480,3480,3480,33.33,33.33,33.33,70,70,70
1,7840,7840,7840,REJECTED,REJECTED,REJECTED,70,70,70
2,4950,4950,4950,33.33,16.67,16.67,70,70,70
3,7520,7520,7520,ALL,ALL,ALL,80,80,80
4,7392,7396,7396,33.33,50.0,50.0,70,70,70
5,7200,7200,7200,ALL,ALL,ALL,70,70,70
6,4730,4730,4730,66.67,83.33,83.33,80,80,80
7,7600,7600,7600,REJECTED,REJECTED,REJECTED,70,70,70
8,5250,5250,5250,33.33,16.67,16.67,70,70,70


Observations
- Score is same in one row (Exception: row 4)
- Content is same in one row
- Consent Rejected/Accepted is same in one row
- Tradeoffs in consent can result in different tradeoffs for more consent options (e.g. row 4: 33.33 => 50.0 and row 0: 33.33 => 33.33)

Interpretation
- Granularization (more resolutions) does not lead to better contract scores
- The number of consent resolutions can change the consent tradeoff

In [300]:
consent_diffs_df = two_c_df[~two_c_df['consent_34'].isin(['ALL', 'REJECTED'])][['user', 'site', 'consent_34', 'consent_64']]
consent_diffs_df['diff']= consent_diffs_df['consent_34'] - consent_diffs_df['consent_64']

consent_34_stats = consent_diffs_df['consent_34'].astype(int).describe()

# Calculate descriptive statistics for 'consent_64' column
consent_64_stats = consent_diffs_df['consent_64'].astype(int).describe()

# Concatenate the two Series into a DataFrame
combined_stats_df = pd.concat([consent_34_stats, consent_64_stats], axis=1).drop(['std','count'])

# Transpose the DataFrame
combined_stats_df = combined_stats_df.T

combined_stats_df


Unnamed: 0,mean,min,25%,50%,75%,max
consent_34,39.6,33.0,33.0,33.0,33.0,66.0
consent_64,39.6,16.0,16.0,33.0,50.0,83.0


Observations
- Median and Mean remain same
- More consent options enable less and more consent