# Equity of Attention Experiment

## Setup the working environment

Requirements for your working environment
- Python >= 3.7
- Package requirements: pandas, numpy, scipy, matplotlib, scikit-learn, tensorflow

If on Google Collab
- GDrive storage requirements: ~1GB

IMPORTANT: Set the following variable to "locally" if running in own hardware or "collab" if on Google Collab

In [1]:
run_location = "locally"

### Install required packages
If on Google Collab, the only needed download is tensorflow-gpu, the rest of packages are already installed

In [2]:
if run_location == "locally":
    !pip install -r ../requirements.txt



### Settings up in GDrive (only on Collab)

In [3]:
if run_location == "collab":
    from google.colab import drive
    drive.mount('/content/gdrive')

In [4]:
if run_location == "collab":
    %cd /content/gdrive/My Drive/

In [5]:
if run_location == "collab":
    !git clone https://github.com/crojascampos/equity-of-attention.git

In [6]:
if run_location == "collab":
    %cd equity-of-attention

In [7]:
if run_location == "collab":
    # Not needed in Google Colab (already installed), but just in case
    ! pip install matplotlib
    ! pip install numpy
    ! pip install pandas
    #! pip install scikit-learn
    #! pip install scipy
    ! pip install mip
    # Needed
    ! pip install tensorflow-gpu

In [8]:
if run_location == "collab":
    %cd ./notebooks

### Import packages

In [9]:
import sys
import os
import math

sys.path.append(os.path.join('..'))

In [10]:
import pandas as pd
import numpy as np

In [11]:
from models.ilpbased_eoa import ILPBasedEOA

### Create folders for saving pre-computed results

We will define the subfolders in **./data** where we will store our pre-computed results. For each dataset:

- *data/outputs/splits* will include two csv files including the train and test interactions, according with the selected train-test split rule. 
- *data/outputs/instances* will include a csv file with instances to be fed to the model, either pairs for point-wise or triplets for pair-wise recommenders.
- *data/outputs/models* will include a h5 file associated with a pre-trained recommender model.  
- *data/outputs/predictions* will include a numpy file representing a user-item matrix; a cell stores the relevance score of an item for a given user.
- *data/outputs/metrics* will include a pickle dictionary with the computed evaluation metrics for a given recommender model. 

**N.B.** This strategy will allow us to play with the intermediate outputs of the pipeline, without starting from scratch any time (e.g., for performing a bias treatment as a post-processing, we just need to load the predictions of a model to start). 

In [12]:
data_path = '../data'

## Step 2: Load and understand the Airbnb dataset

In [13]:
airbnb_city = "Boston"
airbnb_dataset = 'airbnb_' + airbnb_city + '_listings'

In [14]:
data = pd.read_csv(os.path.join(data_path, 'datasets/' + airbnb_dataset + '.csv'), encoding='utf8')
data.columns

Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'name', 'description',
       'neighborhood_overview', 'picture_url', 'host_id', 'host_url',
       'host_name', 'host_since', 'host_location', 'host_about',
       'host_response_time', 'host_response_rate', 'host_acceptance_rate',
       'host_is_superhost', 'host_thumbnail_url', 'host_picture_url',
       'host_neighbourhood', 'host_listings_count',
       'host_total_listings_count', 'host_verifications',
       'host_has_profile_pic', 'host_identity_verified', 'neighbourhood',
       'neighbourhood_cleansed', 'neighbourhood_group_cleansed', 'latitude',
       'longitude', 'property_type', 'room_type', 'accommodates', 'bathrooms',
       'bathrooms_text', 'bedrooms', 'beds', 'amenities', 'price',
       'minimum_nights', 'maximum_nights', 'minimum_minimum_nights',
       'maximum_minimum_nights', 'minimum_maximum_nights',
       'maximum_maximum_nights', 'minimum_nights_avg_ntm',
       'maximum_nights_avg_ntm', 'calendar_upd

In [15]:
test_data = data['review_scores_rating'].fillna(0).to_numpy()
test_data

array([4.95, 4.77, 4.79, ..., 0.  , 0.  , 0.  ])

In [16]:
test = ILPBasedEOA(test_data, 0, 5)

In [17]:
test.prepare(0.5, 5, 100, 1 * math.pow(10, -7))

In [18]:
test.start(1)

Prefiltering...
[ 778  937 2561  938 2562]
[ 778  937 2561  938 2562    4    0    2    3    1]
[7.780e+02 9.370e+02 2.561e+03 9.380e+02 2.562e+03 4.000e+00 0.000e+00
 2.000e+00 3.000e+00 1.000e+00]
Optimizing... iteration 0, ranking 0
Welcome to the CBC MILP Solver 
Version: devel 
Build Date: Nov 15 2020 

Starting solution of the Linear programming relaxation problem using Primal Simplex

Coin0506I Presolve 200 (0) rows, 10000 (0) columns and 20000 (0) elements
Clp1000I sum of infeasibilities 5.27584e-09 - average 2.63792e-11, 4 fixed columns
Coin0506I Presolve 200 (0) rows, 9996 (-4) columns and 19992 (-8) elements
Clp0000I Optimal - objective value 0
Clp0000I Optimal - objective value 0
Coin0511I After Postsolve, objective 0, infeasibilities - dual 0 (0), primal 0 (0)
Clp0000I Optimal - objective value 0
Clp0000I Optimal - objective value 0
Clp0000I Optimal - objective value 0
Clp0032I Optimal objective 0 - 0 iterations time 0.112, Idiot 0.11

Starting MIP optimization
Cgl0004I pro

In [19]:
grid = ""
for i, val in enumerate(test.mip_model.vars):
    grid += "1 " if val.x >= 0.99 else "0 "
    if (i + 1) % test.pref_n == 0:
        grid += "\n"
print(grid)

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 