# Equity of Attention Experiment

## Setup the working environment

Requirements for your working environment
- Python >= 3.7
- Package requirements: pandas, numpy, scipy, matplotlib, scikit-learn, tensorflow

If on Google Collab
- GDrive storage requirements: ~1GB

IMPORTANT: Set the following variable to "locally" if running in own hardware or "collab" if on Google Collab

In [31]:
run_location = "locally"

### Install required packages
If on Google Collab, the only needed download is tensorflow-gpu, the rest of packages are already installed

In [32]:
if run_location == "locally":
    !pip install -r ../requirements.txt

ady satisfied: requests-unixsocket in c:\users\carlos\documents\programming\python\equity-of-attention\.venv\lib\site-packages (from jupyter-server~=1.4->jupyterlab->-r ../requirements.txt (line 1)) (0.2.0)


### Settings up in GDrive (only on Collab)

In [33]:
if run_location == "collab":
    from google.colab import drive
    drive.mount('/content/gdrive')

In [35]:
if run_location == "collab":
    %cd /content/gdrive/My Drive/

In [36]:
if run_location == "collab":
    !git clone https://github.com/crojascampos/equity-of-attention.git

In [37]:
if run_location == "collab":
    %cd equity-of-attention

In [38]:
if run_location == "collab":
    # Not needed in Google Colab (already installed), but just in case
    ! pip install matplotlib
    ! pip install numpy
    ! pip install pandas
    ! pip install scikit-learn
    ! pip install scipy
    # Needed
    ! pip install tensorflow-gpu

In [39]:
if run_location == "collab":
    %cd ./notebooks

### Import packages

In [40]:
import sys
import os

sys.path.append(os.path.join('..'))

In [41]:
import pandas as pd
import numpy as np  

In [42]:
import matplotlib.pyplot as plt
%matplotlib inline

In [43]:
# Load extra modules here

### Create folders for saving pre-computed results

We will define the subfolders in **./data** where we will store our pre-computed results. For each dataset:

- *data/outputs/splits* will include two csv files including the train and test interactions, according with the selected train-test split rule. 
- *data/outputs/instances* will include a csv file with instances to be fed to the model, either pairs for point-wise or triplets for pair-wise recommenders.
- *data/outputs/models* will include a h5 file associated with a pre-trained recommender model.  
- *data/outputs/predictions* will include a numpy file representing a user-item matrix; a cell stores the relevance score of an item for a given user.
- *data/outputs/metrics* will include a pickle dictionary with the computed evaluation metrics for a given recommender model. 

**N.B.** This strategy will allow us to play with the intermediate outputs of the pipeline, without starting from scratch any time (e.g., for performing a bias treatment as a post-processing, we just need to load the predictions of a model to start). 

In [44]:
data_path = '../data'

In [45]:
!mkdir "../data/outputs"
!mkdir "../data/outputs/splits"
!mkdir "../data/outputs/instances"
!mkdir "../data/outputs/models"
!mkdir "../data/outputs/predictions"
!mkdir "../data/outputs/metrics"

A subdirectory or file ../data/outputs already exists.
A subdirectory or file ../data/outputs/splits already exists.
A subdirectory or file ../data/outputs/instances already exists.
A subdirectory or file ../data/outputs/models already exists.
A subdirectory or file ../data/outputs/predictions already exists.
A subdirectory or file ../data/outputs/metrics already exists.


## Step 2: Load and understand the Airbnb dataset

In [46]:
dataset = 'airbnb_Boston_listings'

In [47]:
data = pd.read_csv(os.path.join(data_path, 'datasets/' + dataset + '.csv'), encoding='utf8')

In [48]:
data.columns

Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'name', 'description',
       'neighborhood_overview', 'picture_url', 'host_id', 'host_url',
       'host_name', 'host_since', 'host_location', 'host_about',
       'host_response_time', 'host_response_rate', 'host_acceptance_rate',
       'host_is_superhost', 'host_thumbnail_url', 'host_picture_url',
       'host_neighbourhood', 'host_listings_count',
       'host_total_listings_count', 'host_verifications',
       'host_has_profile_pic', 'host_identity_verified', 'neighbourhood',
       'neighbourhood_cleansed', 'neighbourhood_group_cleansed', 'latitude',
       'longitude', 'property_type', 'room_type', 'accommodates', 'bathrooms',
       'bathrooms_text', 'bedrooms', 'beds', 'amenities', 'price',
       'minimum_nights', 'maximum_nights', 'minimum_minimum_nights',
       'maximum_minimum_nights', 'minimum_maximum_nights',
       'maximum_maximum_nights', 'minimum_nights_avg_ntm',
       'maximum_nights_avg_ntm', 'calendar_upd