Este notebook é baseado no ```./experimentsDictGenerator.ipynb```, porém seu foco está apenas nos experimentos de transferência com dados de origem ruidosos considerando o domínio NELL Sports. Em nossos últimos experimentos, notamos que o NELL Sports é sensível ao ruído adicionado ao domínio de origem e aos hiperparâmetros de nosso método. Porém, nossa metodologia parece apresentar uma falha, pois cada iteração da validação cruzada realiza apenas uma divisão aleatória entre domínios de origem e de destino e gera apenas um ruído. Sendo assim, os resultados podem estar dependentes da divisão e do ruído gerado. Para reduzir esse problema, rodamos novos experimentos com diferentes `randomSeed` a fim gerar novas divisões e ruídos a cada iteração. Isso aumenta significativamente a quantidade de experimentos a serem realizados e esse é o motivo pelo qual focamos inicialmente apenas no NELL Sports. Se for necessário, realizaremos as mesmas análises para os outros experimentos. Para não alterar o que já tá funcionando no notebook ```./experimentsDictGenerator.ipynb```, realizamos essas análises preliminares neste novo notebook.

In [1]:
import os
import json
import itertools
import numpy as np

import sys
sys.path.append("..")
from utils.experiment import loadDatabase
from utils.utils import getHashFromDict

In [2]:
DATA_PATH = "../data/preprocessed"

# **Transfer with Noisy Source**

In this experiment, we perform transfer learning from a noisy source to a target domain. To control the noise intensity, we build both target and source sets from the same dataset. This allow us to bypass the challenge of finding a good mapping. Before cobining the source and target data, we randomly add, remove or change the types of the relations on the source. 

In [3]:
randomSeeds = list(range(15))

In [4]:
def getExperimentID(experimentDict):
    experimentID = getHashFromDict(experimentDict)
    return experimentID

In [5]:
commonFixedParams = {
    "numberOfClauses": 8,
    "numberOfCycles": 100,
    "maxTreeDepth": 3,
    "nEstimators": 10,
    "nodeSize": 2,
    "negPosRatio": 2,
    "maxFailedNegSamplingRetries": 50,
    "ignoreSTDOUT": True,
    "trainNSplits": 5,
    "trainSourceSplits": 4
}

In [6]:
# Only NELL Sports is handled in this preliminary experiments. We want to check whether the randomness affect our last results or not

datasetParams = [
    # # NELL Finances
    # {
    #     "databasePath": f"{DATA_PATH}/nell_finances",
    #     "targetPredicate": None, # Default: companyeconomicsector/2
    #     "resetTargetPredicate": False,      
    #     "useRecursion": True
    # }, 

    # # Yeast
    # {
    #     "databasePath": f"{DATA_PATH}/yeast",
    #     "targetPredicate": None, # Default: proteinclass/2
    #     "resetTargetPredicate": False,
    #     "useRecursion": True
    # },

    # NELL Sports
    {
        "databasePath": f"{DATA_PATH}/nell_sports",
        "targetPredicate": None, # Default: teamplayssport/2
        "resetTargetPredicate": False,
        "useRecursion": True
    },

    # # Cora
    # {
    #     "databasePath": f"{DATA_PATH}/cora",
    #     "targetPredicate": None, # Default: samevenue/2
    #     "resetTargetPredicate": False,
    #     "useRecursion": False
    # },

    # # UWCSE
    # {
    #     "databasePath": f"{DATA_PATH}/uwcse",
    #     "targetPredicate": None, # Default: advisedby/2
    #     "resetTargetPredicate": False,
    #     "useRecursion": False
    # },

    # # Twitter
    # {
    #     "databasePath": f"{DATA_PATH}/twitter",
    #     "targetPredicate": None, # Default: accounttype/2
    #     "resetTargetPredicate": False,       
    #     "useRecursion": True
    # },

    # # IMDB
    # {
    #     "databasePath": f"{DATA_PATH}/imdb",
    #     "targetPredicate": None, # Default: workedunder/2
    #     "resetTargetPredicate": False,
    #     "useRecursion": False
    # },
]

In [7]:
def getNextModelParams():
    utilityAlphaValues = [0, 0.3, 0.6, 1, 1.3]
    utilityAlphaList = [
        {
            "sourceUtilityAlpha": sourceAlpha,
            "targetUtilityAlpha": targetAlpha
        } for sourceAlpha, targetAlpha in itertools.product(utilityAlphaValues, utilityAlphaValues)
    ]
    
    utilityAlphaSetIterList = [{"utilityAlphaSetIter": iteration} for iteration in [1]]

    weightList = [
        {
            "weight": {
                "strategy": "balancedInstanceGroupUniform",
                "parameters": {
                    "balanceStrength": 0 # It is equivalent to scalar weighting schema
                }
            }
        },
        {
            "weight": {
                "strategy": "balancedInstanceGroupUniform",
                "parameters": {
                    "balanceStrength": 1
                }
            }
        },
        {
            "weight": {
                "strategy": "balancedInstanceGroupUniform",
                "parameters": {
                    "balanceStrength": 0.5
                }
            }
        }
    ]

    noiseStrengthValues = [(1e-5)*(2**i) for i in range(0, 15)]
    noiseStrengthList = [{"noiseStrength": strength} for strength in noiseStrengthValues]

    paramsGrid = [
        {
            **utilityParams,
            **utilityAlphaSetIterList,
            **weightParams, 
            **noiseStrengthParams
        } for utilityParams, utilityAlphaSetIterList, weightParams, noiseStrengthParams in itertools.product(
            utilityAlphaList, 
            utilityAlphaSetIterList,
            weightList,
            noiseStrengthList
        )
    ]

    for params in paramsGrid:
        yield params

In [8]:
experimentSettingJSONsBasePath = "./experimentSettingJSONs/noisyTransferLearning-onlyNELLSports"
os.makedirs(experimentSettingJSONsBasePath, exist_ok = True)

for randomSeed in randomSeeds:
    experiments = []
    EXPERIMENTS_BASE_PATH = f"./experiments/noisyTransferLearning-onlyNELLSports/noisyTransferLearning-randomSeed={randomSeed}"
    for params in datasetParams:    
        for paramsGrid in getNextModelParams():
            experimentDict = {
                **commonFixedParams, 
                **params,
                **paramsGrid
            }

            experimentDict.update({
                "path": EXPERIMENTS_BASE_PATH,
                "randomSeed": randomSeed
            })
            experimentID = getExperimentID(experimentDict)
            experimentDict["id"] = experimentID
            
            experiments.append(experimentDict)

    with open(f"{experimentSettingJSONsBasePath}/experiments-noisyTransferLearning-randomSeed={randomSeed}.json", "w") as f:
        json.dump(experiments, f)