# **Summary**

This script provides an automated pipeline for predicting conductance in twisted bilayer graphene systems using machine learning. The process integrates data normalization, clustering with a Self-Organizing Map (SOM), and regression with Gradient Boosting Regressors (GBR).

The workflow ensures that any new input follows the same transformation pipeline as the training data, maintaining consistency in predictions. The key steps include downloading pre-trained models and scalers, clustering new input data based on learned patterns, selecting the appropriate GBR model for the identified cluster, and producing a final conductance prediction.

By structuring the workflow in an automated and modular way, this approach significantly reduces computational costs compared to traditional simulations while preserving accuracy in conductance predictions.

# User Guide: Running the Conductance Prediction Pipeline

### 📌 User Guide: Running the Conductance Prediction Pipeline

This user guide provides a step-by-step explanation of how to execute the conductance prediction pipeline in this Colab notebook. The notebook is structured into multiple sections, ensuring a seamless process from dependency installation to final predictions.

## 1.  Installing Required Dependencies

*   Before running the main code, all necessary Python libraries must be installed. This is done in Section 1, where essential packages such as MiniSom, numpy, scikit-learn, and pandas are installed.
*   Ensure that all required packages are installed before running the pipeline.


## 2. Downloading and Loading Pre-Trained Models


*   In Section 2, the script downloads pre-trained models and scalers required for predictions. It verifies that all files are correctly retrieved before proceeding.
*   The script checks whether each file is present and successfully loaded.
If any file is missing or fails to load, an error message is displayed for debugging.

## 3. Loading and Initializing the Models

*   This step ensures that all necessary components are in memory before making predictions.

## 4. Making Predictions with a New Input

*   In section 4, users can input a new angle configuration to predict conductance. To perform a prediction, type the angle values in the variable.

* The energy (𝐸) will be in the range of 0.285 to 0.306. Values ​​outside this range can lead to unreliable predictions as the model has not been trained beyond these limits.

*   Angle (𝜃) (twist angle between graphene layers), must be within the range 1.17° to 4°. Any value outside this range produce inaccurate results.  Values ​​outside this range can lead to unreliable predictions as the model has not been trained beyond these limits.

*   Arrival and Exit Pairs were indexed according to the following mapping:


*   The "u" (up) and "d" (down) indicate the layer of the bilayer graphene where the contact is located.
![TBG Device](https://drive.google.com/uc?export=view&id=1mg4eskPu94C0zm0Dwt5EioFTZxFmLDpX)


# **Running the code**


*   Press the "Play" button in every code cell
*   In section 3, user have to type the choosen angle to predict after running the cell. ( It shows up bellow the code cell)

# **Section 1: Environment Setup and Package Installation**

*  If an error message shows up during the installation process, you can safely ignore it. The warning is caused by version conflicts with other packages that are not required for this program.

*   Ignore the red "run" button after running the code cell.

*  Wait for the message "✅ Environment setup complete. Ready to proceed!" to run other cell code.

In [None]:
# 1️⃣ Desinstalar TODOS os pacotes
!pip freeze | xargs pip uninstall -y 2>/dev/null

# 2️⃣ Instalar apenas os pacotes necessários
!pip install numpy==2.0.0 joblib==1.4.2 scikit-learn==1.6.0 minisom==2.3.3 gdown pandas --no-warn-script-location > /dev/null 2>&1

# 3️⃣ Exibir mensagem de conclusão
print("✅ Environment setup complete. Ready to proceed!")

import time
time.sleep(2)

# 3️⃣ Reiniciar o ambiente para garantir que tudo funcione corretamente
import os
os._exit(00)


✅ Environment setup complete. Ready to proceed!


# Section 2: Downloading and Verifying the Required Files


### Downloading and Verifying the Required Files
*  Wait for the print message "✅ Environment setup complete. Ready to proceed!" to run the cell code bellow

*   Retrieves pre-trained models, scalers, and the trained SOM from Google Drive.
*   Ensures all required files are downloaded and successfully loaded.

In [1]:
import gdown
import os
import joblib
import numpy as np
import pandas as pd
import joblib
from minisom import MiniSom


# List of required files from Google Drive (direct download links)
files = {
    "minisom_clusterizer.pkl": "https://drive.google.com/uc?id=1Go6sv941JqSWCQG5ph50shK_kiGRFdHj",
    "scaler_X_standard.pkl": "https://drive.google.com/uc?id=1J1oTumvZBXhSdpw6HtCU7OFLqedOyQrI",
    "scaler_X_minmax.pkl": "https://drive.google.com/uc?id=1Pbx5Ra858LvD7Crj_snJfJTMPfhsK37Y",
    "scaler_y_standard.pkl": "https://drive.google.com/uc?id=1JxghZVBzDyB0pIED4SW0MhpdgtTq5kBI",
    "scaler_y_minmax.pkl": "https://drive.google.com/uc?id=1Iw_VO4a1nqPaEiBCV-UsuwtM10VhWisc",

    # GBR Models for each cluster
    "gbr_model_cluster_0.pkl": "https://drive.google.com/uc?id=1Jf2y8mph0GKJJe6OxvDUlbyFJvVQvEKF",
    "gbr_model_cluster_1.pkl": "https://drive.google.com/uc?id=1nu4DcI73dSbGH32-6_1XZv1Kki8R7ooP",
    "gbr_model_cluster_2.pkl": "https://drive.google.com/uc?id=1c3owJkSuKrVPl1TeyA0G4E5rWbxGVkMA",
    "gbr_model_cluster_3.pkl": "https://drive.google.com/uc?id=1cRE_A06sjSyU7Ftluz7Wzmx_w6QrKNNN",
    "gbr_model_cluster_4.pkl": "https://drive.google.com/uc?id=1qFHJQI85s32WA-vS3lPwtKytJlsHr1eS",
    "gbr_model_cluster_5.pkl": "https://drive.google.com/uc?id=1Zp70J-eATA0c50pBkVKOfjqC_xolOkAU",
    "gbr_model_cluster_6.pkl": "https://drive.google.com/uc?id=19V2RD_dQhCWlpICm647HRoREZ3b9Ni9C",
    "gbr_model_cluster_7.pkl": "https://drive.google.com/uc?id=1H-1DDKV5cdDAQc69gtDh1McSnsNY9Siq",
    "gbr_model_cluster_8.pkl": "https://drive.google.com/uc?id=1AKXrzH3Q8kSxMp6l15zJfLaAxA_6zkkv",
}

# Download all required files from Google Drive
for filename, url in files.items():
    gdown.download(url, filename, quiet=False)

print("✅ All files have been successfully downloaded!")

# List all downloaded files in the current directory
downloaded_files = os.listdir()
print("📂 Downloaded files:")
print(downloaded_files)
print("\n")

# Function to verify file loading
def verify_file_loading(file_name, variable_name):
    if file_name in downloaded_files:
        try:
            obj = joblib.load(file_name)
            print(f"✅ {file_name} successfully loaded into variable {variable_name}")
        except Exception as e:
            print(f"⚠️ Error loading {file_name}: {e}")
    else:
        print(f"❌ File {file_name} NOT found!")

# Verify MiniSom
verify_file_loading("minisom_clusterizer.pkl", "som")

# Verify Normalizers
verify_file_loading("scaler_X_standard.pkl", "scaler_X_standard")
verify_file_loading("scaler_X_minmax.pkl", "scaler_X_minmax")
verify_file_loading("scaler_y_standard.pkl", "scaler_y_standard")
verify_file_loading("scaler_y_minmax.pkl", "scaler_y_minmax")

# Verify GBR Models
for i in range(9):
    verify_file_loading(f"gbr_model_cluster_{i}.pkl", f"gbr_model_cluster_{i}")

print("\n✅ Verification completed! If any errors occur, check the Google Drive file IDs.")


Downloading...
From: https://drive.google.com/uc?id=1Go6sv941JqSWCQG5ph50shK_kiGRFdHj
To: /content/minisom_clusterizer.pkl
100%|██████████| 7.05k/7.05k [00:00<00:00, 8.22MB/s]
Downloading...
From: https://drive.google.com/uc?id=1J1oTumvZBXhSdpw6HtCU7OFLqedOyQrI
To: /content/scaler_X_standard.pkl
100%|██████████| 1.06k/1.06k [00:00<00:00, 2.24MB/s]
Downloading...
From: https://drive.google.com/uc?id=1Pbx5Ra858LvD7Crj_snJfJTMPfhsK37Y
To: /content/scaler_X_minmax.pkl
100%|██████████| 823/823 [00:00<00:00, 2.33MB/s]
Downloading...
From: https://drive.google.com/uc?id=1JxghZVBzDyB0pIED4SW0MhpdgtTq5kBI
To: /content/scaler_y_standard.pkl
100%|██████████| 623/623 [00:00<00:00, 1.87MB/s]
Downloading...
From: https://drive.google.com/uc?id=1Iw_VO4a1nqPaEiBCV-UsuwtM10VhWisc
To: /content/scaler_y_minmax.pkl
100%|██████████| 719/719 [00:00<00:00, 1.51MB/s]
Downloading...
From: https://drive.google.com/uc?id=1Jf2y8mph0GKJJe6OxvDUlbyFJvVQvEKF
To: /content/gbr_model_cluster_0.pkl
100%|██████████| 44.4

✅ All files have been successfully downloaded!
📂 Downloaded files:
['.config', 'gbr_model_cluster_6.pkl', 'gbr_model_cluster_7.pkl', 'gbr_model_cluster_8.pkl', 'gbr_model_cluster_3.pkl', 'scaler_y_standard.pkl', 'scaler_y_minmax.pkl', 'gbr_model_cluster_5.pkl', 'scaler_X_minmax.pkl', 'scaler_X_standard.pkl', 'gbr_model_cluster_1.pkl', 'minisom_clusterizer.pkl', 'gbr_model_cluster_2.pkl', 'gbr_model_cluster_4.pkl', 'gbr_model_cluster_0.pkl', 'sample_data']


✅ minisom_clusterizer.pkl successfully loaded into variable som
✅ scaler_X_standard.pkl successfully loaded into variable scaler_X_standard
✅ scaler_X_minmax.pkl successfully loaded into variable scaler_X_minmax
✅ scaler_y_standard.pkl successfully loaded into variable scaler_y_standard
✅ scaler_y_minmax.pkl successfully loaded into variable scaler_y_minmax
✅ gbr_model_cluster_0.pkl successfully loaded into variable gbr_model_cluster_0
✅ gbr_model_cluster_1.pkl successfully loaded into variable gbr_model_cluster_1
✅ gbr_model_cluste

# Section 3: Loading Models and making Predictions

### Loading Models and Performing Predictions

*   Loads the trained MiniSOM, scalers, and GBR models for each cluster.


### Making a prediction

*   Normalize the input data (energy, angle, contact pairs) using the pre-trained scalers.

*   Identifies the corresponding cluster using the trained SOM and selects the appropriate GBR model for prediction.

*   Predict the conductance using the chosen GBR model.

*   Denormalize the predicted conductance to return a final real-world value.


### *** After running the code cell type the angle configuration bellow the code cell.***

In [1]:
import joblib
import pandas as pd
import numpy as np

# 📌 Carregar MiniSom treinado
som = joblib.load("minisom_clusterizer.pkl")

# 📌 Obter dimensões do SOM
som_x, som_y = som._weights.shape[:2]

# 📌 Carregar escaladores
scaler_X_standard = joblib.load("scaler_X_standard.pkl")
scaler_X_minmax = joblib.load("scaler_X_minmax.pkl")
scaler_y_standard = joblib.load("scaler_y_standard.pkl")
scaler_y_minmax = joblib.load("scaler_y_minmax.pkl")

# 📌 Carregar modelos GBR para cada cluster
models = {i: joblib.load(f"gbr_model_cluster_{i}.pkl") for i in range(9)}

print("✅ Modelos e escaladores carregados!")

# 📌 Função para gerar todos os pares de contato válidos
def generate_contact_pairs():
    return [(i, j) for i in range(1, 9) for j in range(1, 9) if i != j]

# 📌 Função para prever condutância para um ângulo específico
def predict_conductance_for_angle(angle):
    print("🔄 Gerando pares de contato...")

    # 📌 Gerar todas as combinações de ArrivalPair e ExitPair
    contact_pairs = generate_contact_pairs()

    # 📌 Criar DataFrame com todas as combinações possíveis
    data = []
    for arrival, exit_ in contact_pairs:
        # 📌 Criar 206 valores de energia no intervalo [0.285, 0.306]
        energy_values = np.linspace(0.285, 0.306, 206)
        for energy in energy_values:
            data.append([energy, angle, arrival, exit_])

    df = pd.DataFrame(data, columns=["Energy", "Angle", "ArrivalPair", "ExitPair"])

    print(f"\n✅ DataFrame criado com {len(df)} amostras!")  # Deve conter 9476 amostras

    # 📌 Exibir 5 amostras antes da normalização
    print("\n📌 5 Random Samples Before Normalization:")
    print(df.sample(n=5, random_state=42))

    # 📌 Normalizar os dados
    df_standardized = scaler_X_standard.transform(df)
    df_normalized = scaler_X_minmax.transform(df_standardized)

    df_normalized_df = pd.DataFrame(df_normalized, columns=["Energy", "Angle", "ArrivalPair", "ExitPair"])
    print("\n📌 5 Random Samples After Normalization:")
    print(df_normalized_df.sample(n=5, random_state=42))

    # 📌 Clusterizar os dados com o MiniSom treinado
    clusters = [som.winner(sample) for sample in df_normalized]
    cluster_labels = [x[0] * som_y + x[1] for x in clusters]
    df["Cluster"] = cluster_labels

    print("\n📌 5 Random Samples After Clustering:")
    print(df.sample(n=5, random_state=42))

    # 📌 Fazer previsões usando o modelo correspondente a cada cluster
    predicted_conductance = []
    for i, sample in enumerate(df_normalized):
        cluster = cluster_labels[i]
        model = models.get(cluster)

        if model is None:
            predicted_conductance.append(None)
        else:
            # 📌 Fazer a previsão com o modelo GBR correspondente ao cluster
            pred_norm = model.predict(sample.reshape(1, -1))

            # 📌 Desnormalizar a previsão
            pred_std = scaler_y_minmax.inverse_transform(pred_norm.reshape(-1, 1))
            pred_real = scaler_y_standard.inverse_transform(pred_std)

            predicted_conductance.append(pred_real[0][0])

            if i < 5:  # Exibir apenas 5 previsões para não poluir o console
                print(f"\n📌 Sample {i+1}: Cluster {cluster}, GBR Model {cluster} utilizado.")
                print(f"   Entrada para GBR: {sample}")
                print(f"   Predição normalizada: {pred_norm[0]}")
                print(f"   Predição desnormalizada: {pred_real[0][0]}")

    # 📌 Adicionar previsões ao DataFrame
    df["PredictedConductance"] = predicted_conductance

    print("\n📌 DataFrame Final com Previsões:")
    print(df.head(10))  # Exibir as 10 primeiras linhas com previsões

    return df  # Retorna o DataFrame final para inspeção

# 📌 Perguntar ao usuário qual ângulo deseja usar
while True:
    try:
        new_angle = float(input("\n📌 Digite o ângulo para previsão: "))
        break
    except ValueError:
        print("❌ Entrada inválida! Digite um número válido para o ângulo.")

# 📌 Executar previsão sem salvar resultados
final_df = predict_conductance_for_angle(new_angle)

print("\n✅ Processo concluído! O DataFrame com previsões foi gerado.")


✅ Modelos e escaladores carregados!


KeyboardInterrupt: Interrupted by user

# Section 4: Saving the results

### Saving results files to calculate resistance

*   Normalize the input data (energy, angle, contact pairs) using the pre-trained scalers.

*   Identify the cluster of the input using the MiniSom model.

*   Select the appropriate GBR model for the identified cluster.

*   Predict the conductance using the chosen GBR model.

*   Denormalize the predicted conductance to return a final real-world value.


In [None]:
import zipfile
import sys

# 📌 Definir o mapeamento original dos contatos
arrival_mapping = {'1u': 1, '2u': 2, '3u': 3, '4u': 4, '1d': 5, '2d': 6, '3d': 7, '4d': 8}
exit_mapping = arrival_mapping.copy()

# 📌 Criar diretório se não existir
os.makedirs(save_directory, exist_ok=True)

# 📌 Criar mapeamento reverso dos contatos (index -> nome real)
reverse_mapping = {v: k for k, v in arrival_mapping.items()}

# 📌 Iterar sobre cada par ArrivalPair/ExitPair e salvar arquivos individuais
saved_files = []
for (arrival, exit_), group_df in final_df.groupby(["ArrivalPair", "ExitPair"]):
    # 📌 Converter índice de volta para o nome real do contato
    arrival_name = reverse_mapping[arrival]
    exit_name = reverse_mapping[exit_]

    # 📌 Converter ângulo para formato "3p0" (substituir "." por "p")
    angle_str = str(new_angle).replace(".", "p")

    # 📌 Formatar nome do arquivo usando os nomes reais dos contatos
    filename = f"G{arrival_name}{exit_name}_AA50x50_{angle_str}_0T_th.dat"
    output_file = os.path.join(save_directory, filename)

    # 📌 Salvar DataFrame do grupo no arquivo correspondente
    group_df.to_csv(output_file, sep=" ", index=False, header=True)

    print(f"✅ Arquivo salvo: {output_file}")
    saved_files.append(output_file)

# 📌 Se estiver no Google Colab, compactar os arquivos e baixar
if "google.colab" in sys.modules:
    print("\n📦 Criando um arquivo ZIP para download...")

    zip_filename = "/content/predicted_conductances.zip"
    with zipfile.ZipFile(zip_filename, 'w') as zipf:
        for file in saved_files:
            zipf.write(file, os.path.basename(file))  # Adiciona ao ZIP sem os diretórios

    print(f"\n📦 Arquivo ZIP gerado: {zip_filename}")

    # 📌 Baixar automaticamente o arquivo ZIP no Colab
    from google.colab import files
    files.download(zip_filename)
    print("\n📥 O download do arquivo compactado foi iniciado!")

print("\n✅ Processo concluído! Todos os arquivos foram gerados e salvos.")


✅ Arquivo salvo: C:\Users\PICHAU\Desktop\Condutâncias preditas/G1u2u_AA50x50_3p0_0T_th.dat
✅ Arquivo salvo: C:\Users\PICHAU\Desktop\Condutâncias preditas/G1u3u_AA50x50_3p0_0T_th.dat
✅ Arquivo salvo: C:\Users\PICHAU\Desktop\Condutâncias preditas/G1u4u_AA50x50_3p0_0T_th.dat
✅ Arquivo salvo: C:\Users\PICHAU\Desktop\Condutâncias preditas/G1u1d_AA50x50_3p0_0T_th.dat
✅ Arquivo salvo: C:\Users\PICHAU\Desktop\Condutâncias preditas/G1u2d_AA50x50_3p0_0T_th.dat
✅ Arquivo salvo: C:\Users\PICHAU\Desktop\Condutâncias preditas/G1u3d_AA50x50_3p0_0T_th.dat
✅ Arquivo salvo: C:\Users\PICHAU\Desktop\Condutâncias preditas/G1u4d_AA50x50_3p0_0T_th.dat
✅ Arquivo salvo: C:\Users\PICHAU\Desktop\Condutâncias preditas/G2u1u_AA50x50_3p0_0T_th.dat
✅ Arquivo salvo: C:\Users\PICHAU\Desktop\Condutâncias preditas/G2u3u_AA50x50_3p0_0T_th.dat
✅ Arquivo salvo: C:\Users\PICHAU\Desktop\Condutâncias preditas/G2u4u_AA50x50_3p0_0T_th.dat
✅ Arquivo salvo: C:\Users\PICHAU\Desktop\Condutâncias preditas/G2u1d_AA50x50_3p0_0T_th.dat

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


📥 O download do arquivo compactado foi iniciado!

✅ Processo concluído! Todos os arquivos foram gerados e salvos.
