# (Q)2つの材料の組合せ+(R)理由+(A)混合混触危険性データセットのLLMによる学習と予測
- Q&A: CRWより抽出した典型的な化学物質に関する混合混触危険性データセットを使用
- R: GPT-3.5を使い､Q&Aに含まれるreference dataを参考に自動生成

# 注 : 無料のGPUだと､cuda out of memoryとなるかもしれません
- vram節約の工夫が必要そうです
- 学習後に一旦､ランタイムリセットしてLoRAモデルを読み込みなおした方が良さそうです
- 推論にはかなりの時間を要します（15 min/件）。

In [1]:
#必要なライブラリのインストール
!pip install transformers==4.35.0
!pip install peft==0.5.0
!pip install bitsandbytes==0.41.1
!pip install accelerate==0.23.0
!pip install flash-attn==2.3.1.post1
!pip install datasets==2.14.5



In [2]:
#huggingfaceにログイン
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
Your token has been saved to /root

In [3]:
import os
#os.environ["CUDA_VISIBLE_DEVICES"]="1"

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch
from peft import LoraConfig, get_peft_model
from transformers import AutoTokenizer,pipeline
from datasets import Dataset
import copy
from tqdm import tqdm
#問題設定: はじめのN件をテストデータにする
n_test=30

In [4]:
#ハイパラ関連
#モデル名
model_size=7
#model_size=13
#model_size=70
model_name=f"meta-llama/Llama-2-{model_size}b-chat-hf"

#LoRA関連
r=8
lora_alpha=8
bit=16
bit=4

#LoRAのadapter
target_modules= [
    #"embed_tokens",
    #"lm_head",
    #"q_proj",
    "k_proj",
    "v_proj",
    #"o_proj",
    #"gate_proj",
    #"up_proj",
    #"down_proj",
]

#学習関連
gradient_checkpointing = False
per_device_train_batch_size=1
epochs=1
lr=10**-5

#ファインチューニングするか否か
do_train=True
#do_train=False

In [5]:

device_map={"":0}
use_flash_attention_2=False #Trueの方が早いが､無料のT4 gpuの場合は無効にしないとエラーが出る

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

def init_model(model_name, r, lora_alpha, target_modules, bit=4):
    if bit == 4:
        print("Using 4-bit mode")
        model = AutoModelForCausalLM.from_pretrained(model_name,
                                                     quantization_config=bnb_config,
                                                     device_map=device_map,
                                                     use_flash_attention_2=use_flash_attention_2,
                                                     )
    elif bit == 16:
        print("Using fp16 mode")
        model = AutoModelForCausalLM.from_pretrained(model_name,
                                                     device_map=device_map,
                                                     torch_dtype=torch.float16,
                                                     use_flash_attention_2=use_flash_attention_2,
                                                     )
    else:
        raise ValueError("bit must be 4 or 16")

    if len(target_modules)==0:
        return model
    peft_config = LoraConfig(
        task_type="CAUSAL_LM", inference_mode=False, r=r, lora_alpha=lora_alpha,
        lora_dropout=0.1,
        target_modules=target_modules,
    )
    model = get_peft_model(model, peft_config)
    return model


In [6]:

from peft import AutoPeftModelForCausalLM
lora_path="./outputs/model"

#訓練済みモデルを読み込む場合
if os.path.exists(lora_path):
  print("load lora model")
  model = AutoPeftModelForCausalLM.from_pretrained(lora_path,
                                                   device_map=device_map,
                                                     quantization_config=bnb_config,
                                                     use_flash_attention_2=use_flash_attention_2,)
  do_train=False
else:
  #モデル初期化
  model=init_model(model_name, r, lora_alpha, target_modules, bit=bit)


tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

#pipe = pipeline("text-generation", model=model,
#                tokenizer=tokenizer, max_new_tokens=1000)

load lora model


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

# データセットの生成

In [7]:
from collections import Counter
import random
import json

# JSONファイルを読み込む（今回はサンプルデータを使用）
# 実際には、ユーザーが提供するファイルパスを使用する
file_path = '/content/mixture_practice_w_reason_numeric_train_20240226_073910.json'

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# 予測結果のサンプルデータ
with open(file_path, 'r', encoding='utf-8') as f:
    data = json.load(f)

# predicted_resultsがないレコードを除外し、str型が紛れ込んでいるpredicted_resultsを数値型に置き換える
data = [
    {**record, 'predicted_results': [int(value) if isinstance(value, str) else value for value in record['predicted_results']]}
    for record in data if 'predicted_results' in record
]


In [8]:
# データセットの読み込み
import pandas as pd
import numpy as np
import random

df = pd.DataFrame(data)

# 乱数シードを設定しDataFrameをシャッフル
np.random.seed(0)  # Numpyの乱数シードも固定
df = df.sample(frac=1, random_state=0).reset_index(drop=True)  # random_stateでシードを指定（ここ省略しないでください。テストデータに学習データが混入してリーケージの要因になります）

dataset = df.to_dict(orient="records")
random.seed(0)  # randomモジュールのシード固定
random.shuffle(dataset)  # ここでは既にrandom.seed(0)が適用されている

print(df.shape)　# 前述のリーケージを防ぐために、データフレームの一部を表示してシャッフルが再現しているか確認します
df[:2]


(1230, 5)


Unnamed: 0,substance1,substance2,result,generated_reason,predicted_results
0,CALCIUM CARBONATE,BENZENE,Compatible,"Hydrocarbons, Aromatic WITH Carbonate Salts: ...","[10, 10, 10]"
1,LIMESTONE,ETHYLENE GLYCOL,Compatible,While there is generally no evidence for hazar...,"[25, 25, 25]"


In [9]:
#予測用関数（混合混触危険性の予測に対応するため改変）
def gen_mixture_text(record, reason="", prediction=""):
    substance1 = record["substance1"]
    substance2 = record["substance2"]
    prompt = f"""
#Problem
##Substance 1: {substance1}
##Substance 2: {substance2}"""
    if reason != "" and prediction != "":
        prompt += f"""
##Reason: {reason}
##Prediction: {prediction}
"""
    else:
        # Test mode
        prompt += """
##Reason:
"""
    return prompt

def generate_question_prompt(dataset, test_id, n_prompt_examples=5):
    train_ids = [i for i in range(len(dataset)) if i != test_id]
    prompt = system_prompt

    # Train prompt
    for _ in range(n_prompt_examples):
        id = random.choice(train_ids)
        prompt += gen_mixture_text(dataset[id],
                                   reason=dataset[id]["generated_reason"],
                                   prediction=dataset[id]["predicted_results"])
        prompt += "\n"

    # Test prompt
    prompt += gen_mixture_text(dataset[test_id])

    return prompt

# このスクリプトでは、`prepare_dataset`関数は不要となる可能性があります。
# 代わりに、`generate_question_prompt`関数で生成されたプロンプトを直接モデルに入力して使用します。


In [10]:
import random
system_prompt="""
You are a professional chemist. Predict the level of danger of the following mixture on a scale of 0 to 100 for three times, where 0 indicates no danger at all and 100 represents the highest level of danger.
Consider the quantitative reasons provided and make your prediction based on the mixture's safety or potential hazards. Provide your prediction as a list of three numerical values like [80, 85, 80], reflecting the mixture's level of danger.
"""
def prepare_dataset(context_list, tokenizer):
    data_list = [{"text": i} for i in context_list]
    random.shuffle(data_list)

    # tokenize
    dataset = Dataset.from_dict(
        {"text": [item["text"] for item in data_list[:]]})
    dataset = dataset.map(lambda samples: tokenizer(
        samples['text']), batched=True)

    return dataset


In [11]:

train_text_list=[]
for id in range(len(dataset)):
    prompt=gen_mixture_text(dataset[id],
                                reason=dataset[id]["generated_reason"],
                                prediction=dataset[id]["predicted_results"])
    train_text_list.append(prompt)
tokenized_dataset = prepare_dataset(train_text_list[n_test:], tokenizer)

Map:   0%|          | 0/1200 [00:00<?, ? examples/s]

In [12]:
#check prompt

print("train")
print(prompt)
print("test")
t_prompt=gen_mixture_text(dataset[0])
print(t_prompt)

train

#Problem
##Substance 1: TUNGSTEN
##Substance 2: PLATINUM
##Reason: Metals, Less Reactive WITH Metals, Less Reactive: the interaction of less reactive metals with each other at room temperature does not lead to hazardous reactions. The incandescent reactions of tin with tellurium and platinum with tellurium occur only at higher temperatures. No evidence of hazardous reactions between tungsten and platinum was found in the provided references. Additionally, qualitative data suggests that less reactive metals do not undergo hazardous reactions at room temperature. Therefore, based on the absence of evidence of hazardous interactions and the known behavior of less reactive metals, the combination of tungsten and platinum is deemed compatible with a low hazard score of 10.
##Prediction: [10, 10, 10]

test

#Problem
##Substance 1: POLYVINYL ALCOHOL
##Substance 2: HELIUM
##Reason:



# モデルの訓練

In [13]:
import transformers
from datetime import datetime

#train
train_args = transformers.TrainingArguments(
        per_device_train_batch_size=per_device_train_batch_size,
        gradient_accumulation_steps=1,
        warmup_steps=0,
        num_train_epochs=epochs,
        learning_rate=lr,
        fp16=True,
        logging_steps=100,
        save_total_limit=1,
        output_dir='outputs/'+datetime.now().strftime('%Y%m%d%H%M%S'),
        gradient_checkpointing=gradient_checkpointing,
    )

# trainer
#callbacks = [EarlyStoppingCallback()]
callbacks = []

trainer = transformers.Trainer(
    model=model,
    train_dataset=tokenized_dataset,
    args=train_args,
    callbacks=callbacks,
    data_collator=transformers.DataCollatorForLanguageModeling(
        tokenizer, mlm=False)
)

if do_train:
    training_result = trainer.train()
    training_result.training_loss

    model.save_pretrained(lora_path)

In [None]:
#model.save_pretrained(lora_path)

# **T4などの無料のGPUを用いている場合は､このタイミングで「セッションを再起動」して､上からセルを実行し直す(メモリが足りなくなるため。再起動時はoutputにモデルが保存されているため、学習は再実行されません。全セルを実行しなおしてください)**

# モデルによる物性値の予測

In [14]:

import re
import torch
import gc
from IPython.display import clear_output
model.eval()
def gen_text_stop_word(prompt,model,tokenizer,
                       device="cuda:0",
                       stop_words=["#Problem","#Reason"],
                       double_stop_words=["#Prediction"],
                       stream=False,
                       max_tokens=300):
    gc.collect()
    torch.cuda.empty_cache()

    input_ids = tokenizer.encode(prompt, return_tensors='pt').to(device)

    # 生成されたテキストを格納する変数
    generated_text = ""

    # トークンを一つずつ生成
    for i in range(max_tokens):
        # 次のトークンを予測
        outputs = model(input_ids)
        next_token_logits = outputs.logits[:, -1, :]
        next_token = torch.argmax(next_token_logits, dim=-1).unsqueeze(-1)

        # 生成されたトークンを現在の入力に追加
        input_ids = torch.cat([input_ids, next_token], dim=-1)

        # 生成されたテキストを更新
        generated_text = tokenizer.decode(input_ids[0], skip_special_tokens=True)[len(prompt):]

        if stream:
            if i%30==0:
                clear_output()
            print(generated_text)

        # ストップワードのチェック
        if any(stop_word in generated_text for stop_word in stop_words):
            break

        # 2回以上出現したらstopするwordのcheck
        stop_flag=False
        for check_word in double_stop_words:
            count=generated_text.count(check_word)
            if count>=2:
                stop_flag=True
                break
        if stop_flag:
            break

    return generated_text

import re

def ask_value(prompt, model, tokenizer):
    res = gen_text_stop_word(prompt, model, tokenizer)
    # res = pipe(prompt)[0]["generated_text"]
    print("----")
    print(res.strip())

    # 数値リストを探す正規表現パターンを更新
    regex_list = [
        r"\[(\d+\.?\d*(?:,\s*\d+\.?\d*)*)\]",  # [10, 10, 20] のようなリスト形式の数値をマッチ
    ]

    values = None
    for reg in regex_list:
        match = re.search(reg, res)
        if match:
            # マッチしたグループから数値リストを抽出し、それをfloatのリストとして解析
            values_str = match.group(1)  # "10, 10, 20"
            values = [float(value) for value in values_str.split(",")]
            break

    return res, values



In [None]:

random.seed(0)
prediction_results={}

#予測時のハイパラ
n_prompt_examples=3 #何件の例題をprompt tuningで出すか
n_max_trials=3  # 値を返さなかったときの再試行の最大数

res_list=[]
for test_id in tqdm(range(n_test)):
    print(f"promlem {test_id+1} / {n_test}")
    for _ in range(n_max_trials):
        try:
            prompt=generate_question_prompt(dataset,test_id,n_prompt_examples=n_prompt_examples)
            reason,value=ask_value(prompt,model,tokenizer)
        except Exception as e:
            print(e)
            continue


        if value is not None:
            record=copy.deepcopy(dataset[test_id])
            record["Test (Predicted reason)"]=reason
            record["Test (Predicted value)"]=value
            res_list.append(record)
            print("actual: ",record["predicted_results"],"predicted: ", record["Test (Predicted value)"],)
            break
prediction_results[n_prompt_examples]=res_list

  0%|          | 0/30 [00:00<?, ?it/s]

promlem 1 / 30


  3%|▎         | 1/30 [02:20<1:07:57, 140.62s/it]

----
##Prediction: [40, 40, 40]

##Reason
actual:  [10, 10, 10] predicted:  [40.0, 40.0, 40.0]
promlem 2 / 30


  7%|▋         | 2/30 [18:55<5:00:03, 642.99s/it]

----
Nitrosulfuric acid, a reactive and unstable compound, is known to react with chlorine, forming hazardous and toxic compounds. Sulfuric acid, when combined with chlorine, can also undergo hazardous reactions, such as the formation of chlorine gas and the production of toxic and corrosive compounds. The combination of nitrosulfuric acid and sulfuric acid with chlorine is expected to exhibit hazardous reactivity, resulting in a refined hazard score of 85, indicating a moderate to high level of danger.
##Prediction: [85, 85, 85]


#Problem
actual:  [95, 95, 95] predicted:  [85.0, 85.0, 85.0]
promlem 3 / 30


 10%|█         | 3/30 [45:45<8:08:01, 1084.52s/it]

----
Methanol, as a polar, non-redox-active compound, is generally compatible with inorganic compounds like calcium carbonate, which lacks redox-active properties. The absence of documented hazardous interactions between methanol and inorganic compounds, such as calcium carbonate, supports the compatibility of these substances. Moreover, the lack of evidence for hazardous reactions between methanol and other non-redox-active compounds, such as silicates, further reinforces the compatibility of methanol with calcium carbonate. Therefore, the combination of methanol and calcium carbonate is predicted to be compatible, with a low hazard score of 10 on the 0-100 scale, indicating minimal risk of hazardous interactions or reactivity under standard handling and storage conditions.
##Prediction: [10, 10, 10]


#Problem
actual:  [10, 10, 10] predicted:  [10.0, 10.0, 10.0]
promlem 4 / 30


 13%|█▎        | 4/30 [1:06:48<8:20:29, 1154.99s/it]

----
Hydrogen and methane mixtures, compressed, are known to be compatible with calcium carbonate, as there is no documented evidence of hazardous reactions or interactions between these substances under standard conditions. Calcium carbonate, a weak base, does not react with hydrogen and methane mixtures, as it does not readily undergo redox reactions with hydrogen or methane. Additionally, the lack of evidence for hazardous reactions between weak bases and hydrogen and methane mixtures supports the compatibility of this mixture. Therefore, the refined hazard score for the mixture of hydrogen and methane, compressed, and calcium carbonate is adjusted to 0, indicating no discernible hazard.
##Prediction: [0, 0, 0]


#Problem
actual:  [5, 5, 5] predicted:  [0.0, 0.0, 0.0]
promlem 5 / 30


 17%|█▋        | 5/30 [1:20:07<7:07:46, 1026.67s/it]

----
Methane, a hydrocarbon gas, is generally considered inert and non-reactive with metals. The lack of evidence for hazardous reactions between methane and metals, including lead, supports the compatibility of these substances. Additionally, the absence of any known reactivity between methane and lead sulfate, a metal sulfate compound, further reinforces the refined hazard score of 10, indicating minimal hazard and safe compatibility.
##Prediction: [10, 10, 10]


#Problem
actual:  [10, 10, 10] predicted:  [10.0, 10.0, 10.0]
promlem 6 / 30


 20%|██        | 6/30 [1:46:55<8:09:48, 1224.51s/it]

----
Sodium chloride is a strong oxidizing agent, while black phosphorus is a highly reactive nonmetal with a tendency to form compounds with metals. However, there is no known evidence or documented case of a hazardous reaction between sodium chloride and black phosphorus. Sodium chloride is known for its ability to oxidize a wide range of organic and inorganic compounds, while black phosphorus is highly reactive and can form compounds with metals. However, the lack of evidence and the stability of both substances under normal conditions suggest that the combination of sodium chloride and black phosphorus is compatible, with a low risk of hazardous interactions or reactivity under standard handling and storage conditions. Therefore, the compatibility score is estimated to be low, around 10-20, indicating minimal hazard.
##Prediction: [20, 15, 15]


#Problem
actual:  [65, 70, 70] predicted:  [20.0, 15.0, 15.0]
promlem 7 / 30


 23%|██▎       | 7/30 [2:01:02<7:02:04, 1101.05s/it]

----
Lithium, a highly reactive metal, is known to react with calcium carbonate, a common mineral, to form lithium carbonate and hydrogen gas. The reaction is considered hazardous due to the potential for gas formation and the reactivity of lithium. The documented evidence of lithium's reactivity with calcium carbonate supports the prediction of a high level of hazard, approaching 100 on the compatibility scale.
##Prediction: [90, 95, 90]


#Problem
actual:  [35, 35, 35] predicted:  [90.0, 95.0, 90.0]
promlem 8 / 30


 27%|██▋       | 8/30 [2:15:29<6:16:19, 1026.34s/it]

----
Methane, a non-reactive gas, is not expected to undergo hazardous reactions with limestone, a non-reactive inorganic compound. The lack of evidence for hazardous interactions between these substances, combined with their inert nature, suggests a low probability of adverse reactions occurring. Therefore, based on the compatibility of methane and limestone, the refined hazard score is set at 10, indicating a low level of hazard.
##Prediction: [10, 10, 10]


#Problem
actual:  [0, 0, 0] predicted:  [10.0, 10.0, 10.0]
promlem 9 / 30


 30%|███       | 9/30 [2:17:26<4:19:44, 742.12s/it] 

----
##Prediction: [0, 0, 0]


#Problem
actual:  [10, 10, 10] predicted:  [0.0, 0.0, 0.0]
promlem 10 / 30


 33%|███▎      | 10/30 [2:19:42<3:05:03, 555.19s/it]

----
##Prediction: [30, 35, 30]


#Problem
actual:  [0, 0, 0] predicted:  [30.0, 35.0, 30.0]
promlem 11 / 30


 37%|███▋      | 11/30 [2:37:30<3:45:28, 712.02s/it]

----
Silica, being a non-reactive inorganic compound, is generally considered safe and inert towards other non-reactive materials. Polypropylene, as a non-reactive polymer, is also considered safe and non-reactive. However, the absence of documented evidence for hazardous interactions between these two substances suggests that they are compatible and pose a low level of hazard. Therefore, the refined hazard score for the mixture of silica and polypropylene is 10, indicating a low level of hazard due to the compatibility of these non-reactive substances.
##Prediction: [10, 10, 10]


#Problem
actual:  [0, 0, 0] predicted:  [10.0, 10.0, 10.0]
promlem 12 / 30


 40%|████      | 12/30 [2:53:12<3:54:32, 781.83s/it]

----
Acetylene is a highly reactive gas that can undergo hazardous reactions with water, leading to the formation of explosive and toxic compounds. The presence of water in the mixture with acetylene is predicted to be hazardous, with a refined hazard score of 80, indicating a high level of danger. The potential for hazardous interactions between acetylene and water, along with the known reactivity of acetylene, supports the prediction of a high level of danger.
##Prediction: [80, 80, 80]


#Problem
actual:  [70, 70, 70] predicted:  [80.0, 80.0, 80.0]
promlem 13 / 30


 43%|████▎     | 13/30 [3:08:37<3:53:49, 825.24s/it]

----
Lead sulfate, a lead salt, is known to react with chlorine, a strong oxidizing agent, to form lead chloride, a highly toxic and corrosive compound. The documented hazardous reactions between lead salts and chlorine, as well as the potential for violent reactions, justify a high hazard score of 90. The compatibility of lead sulfate with chlorine is predicted to be incompatible, with a high likelihood of hazardous reactions occurring under ambient conditions.
##Prediction: [90, 90, 90]


#Problem
actual:  [80, 80, 80] predicted:  [90.0, 90.0, 90.0]
promlem 14 / 30


 47%|████▋     | 14/30 [3:42:31<5:17:28, 1190.52s/it]

----
Sodium chloride, a strong oxidizing agent, is incompatible with carbon monoxide, a highly toxic gas. The combination of these substances poses a significant hazard due to the potential for violent reactions, including the formation of toxic compounds. The documented hazards associated with the interaction of strong oxidizing agents with toxic gases, such as the violent reactions between chlorine and carbon monoxide, further underscore the incompatibility of sodium chloride with carbon monoxide. The lack of evidence for hazardous interactions between sodium chloride and other toxic gases, such as hydrogen cyanide, supports the compatibility of sodium chloride with carbon monoxide. However, the potential for violent reactions between strong oxidizing agents and toxic gases, combined with the documented hazards associated with the interaction of sodium chloride with carbon monoxide, warrants a cautious approach and a refined hazard score closer to the upper end of the scale.
##Predic

 50%|█████     | 15/30 [4:06:30<5:16:22, 1265.48s/it]

----
Lithium, a highly reactive metal, is known to react violently with alcohols, including ethylene glycol, forming toxic and flammable compounds. The reaction between lithium and ethylene glycol is expected to produce a mixture of toxic gases, including carbon monoxide, carbon dioxide, and hydrogen cyanide, with a high likelihood of explosive and hazardous reactions. The documented evidence of violent reactions between lithium and alcohols, including ethylene glycol, underscores the hazardous nature of their combination, warranting a high refined hazard score of 95 to reflect the significant danger and potential for explosive and toxic gas release.
##Prediction: [95, 95, 95]


#Problem
actual:  [95, 95, 95] predicted:  [95.0, 95.0, 95.0]
promlem 16 / 30


 53%|█████▎    | 16/30 [4:43:05<6:00:32, 1545.19s/it]

----
The interaction between hydrogen chloride, anhydrous and calcium carbonate has been evaluated, and no evidence of hazardous reactions was found. The absence of reactivity between halogenated compounds and weak reducing agents, as well as the lack of documented reactions between halogenated compounds and alkaline earth metal carbonates, supports a compatible classification. Additionally, the absence of any specific evidence of hazardous interactions between halogenated compounds and alkaline earth metal oxides, as well as the lack of documented reactions between halogenated compounds and alkaline earth metal hydroxides, further supports a compatible classification. Therefore, the compatibility rating of this mixture is refined to a score of 5 out of 100, indicating a very low hazard level based on the current understanding of the interactions.
##Prediction: [5, 5, 5]


In conclusion, the predictions for the level of danger of the mixtures of the three substances are as follows:

* 

 57%|█████▋    | 17/30 [5:01:27<5:05:52, 1411.71s/it]

----
Nitrogen, a chemically inert gas, does not participate in chemical reactions with non-reactive compounds like methane. Methane, a simple hydrocarbon, lacks functional groups that are known to form hazardous compounds with inert gases. The combination of nitrogen and methane is considered compatible, with no appreciable risk of hazardous interactions or reactivity under normal handling and storage conditions. The refined hazard score for the mixture of nitrogen and methane is 10, indicating a very low level of hazard based on the lack of evidence for any significant chemical interaction or reactivity between these substances.
##Prediction: [10, 10, 10]


#Problem
actual:  [0, 0, 0] predicted:  [10.0, 10.0, 10.0]
promlem 18 / 30


 60%|██████    | 18/30 [5:32:47<5:10:29, 1552.46s/it]

----
Tungsten is a refractory metal with a high melting point and low reactivity, and chromium is a transition metal with a high reactivity due to its ability to form complex compounds. However, there is no known evidence of hazardous interactions between tungsten and chromium under standard conditions. Additionally, the lack of documented cases of incompatibility between refractory metals and transition metals supports the prediction of compatibility between these substances. Furthermore, the principles of Lewis acid-base theory suggest that tungsten can act as a Lewis acid due to its tendency to accept an electron pair, while chromium can act as a Lewis base due to its ability to donate an electron pair. This interaction can be understood in terms of coordination rather than reactivity, further strengthening the prediction of compatibility. Considering these factors, the mixture of tungsten and chromium is assessed to be compatible, with a low likelihood of hazardous interactions occ

 63%|██████▎   | 19/30 [5:35:03<3:26:39, 1127.25s/it]

----
##Prediction: [80, 80, 80]

##Prediction
actual:  [0, 0, 0] predicted:  [80.0, 80.0, 80.0]
promlem 20 / 30


 67%|██████▋   | 20/30 [5:55:30<3:12:51, 1157.17s/it]

----
Lead, a toxic and highly reactive metal, is known to react violently with hydrogen and methane mixtures under high pressure. The interaction between lead and hydrogen has been documented to produce toxic and explosive compounds, such as hydrogen cyanide and hydrogen chloride. The potential for hazardous interactions between lead and methane is further supported by the documented reactivity of methane with other metals, such as aluminum and magnesium, under high-pressure conditions. The combination of lead and methane under high pressure is expected to exhibit hazardous reactivity, warranting a high hazard score on a scale from 0 to 100.
##Prediction: [90, 90, 95]


#Problem
actual:  [85, 85, 85] predicted:  [90.0, 90.0, 95.0]
promlem 21 / 30


 70%|███████   | 21/30 [5:57:46<2:07:35, 850.62s/it] 

----
##Prediction: [80, 80, 80]

##Prediction
actual:  [90, 90, 90] predicted:  [80.0, 80.0, 80.0]
promlem 22 / 30


 73%|███████▎  | 22/30 [6:29:05<2:34:32, 1159.09s/it]

----
Hydrogen peroxide, a stabilized oxidizing agent, has been documented to react with a wide range of organic compounds, including aromatic hydrocarbons like benzene. The hazardous nature of these interactions is attributed to the potential for violent reactions, including explosions, upon contact with strong oxidizing agents. The documented reactivity of hydrogen peroxide with aromatic hydrocarbons, such as toluene and xylene, underscores the potential for hazardous interactions with benzene. The historical evidence from literature supports the incompatibility of hydrogen peroxide with aromatic hydrocarbons, as well as the potential for violent reactions upon contact with strong oxidizing agents. The refined hazard score for the mixture of hydrogen peroxide and benzene is 80, indicating a moderate level of incompatibility and reactivity.
##Prediction: [80, 80, 80]


#Problem
actual:  [95, 95, 95] predicted:  [80.0, 80.0, 80.0]
promlem 23 / 30


 77%|███████▋  | 23/30 [6:53:07<2:25:09, 1244.20s/it]

----
Methanol, a polar and reactive solvent, is known to react with methane, a hydrocarbon gas, under certain conditions. The reaction between methanol and methane can produce a range of compounds, including methyl alcohol, methyl halides, and methyl esters. While the exact reactivity of methanol with methane is not explicitly documented, the potential for the formation of hazardous compounds and the known reactivity of methanol with other hydrocarbons warrant caution. Therefore, the combination of methanol and methane is classified as having a moderate level of hazard, with a refined prediction of 50, denoting a cautious approach due to the potential for hazardous interactions.
##Prediction: [50, 50, 50]


#Problem
actual:  [5, 5, 5] predicted:  [50.0, 50.0, 50.0]
promlem 24 / 30


In [None]:
#save
import datetime
import json

study_name="mixture_practice_w_reason_numeric_test"
now=datetime.datetime.now()
now_str=now.strftime("%Y%m%d_%H%M%S")
save_path=f"{study_name}_{now_str}.json"
with open(save_path, 'w', encoding = "utf-8") as f:
    json.dump(prediction_results, f, indent=4, ensure_ascii=False)


In [None]:
from google.colab import files

# ファイルをローカルにダウンロード
files.download(save_path)