# [STEP4] ChatGPTによる洞察の生成

STEP3で得た結果と、データ構造などのメタ情報から、OpenAI APIを用いて洞察を得ました。

In [None]:
# Pythonライブラリインストール
# ※Python 3.10.x　使用推奨
!python --version
!pip install python-dotenv
!pip install --upgrade openai
!pip install openai[datalib]

!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install plotly
!pip install scikit-learn
!pip install sqlalchemy

!pip install anytree

## 環境変数
supabase接続用URL,APIキーと、openai api接続用のAPIキーを設定します。
自身のopenaiアカウントからapi keyを取得してください。

https://platform.openai.com/account/api-keys

supabaseの情報は管理者にお尋ねください。

下記の例では、.envファイルに変数を書き込んで、JupiterNotebookで読み込む仕様で実装しております。

※.envファイルの作成が困難、.envファイルから値を読み込めない場合、
　os.getenv("◯◯")部分に変数値を直接書き込んでいただいても動作自体には問題ありません。

In [77]:
# 環境変数
import os
from dotenv import load_dotenv

import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy import text
import urllib.parse
from IPython.display import display
from anytree import Node, RenderTree, AsciiStyle
import openai

load_dotenv()

# supabase接続用変数
db_host = os.getenv("DB_HOST")
db_port = os.getenv("DB_PORT")
db_name = os.getenv("DB_NAME")
db_user = os.getenv("DB_USER")
db_pass = os.getenv("DB_PASS")

# OPENAI API KEY
openai_api_key = os.getenv("OPENAI_API_KEY")
openai.api_key = openai_api_key


# Connect to the database
connection_config = {
    'user': db_user,
    'password': urllib.parse.quote_plus(db_pass),
    'host': db_host,
    'port': db_port, 
    'database': db_name
}
engine = create_engine('postgresql://{user}:{password}@{host}:{port}/{database}'.format(**connection_config))


print('環境変数読み込み完了')

環境変数読み込み完了


# 処理実行

## 解析1. 大まかな脳領域からの投射関係を出力

### 脳の階層構造をテキスト化

rootからの全ての脳構造をテキスト化してAPIで送信する場合、トークン送信量の上限を超えてしまうため、STEP3の解析結果に関連するツリー構造のみをフィルタリングしました。

In [94]:
# STEP3 - 解析1の出力結果

# FRP
selected_structure_name = "FRP(Frontal pole)"
selected_structure_ids= [184, 526322264, 526157192, 526157196, 667, 68] + [672, 767, 962, 656, 1021, 629, 1101, 685, 328, 1020, 783, 320, 943, 844, 996, 262, 599, 630, 907, 440]
selected_structure_list="""
CP(Caudoputamen)                                          2.440266\n
MOs5(Secondary motor area, layer 5)                       0.961629\n
MOs2/3(Secondary motor area, layer 2/3)                   0.715561\n
MOs1(Secondary motor area, layer 1)                       0.510339\n
MOs6a(Secondary motor area, layer 6a)                     0.338765\n
VAL(Ventral anterior-lateral complex of the thalamus)     0.311510\n
AId5(Agranular insular area, dorsal part, layer 5)        0.301772\n
VM(Ventral medial nucleus of the thalamus)                0.287174\n
AId2/3(Agranular insular area, dorsal part, layer 2/3)    0.217278\n
PO(Posterior complex of the thalamus)                     0.169447\n
AId6a(Agranular insular area, dorsal part, layer 6a)      0.156987\n
MOp1(Primary motor area, Layer 1)                         0.124162\n
MOp2/3(Primary motor area, Layer 2/3)                     0.120033\n
MOp6a(Primary motor area, Layer 6a)                       0.111396\n
AId1(Agranular insular area, dorsal part, layer 1)        0.107286\n
RT(Reticular nucleus of the thalamus)                     0.107157\n
CM(Central medial nucleus of the thalamus)                0.105601\n
ORBl5(Orbital area, lateral part, layer 5)                0.105296\n
PCN(Paracentral nucleus)                                  0.103668\n
ORBl6a(Orbital area, lateral part, layer 6a)              0.082529\n
"""

'''
# MOp
selected_structure_name = "MOp(Primary motor area)"
selected_structure_ids= [985, 882, 844, 648, 943, 320] + [672, 767, 962, 484682516, 1020, 6, 1021, 656, 718, 945, 862, 1102, 924, 733, 629, 190, 625, 854, 1128, 931] 
selected_structure_list="""
CP(Caudoputamen)                                                3.104600\n
MOs5(Secondary motor area, layer 5)                             0.412647\n
MOs2/3(Secondary motor area, layer 2/3)                         0.394685\n
ccb(corpus callosum, body)                                      0.341310\n
PO(Posterior complex of the thalamus)                           0.284739\n
int(internal capsule)                                           0.247628\n
MOs6a(Secondary motor area, layer 6a)                           0.244097\n
MOs1(Secondary motor area, layer 1)                             0.207162\n
VPL(Ventral posterolateral nucleus of the thalamus)             0.190668\n
SSp-ul6a(Primary somatosensory area, upper limb, layer 6a)      0.169630\n
SSs6a(Supplemental somatosensory area, layer 6a)                0.167970\n
SSp-m6a(Primary somatosensory area, mouth, layer 6a)            0.166003\n
cpd(cerebal peduncle)                                           0.161042\n
VPM(Ventral posteromedial nucleus of the thalamus)              0.158887\n
VAL(Ventral anterior-lateral complex of the thalamus)           0.145010\n
py(pyramid)                                                     0.134618\n
SSp-ul5(Primary somatosensory area, upper limb, layer 5)        0.120353\n
SSp-ul2/3(Primary somatosensory area, upper limb, layer 2/3)    0.117440\n
SSp-ll5(Primary somatosensory area, lower limb, layer 5)        0.104281\n
PG(Pontine gray)                                                0.095131
"""

# MOs
selected_structure_name = "MOs(Secondary motor area)"
selected_structure_ids= [993, 656, 1085, 1021, 767, 962] + [672, 943, 844, 648, 276, 1020, 484682516, 320, 1108, 458, 629, 6, 685, 1015, 924, 873, 17, 1102, 878, 862]
selected_structure_list="""
CP(Caudoputamen)                                                      3.385755\n
MOp2/3(Primary motor area, Layer 2/3)                                 0.367506\n
MOp6a(Primary motor area, Layer 6a)                                   0.325849\n
MOp5(Primary motor area, Layer 5)                                     0.312323\n
PIR1(Piriform area, molecular layer)                                  0.228673\n
PO(Posterior complex of the thalamus)                                 0.210516\n
ccb(corpus callosum, body)                                            0.183801\n
MOp1(Primary motor area, Layer 1)                                     0.182400\n
ccg(genu of corpus callosum)                                          0.173723\n
OT1(Olfactory tubercle, molecular layer)                              0.168729\n
VAL(Ventral anterior-lateral complex of the thalamus)                 0.160450\n
int(internal capsule)                                                 0.118487\n
VM(Ventral medial nucleus of the thalamus)                            0.108683\n
ACAd5(Anterior cingulate area, dorsal part, layer 5)                  0.101966\n
cpd(cerebal peduncle)                                                 0.099282\n
SSs1(Supplemental somatosensory area, layer 1)                        0.096358\n
SCiw(Superior colliculus, motor related, intermediate white layer)    0.088734\n
SSp-m6a(Primary somatosensory area, mouth, layer 6a)                  0.087226\n
SSp-m1(Primary somatosensory area, mouth, layer 1)                    0.079325\n
SSs6a(Supplemental somatosensory area, layer 6a)                      0.078490
"""
'''

# structures要素を全て書き出し
sql ="""
SELECT
    id,
    name,
    acronym,
    "st-level",
    "parent-structure-id"
FROM
    structures;"""
with engine.begin() as conn:
    query = text(sql)
    df_structures = pd.read_sql_query(query, conn)

id_parent_dict = df_structures.set_index('id')['parent-structure-id'].to_dict()

tree_ids = []
for structure_id in selected_structure_ids:
    id = structure_id
    if id not in tree_ids:
        tree_ids.append(id)
    # Add the parent nodes to the filtered_nodes dictionary
    while id_parent_dict[id] > 0:
        id = id_parent_dict[id]
        if id not in tree_ids:
            tree_ids.append(id)

df_structures.loc[:, 'label'] = df_structures['acronym'] + "(" + df_structures['name'] + ")"
df = df_structures[df_structures['id'].isin(tree_ids)]


# Create a dictionary of anytree Node objects
nodes={}
for index, row in df.iterrows():
    node = Node(row['label'], id=row['id'])
    nodes[row['id']] = node

# Iterate through the DataFrame, set parent for each node
for index, row in df.iterrows():
    if row['parent-structure-id'] > 0:
        nodes[row['id']].parent = nodes[row['parent-structure-id']]

# Iterate through child nodes
root_node = nodes[df_structures.loc[df_structures['parent-structure-id']==0]['id'].values[0]]
root_render_tree=RenderTree(root_node, style=AsciiStyle()).by_attr()
print(root_render_tree)

root(root)
+-- grey(Basic cell groups and regions)
    |-- BS(Brain stem)
    |   +-- IB(Interbrain)
    |       +-- TH(Thalamus)
    |           |-- DORpm(Thalamus, polymodal association cortex related)
    |           |   |-- ILM(Intralaminar nuclei of the dorsal thalamus)
    |           |   |   |-- PCN(Paracentral nucleus)
    |           |   |   +-- CM(Central medial nucleus of the thalamus)
    |           |   |-- LAT(Lateral group of the dorsal thalamus)
    |           |   |   +-- PO(Posterior complex of the thalamus)
    |           |   +-- RT(Reticular nucleus of the thalamus)
    |           +-- DORsm(Thalamus, sensory-motor cortex related)
    |               +-- VENT(Ventral group of the dorsal thalamus)
    |                   |-- VM(Ventral medial nucleus of the thalamus)
    |                   +-- VAL(Ventral anterior-lateral complex of the thalamus)
    +-- CH(Cerebrum)
        |-- CNU(Cerebral nuclei)
        |   +-- STR(Striatum)
        |       +-- STRd(Striatum do

### OpenAI Completion APIを用いて洞察を得る

教授と学生というロールモデルを用いて、解析情報を報告するというかたちでCompletion APIを使用しました。

In [96]:
def request_for_insight_on_analysis1(is_sending_tree:bool=True,  is_sending_list:bool=True, ):
    query="""
      I'm your student. I'm studying what mouse brain regions are projected by injections to """+selected_structure_name+""".  Do you know what kinds of mouse brain parts are projected mainly from"""+selected_structure_name+"""? I am glad to tell me details as possible as you can.
    """
    if is_sending_list:
      query = """
        I'm your student. I analyzed data and realized that injections to """+selected_structure_name+""" projected mainly to the brain structures below. Would you teach me some insight into the result? I am glad to tell me details as possible as you can.
        ----
        """+selected_structure_list
    
    msg=[]
    msg.append({"role": "system", "content": "You are a professor majoring in mouse brain connectivity. And you have a student analyzing experiment data and studying the neural connections in the mouse brain. In this experiment, a viral tracer is injected into a specimen that labels the axons by expressing a fluorescent protein, and the labeled axons are visualized using serial two-photon tomography."})
    if is_sending_tree:
      msg.append({"role": "system", "content": "You know a mouse brain structures like the below described by the tree diagram. \n----\n"+root_render_tree})

    msg.append({"role": "user", "content": query})
    
    completion = openai.ChatCompletion.create(
      model="gpt-4",
      messages=msg,
      temperature=0.2
    )
    #print(msg)
    #print(completion)
    return completion.choices[0].message.content

print("Completion A. 脳構造ツリーなし, 解析リストなし")
print(request_for_insight_on_analysis1(False,False))
print("Completion B. 脳構造ツリーなし, 解析リストあり")
print(request_for_insight_on_analysis1(False,True))
print("Completion C. 脳構造ツリーあり, 解析リストあり")
print(request_for_insight_on_analysis1(True,True))

Completion A. 脳構造ツリーなし, 解析リストなし
Hello, student! I'm glad to see you're interested in the projections from the frontal pole (FRP) of the mouse brain. The FRP is a part of the prefrontal cortex, which is involved in various higher cognitive functions such as decision-making, working memory, and attention.

The FRP projects to several brain regions, including but not limited to:

1. Other regions within the prefrontal cortex: The FRP has reciprocal connections with other areas of the prefrontal cortex, such as the medial prefrontal cortex (mPFC) and the orbitofrontal cortex (OFC). These connections help in integrating information across different prefrontal areas.

2. Sensory and motor cortices: The FRP sends projections to the primary and secondary motor cortices (M1 and M2) and the primary somatosensory cortex (S1). These connections are crucial for the planning and execution of goal-directed actions.

3. Basal ganglia: The FRP projects to the striatum, a key component of the basal gang