# [STEP4] ChatGPTによる洞察の生成

STEP3で得た結果と、データ構造などのメタ情報から、OpenAI APIを用いて洞察を得ました。

In [None]:
# Pythonライブラリインストール
# ※Python 3.10.x　使用推奨
!python --version
!pip install python-dotenv
!pip install --upgrade openai
!pip install openai[datalib]

!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install plotly
!pip install scikit-learn
!pip install sqlalchemy

!pip install anytree

## 環境変数
supabase接続用URL,APIキーと、openai api接続用のAPIキーを設定します。
自身のopenaiアカウントからapi keyを取得してください。

https://platform.openai.com/account/api-keys

supabaseの情報は管理者にお尋ねください。

下記の例では、.envファイルに変数を書き込んで、JupiterNotebookで読み込む仕様で実装しております。

※.envファイルの作成が困難、.envファイルから値を読み込めない場合、
　os.getenv("◯◯")部分に変数値を直接書き込んでいただいても動作自体には問題ありません。

In [61]:
# 環境変数
import os
from dotenv import load_dotenv

import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy import text
import urllib.parse
from IPython.display import display
from anytree import Node, RenderTree, AsciiStyle
import openai

load_dotenv()

# supabase接続用変数
db_host = os.getenv("DB_HOST")
db_port = os.getenv("DB_PORT")
db_name = os.getenv("DB_NAME")
db_user = os.getenv("DB_USER")
db_pass = os.getenv("DB_PASS")

# OPENAI API KEY
openai_api_key = os.getenv("OPENAI_API_KEY")
openai.api_key = openai_api_key


# Connect to the database
connection_config = {
    'user': db_user,
    'password': urllib.parse.quote_plus(db_pass),
    'host': db_host,
    'port': db_port, 
    'database': db_name
}
engine = create_engine('postgresql://{user}:{password}@{host}:{port}/{database}'.format(**connection_config))


print('環境変数読み込み完了')

環境変数読み込み完了


# 処理実行

## 解析1. 大まかな脳領域からの投射関係を出力

### 脳の階層構造をテキスト化

rootからの全ての脳構造をテキスト化してAPIで送信する場合、トークン送信量の上限を超えてしまうため、STEP3の解析結果に関連するツリー構造のみをフィルタリングしました。

In [62]:
# STEP3 - 解析1の出力結果
selected_structure_ids= [672, 767, 962, 656, 1021, 629, 1101, 685, 328, 1020, 783, 320, 943, 844, 996, 262, 599, 630, 907, 440]
selected_structure_list="""
CP(Caudoputamen)\n
MOs5(Secondary motor area, layer 5)\n
MOs2/3(Secondary motor area, layer 2/3)\n
MOs1(Secondary motor area, layer 1)\n
MOs6a(Secondary motor area, layer 6a)\n
VAL(Ventral anterior-lateral complex of the thalamus)\n
AId5(Agranular insular area, dorsal part, layer 5)\n
VM(Ventral medial nucleus of the thalamus)\n
AId2/3(Agranular insular area, dorsal part, layer 2/3)\n
PO(Posterior complex of the thalamus)\n
AId6a(Agranular insular area, dorsal part, layer 6a)\n
MOp1(Primary motor area, Layer 1)\n
MOp2/3(Primary motor area, Layer 2/3)\n
MOp6a(Primary motor area, Layer 6a)\n
AId1(Agranular insular area, dorsal part, layer 1)\n
RT(Reticular nucleus of the thalamus)\n
CM(Central medial nucleus of the thalamus)\n
ORBl5(Orbital area, lateral part, layer 5)\n
PCN(Paracentral nucleus)\n
ORBl6a(Orbital area, lateral part, layer 6a)
"""



# structures要素を全て書き出し
sql ="""
SELECT
    id,
    name,
    acronym,
    "st-level",
    "parent-structure-id"
FROM
    structures;"""
with engine.begin() as conn:
    query = text(sql)
    df_structures = pd.read_sql_query(query, conn)

id_parent_dict = df_structures.set_index('id')['parent-structure-id'].to_dict()

tree_ids = []
for structure_id in selected_structure_ids:
    id = structure_id
    if id not in tree_ids:
        tree_ids.append(id)
    # Add the parent nodes to the filtered_nodes dictionary
    while id_parent_dict[id] > 0:
        id = id_parent_dict[id]
        if id not in tree_ids:
            tree_ids.append(id)

df_structures.loc[:, 'label'] = df_structures['acronym'] + "(" + df_structures['name'] + ")"
df = df_structures[df_structures['id'].isin(tree_ids)]


# Create a dictionary of anytree Node objects
nodes={}
for index, row in df.iterrows():
    node = Node(row['label'], id=row['id'])
    nodes[row['id']] = node

# Iterate through the DataFrame, set parent for each node
for index, row in df.iterrows():
    if row['parent-structure-id'] > 0:
        nodes[row['id']].parent = nodes[row['parent-structure-id']]

# Iterate through child nodes
root_node = nodes[df_structures.loc[df_structures['parent-structure-id']==0]['id'].values[0]]
root_render_tree=RenderTree(root_node, style=AsciiStyle()).by_attr()
print(root_render_tree)

root(root)
+-- grey(Basic cell groups and regions)
    |-- BS(Brain stem)
    |   +-- IB(Interbrain)
    |       +-- TH(Thalamus)
    |           |-- DORpm(Thalamus, polymodal association cortex related)
    |           |   |-- ILM(Intralaminar nuclei of the dorsal thalamus)
    |           |   |   |-- PCN(Paracentral nucleus)
    |           |   |   +-- CM(Central medial nucleus of the thalamus)
    |           |   |-- LAT(Lateral group of the dorsal thalamus)
    |           |   |   +-- PO(Posterior complex of the thalamus)
    |           |   +-- RT(Reticular nucleus of the thalamus)
    |           +-- DORsm(Thalamus, sensory-motor cortex related)
    |               +-- VENT(Ventral group of the dorsal thalamus)
    |                   |-- VM(Ventral medial nucleus of the thalamus)
    |                   +-- VAL(Ventral anterior-lateral complex of the thalamus)
    +-- CH(Cerebrum)
        |-- CNU(Cerebral nuclei)
        |   +-- STR(Striatum)
        |       +-- STRd(Striatum do

### OpenAI Completion APIを用いて洞察を得る

教授と学生というロールモデルを用いて、解析情報を報告するというかたちでCompletion APIを使用しました。

In [67]:
def generate_sql(query:str):
    msg = [
      {"role": "system", "content": "You are a professor majoring in mouse brain connectivity. And you have a student analyzing experiment data and studying the neural connections in the mouse brain. In this experiment, a viral tracer is injected into a specimen that labels the axons by expressing a fluorescent protein, and the labeled axons are visualized using serial two-photon tomography."},
      {"role": "system", "content": "You know a mouse brain structures like the below described by the tree diagram. \n----\n"+root_render_tree},
      {"role": "user", "content": query}
    ]
    completion = openai.ChatCompletion.create(
      model="gpt-4",
      messages=msg,
      temperature=0.5
    )
    #print(msg)
    #print(completion)
    return completion.choices[0].message.content


query = """
I'm your student. I analyzed data and realized that injections to FRP(frontal pole) projected mainly to the brain structures below. Would you teach me some insight into the result? I am glad to tell me details as possible as you can.
----
"""+selected_structure_list


sql = generate_sql(query)

print(sql)

The results of your analysis show that the injections to the frontal pole (FRP) project to various brain structures, including the caudoputamen, secondary motor area layers, thalamic nuclei, and agranular insular area layers. Let's go through these structures and their potential roles in the mouse brain.

1. Caudoputamen (CP): The CP is part of the striatum, which is involved in motor and reward functions. The projection from the FRP to the CP suggests a potential role in modulating motor control and decision-making.

2. Secondary motor area (MOs) layers 1, 2/3, 5, and 6a: The MOs is involved in higher-order motor control and planning. Projections from the FRP to these layers suggest a role in integrating cognitive and motor functions for goal-directed behavior.

3. Thalamic nuclei: The projections to the ventral anterior-lateral complex (VAL), ventral medial nucleus (VM), posterior complex (PO), reticular nucleus (RT), central medial nucleus (CM), and paracentral nucleus (PCN) suggest