# [STEP3,4 - 解析2] 特定の脳部位に対し、投射関係を逆に辿った経路を出力

[解析1]と同様、[解析2]に対して、STEP3から4までの出力結果を表示します。


In [None]:
# Pythonライブラリインストール
# ※Python 3.10.x　使用推奨
!python --version
!pip install python-dotenv
!pip install --upgrade openai
!pip install openai[datalib]

!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install plotly
!pip install scikit-learn
!pip install sqlalchemy

!pip install anytree

## 環境変数
supabase接続用URL,APIキーと、openai api接続用のAPIキーを設定します。
自身のopenaiアカウントからapi keyを取得してください。

https://platform.openai.com/account/api-keys

supabaseの情報は管理者にお尋ねください。

下記の例では、.envファイルに変数を書き込んで、JupiterNotebookで読み込む仕様で実装しております。

※.envファイルの作成が困難、.envファイルから値を読み込めない場合、
　os.getenv("◯◯")部分に変数値を直接書き込んでいただいても動作自体には問題ありません。

In [2]:
# 環境変数
import os
from dotenv import load_dotenv

import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy import text
import urllib.parse
from IPython.display import display
from anytree import Node, RenderTree, AsciiStyle
import openai

load_dotenv()

# supabase接続用変数
db_host = os.getenv("DB_HOST")
db_port = os.getenv("DB_PORT")
db_name = os.getenv("DB_NAME")
db_user = os.getenv("DB_USER")
db_pass = os.getenv("DB_PASS")

# OPENAI API KEY
openai_api_key = os.getenv("OPENAI_API_KEY")
openai.api_key = openai_api_key

# Connect to the database
connection_config = {
    'user': db_user,
    'password': urllib.parse.quote_plus(db_pass),
    'host': db_host,
    'port': db_port, 
    'database': db_name
}
engine = create_engine('postgresql://{user}:{password}@{host}:{port}/{database}'.format(**connection_config))


print('環境変数読み込み完了')

環境変数読み込み完了


# 処理実行
## [STEP3] 大まかな脳領域からの投射関係を出力

Frontal Pole (ID:184)から他の脳部位への投射関係について解析しました。
解析対象は、脳の階層構造において、子ノードを持たない脳部位に絞っております。


In [21]:
# 投射先のstructure id
# 入力例:
#    Frontal Pole (ID:184)
#    Primary motor area (ID:985)
#    Secondary motor area (ID:993)
input_structure_id = 184
# 正規化投射量(normalized projection volume)の多い順に{rank_num}項目だけ書き出し
rank_num = 20


# 子要素を全て書き出し
sql ="""
SELECT
    id,
    name,
    acronym,
    "st-level",
    "parent-structure-id"
FROM
    structures;"""
with engine.begin() as conn:
    query = text(sql)
    df_structures = pd.read_sql_query(query, conn)

structure_ids = [input_structure_id]
terminal_structure_ids = df_structures.loc[~df_structures['id'].isin(df_structures['parent-structure-id'])]['id']

def get_children(parent_id):
    return df_structures[df_structures["parent-structure-id"] == parent_id]

def build_hierarchy(parent_id, level=0):
    children = get_children(parent_id)
    hierarchy = {}
    for _, child in children.iterrows():
        child_id = child["id"]
        hierarchy[child_id] = {"level": level, "acronym": child['acronym'], "children": build_hierarchy(child_id, level + 1)}
        structure_ids.append(child_id)
    return hierarchy

hierarchy = build_hierarchy(input_structure_id)


# 投射情報
sql ="""
SELECT
    sp.id,
    sp."structure-id" AS "injected-structure-id",
    s.name,
    s.acronym,
    p."normalized-projection-volume",
    p."structure-id"
FROM specimens AS sp
    INNER JOIN projections AS p ON p."experiment-id" = sp."experiment-id"
    INNER JOIN structures AS s ON sp."structure-id" = s.id
WHERE
    p."structure-id" IN :structure_ids
    AND p."is-injection" = false
"""
with engine.begin() as conn:
    query = text(sql)
    df = pd.read_sql_query(query, conn, params={"structure_ids": tuple(structure_ids)})

# Filter structures
df=df[~df['injected-structure-id'].isin(structure_ids)]

# display(df.sort_values(by="normalized-projection-volume", ascending=False))

# Calculate the average volume for each id
df_groupby = df.copy()
df_groupby.loc[:, 'label'] = df_groupby['acronym'] + "(" + df_groupby['name'] + ")"

average_volume = df_groupby.groupby('label')['normalized-projection-volume'].mean()

# Sort the average volume
sorted_average_volume = average_volume.sort_values(ascending=False)

injected_structure_list_ranking = sorted_average_volume.head(rank_num)
display(injected_structure_list_ranking)


# export only ids
average_volume = df_groupby.groupby('injected-structure-id')['normalized-projection-volume'].mean()
sorted_average_volume = average_volume.sort_values(ascending=False)
ids = sorted_average_volume.head(rank_num)
injected_structure_ids = ids.index[:rank_num].tolist()

print("投射先 structures ids")
print(structure_ids)

print("投射元 structures ids")
print(injected_structure_ids)

# STEP3 - 解析1の出力結果
df_selected_structure = df_structures[df_structures['id']==input_structure_id]
selected_structure_name = df_selected_structure['acronym'].values[0] + "(" + df_selected_structure['name'].values[0] + ")"
selected_structure_ids= structure_ids + injected_structure_ids
selected_structure_list=injected_structure_list_ranking.to_csv(sep='\t', index=True, header=False)


label
IO(Inferior olivary complex)                             0.232814
ORBl(Orbital area, lateral part)                         0.112609
NLOT(Nucleus of the lateral olfactory tract)             0.061361
CM(Central medial nucleus of the thalamus)               0.054840
GPe(Globus pallidus, external segment)                   0.053000
CLA(Claustrum)                                           0.050415
SMT(Submedial nucleus of the thalamus)                   0.048041
MOs(Secondary motor area)                                0.047001
AIv(Agranular insular area, ventral part)                0.046315
ORBvl(Orbital area, ventrolateral part)                  0.044013
MD(Mediodorsal nucleus of thalamus)                      0.040891
LAV(Lateral vestibular nucleus)                          0.040658
VM(Ventral medial nucleus of the thalamus)               0.039933
PCN(Paracentral nucleus)                                 0.037771
SLD(Sublaterodorsal nucleus)                             0.037230
TMv(

投射先 structures ids
[184, 526322264, 526157192, 526157196, 667, 68]
投射元 structures ids
[83, 723, 619, 599, 1022, 583, 366, 993, 119, 746, 362, 209, 685, 907, 358, 1, 104, 210, 629, 342]


## [STEP3] 脳構造ツリーの作成

In [23]:
# structures要素を全て書き出し
sql ="""
SELECT
    id,
    name,
    acronym,
    "st-level",
    "parent-structure-id"
FROM
    structures;"""
with engine.begin() as conn:
    query = text(sql)
    df_structures = pd.read_sql_query(query, conn)

id_parent_dict = df_structures.set_index('id')['parent-structure-id'].to_dict()

tree_ids = []
for structure_id in selected_structure_ids:
    id = structure_id
    if id not in tree_ids:
        tree_ids.append(id)
    # Add the parent nodes to the filtered_nodes dictionary
    while id_parent_dict[id] > 0:
        id = id_parent_dict[id]
        if id not in tree_ids:
            tree_ids.append(id)

df_structures.loc[:, 'label'] = df_structures['acronym'] + "(" + df_structures['name'] + ")"
df = df_structures[df_structures['id'].isin(tree_ids)]


# Create a dictionary of anytree Node objects
nodes={}
for index, row in df.iterrows():
    node = Node(row['label'], id=row['id'])
    nodes[row['id']] = node

# Iterate through the DataFrame, set parent for each node
for index, row in df.iterrows():
    if row['parent-structure-id'] > 0:
        nodes[row['id']].parent = nodes[row['parent-structure-id']]

# Iterate through child nodes
root_node = nodes[df_structures.loc[df_structures['parent-structure-id']==0]['id'].values[0]]
root_render_tree=RenderTree(root_node, style=AsciiStyle()).by_attr()
print(root_render_tree)

root(root)
+-- grey(Basic cell groups and regions)
    |-- BS(Brain stem)
    |   |-- IB(Interbrain)
    |   |   |-- HY(Hypothalamus)
    |   |   |   +-- MEZ(Hypothalamic medial zone)
    |   |   |       +-- MBO(Mammillary body)
    |   |   |           |-- TM(Tuberomammillary nucleus)
    |   |   |           |   +-- TMv(Tuberomammillary nucleus, ventral part)
    |   |   |           +-- LM(Lateral mammillary nucleus)
    |   |   +-- TH(Thalamus)
    |   |       |-- DORpm(Thalamus, polymodal association cortex related)
    |   |       |   |-- MED(Medial group of the dorsal thalamus)
    |   |       |   |   |-- SMT(Submedial nucleus of the thalamus)
    |   |       |   |   +-- MD(Mediodorsal nucleus of thalamus)
    |   |       |   +-- ILM(Intralaminar nuclei of the dorsal thalamus)
    |   |       |       |-- PCN(Paracentral nucleus)
    |   |       |       +-- CM(Central medial nucleus of the thalamus)
    |   |       +-- DORsm(Thalamus, sensory-motor cortex related)
    |   |         

## [STEP4] OpenAI Completion APIを用いて洞察を得る

教授と学生というロールモデルを用いて、解析情報を報告するというかたちでCompletion APIを使用しました。

In [24]:
def request_for_insight_on_analysis2(is_sending_tree:bool=True,  is_sending_list:bool=True, ):
    query="""
      I'm your student. I'm studying what mouse brain regions are projecting to """+selected_structure_name+""" in the injection experiment.  Do you know what kinds of mouse brain parts are projecting mainly to """+selected_structure_name+"""? I am glad to tell me details as possible as you can.
    """
    if is_sending_list:
      query = """
        I'm your student. I analyzed data and realized that the injected brain structures below were projecting to """+selected_structure_name+""". Would you teach me some insight into the result? I am glad to tell me details as possible as you can.
        ----
        """+selected_structure_list
    
    msg=[]
    msg.append({"role": "system", "content": "You are a professor majoring in mouse brain connectivity. And you have a student analyzing experiment data and studying the neural connections in the mouse brain. In this experiment, a viral tracer is injected into a specimen that labels the axons by expressing a fluorescent protein, and the labeled axons are visualized using serial two-photon tomography."})
    if is_sending_tree:
      msg.append({"role": "system", "content": "You know a mouse brain structures like the below described by the tree diagram. \n----\n"+root_render_tree})

    msg.append({"role": "user", "content": query})
    
    completion = openai.ChatCompletion.create(
      model="gpt-4",
      messages=msg,
      temperature=0.2
    )
    #print(msg)
    #print(completion)
    return completion.choices[0].message.content

print("Completion A. 脳構造ツリーなし, 解析リストなし")
print(request_for_insight_on_analysis2(False,False))
print("Completion B. 脳構造ツリーなし, 解析リストあり")
print(request_for_insight_on_analysis2(False,True))
print("Completion C. 脳構造ツリーあり, 解析リストあり")
print(request_for_insight_on_analysis2(True,True))

Completion A. 脳構造ツリーなし, 解析リストなし
Hello! I'm glad to see you're making progress in your research on mouse brain connectivity. The frontal pole (FRP) of the cerebral cortex is an important region involved in higher cognitive functions, decision-making, and social behaviors. In mice, the FRP receives projections from several brain regions, including but not limited to:

1. Medial prefrontal cortex (mPFC): This region is involved in executive functions, decision-making, and social behaviors. It has strong reciprocal connections with the FRP, allowing for communication between these two areas.

2. Orbitofrontal cortex (OFC): The OFC is involved in reward processing, decision-making, and emotion regulation. It projects to the FRP, providing information about the value of stimuli and potential outcomes of actions.

3. Anterior cingulate cortex (ACC): The ACC is involved in error detection, conflict monitoring, and emotional processing. It sends projections to the FRP, which may contribute to t