# [STEP3,4 - 解析1] 大まかな脳領域からの投射関係を出力

特定の脳部位から他の脳部位への投射関係について解析しました。
解析対象は、脳の階層構造において、子ノードを持たない脳部位に絞っております。

In [None]:
# Pythonライブラリインストール
# ※Python 3.10.x　使用推奨
!python --version
!pip install python-dotenv
!pip install --upgrade openai
!pip install openai[datalib]

!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install plotly
!pip install scikit-learn
!pip install sqlalchemy

!pip install anytree

## 環境変数
supabase接続用URL,APIキーと、openai api接続用のAPIキーを設定します。
自身のopenaiアカウントからapi keyを取得してください。

https://platform.openai.com/account/api-keys

supabaseの情報は管理者にお尋ねください。

下記の例では、.envファイルに変数を書き込んで、JupiterNotebookで読み込む仕様で実装しております。

※.envファイルの作成が困難、.envファイルから値を読み込めない場合、
　os.getenv("◯◯")部分に変数値を直接書き込んでいただいても動作自体には問題ありません。

In [2]:
# 環境変数
import os
from dotenv import load_dotenv

import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy import text
import urllib.parse
from IPython.display import display
from anytree import Node, RenderTree, AsciiStyle
import openai

load_dotenv()

# supabase接続用変数
db_host = os.getenv("DB_HOST")
db_port = os.getenv("DB_PORT")
db_name = os.getenv("DB_NAME")
db_user = os.getenv("DB_USER")
db_pass = os.getenv("DB_PASS")

# OPENAI API KEY
openai_api_key = os.getenv("OPENAI_API_KEY")
openai.api_key = openai_api_key

# Connect to the database
connection_config = {
    'user': db_user,
    'password': urllib.parse.quote_plus(db_pass),
    'host': db_host,
    'port': db_port, 
    'database': db_name
}
engine = create_engine('postgresql://{user}:{password}@{host}:{port}/{database}'.format(**connection_config))


print('環境変数読み込み完了')

環境変数読み込み完了


# 処理実行
## [STEP3] 大まかな脳領域からの投射関係を出力

Frontal Pole (ID:184)から他の脳部位への投射関係について解析しました。
解析対象は、脳の階層構造において、子ノードを持たない脳部位に絞っております。


In [4]:
# 投射先のstructure id
# 入力例:
#    Frontal Pole (ID:184)
#    Primary motor area (ID:985)
#    Secondary motor area (ID:993)
input_structure_id = 184
# 正規化投射量(normalized projection volume)の多い順に{rank_num}項目だけ書き出し
rank_num = 20


# 子要素を全て書き出し
sql ="""
SELECT
    id,
    name,
    acronym,
    "st-level",
    "parent-structure-id"
FROM
    structures;"""
with engine.begin() as conn:
    query = text(sql)
    df_structures = pd.read_sql_query(query, conn)

structure_ids = [input_structure_id]
terminal_structure_ids = df_structures.loc[~df_structures['id'].isin(df_structures['parent-structure-id'])]['id']

def get_children(parent_id):
    return df_structures[df_structures["parent-structure-id"] == parent_id]

def build_hierarchy(parent_id, level=0):
    children = get_children(parent_id)
    hierarchy = {}
    for _, child in children.iterrows():
        child_id = child["id"]
        hierarchy[child_id] = {"level": level, "acronym": child['acronym'], "children": build_hierarchy(child_id, level + 1)}
        structure_ids.append(child_id)
    return hierarchy

hierarchy = build_hierarchy(input_structure_id)


# 投射情報
sql ="""
SELECT
    p.id,
    p."structure-id" AS "projected-structure-id",
    s.name,
    s.acronym,
    p."normalized-projection-volume",
    sp."structure-id"
FROM projections AS p
    INNER JOIN specimens AS sp ON p."experiment-id" = sp."experiment-id"
    INNER JOIN structures AS s ON p."structure-id" = s.id
WHERE
    sp."structure-id" IN :structure_ids
    AND p."is-injection" = false
    AND s."st-level" > 5;
"""
with engine.begin() as conn:
    query = text(sql)
    df = pd.read_sql_query(query, conn, params={"structure_ids": tuple(structure_ids)})


# Filter structures which have no child node
df=df[df['projected-structure-id'].isin(terminal_structure_ids)]
df=df[~df['projected-structure-id'].isin(structure_ids)]

# display(df.sort_values(by="normalized-projection-volume", ascending=False))

# Calculate the average volume for each id
df_groupby = df.copy()
df_groupby.loc[:, 'label'] = df_groupby['acronym'] + "(" + df_groupby['name'] + ")"

average_volume = df_groupby.groupby('label')['normalized-projection-volume'].mean()

# Sort the average volume
sorted_average_volume = average_volume.sort_values(ascending=False)

projected_structure_list_ranking = sorted_average_volume.head(rank_num)
display(projected_structure_list_ranking)


# export only ids
average_volume = df_groupby.groupby('projected-structure-id')['normalized-projection-volume'].mean()
sorted_average_volume = average_volume.sort_values(ascending=False)
ids = sorted_average_volume.head(rank_num)
injected_structure_ids = ids.index[:rank_num].tolist()

print("投射先 structures ids")
print(structure_ids)

print("投射元 structures ids")
print(injected_structure_ids)

# STEP3 - 解析1の出力結果
df_selected_structure = df_structures[df_structures['id']==input_structure_id]
selected_structure_name = df_selected_structure['acronym'].values[0] + "(" + df_selected_structure['name'].values[0] + ")"
selected_structure_ids= structure_ids + injected_structure_ids
selected_structure_list=projected_structure_list_ranking.to_csv(sep='\t', index=True, header=False)


label
CP(Caudoputamen)                                          2.440266
MOs5(Secondary motor area, layer 5)                       0.961629
MOs2/3(Secondary motor area, layer 2/3)                   0.715561
MOs1(Secondary motor area, layer 1)                       0.510339
MOs6a(Secondary motor area, layer 6a)                     0.338765
VAL(Ventral anterior-lateral complex of the thalamus)     0.311510
AId5(Agranular insular area, dorsal part, layer 5)        0.301772
VM(Ventral medial nucleus of the thalamus)                0.287174
AId2/3(Agranular insular area, dorsal part, layer 2/3)    0.217278
PO(Posterior complex of the thalamus)                     0.169447
AId6a(Agranular insular area, dorsal part, layer 6a)      0.156987
MOp1(Primary motor area, Layer 1)                         0.124162
MOp2/3(Primary motor area, Layer 2/3)                     0.120033
MOp6a(Primary motor area, Layer 6a)                       0.111396
AId1(Agranular insular area, dorsal part, layer 1)      

投射先 structures ids
[184, 526322264, 526157192, 526157196, 667, 68]
投射元 structures ids
[672, 767, 962, 656, 1021, 629, 1101, 685, 328, 1020, 783, 320, 943, 844, 996, 262, 599, 630, 907, 440]


## [STEP3] 脳構造ツリーの作成

In [5]:
# structures要素を全て書き出し
sql ="""
SELECT
    id,
    name,
    acronym,
    "st-level",
    "parent-structure-id"
FROM
    structures;"""
with engine.begin() as conn:
    query = text(sql)
    df_structures = pd.read_sql_query(query, conn)

id_parent_dict = df_structures.set_index('id')['parent-structure-id'].to_dict()

tree_ids = []
for structure_id in selected_structure_ids:
    id = structure_id
    if id not in tree_ids:
        tree_ids.append(id)
    # Add the parent nodes to the filtered_nodes dictionary
    while id_parent_dict[id] > 0:
        id = id_parent_dict[id]
        if id not in tree_ids:
            tree_ids.append(id)

df_structures.loc[:, 'label'] = df_structures['acronym'] + "(" + df_structures['name'] + ")"
df = df_structures[df_structures['id'].isin(tree_ids)]


# Create a dictionary of anytree Node objects
nodes={}
for index, row in df.iterrows():
    node = Node(row['label'], id=row['id'])
    nodes[row['id']] = node

# Iterate through the DataFrame, set parent for each node
for index, row in df.iterrows():
    if row['parent-structure-id'] > 0:
        nodes[row['id']].parent = nodes[row['parent-structure-id']]

# Iterate through child nodes
root_node = nodes[df_structures.loc[df_structures['parent-structure-id']==0]['id'].values[0]]
root_render_tree=RenderTree(root_node, style=AsciiStyle()).by_attr()
print(root_render_tree)

root(root)
+-- grey(Basic cell groups and regions)
    |-- BS(Brain stem)
    |   +-- IB(Interbrain)
    |       +-- TH(Thalamus)
    |           |-- DORpm(Thalamus, polymodal association cortex related)
    |           |   |-- ILM(Intralaminar nuclei of the dorsal thalamus)
    |           |   |   |-- PCN(Paracentral nucleus)
    |           |   |   +-- CM(Central medial nucleus of the thalamus)
    |           |   |-- LAT(Lateral group of the dorsal thalamus)
    |           |   |   +-- PO(Posterior complex of the thalamus)
    |           |   +-- RT(Reticular nucleus of the thalamus)
    |           +-- DORsm(Thalamus, sensory-motor cortex related)
    |               +-- VENT(Ventral group of the dorsal thalamus)
    |                   |-- VM(Ventral medial nucleus of the thalamus)
    |                   +-- VAL(Ventral anterior-lateral complex of the thalamus)
    +-- CH(Cerebrum)
        |-- CNU(Cerebral nuclei)
        |   +-- STR(Striatum)
        |       +-- STRd(Striatum do

## [STEP4] OpenAI Completion APIを用いて洞察を得る

教授と学生というロールモデルを用いて、解析情報を報告するというかたちでCompletion APIを使用しました。

In [6]:
def request_for_insight_on_analysis1(is_sending_tree:bool=True,  is_sending_list:bool=True, ):
    query="""
      I'm your student. I'm studying what mouse brain regions are projected by injections to """+selected_structure_name+""".  Do you know what kinds of mouse brain parts are projected mainly from """+selected_structure_name+"""? I am glad to tell me details as possible as you can.
    """
    if is_sending_list:
      query = """
        I'm your student. I analyzed data and realized that injections to """+selected_structure_name+""" projected mainly to the brain structures below. Would you teach me some insight into the result? I am glad to tell me details as possible as you can.
        ----
        """+selected_structure_list
    
    msg=[]
    msg.append({"role": "system", "content": "You are a professor majoring in mouse brain connectivity. And you have a student analyzing experiment data and studying the neural connections in the mouse brain. In this experiment, a viral tracer is injected into a specimen that labels the axons by expressing a fluorescent protein, and the labeled axons are visualized using serial two-photon tomography."})
    if is_sending_tree:
      msg.append({"role": "system", "content": "You know a mouse brain structures like the below described by the tree diagram. \n----\n"+root_render_tree})

    msg.append({"role": "user", "content": query})
    
    completion = openai.ChatCompletion.create(
      model="gpt-4",
      messages=msg,
      temperature=0.2
    )
    #print(msg)
    #print(completion)
    return completion.choices[0].message.content

print("Completion A. 脳構造ツリーなし, 解析リストなし")
print(request_for_insight_on_analysis1(False,False))
print("Completion B. 脳構造ツリーなし, 解析リストあり")
print(request_for_insight_on_analysis1(False,True))
print("Completion C. 脳構造ツリーあり, 解析リストあり")
print(request_for_insight_on_analysis1(True,True))

Completion A. 脳構造ツリーなし, 解析リストなし
Hello, I'm glad to help you with your research on the frontal pole (FRP) projections in the mouse brain. The frontal pole is a part of the prefrontal cortex, which is involved in various higher cognitive functions such as decision-making, working memory, and social behavior.

The FRP has been found to project to several brain regions, including:

1. Other cortical areas: The FRP has strong reciprocal connections with other regions of the prefrontal cortex, such as the medial prefrontal cortex (mPFC) and the orbitofrontal cortex (OFC). Additionally, it projects to other cortical areas like the motor cortex, somatosensory cortex, and the cingulate cortex.

2. Subcortical structures: The FRP sends projections to various subcortical structures, such as the striatum (caudate-putamen and nucleus accumbens), the thalamus (mediodorsal and ventromedial nuclei), the hypothalamus, and the amygdala (basolateral and central nuclei).

3. Brainstem regions: The FRP als