# Mechine Learning in Microbiome
## 1. Pre-software
+ Python
+ Pytorch
+ SciPy
+ Scikit-Learn
## 2. Basic structure of neural networks
### Introduction and Layers of neural networks

Artificial Neural Networks (ANNs) 是互相连接节点组成的计算模型。 

基于以下的假设：
1. 信息在许多神经元中简单处理，但是保存在整个网络中；
2. 信号可以在不同层的神经元中通过有关联权重的链接传递；
3. 加权输入信号：每个神经元的输入通过乘以信号中的相关权重计算加权输入信号；
4. 每个神经元对它们加权信号的总和应用一个激活函数而确定输出信号。

神经网络可以识别隐藏的模式，建立变量之间的相关性，对原始数据进行聚类和分析。神经网络可以从数据中学习，并提供输入和输出之间的关系，概括并从训练数据中进行推断，揭示出隐藏关系、模式，并进行预测。

神经网络由大量的称为神经元的节点简单处理元素并且和每个节点独立但相互连接的层组成。神经网络只是一些相互连接的节点。

神经网络内的一个内部状态称为激活或活动水平，这个活动水平由激活函数（activation function）决定。一个input数据在神经元内通过activation function产生一个输出，然后产生的输出被转移到后续层的其他几个神经元的输入。神经元的组成包括输入数据，weights， biases，和一个activation function。每个input与一个weight相关联。

### Train a Neutal Network and Delta Learning Rule

In the training set, we have known the correct output. So, in the training phase, the network calculates the error for the output and these errors are used to update the weights of the network in order to update the network. 

Delta Learning Rule：

The delta rule works by adjusting the weights of the network so that the difference between the network's output and the desired output is minimized. This is done by computing the error between the network's output and the desired output, and then using that error to adjust the weights. The adjustment to the weights is proportional to the error and the input, and is given by the following formula:

Δw = α(d - y)x

where:

+ Δw is the change in the weight
+ α is the learning rate, which controls the step size of the weight update
+ d is the desired output
+ y is the network's output
+ x is the input

Delta Rule is used to **train single layer**. 

### Generalized delta rule

The Generalized delta rule can be used to train multi-layer networks, including feedforward networks with hidden layers. The formula is same to the Delta Learning Rule.

### Gradient descent

用来更新权重的值($w_{ij}$)，进行优化模型。

#### Stochastic gradient descent

1. 从训练数据集获得训练数据；
2. 将数据反馈到神经网络；
3. 计算输出的$w_{ij}$(即梯度);
4. 使用在步骤3中更新的 $w_{ij}$;
5. 对训练集重复步骤1-4.

#### Batch gradient descent



In [29]:
import pandas as pd
from bioservices import KEGG

In [30]:
# Step 1: Import the required libraries
import pandas as pd
from bioservices import KEGG

# Step 2: Initialize the KEGG API
kegg = KEGG()
entry_id = "hsa:10050"
entry_data = kegg_obj.get_entry(entry_id)

# 打印条目数据
print(entry_data)

TypeError: 'NoneType' object is not callable

In [17]:
# Step 3: Load the bacteria names from a file
bacteria_names = pd.read_csv("bacteria_names.csv")

# Step 4: Map the bacteria names to KEGG IDs
# kegg_obj = kegg
# bacteria_ids = [kegg_obj.get_entry(bac, "name") for bac in bacteria_names['bacteria']]
# bacteria_ids = [bid.split(":")[1] for bid in bacteria_ids if bid != '']

kegg_obj = kegg


In [18]:
bacteria_names

Unnamed: 0,bacteria
0,Rhodococcus sp. C-2
1,Rhodococcus sp. IEGM 1401
2,Rhodococcus sp. IEGM 1276
3,Rhodococcus sp. JS3074
4,Rhodococcus sp. H36-A4
5,Rhodococcus sp. A5(2022)
6,Rhodococcus sp. JS3073
7,Rhodococcus sp. VUW_JGO2c-H1
8,Rhodococcus sp. VUW_Li1c-G12
9,Rhodococcus sp. 75


TypeError: 'NoneType' object is not callable

In [23]:
kegg_obj.get_entry("bacteria", "Rhodococcus")

TypeError: 'NoneType' object is not callable

In [19]:

bacteria_ids = [kegg_obj.get_entry(bac, "name") for bac in bacteria_names['bacteria']]
bacteria_ids = [bid.split(":")[1] for bid in bacteria_ids if bid != '']



TypeError: 'NoneType' object is not callable

In [None]:

# Step 5: Get the KEGG metabolic pathways for each bacteria
bacteria_pathways = {}
for bac_id in bacteria_ids:
    pathways = kegg_obj.parse(kegg_obj.get(bac_id, "map"))
    bacteria_pathways[bac_id] = pathways.keys()

# Step 6: Create a matrix with bacteria names as rows and pathways as columns
bacteria_pathways_matrix = pd.DataFrame(bacteria_pathways).T
bacteria_pathways_matrix.index = bacteria_ids
bacteria_pathways_matrix = bacteria_pathways_matrix.fillna(0)
bacteria_pathways_matrix[bacteria_pathways_matrix != 0] = 1

# Step 7: Plot a heatmap of the matrix
plt.figure(figsize=(20,10))
sns.heatmap(bacteria_pathways_matrix, cmap='viridis')
plt.show()


