### How to use PubChem REST API
- https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest
- 支持的格式：XML/JSON/CSV/TXT...

### URL的基本格式：
- https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/property/MolecularFormula/JSON
- `https://pubchem.ncbi.nlm.nih.gov/rest/pug/<input specification>/<operation specification>/[<output specification>][?<operation_options>]`

#### 编码
For proper transmission of certain special characters, strings passed e.g. for SMILES input may need to be URL encoded; for example, “smiles=C1C[CH+]1” should be encoded as “smiles=C1C%5BCH%2B%5D1”.

In [32]:
import os
import requests
import pandas as pd
import numpy as np
import json

In [23]:
BASE_URL = r'https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/{q_from}/{q_value}/property'
q_from = 'cid'
q_to = ','.join(['CanonicalSMILES', 'IsomericSMILES', 'ExactMass'])
q_format = 'JSON'
q_to

'CanonicalSMILES,IsomericSMILES,ExactMass'

In [24]:
q_value = '5282283'
BASE_URL.format(q_from=q_from, q_value=q_value)

'https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/5282283/property'

In [40]:
query = os.path.join(BASE_URL.format(q_from=q_from, q_value=q_value), q_to, q_format)
query

'https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/5282283/property/CanonicalSMILES,IsomericSMILES,ExactMass/JSON'

In [27]:
r = requests.get(query)

In [28]:
r.status_code

200

In [34]:
q_result = json.loads(str(r.content, 'utf-8'))
q_result

{'PropertyTable': {'Properties': [{'CID': 5282283,
    'CanonicalSMILES': 'CCCCCCCCCCCCCCCC(=O)OCC(CO)OC(=O)CCCCCCCC=CCCCCCCCC',
    'ExactMass': 594.522,
    'IsomericSMILES': 'CCCCCCCCCCCCCCCC(=O)OC[C@H](CO)OC(=O)CCCCCCC/C=C\\CCCCCCCC'}]}}

In [39]:
q_dic = q_result['PropertyTable']['Properties'][0]
q_dic

{'CID': 5282283,
 'CanonicalSMILES': 'CCCCCCCCCCCCCCCC(=O)OCC(CO)OC(=O)CCCCCCCC=CCCCCCCCC',
 'ExactMass': 594.522,
 'IsomericSMILES': 'CCCCCCCCCCCCCCCC(=O)OC[C@H](CO)OC(=O)CCCCCCC/C=C\\CCCCCCCC'}

In [44]:
raw_data = pd.read_csv('lipid_name2pubchemid.txt', sep='\t', header=None)
raw_data

Unnamed: 0,0,1
0,DG(34:1),5282283
1,DG(34:1),5283470
2,DG(34:1),5283471
3,DG(34:1),9543686
4,DG(34:1),9543691
5,DG(34:1),9543972
6,DG(34:1),56936316
7,DG(34:1),53477956
8,DG(34:1),53477984
9,DG(34:1),56936375


In [51]:
24778900 in np.unique(raw_data[1])

True