## Disclaimer
The repository is still under developments and not ready to use. It made
public to share the progress between collaborators. 

## Currnt limitations
Currently only dictionary, biblographic data, and list of reactions are converted
and stored in the database on MongoDb culuster service. Note that the
specification of JSON schema is under consideration, so please be
informed that the data structure would be changed without any
notifications.

## Collections
The name of collections, which is like a data table in the SQL database, are as follows.

- dictiondef: dictionary index
- dictionary: dictionary data
- entry: biblographic data
- data: data body (Simple translation from EXFOR to JSON. The
  separation of dataset based on the pointer has taken into account.)
- reaction_index: all reactions

### Collection: dictionary and dictiondef
For dictionary, please refere the information
[here](https://github.com/shinokumura/exforparser/tree/main/dictionary).
In this example, the retrieval examples of biblographic data and list of
reactions will be explained. 

### Collection: entry
The bibliographic data includes following information

- title: string
- entry_number: string
- reference: array
- authors: array
- institutes: array
- facilities: array
- reactions: array
- experimental_conditions: array

, where ``experimental_conditions`` includes other meta data in the first
SUBENTRY and ``reactions`` includes all reactions in the entire ENTRY,
which are stored in the collection named ``entry``.


### Collection: data
The data includes following information. This is the direct and simple
conversion from EXFOR to JSON. Only the dataset-wise separation based on the
pointer is taken into account.

- id: string (entry number + subentry number + pointer)
- reaction: object
- measurement_condition: object
- common_data: array
- data_table: array

, where ``measurement_condition`` includes other meta data in this
particular SUBENTRY, ``common_data`` and ``data_table`` include the
datatable of COMMON and DATA blocks.


### Collection: reaction_index
The ``reaction_index`` collection composes of following columns. The column could be None
or np.Nan either if the data parse was failed or simply there is no data.
Temporary ``XX`` string is used for the pointer if no pointer is given.
Note that the pointer coded only with REACTION is taken into account,
since the pointer coded with other identifieres are meaningless in many cases.

```
        id  entry subentry pointer  year       author  min_inc_en  max_inc_en points     target     process            sf4       residual   sf5      sf6   sf7    sf8   sf9
C0290009XX  C0290      009      XX  1981    R.A.Cecil   3.370e-04   3.370e-04      1   13-AL-27  10-NE-20,X         0-NN-1         0-NN-1  None    DA/DE  None   None  None
E1773008XX  E1773      008      XX  2002     T.Wakasa   3.450e+02   3.450e+02      1   20-CA-40         P,X         0-NN-1         0-NN-1  None    DA/DE  None   None  None
411280022   41128      002       2  1993 V.A.Anufriev         NaN         NaN      0  98-CF-250       N,TOT           None           None  None      WID  None   None  None
E2617012XX  E2617      012      XX  2019     T.Murata   3.270e+01   5.040e+01     15    39-Y-89         A,X        39-Y-87        39-Y-87  None      SIG  None   None  None
G0018003XX  G0018      003      XX  2010  Md.S.Rahman   5.000e+01   7.000e+01      3    49-IN-0         G,X  49-IN-111-G/M  49-IN-111-G/M  None  SIG/RAT  None    BRA  None
21909005XX  21909      005      XX  1979   H.Yamamoto   1.450e+01   1.450e+01      1   92-U-238         N,F           MASS          A=110   SEC       FY  None   None  None
E1434007XX  E1434      007      XX  1983  M.Takahashi   5.190e+01   5.190e+01      1  82-PB-208         P,T      82-PB-206      82-PB-206   PAR       DA  None   None  None
D6158002XX  D6158      002      XX  2008   R.Tripathi   7.000e+01   1.000e+02    169    39-Y-89    9-F-19,X           ELEM              C  None       DA  None   None  None
 120970097  12097      009       7  1960   H.B.Moller         NaN         NaN      0  64-GD-155       N,TOT           None           None  None      WID  None  SQ/S0  None
O0920008XX  O0920      008      XX  2001   J.Kuhnhenn   6.660e+01   6.660e+01      1    82-PB-0         P,F    47-AG-110-M    47-AG-110-M   IND      SIG  None   None  None
D0635002XX  D0635      002      XX  2003     W.Krolas   3.500e+02   3.500e+02      1  82-PB-208  28-NI-64,X      ELEM/MASS      47-Ag-110  None      SIG  None   None  None
D0635002XX  D0635      002      XX  2003     W.Krolas   3.500e+02   3.500e+02      1  82-PB-208  28-NI-64,X      ELEM/MASS      82-Pb-199  None      SIG  None   None  None
M06350212   M0635      021       2  2003 V.V.Varlamov   1.980e+01   2.760e+01     27    23-V-51        G,2N        23-V-49        23-V-49  None      SIG  None   None  EVAL
```
## Examples
There are two ways to retrive data from MongoDb, (1) via MongoDb API and (2) via
MongoClient Python package.

### (1) via API
First example is to retrive data via API that MongoDB culuster service naturally suports.
In this way, user can retrive data either through Python code or curl
command. There are two endpoints, ``action/findOne`` and
``action/find``. Please refere the [MongoDb API
reference](https://www.mongodb.com/docs/atlas/api/data-api-resources/)
for details. First, import the libraries and url.

In [None]:
import requests
import json
import pandas as pd
import sys
sys.path.append("../")
from path import MONGOBASE_URI, API_KEY

headers = {
    'Content-Type': 'application/json',
    'Access-Control-Request-Headers': '*',
    'api-key': API_KEY
}

#### Example of the endpoint: POST /action/findOne

In [None]:
# retrieval of dictionary info
url = MONGOBASE_URI + "action/findOne"
payload = json.dumps(
    {
        "collection": "dictionary",
        "database": "exfor",
        "dataSource": "exparser",
        "filter": {"diction_num": "21"}, # METHOD
        "projection": {"_id": 0, "diction_num": 1, "diction_def": 1, "parameters": 1},
    }
)
response = requests.request("POST", url, headers=headers, data=payload)
print(json.dumps(response.json(), indent=1))

In [None]:
# retrieval of one entry's bib data
url = MONGOBASE_URI + "action/findOne"
payload = json.dumps(
    {
        "collection": "entry",
        "database": "exfor",
        "dataSource": "exparser",
        "filter": {"entry_number": "14555"}, # entry number
        "projection": {"_id": 0},
    }
)
response = requests.request("POST", url, headers=headers, data=payload)
print(json.dumps(response.json(), indent=1))

In [None]:
url = MONGOBASE_URI + "action/findOne"
payload = json.dumps(
    {
        "collection": "data",
        "database": "exfor",
        "dataSource": "exparser",
        "filter": {"id": "C1874013XX"}, # entrynumber + subentry number + pointer (XX for without pointer)
        "projection": {"_id": 0},
    }
)
response = requests.request("POST", url, headers=headers, data=payload)
print(json.dumps(response.json()["document"]["data_table"], indent=1))

In [None]:
import matplotlib.pyplot as plt

url = MONGOBASE_URI + "action/findOne"
payload = json.dumps(
    {
        "collection": "data",
        "database": "exfor",
        "dataSource": "exparser",
        "filter": {"id": "23151007XX"}, # entrynumber + subentry number + pointer (XX for without pointer)
        "projection": {
        "_id":0, 
         "data_table.heads":1, 
         "data_table.units":1, 
         "data_table.data":1
        },
    }
)
response = requests.request("POST", url, headers=headers, data=payload)

d = response.json()["document"]["data_table"]
print(d["units"])
dict = {d["heads"][i]: d["data"][str(i)]  for i in range(len(d["heads"]))}
df = pd.DataFrame(dict,  index=None)
print(df)

df.plot(x ="EN", y="DATA", kind="scatter")
plt.show()


#### Example of the endpoint: POST /action/find

In [None]:
url = MONGOBASE_URI + "action/find"
payload = json.dumps(
    {
        "collection": "entry",
        "database": "exfor",
        "dataSource": "exparser",
        "filter": { "entry_number": "E2542"}
    }
)
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)

## store into dataframe
df = pd.DataFrame(list(response.json()["documents"]))
print(df)

In [None]:
url = MONGOBASE_URI + "action/find"
## if you put the fileter that returns very large data, you need to use pymongo.
## e.g. "filter": { "target": {"$eq": "92-U-235"}, "process": {"$eq": "N,F"}}, you will get empty record and you need to narrow the search.
payload = json.dumps(
    {
        "collection": "reaction_index",
        "database": "exfor",
        "dataSource": "exparser",

        "filter": { "target": {"$eq": "92-U-235"}, "process": {"$eq": "N,F"} , "year": {"$eq": "1988"}},
        "projection": {"_id":0, "id":1, "author":1, "year":1, "target":1, "process":1,  "max_inc_en":1, "points":1}
    }
)
response = requests.request("POST", url, headers=headers, data=payload)
a = response.json()
df = pd.DataFrame(a["documents"])
print(df)

### (2) via MongoClient

First, import the packages and establish the connection to MongoDb cloud service

In [None]:
import pandas as pd
import plotly.express as px
from pymongo import MongoClient
import json
import sys
sys.path.append("../")
from path import DB_KEY

client = MongoClient("mongodb+srv://" + DB_KEY + "@exparser.b4gi6.mongodb.net/exfor?retryWrites=true&w=majority")
db = client.exfor

#### Example of db.collection.findOne()

In [None]:
## https://www.mongodb.com/docs/manual/reference/method/db.collection.find/
## https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html?highlight=find()#pymongo.collection.Collection.find
collection = db.reaction_index
for x in collection.find(
    {"target": { "$eq" : "92-U-235"},
     "process": { "$eq" : "N,F"}},
    {"_id":0, "id":1, "author":1, "year":1 }
    # {}
    ):
    print(x)

In [None]:
collection = db.reaction_index
df = pd.DataFrame(
        list(collection.find(
        {"target": { "$eq" : "92-U-238"},
        "process": { "$eq" : "N,F"},
        "sf6": { "$eq" : "FY"}},
        {"_id":0, 
         "entry":1,
         "subentry":1,
         "pointer":1,
         "author":1, 
         "year":1, 
         "target":1, 
         "process":1, 
         "product":1}
         ))
    )
print(df)