# Functional Specification

## 1. Introduction
In the chemistry community, density function theory (DFT) is the major tool set for chemical property prediction. However, it requires users of quantum mechanics specialty and demands a substantial computational hardware facility to achieve an approximal predition. These two prerequisites are not accessible for most of the chemists and materials scientists who are spearheading in chemical and material discoveries. This gap has significantly slowed down the pace of technology innovation.

In the light of the fast advancement of machine learning (ML), it can now make decisions or predictions without explicit instructions, but based on sample data. In the project we are experiementing to utilize ML to predict molecular conformation of organic chemicals based on the crystallographic information data, which has been accomulated by the chemistry community over the decades.

With that in mind, this package's purpose is provide tools to predict 3D molecular conformation from 2D drawings, for chemists and material scientists who don't possess a sufficient quantum mechanics backgrounds.

## 2. Use cases

The demography of users of this package would be mainly chemists, researchers and scientists working in the field of organic crystals and molecular structures. The users are expected to have advance knowledge of chemistry, but programming is not required. Machine learning would help in fine tuning the model to specific need, but it is not neccessary for general usage of this package.

The user can provide a 2D design of a molecule structure that contains the information of X-Y coordinates and atom connections. The 2D input can easily be made using chemtistry softwares and [online tool](http://www.cheminfo.org/Chemistry/Generate_molfiles/index.html).
The model would fit the input into the ML model and output a 3D design of the molecule through the following steps:

Step 1: Load the package and model
    
    import pandas as pd
    from optimol import data_compile
    from optimol import model
    
    # create model
    model.buil_model()

Step 2: Read the user input
    
    user_input = data_compile.get_df_user('./user.txt')

Step 3: Train the model using the default data set

    data = model.get_csv()
    estimator = model.get_model(data)
    
Step 4: Input user data to model and get the result 

    result = model.predict_3d(user_input,estimator)
 

### 2.1 Product Functions
As mentioned in the user case section, there are 4 components in the package, and below are their function specification.

#### In step 1
There are two modules in this package -- `data_compile.py` and `model.py` -- and they are under the folder of optimal. `data_compile.py` includes functions to extract information for downloaded structure data; `model.py` includes functions to train the 3D molecule generater.



#### In step 2 
In the function `data_compile.get_df_user('./id.csv')`, user can feed a list of molecular information into the module. Here, the `id.csv` is the list of molecular information files that you choose to feed into the ML module. We recommanded these molecules are of similar features to the molecule you are trying to predict, and download both of the 2d and 3d structure files from chemspider and save them to the folder `./optimol/chemspider_database`.



#### In step 3
`data = model.get_csv()` is a function to load the compiled data gathered in step 2, which is stored in the file `./model.csv`. Please make sure you update the the compiled data when you switch to a new structure, as mentioned in step 2.
Below are the meaning of each column in `model.csv`:

    1) The columns `3d_x`, `3d_y` and `3d_z` are the x, y ,z coordinates of atoms

    2) `atom` stands for the atom type, while `periodic_#` stands for their periodic numbers

    3) `connect_to` stands for what other atoms -- labelled in row index-- the current atom is connected to, and -1 stands for any connection to hydrogen atom

    4) `bond_1`, `bond_2` and `bond_3` stands for the numbers of single, double and triple bonds that the atom has.

Atoms information of 2D molecules can be interpreted similarly as above.

In `estimator = model.get_model(data)`, it return a multioutputregressor based on the `./model/csv` fed into.



#### In step 4 
`model.predict_3d(user_input,estimator)` takes in the user input 2d information and predictive model built from database to make the prediction of 3d strucutre ::param user_input: user input of the 2d information about molecule model: predictive model built from database ::return user_output: 3d information about the molecule

### 2.2  User Characteristics
1) Chemical engineers and material scientists who want to obtain the approximate 3D structure of molecule before making a lot of effort synthesizing them. 

2) Chemical engineers and material scientists who don’t possess a profound quantum mechanics background to use density function theory (DFT).

3) The user should be able to do the following functions:

* obtain the molecular information from database 
* build an accurate and efficient fundamental design for the experiments from statistics result





### 2.3 User Objectives 
OptiMol is a package for predicting molecular conformations of organic compounds, currently limited to 4 most common elements, C, H, N, O. For our future plan, we would like to generate any molecular conformations of organic compounds with gorgeous visual graph, so that our customers could investigate scientific research deeper with this tool.

##  3. Updated Schedule
As mentioned above, our package currently only predicts 4 elements. (C,H,N,O) We are working hard to predict more elements, so that we can meet our customer's expectation. How to produce visual graph using 3D coordinate system is definitely one of our big goal to achieve.