# Tutorial: Just Bonds

The Just Bonds representation differs from the Bag of Bonds representation in that bonds are considered only for atoms that are "actually" connected. (Remember the Bag of Bonds representation considers all interatomic distances). Connectivity is defined by the attachment in the input geometry files (MDL/SDF files), and there is currently no support for file formats that don't provide this information. The representation is adapted from the literature reference below:

    - DOI: 10.1021/acs.jpclett.5b00831
    
The first thing you need to do to make your bonds only representation using chemreps. You import the BagMaker from chemreps.bagger as well as the just_bonds representation from chemreps.just_bonds as seen below.

In [None]:
from chemreps.bagger import BagMaker
from chemreps.just_bonds import bonds
import glob
import pandas as pd

The first thing we need to do when using Bonds is to make the bags for our dataset. The dataset that we will be using can be found in the data directory of this repository. If you cloned this repository locally, then you should be able to set the path as '../data/sdf/'. Once we have the path to our dataset, we need to pass it to the BagMaker along with the type of representation we want. In this case we want to make Bonds representation, so we will pass BagMaker the string 'JustBonds'.

Note: For larger datasets this may take a little time to run as it needs to iterate through the entire dataset and find the proper bag sizes for the entirety of the dataset.

In [None]:
dataset = '../data/sdf/'
bags = BagMaker('JustBonds', dataset)

Now that we have made our bags and stored them in the object called bags, we can get our empty bags by calling bagger.bags as well as the size of our bags with bagger.bag_sizes.

In [None]:
bags.bags

In [None]:
bags.bag_sizes

Once we have the bags and bag sizes for the dataset, we can start making our representations. To make a Bonds representation using chemreps all we need to do is pass bonds the molecule file, the bagger.bags, and the bagger.bag_sizes. 

In [None]:
mfiles = dataset + 'butane.sdf'
print(mfiles)
rep = bonds(mfiles, bags.bags, bags.bag_sizes)
rep

### Making Representations for Multiple Molecules

Disclaimer: There may be better ways to accomplish the same objective. You are welcome to use your method as well as submit a issue/PR if you think we should use that method

To make representations for all the molecules in our directory we are going to need to use glob and pandas. To find out more about these libraries you can go to the [glob documentation](https://docs.python.org/3/library/glob.html) or [10 Minutes to pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html). We are going to first create an empty list called rep_list in which we will store information such as the filename and the representation. Next we loop over all of the files in the directory using glob to match our pattern (eg. we want all sdf files from our data/sdf/ directory). In this loop we use the same method as above in order to make our representations. We store the name of the file and the representation in a dictionary that is then appended to our rep_list. Once the loop is complete, we store the information in a pandas dataframe.

In [None]:
row_list = []
for i in sorted(glob.iglob(dataset + '/*')):
    fname = i
    print(fname)
    rep = bonds(fname, bags.bags, bags.bag_sizes)
    dict1 = {}
    dict1.update({'Name': fname})
    dict1.update({'Rep': rep})
    row_list.append(dict1)

df = pd.DataFrame(row_list, columns=['Name', 'Rep'])
df

Once our representation information is stored in the pandas dataframe, we can use numpy in order to make an array of our representations that we can finally pass to our machine learning method.

In [None]:
import numpy as np
reps = np.asarray(df['Rep'])
reps