**ColabFit Assignment 2a**

Please complete this assignment by running the provided commands and returning this notebook with all cell outputs visible.

Install colabfit-tools

In [None]:
!pip install git+https://github.com/colabfit/colabfit-tools.git@master

Install Mongo

In [None]:
!conda install -c conda-forge mongodb

Start Mongo server by running the below command within a secondary terminal:


```
mongod --dbpath <path_to_folder_for_storing_mongo_data>
```



Load all necessary packages/functions

In [None]:
import numpy as np
from colabfit.tools.database import MongoDatabase
from colabfit.tools.configuration import AtomicConfiguration 

Open a connection to the Mongo server

In [None]:
client = MongoDatabase('assignment2_database')

Generate fake data

In [None]:
configs = []
for i in range (100):
  ac = AtomicConfiguration(positions=np.random.random((i+1,3)),
                           names=['CO_%s' %i],labels=['lt20' if i<19 else 'gte20'])
  ac.info['potential-energy'] = -np.random.random()
  ac.info['other-property'] = np.random.random()
  configs.append(ac)

Setup Property definitions, settings, and maps

"potential_energy" is already defined within ColabFit so we will load its definition.

"other_property" is a new property and thus we will need to write a custom definition for it.

In [None]:
#Load existing definition
#Currently writing new implementation of this



#Create new definition
base_definition = {
    'property-id': 'other',
    'property-title': 'Random property',
    'property-description':
        'Random property to be used as an example.'\
        'This property is not yet defined within Colabfit.'\
        'Therefore, its Property Definition must be created.',

    'other': {
        'type': 'float',
        'has-unit': True,
        'extent': [],
        'required': True,
        'description':
            'Other random property of the system.'
    },

    'per-atom': {
        'type': 'bool',
        'has-unit': False,
        'extent': [],
        'required': True,
        'description':
            'If False, "other" has NOT been divided '\
            'by the number of atoms in the configuration.'
    },
}

In [None]:
#Settings
potential_settings = dict()
potential_settings['_method'] = 'Random' #may describe software, instrument, etc. used to find property value
potential_settings['_description'] = 'Generated using np.random.random()'
potential_settings['_files'] = None #scripts or other files to help reproducibility
potential_settings['_labels'] = ['random', 'assignment1'] #labels to aid in querying

other_settings = dict(potential_settings) #same settings here-may not always be the case

In [None]:
#Map
property_map = {
        #Property Name
        'potential-energy':[
                # Property Definition field: {'field': .info keyword, 'units': ASE-readable units}
                'potential-energy': {'field': 'potential-energy', 'units': 'eV'},
                '_settings': potential_settings
        ],
        'other':[
                'other': {'field': 'other-property', 'units': 's'},
                '_settings': other_settings
        ]
}




Insert Configurations/Property Instances into Database

In [None]:
ids = client.insert_data(
    configs,
    property_map=property_map,
    verbose=True
)
all_co_ids, all_pi_ids = list(zip(*ids))  #IDs of all Configurations and Property Instances

Construct ConfigurationSets

In [None]:
# As an example we will separate configurations into sets  
# based upon whether or not they contain 20 or more atoms
lt20_co_ids = client.get_data('configurations',fields= '_id',
                              query={'_id':{'$in': all_co_ids}, '_labels': {'$in':'lt20'}},
                              ravel=true).tolist()
gte20_co_ids = all_co_ids.remove(lt20_co_ids)

lt20_cs_id = client.insert_configuration_set(
    lt20_co_ids,
    description='Configurations with fewer than 20 atoms'
)
gt220_cs_id = client.insert_configuration_set(
    gte20_co_ids,
    description='Configurations with 20 atoms or more'
)



Construct Dataset

In [None]:
ds_id = client.insert_dataset(
    cs_ids=[lt20_cs_id, gte20_cs_id],
    pr_ids=all_pi_ids,
    name='Assignment2',
    authors=[
        'Name1', 'Name2'
    ],
    links=[
        'https://colabfit.org',
        #Can include links to manuscripts, where data is stored, etc.
    ],
    description=\
        'Dataset of the Colabfit assignment. '\
        'Data was randomly generated.',
    verbose=True,
)

Confirm Dataset was added to the Database

In [None]:
client.dataset.find_one()

**ColabFit Assignment 2b**

Please complete this assignment by inserting appropriate code in cells below each instruction, running those cells, and returning this notebook with all cell outputs visible.

The file "---" contains a subset of the data from the Zeo-1 dataset (see link for more information)

1) Open a connection to the Mongo server

2) Load data present within the file "---" using below template:


```
images = load_data(
    file_path='path-to-file',
    file_format='xyz',
    reader=lambda x : read(x,index=':'),
    name_field='name',
    elements=['Si', 'O', 'H', 'Al', 'N', 'Ca', 'Ge', 'Li', 'Na', 'K', 'C', 'F', 'Be', 'Cs', 'Ba'],
    verbose=True
)

```



3) Setup Property maps for all properties present in the data

4) Insert Configurations/Property Instances into Database

5) Construct ConfigurationSets in any way you see fit

6) Construct the Dataset

7) Query the Database to count the number of items named "---"