# MatSE580 Guest Lecture 2
## Introduction

In this guest lecture we will cover:
1. [Interacting with the database we set up in Lecture 1](#verify-the-connection-to-the-database) and [visualizing the results](#plotting-with-mongodb-charts) - using [pymongo](https://github.com/mongodb/mongo-python-driver) library and [MongoDB Charts](https://www.mongodb.com/docs/charts/) service
2. [Using machine learning (ML) tools to predict stability of materials](#pysipfenn) - using [pySIPFENN](https://pysipfenn.readthedocs.io/en/stable/)
3. [Using ML featurization and dimensionality reduction to embed materials in feature space](#featurization) - using [pySIPFENN](https://pysipfenn.readthedocs.io/en/stable/) with [MongoDB Charts](https://www.mongodb.com/docs/charts/) visualization
4. [Using faturization to guide DFT and improve ML models](#transfer-learning-on-small-dft-dataset)

**This notebook assumes that you already followed the instructions in Lecture 1 and you:**
1. Have a conda environment called `580demo` (or other) with all the packages installed, including:
    - `pymatgen`
    - `pymongo`
    - `pysipfenn`

2. Have a MongoDB database called `matse580` with collection `structures` to which you have access:
    - username (e.g. `student`)
    - API key / password string (e.g. `sk39mIM2f35Iwc`)
    - whitelisted your IP address or `0.0.0.0/0` (entire internet)
    - know the connection string (URI) to the database (e.g. `mongodb+srv://student:sk39mIM2f35Iwc@cluster0.3wlhaan.mongodb.net/?retryWrites=true&w=majority`)

3. You populated the database with all Sigma phase end members (see Lecture 1 - Inserting Data)

4. After you installed `pysipfenn`, you have downloaded all the [pre-trained models](https://zenodo.org/records/7373089) by calling `downloadModels()` and it finished successfully. If not, run this one liner:

        python -c "import pysipfenn; c = pysipfenn.Calculator(); c.downloadModels(); c.loadModels();"

If all of the above are true, you are ready to go!

In [38]:
from pprint import pprint            # pretty printing
from collections import defaultdict  # convenience in the example
import os                            # file handling
from datetime import datetime        # time handling
from zoneinfo import ZoneInfo        # time handling
from pymatgen.core import Structure  # pymatgen

## Verify the connection to the database
pymongo is a Python library that allows us to interact with MongoDB databases in a very intuitive way. Let's start by importing its `MongoClient` class and creating a connection to our database:

In [39]:
from pymongo import MongoClient
uri = 'mongodb+srv://amk7137:kASMuF5au1069Go8@cluster0.3wlhaan.mongodb.net/?retryWrites=true&w=majority'
client = MongoClient(uri)

and see what databases are available:

In [40]:
client.list_database_names()

['matse580', 'admin', 'local']

Now connect to `matse580\structures` collection

In [41]:
collection = client['matse580']['structures']

and verify that the Sigma phase structures we created are there:

In [42]:
print(f'Found: {collection.count_documents({})} structures\n')
pprint(collection.find_one({}, skip=100))

Found: 243 structures

{'POSCAR': 'Cr12 Fe10 Ni8\n'
           '1.0\n'
           '   8.5470480000000002    0.0000000000000000    0.0000000000000000\n'
           '   0.0000000000000000    8.5470480000000002    0.0000000000000000\n'
           '   0.0000000000000000    0.0000000000000000    4.4777139999999997\n'
           'Cr Fe Ni Fe Cr\n'
           '8 2 8 8 4\n'
           'direct\n'
           '   0.7377020000000000    0.0637090000000000    0.0000000000000000 '
           'Cr\n'
           '   0.2622980000000000    0.9362910000000000    0.0000000000000000 '
           'Cr\n'
           '   0.4362910000000000    0.2377020000000000    0.5000000000000000 '
           'Cr\n'
           '   0.7622980000000000    0.5637090000000000    0.5000000000000000 '
           'Cr\n'
           '   0.5637090000000000    0.7622980000000000    0.5000000000000000 '
           'Cr\n'
           '   0.2377020000000000    0.4362910000000000    0.5000000000000000 '
           'Cr\n'
           '   0.0637

### Plotting with MongoDB Charts

MongoBD Charts is an associated service that allows us to quickly visualize the data in the database online and share it with others, while keeping the source data secure and private.

***Note for Online Students: At this point we will pause the Jupiter Notebook and switch to the MongoDB Atlas website to set up the database, or skip until next week depending on the available time.** The process is fairly straightforward but feel free to stop by office hours for help!*

You should end up with some neat figures like the one below 

<p align="center">
  <img src="assets/MongoDBChartExample.png" width="500"/>
</p>

If you are interested in seeing a couple more examples, you can visit the dashboard of [ULTERA Database](https://ultera.org) for high entropy alloys.

## pySIPFENN

### Getting Started

### Predicting all Endmembers

## Featurization

## Transfer Learning on small DFT dataset

## Further Resources