## *Querying, Organizing and Visualizing Materials Data*


**Why?** Access to data associated with materials in electronic form enables engineers, scientists and
students to explore this data, display it graphically, find trends and develop models.

**What?** In this tutorial, we will learn how to query, organize and plot data from the databases associated with the Python libraries [Pymatgen](http://pymatgen.org/) and [Mendeleev](https://mendeleev.readthedocs.io/en/stable/). 

**How to use this?** This tutorial uses Python, some familiarity with programming would be beneficial but is not required. Run each code cell in order by clicking "Shift + Enter". Feel free to modify the code, or change queries to familiarize yourself with the workings on the code.


Suggested modifications and exercises are included in <font color=blue> blue</font>.

**Outline:**

1. Query from Pymatgen
2. Processing and Organizing Data
3. Plotting
4. Query from Mendeleev

**Get started:** Click "Shift-Enter" on the code cells to run! 

In [1]:
# These lines import both libraries and then define an array with elements to be used below

import pymatgen as pymat
import mendeleev as mendel
import pandas as pd

elements = ['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 'Na', 'Mg',
            'Al', 'Si', 'P', 'S', 'Cl', 'Ar', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr',
            'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br',
            'Kr', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag',
            'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Xe', 'Cs', 'Ba', 'Hf', 'Ta', 'W',
            'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'La', 'Ce', 'Pr',
            'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu',
            'Ac', 'Th', 'Pa', 'U', 'Np', 'Pu']

### 2. Processing and Organizing Data

After going through the basics of a query, we will now learn how to organize data in Python lists and dictionaries.

Entries in a dictionary have a name (in our case, the element) and attributes associated with it. Dictonaries can be useful to store a collection of data values from a particular element. In this example, we will create one to store some of the properties for Iron, using queries from both of the libraries we discussed. Note that the specific heat is obtained from Mendeleev, which is another database to access properties of elements.

Another way we can organize data is in lists, which can be very helpful if we want to create plots with our data. Following the examples above, we will now query two specific properties for all elements to get a list of values which will be indexed corresponding to the positions of the elements in the "elements" list in the first cell of the tutorial.

In [2]:
sample = elements.copy()

melting_point = [] # In this list we will store the Young's Moduli
thermal_conductivity = [] # In this list we will store the Melting Temperatures

for item in sample:
    melting_point.append(pymat.Element(item).melting_point)
    thermal_conductivity.append(pymat.Element(item).thermal_conductivity)


# We will use the following arrays to group elements by their crystal structure at RT, all elements that are gases and liquids at RT have been removed

fcc_elements = ["Ag", "Al", "Au", "Cu", "Ir", "Ni", "Pb", "Pd", "Pt", "Rh", "Sr", "Th", "Yb"]
bcc_elements = ["Ba", "Cr", "Cs", "Eu", "Fe", "K", "Li", "Mn", "Mo", "Na", "Nb", "P", "Rb", "Ta", "V", "W" ]
hcp_elements = ["Be", "Ca", "Cd", "Co", "Dy", "Er", "Gd", "Hf", "Ho", "Lu", "Mg", "Os", "Re", "Ru", "Sc", "Tb", "Tc","Ti", "Tl", "Tm", "Y", "Zn", "Zr"]

# Others (Solids): "B", "Sb", "Sm", "Bi" and "As" are Rhombohedral; "C" , "Ce" and "Sn" are Allotropic; "Si" and "Ge" are Face-centered diamond-cubic; "Pu" is Monoclinic;
#                  "S", "I", "U", "Np" and "Ga" are Orthorhombic; "Se" and "Te" Hexagonal; "In" and "Pa" are Tetragonal; "la", "Pr", "Nd", "Pm" are Double hexagonal close-packed;

### 3. Plotting

Finally, we are going to plot the values for the properties in the lists we just created. For this tutorial we will make two scatter plots:

-  Young's Modulus vs Melting Temperature
-  Coefficient of Linear Thermal Expansion vs Melting Temperature

We will be using a Python library called [Plotly](https://plot.ly/python/) to create these plots. This library allows you to create plots that are really interactive and highly customizable. <br>

#### Simple Plot

In this first cell we will import the library components we will use and create a simple plot.

In [3]:
import plotly #This is the library import
import plotly.graph_objs as go # This is the graphical object (Think "plt" in Matplotlib if you have used that before)

from plotly.offline import iplot # These lines are necessary to run Plotly in Jupyter Notebooks, but not in a dedicated environment
plotly.offline.init_notebook_mode(connected=True)

# To create a plot, you need a layout and a trace

# The layout gives Plotly the instructions on the background grids, tiles in the plot, 
# axes names, axes ticks, legends, labels, colors on the figure and general formatting.

layout = go.Layout(title = "melting_point vs thermal_conductivity",xaxis= dict(title= 'melting_point (K)'), 
                   yaxis= dict(title= 'thermal_conductivity (K^-1 m^-1)'))

# The trace contains a type of plot (In this case, Scatter, but it can be "Bars, Lines, Pie Charts", etc.), 
# the data we want to visualize and the way ("Mode") we want to represent it.

trace = go.Scatter(x = melting_point, y = thermal_conductivity, mode = 'markers')

# To plot, we create a figure and implement our components in the following way:

data = [trace] # We could include more than just one trace here

fig= go.Figure(data, layout=layout)
iplot(fig)

#### CUSTOM PLOTS

Now that we know how to make a basic plot, we can start adding more details to end up with something that looks a little bit better. All modifications are explained in the comments, but you can also find that information [here](https://plot.ly/python/axes/).

Before we start our new plot, wouldn't it look better if we could visualize the points with the elements' names and color them according to their crystal structures?

In [4]:
# Here we are creating a function that takes a value X (Which will be the Symbol of the Element) 
# and returns a color depending on what its crystal structure is in our arrays from the beginning.
# That is because we want to color data according to the crystal structure; therefore, we will have to pass this info to the plot

def SetColor_CrystalStr(x):
    if x in fcc_elements:
        return "red" #This are standard CSS colors, but you can also use Hexadecimal Colors (#009900) or RGB "rgb(0, 128, 0)"
    elif x in bcc_elements:
        return "blue"
    elif x in hcp_elements:
        return "yellow"
    else:
        return "lightgray"
    
# We will then create a list that passes all element symbols through this function. For that we will use the python function "map"    
# Map takes each element on a list and evaluates it in a function.

colors = list(map(SetColor_CrystalStr, sample))

# You can see this list of generated colors looks like by uncommenting this line

#print(colors)

In [5]:
layout0= go.Layout(hovermode= 'closest', width = 600, height=600, showlegend=True,  # Hovermode establishes the way the labels that appear when you hover are arranged # Establishing a square plot width=height
    xaxis= dict(title=go.layout.xaxis.Title(text='melting_point (K)', font=dict(size=24)), zeroline= False, gridwidth= 1, tickfont=dict(size=18)), # Axis Titles. Removing the X-axis Mark. Adding a Grid
    yaxis= dict(title=go.layout.yaxis.Title(text="thermal_conductivity (K^-1 m^-1)", font=dict(size=24)), zeroline= False, gridwidth= 1, tickfont=dict(size=18)), # Axis Titles. Removing the Y-axis Mark. Adding a Grid
    legend=dict(font=dict(size=24))) # Adding a legend

# Trace

trace0 = go.Scatter(x = melting_point,y = thermal_conductivity, mode = 'markers',
    marker= dict(size= 14, line= dict(width=1), color=colors), # We add a size, a border and our custom colors to the markers
    text= sample, # This attribute (Text) labels each point to this list, which contains our elements in the same indexes as our properties
showlegend = False)


# Empty Traces for Legend
legend_plot_FCC = go.Scatter(x=[None], y=[None], mode='markers', marker=dict(size=14,  line= dict(width=1),color='red'), name = 'FCC')
legend_plot_BCC = go.Scatter(x=[None], y=[None], mode='markers', marker=dict(size=14,  line= dict(width=1),color='blue'), name = 'BCC')
legend_plot_HCP = go.Scatter(x=[None], y=[None], mode='markers', marker=dict(size=14,  line= dict(width=1),color='yellow'), name = 'HCP')


data = [trace0, legend_plot_FCC, legend_plot_BCC, legend_plot_HCP]

fig= go.Figure(data, layout=layout0)
iplot(fig)