# Using doper

The Doper module includes the `get_dopants` function. This function requires an input, which is a tuple of strings representing the ionic species of the material.

Each output key contains a list of possible dopants, ordered by probability (Highest → Lowest). Each possible dopant is represented as a tuple: `('substituted dopant', 'original species', 'probability')`.

In [7]:
from smact.dopant_prediction.doper import Doper

# Define the material using its ionic species.
# Here, we are creating a material object with titanium in the 4+ oxidation state (Ti4+) and oxygen in the 2- oxidation state (O2-).
material = Doper(("Ti4+", "O2-"))

# Use the `get_dopants` function to predict potential dopants.
# By default, it returns the top 5 p-type and n-type dopants, but this can be adjusted with the `num_dopants` parameter.
# The output will be a dictionary with possible n-type and p-type cation and anion dopants.
dopants = material.get_dopants(num_dopants=5)

# Print the dopant predictions.
# Each entry in these lists is a tuple: ('substituted dopant', 'original species', 'probability').
print(dopants)

{'n-type cation substitutions': {'sorted': [['Ta5+', 'Ti4+', 8.790371775858281e-05, 1.0], ['Nb5+', 'Ti4+', 7.830035204694342e-05, 1.0], ['Sb5+', 'Ti4+', 6.259166355036722e-05, 1.0], ['Ru5+', 'Ti4+', 4.904126561555437e-05, 1.0], ['Re5+', 'Ti4+', 4.546178573532138e-05, 1.0]], '5': [['Ta5+', 'Ti4+', 8.790371775858281e-05, 1.0], ['Nb5+', 'Ti4+', 7.830035204694342e-05, 1.0], ['Sb5+', 'Ti4+', 6.259166355036722e-05, 1.0], ['Ru5+', 'Ti4+', 4.904126561555437e-05, 1.0], ['Re5+', 'Ti4+', 4.546178573532138e-05, 1.0]], '6': [['W6+', 'Ti4+', 3.4638026110457894e-05, 1.0], ['Mo6+', 'Ti4+', 1.6924395455176864e-05, 1.0], ['U6+', 'Ti4+', 1.4299724897106019e-05, 1.0], ['Te6+', 'Ti4+', 1.4299724897106019e-05, 1.0], ['Ir6+', 'Ti4+', 4.061295856299769e-06, 1.0]]}, 'p-type cation substitutions': {'sorted': [['Na1+', 'Ti4+', 0.00010060400812977031, 1.0], ['Zn2+', 'Ti4+', 8.56373996146833e-05, 1.0], ['Mn2+', 'Ti4+', 8.563568688381837e-05, 1.0], ['Mg2+', 'Ti4+', 6.777016806765154e-05, 1.0], ['Fe3+', 'Ti4+', 6.25

The results can be presented in a table format using the `to_table` attribute.

In [8]:
material.to_table

[91mn-type cation substitutions[0m
[96msorted[0m
+--------+----------+--------+---------------+---------------+
|   Rank | Dopant   | Host   |   Probability |   Selectivity |
|      1 | Ta5+     | Ti4+   |   8.79037e-05 |             1 |
+--------+----------+--------+---------------+---------------+
|      2 | Nb5+     | Ti4+   |   7.83004e-05 |             1 |
+--------+----------+--------+---------------+---------------+
|      3 | Sb5+     | Ti4+   |   6.25917e-05 |             1 |
+--------+----------+--------+---------------+---------------+
|      4 | Ru5+     | Ti4+   |   4.90413e-05 |             1 |
+--------+----------+--------+---------------+---------------+
|      5 | Re5+     | Ti4+   |   4.54618e-05 |             1 |
+--------+----------+--------+---------------+---------------+

[96m5+[0m
+--------+----------+--------+---------------+---------------+
|   Rank | Dopant   | Host   |   Probability |   Selectivity |
|      1 | Ta5+     | Ti4+   |   8.79037e-05 |      

Ternary and multicomponent systems can also be dealt with.

In [9]:
quaternary = Doper(("Cu1+", "Zn2+", "Ge4+", "S2-"))
quaternary.get_dopants()

{'n-type cation substitutions': {'sorted': [['Si4+',
    'Zn2+',
    0.0002388228584563032,
    0.21],
   ['Zn2+', 'Cu1+', 0.00021454984717151132, 0.14],
   ['Hg2+', 'Cu1+', 0.00020520758963053147, 0.55],
   ['Ge4+', 'Zn2+', 0.0001751936790055855, 0.1],
   ['P5+', 'Ge4+', 0.0001357428244856664, 0.69]],
  '2': [['Zn2+', 'Cu1+', 0.00021454984717151132, 0.14],
   ['Hg2+', 'Cu1+', 0.00020520758963053147, 0.55],
   ['Cu2+', 'Cu1+', 9.006774407383325e-05, 0.2],
   ['Mn2+', 'Cu1+', 3.960170207846625e-05, 0.09],
   ['Fe2+', 'Cu1+', 3.333914056567426e-05, 0.1]],
  '3': [['Y3+', 'Zn2+', 0.00010200196717382551, 0.95],
   ['Fe3+', 'Zn2+', 7.56685107087228e-05, 0.66],
   ['In3+', 'Cu1+', 5.763684879119039e-05, 0.51],
   ['Cr3+', 'Zn2+', 5.312098771970144e-05, 0.76],
   ['Al3+', 'Zn2+', 5.105799273313971e-05, 0.37]],
  '4': [['Si4+', 'Zn2+', 0.0002388228584563032, 0.21],
   ['Ge4+', 'Zn2+', 0.0001751936790055855, 0.1],
   ['Ti4+', 'Zn2+', 8.56373996146833e-05, 0.54],
   ['Sn4+', 'Zn2+', 5.1051866141

If you want to plot the results in the form of heatmap, use `plot_dopants` method.

The default heatmap is 'YlOrRd'. Refer to the matplotlib documentation for other options using "cmap" parameter.

In [None]:
quaternary.plot_dopants(cmap="Reds")

## Alternative metrics

The probability values for ion substitution can be calculated using an adaptation of the algorithm presented in:
        [Hautier, G., Fischer, C., Ehrlacher, V., Jain, A., and Ceder, G. (2011)
        Data Mined Ionic Substitutions for the Discovery of New Compounds.
        Inorganic Chemistry, 50(2), 656-663](https://pubs.acs.org/doi/10.1021/ic102031h)
        
We also provide alternative ways for determing the dopant ranking based on probability or similarity metrics. This includes a similarity metric based on distributed representations of the ions, `skipspecies`. This metric is based on the idea that similar ions should have similar embedding vectors, which is quantified using their cosine similarity.

In [11]:
doper_skipspecies = Doper(
    ("Ti4+", "O2-"), embedding="skipspecies", use_probability=False
)
doper_skipspecies.get_dopants(5)
# Present results in a table
doper_skipspecies.to_table

[91mn-type cation substitutions[0m
[96msorted[0m
+--------+----------+--------+--------------+---------------+
|   Rank | Dopant   | Host   |   Similarity |   Selectivity |
|      1 | Nb5+     | Ti4+   |     0.82676  |             1 |
+--------+----------+--------+--------------+---------------+
|      2 | Ta5+     | Ti4+   |     0.724365 |             1 |
+--------+----------+--------+--------------+---------------+
|      3 | P5+      | Ti4+   |     0.598709 |             1 |
+--------+----------+--------+--------------+---------------+
|      4 | V5+      | Ti4+   |     0.584022 |             1 |
+--------+----------+--------+--------------+---------------+
|      5 | Pu7+     | Ti4+   |     0.544501 |             1 |
+--------+----------+--------+--------------+---------------+

[96m5+[0m
+--------+----------+--------+--------------+---------------+
|   Rank | Dopant   | Host   |   Similarity |   Selectivity |
|      1 | Nb5+     | Ti4+   |     0.82676  |             1 |
+---