Skip to content

csutton7/nomad_2018_kaggle_dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


The dataset was used for the Nomad 2018 Kaggle competition (https://www.kaggle.com/c/nomad2018-predict-transparent-conductors) is provided here. The training set (train.csv) and test set as uploaded to the kaggle compeition (test.csv) and with labels (labeled_test.csv). 

The training set is 2400 compounds, while the test set is 600 compounds. For each line of the CSV file, the corresponding spatial positions of all of the atoms in the unit cell expressed in Cartesian coordinates are provided as a separate files in the subfolders train/ and test/.

files with spatial information about the material are provided according to the id in the respective csv files as follows: {train/test}/{id}/geometry.xyz. 

The python script "read_xyz.py" uses the id entry in the csv file to print the structure object from  ASE (https://wiki.fysik.dtu.dk/ase/about.html).


The following information has been included for each compound in the training and test sets:

Spacegroup label identifying the symmetry of the material
Total number of Al, Ga, In and O atoms in the unit cell (Ntotal)
Relative compositions of Al, Ga, and In (x, y, z)
Lattice vectors and angles: lv1, lv2, lv3 which are lengths given in units of angstroms (i.e., 10^−10 meters) and α, β, γ, which are angles in degrees between 0° and 360°.

A domain expert will understand the physical meaning of the above information but those with a data mining background may simply use the data as input for their models.


The task for this competition is to predict two target properties:
Formation energy (an important indicator of the stability of a material)
Bandgap energy (an important property for optoelectronic applications)

Releases

No releases published

Packages

No packages published

Languages