# Import materials from files in multiple formats (CIF, POSCAR, etc.)

This notebook uses ASE python package to extract structural information from files in multiple formats (CIF, POSCAR, etc., as supported by ASE). Some formats, like `espresso-in` and `espresso-out` can be inferred from the file content.

<h2 style="color:green">Usage</h2>

1. Upload files to `uploads` folder: open (double-click) the folder in the left sidebar, then click "Upload" and select the files to upload or just drag-and-drop files onto the sidebar.
1. Click "Run" > "Run All Cells" to run all cells
1. In case of format detection error, please correct the file format extension and try again

## Methodology

The following happens in the script below:

1. Install the required packages
1. The files are extracted from `user_uploads` folder assuming their extensions represent the format - e.g. `SiO2.poscar`.
1. Structural information is read from files into ASE Atoms objects.
1. ASE Atoms objects are converted to `poscar` format 
1. `poscar` structures are converted to ESSE
1. The results are passed to the outside runtime

## 1. Set Parameters

In [None]:
# Upload files to this folder
FOLDER_PATH = "./uploads"
# Attempt to guess the format from file extension
# If set to specific format, it will only accept that format
ENFORCED_FORMAT = None
# If set to true, the file extension will be included in the resulting material name
USE_FILE_NAME_NO_EXTENSION = False
# If set to true, the supported formats will be printed below
SHOW_SUPPORTED_FORMATS = False

## 2. Install Packages

In [None]:
import sys
if sys.platform == "emscripten":
    import micropip
    await micropip.install('mat3ra-api-examples', deps=False)
    from utils.jupyterlite import install_packages
    await install_packages("import_materials_from_files.ipynb")

## 3. Data Processing

### 3.1. Read data from files

In [None]:
import os
from pathlib import Path
from ase.io import read

materials = []
unreadable_files = []
file_names = os.listdir(FOLDER_PATH)

for file_name in file_names:
    file_path = os.path.join(FOLDER_PATH, file_name)
    try:
        atoms = read(file_path, format=ENFORCED_FORMAT)

        atoms.info["file_name"] = Path(file_name).stem if USE_FILE_NAME_NO_EXTENSION else file_name
        materials.append(atoms)
        
    except Exception as e:
        print(e)
        unreadable_files.append(file_name)
        continue

### 3.2. Preview the data

In [None]:
print(f"Successfully read {len(materials)} files")
print(f"Unreadable files: {unreadable_files}. ")


### 3.3. Troubleshoot data

In [None]:
# Uncomment to see the list of supported formats and their file extensions
from ase.io.formats import ioformats
import pandas as pd

if len(unreadable_files) > 0 or SHOW_SUPPORTED_FORMATS:
    print(f"Unreadable files found: {unreadable_files}. See formats/extensions below.")
    data = [[frmt.name, frmt.extensions, frmt.description] for frmt in ioformats.values()]
    dataframe = pd.DataFrame(data, columns=["Format Name", "File Extensions", "Description"])
    print(dataframe.to_markdown())

### 3.4. Convert to ESSE format

In [None]:
import io
from ase import Atoms
from ase.io import write
from express import ExPrESS

def ase_to_poscar(atoms: Atoms):
    """
    Converts ase.Atoms object to POSCAR format

    Args:
        atoms (ase.Atoms): ase.Atoms object

    Returns:
        str: POSCAR string
    """
    output = io.StringIO()
    write(output, atoms, format="vasp")
    content = output.getvalue()
    output.close()

    return content

def convert_ase_entry_to_esse(ase_entry):
    poscar = ase_to_poscar(ase_entry)
    kwargs = {
        "structure_string": poscar,
        "structure_format": "poscar"
    }

    handler = ExPrESS("structure", **kwargs)
    esse = handler.property("material", **kwargs)
    
    esse["name"] = ase_entry.info["file_name"]
    
    return esse

esse_entries = list(map(convert_ase_entry_to_esse, materials))

### 3.5. Preview the data

In [None]:
from utils.visualize import visualize_materials
from mat3ra.made.material import Material

materials = [Material(esse_entry) for esse_entry in esse_entries]

visualize_materials(materials)

## 4. Pass data to the outside runtime

In [None]:
from utils.jupyterlite import set_materials

set_materials(materials)