# Azure Search Schema Generator

# Generating your index schema file 

A schema is necessary for creating an Azure Search Index.

You can create your schema manually (for instance, using *template_index_schema.json*).

This notebook provides you with a simple tool to generate it automatically.

Important: The index schema generator requires the JSONs to have: 
- no nested fields
- fields can be simple or lists; 
- by default, all allowed values are parsed as strings;
- you need to pick one of the existing fields as a key field;
- the parser assumes that all JSONs have the same structure. 

### Reading fields from JSON

The sample JSONs are in the folder *sample_jsons*. (You are welcome to replace them with your documents.)

Remark: The provided sample JSONS are based on https://docs.microsoft.com/en-us/azure/search/search-get-started-python.

The notebook randomly takes one of the JSON to obtain the fields. 

In [1]:
import os
import json
from random import randint

#make sure that the documents are jsons 
arr = [i for i in os.listdir('./sample_jsons') if i.endswith('.json')]

#pick one document randomly
picked_file = arr[randint(0,len(arr)-1)]

with open('./sample_jsons/' + picked_file) as f:
    source_json = json.loads(f.read())   
    
print(picked_file)

file2.json


Now let's determine the types of the imported fields that are going to be in the index. 

The following parser allows importing the following JSON types:
- bool
- integral numbers
- floating-point numbers
- strings
- strings that look like date
- arrays/lists of primitive types

All the above-listed (except arrays) can be listed as *Edm.String* in index target types. Lists can be listed as *Collection(Edm.String)*. This parser allows importing only fields that can assigned those types. Other fields must be manually omitted. 

For more details about the AS index types: https://docs.microsoft.com/en-us/rest/api/searchservice/data-type-map-for-indexers-in-azure-search#bkmk_json_search

The fields to import and their AS types: 

In [2]:
def get_AS_type(field):
    str_types = [bool, int, float, str]
    if isinstance(field, list):
        return "Collection(Edm.String)"
    if any([isinstance(field, i) for i in str_types]):
        return "Edm.String"
    return False

fields_types = {key: get_AS_type(source_json[key]) for key in source_json.keys()}

fields_types

{'HotelId': 'Edm.String',
 'HotelName': 'Edm.String',
 'Description': 'Edm.String',
 'Description_fr': 'Edm.String',
 'Category': 'Edm.String',
 'Tags': 'Collection(Edm.String)',
 'ParkingIncluded': 'Edm.String',
 'LastRenovationDate': 'Edm.String',
 'Rating': 'Edm.String',
 'StreetAddress': 'Edm.String',
 'City': 'Edm.String',
 'StateProvince': 'Edm.String',
 'PostalCode': 'Edm.String',
 'Country': 'Edm.String'}

### Pick your key field:

In [3]:
key_field = 'HotelId'

## Generate your index schema

The index is generated based on '*.assets/template_index_schema.json*.' 
Essentially the following code adds the data fields to this file. 
(You already can find the final version the following code generates for comparison: '*.assets/sample_index_schema.json*.') 

Once the index schema is created, you can modify its other parameters directly in the JSON file. (However, it is essential to consult reference materials before making any changes since not all combinations are allowed.)

Name your index and the JSON file with it: 

In [4]:
my_index_name = 'my-first-index'
my_index_schema_file = 'my_index_schema.json'

Now let's create the schema: 

In [5]:
def create_field(field_sample, field_name, field_type, key):
    """
        This function creates a field description field to be added to the index schema 
        
        Args:
         field_sample (dict): a field description template  
         field_name (str): a name of the added field
         field_type (str): the type of the added field
         key (bool): whether the field is the key field  
    """
    sample = dict(field_sample)
    sample['name'] = field_name
    sample['type'] = field_type
    sample['key'] = key
    if (not key):
        sample['searchable'] = True
    if 'Collection' in field_type:
        sample['sortable'] = False
    return sample 

#read the index schema template
with open('./assets/template_index_schema.json') as f:
    index_schema = json.loads(f.read()) 

#assign the new name
index_schema['name'] = my_index_name

#get the template for the field description
sample_field = index_schema['fields'][0]
index_schema['fields'] = []

#add the decriptions of the fields to the template and save the modified schema template 
try:
    index_schema['fields'] = [create_field(sample_field, key, fields_types[key], key == key_field) for key in fields_types.keys()]
    
    with open('./assets/' + my_index_schema_file, 'w') as f:
        json.dump(index_schema, f)
    
    print('Your index schema is successfully created!')
except Exception as ex:
    print(ex)

Your index schema is successfully created!


Now you should have your new index schema saved in *assets* folder. 