### The following are Python scripts to import Excel files into a SQLite database

This is a function to call a database using its folder path on my computer. sqlalchemy must be pre-installed. Reference: https://stackoverflow.com/questions/47432988/openpyxl-read-out-excel-and-save-into-database.

In [1]:
from sqlalchemy import create_engine
import pandas as pd

def initialise_database(database_path):
    db = create_engine('sqlite:///%s' % database_path)
    return db

#Save this Jupyter notebook in the same folder as the database file, or else add the rest of the file path
db = initialise_database('PIMMS.db')               

This is a function to import data from Excel files into separate source tables in the PIMMS database. Run this after the script above. References: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html and https://docs.python.org/3/tutorial/errors.html.

In [2]:
#List of Excel files containing raw metadata from various data sources and systems
#new_source = ['file_name1', 'file_name2']
excel_source_names = ['Aleph_AAC', 'Analysis_AAC', 'Analysis_AMEMM', 'Filemaker_AMEMM', 'IAMS_AAC', 'IAMS_AMEMM','SP_Workflow', 'Twitter_Analytics']


def import_excel(sources):
    for source in sources:
        filename = '../DataSources/excel/%s.xlsx' % source
        try:
            df = pd.read_excel(filename)
            #Name the new table after the file name, prefixed by 'source_' to distinguish from target tables
            table = 'source_%s' % source
            #If this table already exists in the database, replace it with this new data
            df.to_sql(table, db, if_exists='replace')
            
            #Alternatively, if this table already exists in the database, fail and print error message below
            #df.to_sql(table, db, if_exists='fail')
        #If the source table already exists, print an error with the table name
        #except ValueError:
            #print('The table called ' + table + ' already exists')
            
        #If the filename does not exist or is not exactly written this way in folder, print an error with the filename
        except FileNotFoundError:
            print('The file called ' + filename + ' does not exist')

#Call the import_source function using the list of Excel files
#import_excel(new_source)
import_excel(excel_source_names)


This is a function to import a batch of Excel spreadsheets into the same table of the SQLite database. They do require the same number/order of columns - just define which mutual columns you want to import. This example uses ingest spreadsheets.

In [None]:
sources_to_combine = ['DAR 643 - Sherborne Missal', 'DAR 693 Ritsumeikan Japanese Maps master ingest log', 'DAR00610_Polonsky-pre1200_All', 'DAR00675_Sultan_Baybars_Quran', 'DAR00685_Western_African_Manuscripts']

def combine_excel(sources):
    for source in sources:
        filename = '../DataSources/excel/%s.xlsx' % source
        try:
            df = pd.read_excel(filename, usecols=['Shelfmark', 'Parent', 'Label', 'System number', 'Volume Enumeration', 'Project', 'DAR Number', 'Source Folder', 'Output Folder', 'Ingest Format', 'JP2 Quality'])
            #Name the new table after the file name, prefixed by 'ingest_'
            table = 'ingest_'
            #If the table already exists in the database, add this new data to the table
            #Data files need to have the same set of columns
            df.to_sql(table, db, if_exists='append')
        except FileNotFoundError:
            print('The file called ' + filename + ' does not exist')

combine_excel(sources_to_combine)