### NOTE:  It is an extremely good idea to run the entire build process using a small subset of files in raw/outdata to test everything works before doing it on the whole database

This follows the guidance Ordnance Survey provide [here](https://www.ordnancesurvey.co.uk/docs/user-guides/addressbase-products-getting-started-guide.pdf), and we use the Python scripts provided by Ordnance Survey [here](https://s3-eu-west-1.amazonaws.com/osproducts/AddressBase/AddressBase_Scripts.zip)

Download and extract the scripts from OS

In [None]:
from urllib.request import urlopen
from zipfile import ZipFile
from io import BytesIO

link = "https://s3-eu-west-1.amazonaws.com/osproducts/AddressBase/AddressBase_Scripts.zip"
response = urlopen(link, timeout = 5)
zipfile = ZipFile(BytesIO(response.read()))
with zipfile as z:
    z.extractall("AddressBase_Scripts/")
    
# Put the recordsplitter script in the same folder as the AddressBase csvs
import os 
os.rename("AddressBase_Scripts/Code/AddressBasePremium_RecordSplitter.py", "raw/AddressBasePremium_RecordSplitter.py")

The script provdided by Ordnance Survey is in Python 2, so use the 2to3 commandline tool to convert to a Python 3 script

In [None]:
%%bash 
2to3 -w raw/AddressBasePremium_RecordSplitter.py >/dev/null

In [None]:
import os 
os.chdir("raw")

Enter "outdata" (including the "") at the prompt

In [None]:
%run AddressBasePremium_RecordSplitter.py

The [guide](https://www.ordnancesurvey.co.uk/docs/user-guides/addressbase-products-getting-started-guide.pdf) says

> Check that there are no carriage returns at the end of each .csv output file as this will result in errors being
caused.

The following script just double checks this

In [None]:
files_list = [ r'ID10_Header_Records.csv',
 r'ID11_Street_Records.csv',
 r'ID15_StreetDesc_Records.csv',
 r'ID21_BLPU_Records.csv',
 r'ID23_XREF_Records.csv',
 r'ID24_LPI_Records.csv',
 r'ID28_DPA_Records.csv',
 r'ID29_Metadata_Records.csv',
 r'ID30_Successor_Records.csv',
 r'ID31_Org_Records.csv',
 r'ID32_Class_Records.csv',
 r'ID99_Trailer_Records.csv']



for fname_str in files_list:
    
    fname_path = fname_str
    with open(fname_str,"r+b") as my_file:
        print("processing {}".format(fname_path))

        #Move the pointer (similar to a cursor in a text editor) to the end of the file. 
        my_file.seek(0, os.SEEK_END) 

        #The end of the file is the position after the final char - this goes to before the final char
        pos = my_file.tell() -1
        my_file.seek(pos, os.SEEK_SET)

        last_char = my_file.read(1) 
        if last_char == "\n": 
            my_file.seek(pos, os.SEEK_SET)
            my_file.truncate()
        if last_char == "\r": 
            my_file.seek(pos, os.SEEK_SET)
            my_file.truncate()

        #Move the pointer (similar to a cursor in a text editor) to the end of the file. 
        my_file.seek(0, os.SEEK_END) 

        #The end of the file is the position after the final char - this goes to before the final char
        pos = my_file.tell() -1
        my_file.seek(pos, os.SEEK_SET)

        last_char = my_file.read(1) 
        if last_char == "\n": 
            my_file.seek(pos, os.SEEK_SET)
            my_file.truncate()
        if last_char == "\r": 
            my_file.seek(pos, os.SEEK_SET)
            my_file.truncate()



We now have a bunch of csv files which are ready to be put into the database.  Each file corresponds to one table in the AddressBase Premium database.