# 1-openTabulate.ipynb

1. Creates JSON source files for opentabulate according to a variable map csv.

2. After generating all the source files, the notebook runs the following commands in terminal:
    ```
    $ cd /home/jovyan/ODBiz/2-OpenTabulate/sources
    $ opentab *
    ```
    In order to run opentabulate on all the generated .json files.

3. Compresses the output files into a .zip file so that it can be easily moved into another server with more RAM

# 1. json-generator

The cell below creates JSON source files for opentabulate according to a variable map csv.
Modified for use with ODBiz data sources by Skye Chen

In [1]:
"""
1. json-generator

Creates JSON source files for opentabulate according to a variable map csv.
Modified for use with ODBiz data sources by Skye Chen
"""

import csv
import json
from tqdm import tqdm

# Define file path name
opentabDir = '/home/jovyan/ODBiz/2-OpenTabulate'

# input variable map file
input_file = csv.DictReader(open(f'{opentabDir}/variablemap.csv', encoding = 'utf-8-sig')) 

# define schema objects (Forced for all features in data source)
metadataFields = ('localfile',)

geoFields = ('longitude', 'latitude')

bizFields = ('business_name',
        'business_sector',
        'business_subsector',
        'business_description',
        'business_id_no',
        'licence_number',
        'licence_type',
        'primary_NAICS',
        'secondary_NAICS',
        'NAICS_descr',
        'alt_econ_act_code',
        'alt_econ_act_descrip',
        'business_website',
        'email',
        'telephone',
        'telephone_extension',
        'toll_free_telephone',
        'fax',
        'total_no_employees',
        'no_full_time',
        'no_part_time',
        'no_seasonal',
        'date_established',
        'indigenous',
        'status')

addressFields = ('full_address',
                'full_address_2',
                'mailing_address',
                'unit',
                'street_no',
                'street_name',
                'street_direction',
                'street_type',
                'city',
                'province',
                'postal_code',
                'country')

for row in (input_file): # Each row is a data source

    filename = f"{row['localfile']}"
    json_filename = filename.replace(".csv", ".json")
    OP = open(f"{opentabDir}/sources/{json_filename}", "w")
    
        
    # Dictionaries for schema json field
    metadata_dict = {}
    geo_dict = {}
    address_dict = {}
    biz_dict = {}

    # Only adding k,v pairs that are not blank
    for f in metadataFields:
        if row[f] != '':
            metadata_dict[f] = row[f]
    for f in geoFields:
        if row[f] != '':
            geo_dict[f] = row[f]
    for f in addressFields:
        if row[f] != '':
            address_dict[f] = row[f]
    for f in bizFields:
        if row[f] != '':
            biz_dict[f] = row[f]
            
    # function to add force to a string

    # SL define json structure and static fields
    jsondict = {
        "localfile": filename,
        "schema_groups": ["metadata", "geocoordinates", "address", "biz"],
        "source": row['source_url'],
        "licence": row['licence'],
        "provider": row['provider'],
        "format": {
            "type": "csv",
            "delimiter": ",",
            "quote": "\""
        },   
        "schema": {
            "metadata": {
                'localfile': f'force:{filename}'
            },
            "geocoordinates": geo_dict,
            "biz": biz_dict,
            "address": address_dict
        }   
    }  
    
    # Create the dictionary for the json output and write to file
    json_data = json.dumps(jsondict, indent=4)  # Formatting
    OP.write(json_data)  # Write to json
    print(f'File saved to {OP.name}')
    OP.close()



File saved to /home/jovyan/ODBiz/2-OpenTabulate/sources/AB_Banff_Business_Licences.json
File saved to /home/jovyan/ODBiz/2-OpenTabulate/sources/AB_Calgary_Business_Licences.json
File saved to /home/jovyan/ODBiz/2-OpenTabulate/sources/AB_Chestermere_Businesses.json
File saved to /home/jovyan/ODBiz/2-OpenTabulate/sources/AB_Edmonton_Business_Licences.json
File saved to /home/jovyan/ODBiz/2-OpenTabulate/sources/AB_Strathcona_Business_Directory.json
File saved to /home/jovyan/ODBiz/2-OpenTabulate/sources/BC_Burnaby_Business_Licences.json
File saved to /home/jovyan/ODBiz/2-OpenTabulate/sources/BC_Chilliwack_Business_Licences.json
File saved to /home/jovyan/ODBiz/2-OpenTabulate/sources/BC_Indigenous_Business_Listings.json
File saved to /home/jovyan/ODBiz/2-OpenTabulate/sources/BC_Kelowna_Business_Licence.json
File saved to /home/jovyan/ODBiz/2-OpenTabulate/sources/BC_Langley_Business_Licences.json
File saved to /home/jovyan/ODBiz/2-OpenTabulate/sources/BC_Liquor_licences.json
File saved to /

# 2. opentab 

The cell below runs the following commands in terminal:
```
$ cd /home/jovyan/ODBiz/2-OpenTabulate/sources
$ opentab *
```
In order to run opentabulate on all the generated .json files.


In [2]:
%%bash
cd /home/jovyan/ODBiz/2-OpenTabulate/sources
opentab *

Beginning data processing.
Completed processing in 42.9076714492403 seconds.


# 3. Move Output Files to Merging


In [None]:
# transfer files directly from OpenTab/data/output to Merging/input
src = '../2-OpenTabulate/data/output'
dst = '../3-Merging/input'

if os.path.exists(dst):
    shutil.rmtree(dst)
    shutil.copytree(src, dst)
print('Directory successfully copied!')

# DEPRECATED 3. compressOutputFiles

The cell below compresses the output files to make it easier to export to the server with more RAM.
The compressed file is named `OpenTabCompressedOutput.tar.gz`

In [3]:
# '''
DEPRECATED
# 3. compressOutputFiles

# Run this cell to compress the output files to make it easier to export to the server with more RAM.
# The compressed file is named `OpenTabCompressedOutput.tar.gz`
# '''

# import shutil

# dir_to_compress = '/home/jovyan/ODBiz/2-OpenTabulate/data/output'
# output_filename = '/home/jovyan/ODBizOpenTabCompressedOutput/OpenTabCompressedOutput'

# outName = shutil.make_archive(output_filename, 'zip', dir_to_compress)
# print(f'Output saved to {outName}')

Output saved to /home/jovyan/ODBizOpenTabCompressedOutput/OpenTabCompressedOutput.zip
