# Copying tables from F or G drive to the esa-ees folder to put the files in production

This notebook contains the necessary code to copy the ESA CSV files as well as the index files into the esa-ees folder. Once the files are in esa-ees, IT will take the files and copy them into the production / test folders.

Part of the code organizes all the CSVs into their respective project folder. The files are then zipped into a zip file to make it easier for users to download the files.

**What's missing?** There needs to be code added to bundle the files together. Personally, I would use a new duplicate folder of the csv files and test how you can bundle the files. Once you've bundled the files correctly, rearrange the code in this notebook as necessary.

In [1]:
#!pip install tqdm

In [2]:
import pandas as pd
import shutil # for copying files
from tqdm import tqdm # progress bar for for-loops
from zipfile import ZipFile # for creating zip files
import os
import glob

In [3]:
# filepath to the English and French index files
ENG_index_filepath = 'F:/Environmental Baseline Data/Version 4 - Final/Indices/ESA_website_ENG.csv'
FRA_index_filepath = 'F:/Environmental Baseline Data/Version 4 - Final/Indices/ESA_website_FRA.csv'

ENG_destination_path = "G:\\ESA_downloads\\esa_files_eng\\"

# Caution: Following code is to the delete files in esa-ees.

Use this code when your are copying new files to esa-ees and you want to avoid creating duplicates with different filenames (or folder names).

**Skip if you don't want to delete files in esa-ees.**

In [4]:
%%time

# df = pd.read_csv(ENG_index_filepath)
# df['Download folder name'] = r'//dweb5/esa-ees/' + df['Download folder name'] + '/*'
# unique_folders = df['Download folder name'].unique()

# for x in tqdm(range(0, len(unique_folders))):
#     folder_name = unique_folders[x]
#     files = glob.glob(folder_name)
#     for f in files:
#         os.remove(f)

Wall time: 0 ns


## Modifying index files with pandas to be able to copy files to esa-ees folder (the destination folder)

In [5]:
%%time

# Loading index file of all tables
df = pd.read_csv(ENG_index_filepath)

# Grabbing project folder names and replacing with folder paths in esa-ees
df['Download Folder Name'] = r'//dweb5/esa-ees/' + df['Download folder name'] + '/'

# Creating new 
df['Csv Url'] = df["CSV Download URL"].str.split('/|_')

# Remove all rows for figures so that we are only moving tables
df = df[df['Content Type'] == 'Table']
df = df.reset_index(drop=True)



Wall time: 3min 10s


### Checking a cell to see what the paths look like:

In [6]:
# df['Csv Path'][0]

In [7]:
# df['Download Folder Name'][0]

In [8]:
# df.head()

# 1. Table Level Downloads

# # Function to create about_me.txt

In [12]:
def create_table_about_me(path, about_me_data):
    f = open(path + "\\about_me.txt", "w+")
    text = "Guideline Limitation"
    text = text +'\r\n\t Please refer to the CER website\'s “Terms and Conditions" for limitations, which apply to this ESA Data Bank.'
    text = text +"\r\n Accuracy"
    text = text +"\r\n\t CSV table data is extracted according to the methods outlined in the Methodology Documentation and may contain some inaccuracies. Extracted data is limited to ESA filings submitted by companies at the start of the public hearing assessment process. ESA-related information that is filed after the initial application milestone is excluded from this collection. In the event of variance, the original PDF in REGDOCS is the main source. Data sources for ESA tables are cited at the discretion of the consultants writing the ESAs and may vary according to ESA. Figures and Tables are available in the language they were submitted in."
    text = text +"\r\n Citation"
    text = text +"\r\n\t To cite this data bank, please use the following"
    text = text +"\r\n\t Canada Energy Regulator (2020): Environmental and Socioeconomic Assessment Data Bank."
    text = text +"\r\n\t To cite individual data tables, please also cite the original ESA PDF in REGDOCS."
    text = text +"\r\n\t By choosing to download files, you understand these limitations."
    text = text +"\r\n Table Title: " + about_me_data[0] 
    text = text +"\r\n Application Name: " + about_me_data[1] 
    text = text +"\r\n Company Name: " + about_me_data[2] 
    text = text +"\r\n File Name: " + about_me_data[3] 
    
    f.write(text)
    f.close()

## Copying the files from the source folder to the destination folder (esa-ees):

In [13]:
df = df.head(25)
df


Unnamed: 0.1,Unnamed: 0,Title,Content Type,Application Name,Application Short Name,Application Filing Date,Company Name,Commodity,File Name,ESA Folder URL,...,ESA Section(s) Topics,CSV Download URL,PDF Page Number,PDF Page Count,PDF Size,PDF Outline,Download folder name,Zipped Project Link,Download Folder Name,Csv Url
0,0,TABLE 3 SUMMARY OF AQUATICS FIELD WORK AND ABO...,Table,Application for North Montney Project,North Montney,2013-11-08,NOVA Gas Transmission Ltd.,Gas,B2-16 ESA_Appendix_G_Part1of4 (A3Q6H2),https://apps.cer-rec.gc.ca/REGDOCS/Item/LoadRe...,...,Water,http://www.cer-rec.gc.ca/esa-ees/nrthmntn/1059...,14,48.0,5.87,No,nrthmntn,http://www.cer-rec.gc.ca/esa-ees/nrthmntn.zip,//dweb5/esa-ees/nrthmntn/,"[http:, , www.cer-rec.gc.ca, esa-ees, nrthmntn..."
1,1,TABLE 4 SUMMARY OF WATERCOURSE CROSSINGS ALONG...,Table,Application for North Montney Project,North Montney,2013-11-08,NOVA Gas Transmission Ltd.,Gas,B2-16 ESA_Appendix_G_Part1of4 (A3Q6H2),https://apps.cer-rec.gc.ca/REGDOCS/Item/LoadRe...,...,Water,http://www.cer-rec.gc.ca/esa-ees/nrthmntn/1059...,17,48.0,5.87,No,nrthmntn,http://www.cer-rec.gc.ca/esa-ees/nrthmntn.zip,//dweb5/esa-ees/nrthmntn/,"[http:, , www.cer-rec.gc.ca, esa-ees, nrthmntn..."
2,2,TABLE 4 SUMMARY OF WATERCOURSE CROSSINGS ALONG...,Table,Application for North Montney Project,North Montney,2013-11-08,NOVA Gas Transmission Ltd.,Gas,B2-16 ESA_Appendix_G_Part1of4 (A3Q6H2),https://apps.cer-rec.gc.ca/REGDOCS/Item/LoadRe...,...,Water,http://www.cer-rec.gc.ca/esa-ees/nrthmntn/1059...,18,48.0,5.87,No,nrthmntn,http://www.cer-rec.gc.ca/esa-ees/nrthmntn.zip,//dweb5/esa-ees/nrthmntn/,"[http:, , www.cer-rec.gc.ca, esa-ees, nrthmntn..."
3,3,TABLE 4 SUMMARY OF WATERCOURSE CROSSINGS ALONG...,Table,Application for North Montney Project,North Montney,2013-11-08,NOVA Gas Transmission Ltd.,Gas,B2-16 ESA_Appendix_G_Part1of4 (A3Q6H2),https://apps.cer-rec.gc.ca/REGDOCS/Item/LoadRe...,...,Water,http://www.cer-rec.gc.ca/esa-ees/nrthmntn/1059...,19,48.0,5.87,No,nrthmntn,http://www.cer-rec.gc.ca/esa-ees/nrthmntn.zip,//dweb5/esa-ees/nrthmntn/,"[http:, , www.cer-rec.gc.ca, esa-ees, nrthmntn..."
4,4,TABLE 4 SUMMARY OF WATERCOURSE CROSSINGS ALONG...,Table,Application for North Montney Project,North Montney,2013-11-08,NOVA Gas Transmission Ltd.,Gas,B2-16 ESA_Appendix_G_Part1of4 (A3Q6H2),https://apps.cer-rec.gc.ca/REGDOCS/Item/LoadRe...,...,Water,http://www.cer-rec.gc.ca/esa-ees/nrthmntn/1059...,20,48.0,5.87,No,nrthmntn,http://www.cer-rec.gc.ca/esa-ees/nrthmntn.zip,//dweb5/esa-ees/nrthmntn/,"[http:, , www.cer-rec.gc.ca, esa-ees, nrthmntn..."
5,5,TABLE 4 SUMMARY OF WATERCOURSE CROSSINGS ALONG...,Table,Application for North Montney Project,North Montney,2013-11-08,NOVA Gas Transmission Ltd.,Gas,B2-16 ESA_Appendix_G_Part1of4 (A3Q6H2),https://apps.cer-rec.gc.ca/REGDOCS/Item/LoadRe...,...,Water,http://www.cer-rec.gc.ca/esa-ees/nrthmntn/1059...,21,48.0,5.87,No,nrthmntn,http://www.cer-rec.gc.ca/esa-ees/nrthmntn.zip,//dweb5/esa-ees/nrthmntn/,"[http:, , www.cer-rec.gc.ca, esa-ees, nrthmntn..."
6,6,TABLE 4 SUMMARY OF WATERCOURSE CROSSINGS ALONG...,Table,Application for North Montney Project,North Montney,2013-11-08,NOVA Gas Transmission Ltd.,Gas,B2-16 ESA_Appendix_G_Part1of4 (A3Q6H2),https://apps.cer-rec.gc.ca/REGDOCS/Item/LoadRe...,...,Water,http://www.cer-rec.gc.ca/esa-ees/nrthmntn/1059...,22,48.0,5.87,No,nrthmntn,http://www.cer-rec.gc.ca/esa-ees/nrthmntn.zip,//dweb5/esa-ees/nrthmntn/,"[http:, , www.cer-rec.gc.ca, esa-ees, nrthmntn..."
7,7,TABLE 4 SUMMARY OF WATERCOURSE CROSSINGS ALONG...,Table,Application for North Montney Project,North Montney,2013-11-08,NOVA Gas Transmission Ltd.,Gas,B2-16 ESA_Appendix_G_Part1of4 (A3Q6H2),https://apps.cer-rec.gc.ca/REGDOCS/Item/LoadRe...,...,Water,http://www.cer-rec.gc.ca/esa-ees/nrthmntn/1059...,23,48.0,5.87,No,nrthmntn,http://www.cer-rec.gc.ca/esa-ees/nrthmntn.zip,//dweb5/esa-ees/nrthmntn/,"[http:, , www.cer-rec.gc.ca, esa-ees, nrthmntn..."
8,8,TABLE 5 SUMMARY OF WATER QUALITY PARAMETERS AN...,Table,Application for North Montney Project,North Montney,2013-11-08,NOVA Gas Transmission Ltd.,Gas,B2-16 ESA_Appendix_G_Part1of4 (A3Q6H2),https://apps.cer-rec.gc.ca/REGDOCS/Item/LoadRe...,...,Water,http://www.cer-rec.gc.ca/esa-ees/nrthmntn/1059...,24,48.0,5.87,No,nrthmntn,http://www.cer-rec.gc.ca/esa-ees/nrthmntn.zip,//dweb5/esa-ees/nrthmntn/,"[http:, , www.cer-rec.gc.ca, esa-ees, nrthmntn..."
9,9,TABLE 5 SUMMARY OF WATER QUALITY PARAMETERS AN...,Table,Application for North Montney Project,North Montney,2013-11-08,NOVA Gas Transmission Ltd.,Gas,B2-16 ESA_Appendix_G_Part1of4 (A3Q6H2),https://apps.cer-rec.gc.ca/REGDOCS/Item/LoadRe...,...,Water,http://www.cer-rec.gc.ca/esa-ees/nrthmntn/1059...,25,48.0,5.87,No,nrthmntn,http://www.cer-rec.gc.ca/esa-ees/nrthmntn.zip,//dweb5/esa-ees/nrthmntn/,"[http:, , www.cer-rec.gc.ca, esa-ees, nrthmntn..."


In [14]:
# Adding source and destination filepaths to the dataframe
source_folder_path = 'F:/Environmental Baseline Data/Version 4 - Final/all_csvs_cleaned/'
source_filenames = []
dest_filenames = []

exceptions = []
source_file_names = []
source_title_names = []



previous_table_title = ""
for index, row in df.iterrows():
    if previous_table_title != row['Title'] and previous_table_title != "":
        
        #create the folder location 
        path = ENG_destination_path + 'tables\\'+ previous_table_title[:100]
        #path = os.path.realpath(ENG_destination_path + 'tables\\'+ previous_table_title[:25]
        try:
            os.mkdir(path)
        except OSError:
            print ("Creation of the directory %s failed" % path)
        
        #copy the csv files 
        for i in range(len(source_file_names)):
            try: 
            # copying the files into the destination folder
                shutil.copy(source_file_names[i], path +"\\" + source_title_names[i][:100]+"_" +str(i)+".csv")
            except:
                exceptions.append(row['Title'])
                print("exception raised on ", row['Title'])
                pass
        
        source_file_names = []
        source_title_names = []
            
        #create the about_me text file
        create_table_about_me(path, about_me_data)
            
        
    
    else:
        source_file_name = source_folder_path + row['Csv Url'][-3] + '_' + row['Csv Url'][-2] + '_lattice-v_' + row['Csv Url'][-1][0] + '.csv'
        source_file_names.append(source_file_name)
        source_title_names.append(row['Title'])
        
        about_me_data = [row['Title'], row['Application Name'], row['Company Name'], row['File Name']]
    
    previous_table_title = row['Title']

    
#Copy the above code to execute the last row tabel dump


print(len(source_file_names))
print(len(source_title_names))

1
1


In [78]:
# Checking to see if we were able to move every single file
# 'exceptions' will be empty if so
exceptions

[]

## Saving the index files as individual zip files

In [6]:
ENG_website_filename = 'ESA_website_ENG.csv'
FRA_website_filename = 'ESA_website_FRA.csv'

**Note: The next coding block may need to include a duplicate for the French index file. Please contact Angelsea for info on how she would like the files to be saved. This is likely the case so I've duplicated the code for the French file as well. Delete if unnecessary.**

Another thing, you may need to zip some other files as well, I'm unsure.

In [4]:
%%time

# Loading ENG index file
df = pd.read_csv(ENG_index_filepath)

# Removing rows for the figures
df = df[df['Content Type'] == 'Table']

# destination path
dest_path = r'//dweb5/esa-ees/'

df.to_csv(dest_path + ENG_website_filename)

# Create a ZipFile object
zipObj = ZipFile('DLD-ndc.zip', 'w')

# Add multiple files to the zip by using zipObj.write(...) for each file/folder you want to include in the zip folder.
zipObj.write(dest_path + ENG_website_filename)
                 
# Close the Zip File
zipObj.close()     

FileNotFoundError: [Errno 2] File b'F:/Environmental Baseline Data/Version 4 - Final/Indices/ESA_website_ENG.csv' does not exist: b'F:/Environmental Baseline Data/Version 4 - Final/Indices/ESA_website_ENG.csv'

## Create a csv and excel file the download paths for each csv (separated by folder)

In [47]:
# English files

%%time

# Loading ENG index file
df = pd.read_csv(ENG_index_filepath)

# Removing rows for the figures
df = df[df['Content Type'] == 'Table']

unique_folders = df['Download folder name'].unique()
download_folder_path_df = r'//dweb5/esa-ees/' + unique_folders + '/'

exceptions = []

for x in tqdm(range(0, len(unique_folders))):
    try:
        # Get name of the folder of a project
        folder_name = unique_folders[x]
        
        # Filter out all rows that aren't associated with project folder x
        folder_df = df[df['Download folder name'] == folder_name]
        
        # Save download path csv and excel files
        folder_df.to_csv(download_folder_path_df[x] + folder_name + '_ENG.csv')
        folder_df.to_excel(download_folder_path_df[x] + folder_name + '_ENG.xlsx')
    except:
        pass
        exceptions.append(df['Download folder name'].iloc[x])
    

100%|██████████████████████████████████████████████████████████████████████████████████| 38/38 [00:58<00:00,  1.53s/it]


Wall time: 1min


In [3]:
## Same as above, but for the French files

%%time

df = pd.read_csv(FRA_index_filepath)
df = df[df['Type de contenu'] == 'Tableau']
dossiers_uniques = df['Télécharger le nom du dossier'].unique()
download_folder_path_df = r'//dweb5/esa-ees/' + dossiers_uniques + '/'

exceptions = []

for x in tqdm(range(0, len(dossiers_uniques))):
    try:
        # Get name of the folder of a project
        nom_du_dossier = dossiers_uniques[x]
        
        # Filter out all rows that aren't associated with project folder x
        dossier_df = df[df['Télécharger le nom du dossier'] == nom_du_dossier]
        
        # Save download path csv and excel files
        dossier_df.to_csv(download_folder_path_df[x] + nom_du_dossier + '_FRA.csv')
        dossier_df.to_excel(download_folder_path_df[x] + nom_du_dossier + '_FRA.xlsx')
    except:
        pass
        exceptions.append(df['Télécharger le nom du dossier'].iloc[x])

100%|██████████████████████████████████████████████████████████████████████████████████| 37/37 [00:58<00:00,  1.59s/it]

Wall time: 59.8 s





## Create a zip file of each individual project folder

In [None]:
%%time

df = pd.read_csv(ENG_index_filepath)
unique_folders = df['Download folder name'].unique()
folder_paths = r'//dweb5/esa-ees/' + unique_folders

exceptions = []

for x in tqdm(range(0, len(unique_folders))):
    try:
        shutil.make_archive(folder_paths[x], 'zip', folder_paths[x])
    except:
        pass
        exceptions.append(folder_paths[x])