# Copying tables from F or G drive to the esa-ees folder to put the files in production

This notebook contains the necessary code to copy the ESA CSV files as well as the index files into the esa-ees folder. Once the files are in esa-ees, IT will take the files and copy them into the production / test folders.

Part of the code organizes all the CSVs into their respective project folder. The files are then zipped into a zip file to make it easier for users to download the files.

**What's missing?** There needs to be code added to bundle the files together. Personally, I would use a new duplicate folder of the csv files and test how you can bundle the files. Once you've bundled the files correctly, rearrange the code in this notebook as necessary.

In [1]:
# !pip install tqdm

In [8]:
import pandas as pd
import shutil # for copying files
from tqdm import tqdm # progress bar for for-loops
from zipfile import ZipFile # for creating zip files
import os
import glob

In [9]:
# filepath to the English and French index files
ENG_index_filepath = 'F:/Environmental Baseline Data/Version 4 - Final/Indices/ESA_website_ENG.csv'
FRA_index_filepath = 'F:/Environmental Baseline Data/Version 4 - Final/Indices/ESA_website_FRA.csv'

# Caution: Following code is to the delete files in esa-ees.

Use this code when your are copying new files to esa-ees and you want to avoid creating duplicates with different filenames (or folder names).

**Skip if you don't want to delete files in esa-ees.**

In [22]:
%%time

# df = pd.read_csv(ENG_index_filepath)
# df['Download folder name'] = r'//dweb5/esa-ees/' + df['Download folder name'] + '/*'
# unique_folders = df['Download folder name'].unique()

# for x in tqdm(range(0, len(unique_folders))):
#     folder_name = unique_folders[x]
#     files = glob.glob(folder_name)
#     for f in files:
#         os.remove(f)

100%|██████████████████████████████████████████████████████████████████████████████████| 38/38 [16:16<00:00, 25.70s/it]


Wall time: 16min 18s


## Modifying index files with pandas to be able to copy files to esa-ees folder (the destination folder)

In [None]:
%%time

# Loading index file of all tables
df = pd.read_csv(ENG_index_filepath)

# Grabbing project folder names and replacing with folder paths in esa-ees
df['Download folder name'] = r'//dweb5/esa-ees/' + df['Download folder name'] + '/'

# Creating new 
df['csv_url'] = df["CSV Download URL"].str.split('/|_')

# Remove all rows for figures so that we are only moving tables
df = df[df['Content Type'] == 'Table']
df = df.reset_index(drop=True)

# Adding source and destination filepaths to the dataframe
source_folder_path = 'F:/Environmental Baseline Data/Version 4 - Final/all_csvs_cleaned/'
source_filenames = []
dest_filenames = []
for x in tqdm(range(0, len(df))):
    source_file_name = source_folder_path + df['csv_url'].iloc[x][-3] + '_' + df['csv_url'].iloc[x][-2] + '_lattice-v_' + df['csv_url'].iloc[x][-1][0] + '.csv'
    source_filenames.append(source_file_name)
    dest_file_name = df['Download folder name'].iloc[x] + df['csv_url'].iloc[x][-3] + '_' + df['csv_url'].iloc[x][-2] + '_' + df['csv_url'].iloc[x][-1][0] + '.csv'
    dest_filenames.append(dest_file_name)

df['csv_path'] = pd.Series(source_filenames)
df['Download folder name'] = pd.Series(dest_filenames)

### Checking a cell to see what the paths look like:

In [4]:
df['csv_path'][0]

'F:/Environmental Baseline Data/Version 4 - Final/all_csvs_cleaned/1059614_14_lattice-v_1.csv'

In [5]:
df['Download folder name'][0]

'//dweb5/esa-ees/nrthmntn/1059614_14_1.csv'

## Copying the files from the source folder to the destination folder (esa-ees):

In [46]:
%%time

exceptions = []

for x in tqdm(range(0, len(df))):
    try: 
        # copying the files into the destination folder
        shutil.copy(df['csv_path'].iloc[x], df['Download folder name'].iloc[x])
    except:
        pass
        exceptions.append(df['csv_path'].iloc[x])
#     print(str(x) + ' of ' + str(len(df)) + ' files moved.')

100%|████████████████████████████████████████████████████████████████████████████| 28891/28891 [43:24<00:00, 11.09it/s]


In [78]:
# Checking to see if we were able to move every single file
# 'exceptions' will be empty if so
exceptions

[]

## Saving the index files as individual zip files

In [6]:
ENG_website_filename = 'ESA_website_ENG.csv'
FRA_website_filename = 'ESA_website_FRA.csv'

**Note: The next coding block may need to include a duplicate for the French index file. Please contact Angelsea for info on how she would like the files to be saved. This is likely the case so I've duplicated the code for the French file as well. Delete if unnecessary.**

Another thing, you may need to zip some other files as well, I'm unsure.

In [32]:
%%time

# Loading ENG index file
df = pd.read_csv(ENG_index_filepath)

# Removing rows for the figures
df = df[df['Content Type'] == 'Table']

# destination path
dest_path = r'//dweb5/esa-ees/'

df.to_csv(dest_path + ENG_website_filename)

# Create a ZipFile object
zipObj = ZipFile('DLD-ndc.zip', 'w')

# Add multiple files to the zip by using zipObj.write(...) for each file/folder you want to include in the zip folder.
zipObj.write(dest_path + ENG_website_filename)
                 
# Close the Zip File
zipObj.close()     

Wall time: 22.3 s


## Create a csv and excel file the download paths for each csv (separated by folder)

In [47]:
# English files

%%time

# Loading ENG index file
df = pd.read_csv(ENG_index_filepath)

# Removing rows for the figures
df = df[df['Content Type'] == 'Table']

unique_folders = df['Download folder name'].unique()
download_folder_path_df = r'//dweb5/esa-ees/' + unique_folders + '/'

exceptions = []

for x in tqdm(range(0, len(unique_folders))):
    try:
        # Get name of the folder of a project
        folder_name = unique_folders[x]
        
        # Filter out all rows that aren't associated with project folder x
        folder_df = df[df['Download folder name'] == folder_name]
        
        # Save download path csv and excel files
        folder_df.to_csv(download_folder_path_df[x] + folder_name + '_ENG.csv')
        folder_df.to_excel(download_folder_path_df[x] + folder_name + '_ENG.xlsx')
    except:
        pass
        exceptions.append(df['Download folder name'].iloc[x])
    

100%|██████████████████████████████████████████████████████████████████████████████████| 38/38 [00:58<00:00,  1.53s/it]


Wall time: 1min


In [3]:
## Same as above, but for the French files

%%time

df = pd.read_csv(FRA_index_filepath)
df = df[df['Type de contenu'] == 'Tableau']
dossiers_uniques = df['Télécharger le nom du dossier'].unique()
download_folder_path_df = r'//dweb5/esa-ees/' + dossiers_uniques + '/'

exceptions = []

for x in tqdm(range(0, len(dossiers_uniques))):
    try:
        # Get name of the folder of a project
        nom_du_dossier = dossiers_uniques[x]
        
        # Filter out all rows that aren't associated with project folder x
        dossier_df = df[df['Télécharger le nom du dossier'] == nom_du_dossier]
        
        # Save download path csv and excel files
        dossier_df.to_csv(download_folder_path_df[x] + nom_du_dossier + '_FRA.csv')
        dossier_df.to_excel(download_folder_path_df[x] + nom_du_dossier + '_FRA.xlsx')
    except:
        pass
        exceptions.append(df['Télécharger le nom du dossier'].iloc[x])

100%|██████████████████████████████████████████████████████████████████████████████████| 37/37 [00:58<00:00,  1.59s/it]

Wall time: 59.8 s





## Create a zip file of each individual project folder

In [None]:
%%time

df = pd.read_csv(ENG_index_filepath)
unique_folders = df['Download folder name'].unique()
folder_paths = r'//dweb5/esa-ees/' + unique_folders

exceptions = []

for x in tqdm(range(0, len(unique_folders))):
    try:
        shutil.make_archive(folder_paths[x], 'zip', folder_paths[x])
    except:
        pass
        exceptions.append(folder_paths[x])