### Merging Personalized Dataframes back into Generalized DataFrame ###

The purpose of this notebook is to merge together perosnalized dataframes into a shared dataframe. Edits made by individuals on their own copy of the shared dataframe are saved to their personalized dataframes. These personalized dataframes can then be used to start this process anew. Begin by importing the relevant libraries and defining a method that is relevant for merging the columns of the intermittent merged dataframes.

In [None]:
import pandas as pd
import numpy as np

def tagMerge(compTag, umbTag, df, clean) : # a function that merges several existing tags together into one common tag.
    tagMat = [ # list comp within list comp
        [
            df[tag][ind] for tag in compTag # copy each tag value for the relevant component tags
        ] for ind, row in df.iterrows() # list comprehension over no. of rows
        
    ]
    df[umbTag] = [np.lcm.reduce(tagRow) for tagRow in tagMat] # set new column equal
    if(clean) : # are we purging all of the old columns?
        if(umbTag in compTag) : # is umbrella tag one of our component tags?
            compTag.remove(umbTag) # remove it so that we don't undo all of our efforts
        df.drop(columns = compTag, inplace = True) # drop all unnecessary tags.
    return df

#### Setting Workspace and Loading Files ####

This block reads in the name of the family of files to be merged together. Adjust the first line of this block as necessary when merging different data frames. The block then loads in both the general and the personalized data frames of each user.

In [None]:
workingFile = 'example' # which file are we working on today

users = []
f = open("users.txt", "r")
for line in f: # read through every user
    read = line.split()
    users.append(read[0])
f.close()

mainDF = pd.read_csv(f'{workingFile}.csv') # load in general use dataframe
persDFs = [pd.read_csv(f'{workingFile}{user}.csv') for user in users] # Load in personalized dataframes as a list

#### Merging Dataframes ####

The merging occurs by doing an outer merge between the general dataframe and each of the personalized dataframes in some order. Each merge creates a number of redundant columns in the resulting dataframe, so the `tagMerge` function is used to reduce the dataframe back down to the number of columns that it had initially. These merges use the proper mathematical reasoning required for the dataframes.

In [None]:
for df in persDFs : # iterate over each personalized dataframe in the list
    scraDF = mainDF.merge(df, how = 'outer', on = ['text', 'id', 'index', 'section'], suffixes = [None,'_p']) # merge each personalized dataframe with main dataframe
    for tag in mainDF.columns : # Merge together tags that had previously existed in dataframe
        if(tag in ['text', 'completion', 'id', 'index', 'section']) : # don't deal with this non-tag column
            continue
        else:
            scraDF = tagMerge([tag,f'{tag}_p'],tag,scraDF,True)
        mainDF = scraDF # move current scratch df onto main df prior to next iteration

#### Saving Final Dataframe ####

Once the personalized dataframes have all been merged together, we can re-save the main dataframe to its `.csv` file. This ends our use of the code merging method. From this point, you are cleared to use `qualCoder.ipynb` where you may make changes to the main dataframe before saving it over your personalized copy of the dataframe.

In [None]:
mainDF.to_csv(f'{workingFile}.csv', index = False)

### MANUAL OVERWRITE ###

**WARNING:** Only use this block when you wish to work on the main dataset, but you know for a fact that the datasets of your fellow coders are outdated. In this circumstance merging with them would undo some of your work, so with their consent, you will replace their outdated dataframes with your own. The main utility of this comes along when we wish to begin pruning our tags rather than expanding them. My code is mainly designed to be inclusive of all contributions and when one user is attempting to discard useless tags that they don't like, merging will only undo their work as the other users will still have
those tags written.


When you do this, be sure to `git add` all three personalized dataframes when you push to github. If you do this, then the next time someone pulls from off of github, they will be given the updated overwrite of their own dataframe. If you don't add their work, then they will continue working on their 'outdated' file and you risk merging ineffectively during a future session.

In [None]:
query = input("Are you sure you wish to overwrite all other data frames with your personalized version? (y/n) ") # seek verification
if(query.lower() == 'y') : # acquired
    print(users)
    useNo = input("\nPlease enter integer value for your position in the user list: ")
    if useNo in range(len(users)): # verify that correct int was entered
        copy_df = pd.read_csv(f'{workingFile}{users[int(useNo)]}.csv') # load in core copier
        for user in users: # look at each user
            copy_df.to_csv(f'{workingFile}{user}.csv', index = False) # paste core copier over each dataframe
    else: # missed input
        print("\nImproper integer input. Please try again.")
    print("\nData frame has been saved.")
    codeDF.to_csv(f'{workingFile}.csv', index = False) # overwrite .csv with current version
else :  # denied
    print("\nData frames were not overwritten.") # state that save did not occur