# Do Your Own Analysis

1. First, work through Setup and a few analysis notebooks
1. Perform new analyses on CVA and Token dataframes
1. Rename and save your changes to this notebook on your gDrive
1. Add lots & lots of comments if you want others to understand & collaborate
1. Finally, contribute back to this repository, sharing your insights.

# Setup Environment

## Set Notebook Parameters

In [None]:
# use gDrive if you previously saved train_data, etc.
# otherwise, use pre-generated data from repos (Default)
USE_GDRIVE = False

# save analysis plots if customized
SAVE_PLOT = False

## Import various packages


In [None]:
import pandas as pd
import numpy as np

import os.path
from os import path
from time import strftime, localtime
from google.colab import drive

## Clone CVA-SBERT GitHub or mount gDrive

In [None]:
if USE_GDRIVE:
    drive.mount('/content/drive')               # mount YOUR gDrive

    # Path to data -- change for YOUR specific Analysis folder
    path = '/content/drive/MyDrive/CVA-SBERT/Analysis-20221203-190207' ### CHANGE!!!

else:
    !git clone https://github.com/Hackathorn/CVA-SBERT  # clone repos

    # Path to data in repository
    path = '/content/CVA-SBERT/data/SetUp_Data'

path

Cloning into 'CVA-SBERT'...
remote: Enumerating objects: 197, done.[K
remote: Counting objects: 100% (35/35), done.[K
remote: Compressing objects: 100% (23/23), done.[K
remote: Total 197 (delta 21), reused 19 (delta 12), pack-reused 162[K
Receiving objects: 100% (197/197), 87.56 MiB | 23.53 MiB/s, done.
Resolving deltas: 100% (122/122), done.
/content/CVA-SBERT


Load dataframes and create working df as simply ```df```

In [None]:
# load previous dataframes from SetUp notebook
CVA_df = pd.read_pickle(path + '/CVA_df.pkl')
token_df = pd.read_pickle(path + '/token_df.pkl')

#### select/rename/transform data into working df to be analyzed
df = CVA_df     # ...or CVA_df[CVA_df.is_train == True] etc
df

Unnamed: 0,DataId,SourceId,Target,Definition,Item,Cos_Sim,Euc_Sim
0,0,2978,1,People whose past behavior is consistent with ...,Have any of your current or previous partners ...,0.183756,1.277688
1,1,1056,0,Facilitation from work to school.,I enjoy being a student on this campus.,0.292208,1.189783
2,3,1015,0,Employees? sense of belongingness at work.,Helps others when it is clear their workload i...,0.322255,1.164255
3,4,2988,0,How attracted members were to the crew and the...,Managers rate each crew (low performance/high ...,0.446235,1.052393
4,7,3130,0,Things Manny didn?t do.,Did Manny tear the book while he was reading it?,0.565577,0.932119
...,...,...,...,...,...,...,...
23030,28070,12341,0,The extent to which reputations were observabl...,The project required close working relationshi...,0.213506,1.254188
23031,28071,12822,1,How characteristic each of the attractiveness ...,Wise.,0.147961,1.305403
23032,28072,3350,1,Participants' explanations for why the seller ...,The buyer is persuasive,0.569600,0.927793
23033,28074,2361,1,Newcomers? belief that good alternative work e...,To what extent have other co-workers influence...,0.447036,1.051631


RESULTS...
- some insights from above results??? 

# Next step in your analysis

## Next piece of this step

In [None]:
# some code

RESULTS...
- Some insights........

# Save analysis results to your gDrive - OPTIONAL

Mount gDrive and create timestamped Experiment Folder

In [None]:
drive.mount('/content/drive')   # ignore warning if already mounted

BASE_PATH = '/content/drive/MyDrive/CVA-SBERT/'
EXP_PATH = BASE_PATH + 'Analysis-' + strftime("%Y%m%d-%H%M%S", localtime())

if path.exists(BASE_PATH) == False:
    os.mkdir(BASE_PATH)
if path.exists(EXP_PATH) == False:
    os.mkdir(EXP_PATH)

Save dataframes or other results to Experiment Folder

In [None]:
# save initial two dataframes
CVA_df.to_pickle(EXP_PATH + '/CVA_df.pkl')
token_df.to_pickle(EXP_PATH + '/token_df.pkl')

# ...or other saving of other results, like plots
#