# Given a CSV file, Update columns to fit expected format for gc_event_dataframe

> ### The purpose of this tutorial and tool is to take a CSV file with external data, and make sure it can easily be accepted in the gc_log_analysis tool. Given a CSV filename, here we are able to transform the structure to fit that of the `gc_event_dataframe`, and create a new CSV file. Later, it is possible to import those CSV files with a single line of code (as seen in the bottom cell) for easy analysis. 

## Populate the following variable fileds. Then, run all cells
- `old_csv_filename` : the path to the file you would like to fix the format of.
- `my_column_transformations` : transformation functions for the data. Write None to ignore transforming this column of data. Length of this list must match the number of columns in the original dataset
- `new_column_names` : the updates column names you would like to have, as strings. Write None to ignore this column. Length of this list must match the number of columns in the original dataset
- `populate_columns` : If there is a column you would like to fill with the same value, add a tuple to the list here. The first index of the tuple is the column name, and the second index of the tuple is the constant value that every row will take on. List length for this can be any, but tuples must have length 2.


In [51]:
# Current CSV file name:
old_csv_filename = "just-in-time.csv" 
my_column_names = ["TimeFromStart_seconds", "Duration_miliseconds"] # How to rename the columns. Choose None to not use the data in that column
my_column_transformations = [lambda value : int(value) / 1000, None] # Applys to each element in the column
populate_columns = [("EventType", "My-unique-event-type")] # list of ('A', B): Sets all rows in Column 'A' to value B

### The cell below contains the code to create your CSV, but does not need to be inspected

In [60]:
# Get the column names
import sys
import pandas as pd 
sys.path.append("../src")
from read_log_file import columnNames

def create_formatted_csv(output_csv_filename):
    global old_csv_filename, my_column_names, populate_columns, column_names, my_column_transformations
    # Create a blank dataframe, and add the needed columns to it.
    df = pd.DataFrame()
    for column in columnNames():
        df[column] = ""
    
    # Gather data from columns of the original csv
    old_df = pd.read_csv(old_csv_filename)
    
    # Apply any transformations to each row in each column
    for index, transformation in enumerate(my_column_transformations):
        if transformation:
            old_df.iloc[:,index] = old_df.iloc[:,index].apply(transformation)

    # Populate the new array with data from the old, under the column names
    for index, (column, column_data) in enumerate(old_df.iteritems()):
        if my_column_names[index]:
            df[my_column_names[index]] = column_data

    # Populate columns with the same value in the new dataframe
    for column, value in populate_columns:
        df[column] = [value for i in range(len(df[column]))] 

    # 
    df.to_csv(output_csv_filename, index = False) # Create the CSV file
    return df 
    

In [61]:
# Run the function
df = create_formatted_csv("example_out.csv")

## To use the line of code in the future, try this