# Merging logfiles OpenSesame

> **Warning** During this tutorial we are going to work with files. You learn to create, copy, move, and delete files! Make sure you run your scripts in a location where you are allowed to edit files and that this location does not contain other files. Be careful and make sure you do not accidentially delete or move other files (copying is safer then moving or deleting!). Make backups of important files on your computer before continuing. The sandbox of Google Colab is a nice, more protected, environment to work with files.

## Introduction
OpenSesame creates a logfile separately for each participant you run. In this tutorial we show a way to merge these logfiles by assuming they all have exactly the same format (i.e., they have identical headers). This is not necessarily the case if you run slightly different versions of the experiment for different participants! Note that during the datawrangling tutorial in a later session, you will learn about a different way to merge logfiles that is more flexible.

## Step 1. Get OpenSesame data from osf
Let's use some python code to download existing OpenSesame data of a Stroop task from [osf](https://osf.io/7ma4t/). Note that the code below creates a folder called `tutorial_data` in your current working directory. Run the code below.

In [6]:
import requests
import shutil
import os

# recursively remove folder main and its content, do this if you want to start again with
#shutil.rmtree('tutorial_data')

# create the main directory
if not os.path.exists('tutorial_data'):
    os.makedirs('tutorial_data')

# download the zip file
url = 'https://osf.io/download/3d9er/'
r = requests.get(url, allow_redirects=True)
open('./tutorial_data/data_pilot.zip', 'wb').write(r.content)

# extract the zip file
import zipfile
with zipfile.ZipFile('./tutorial_data/data_pilot.zip', 'r') as zip_ref:
    zip_ref.extractall('./tutorial_data/')


Check whether the data is properly stored and extracted. Open a csv file to see it's content.

## Step 2. Create a loop that finds all files in a particular directory
Let's now create a loop that prints all the filesnames found by the os.listdir function and counts the number of files found.

In [7]:
# Change this to the folder that contains the .csv files
SRC_FOLDER = './tutorial_data/data/'

filecount = 0
for basename in os.listdir(SRC_FOLDER):
    path = os.path.join(SRC_FOLDER, basename)
    print('Reading ',path)
    filecount = filecount + 1
print('Number of files considered for merge:',filecount)


Reading  ./tutorial_data/data/CI_RSI2000_test.csv
Reading  ./tutorial_data/data/subject-0_CI.csv
Reading  ./tutorial_data/data/subject-11_CI.csv
Reading  ./tutorial_data/data/subject-12_IC.csv
Reading  ./tutorial_data/data/subject-13_CI.csv
Reading  ./tutorial_data/data/subject-14_IC.csv
Reading  ./tutorial_data/data/subject-1_IC.csv
Reading  ./tutorial_data/data/subject-3_CI.csv
Reading  ./tutorial_data/data/subject-4_IC.csv
Reading  ./tutorial_data/data/subject-5_CI.csv
Reading  ./tutorial_data/data/subject-6.csv
Reading  ./tutorial_data/data/subject-7_IC.csv
Reading  ./tutorial_data/data/subject-9_IC.csv
Number of files considered for merge: 13


## Step 3. Merge all files into one csv file
As a next step, we are going to read in each file line by line and write them to a new merged file.

To create the new merged file use the command:

In [8]:
fout = open('./tutorial_data/merged.csv', 'w')

To read in a file line by line and save each line to the merged file use this loop:

In [14]:
fhand = open(path)
for line in fhand:
    fout.write(line)
fhand.close()

ValueError: I/O operation on closed file.

 Insert these code snippets in the code we just created at the right locations:


In [10]:
# Change this to the folder that contains the .csv files
SRC_FOLDER = './tutorial_data/data/'
# ... your code here

filecount = 0
for basename in os.listdir(SRC_FOLDER):
    path = os.path.join(SRC_FOLDER, basename)
    print('Reading ',path)
    # ... your code here
    filecount = filecount + 1
print('Number of files considered for merge:',filecount)

Reading  ./tutorial_data/data/CI_RSI2000_test.csv
Reading  ./tutorial_data/data/subject-0_CI.csv
Reading  ./tutorial_data/data/subject-11_CI.csv
Reading  ./tutorial_data/data/subject-12_IC.csv
Reading  ./tutorial_data/data/subject-13_CI.csv
Reading  ./tutorial_data/data/subject-14_IC.csv
Reading  ./tutorial_data/data/subject-1_IC.csv
Reading  ./tutorial_data/data/subject-3_CI.csv
Reading  ./tutorial_data/data/subject-4_IC.csv
Reading  ./tutorial_data/data/subject-5_CI.csv
Reading  ./tutorial_data/data/subject-6.csv
Reading  ./tutorial_data/data/subject-7_IC.csv
Reading  ./tutorial_data/data/subject-9_IC.csv
Number of files considered for merge: 13


Run the code and check whether a merged file is created.

## Step 4. Compare headers





In [16]:
# Change this to the folder that contains the .csv files
SRC_FOLDER = './tutorial_data/data/'

fout = open('./tutorial_data/merged.csv', 'w')

filecount = 0
filecountmerged = 0
for basename in os.listdir(SRC_FOLDER):
    path = os.path.join(SRC_FOLDER, basename)
    if path !=  "./tutorial_data/data/CI_RSI2000_test.csv":
        print('Reading {}'.format(path))
        fhand = open(path)
        linecount = 0
        for line in fhand:

            #print('number of commas: ',line.count(','))
            if linecount == 0:
                if filecount == 0:
                    #header first file
                    refheader =  line
                    fout.write(line)
                    writethisfile = True
                else:
                    #check whether current header matches refheader
                    if line == refheader:
                        writethisfile = True
                        filecountmerged = filecountmerged + 1
                    else:
                        writethisfile = False
            else:
                if writethisfile:
                    fout.write(line)
            linecount = linecount + 1
        print('Line Count:', linecount)
        fhand.close()
        filecount = filecount + 1
fout.close()
print('Number of files considered for merge:',filecount,". Merged: ",filecountmerged)


Reading ./tutorial_data/data/subject-9_IC.csv
Line Count: 277
Number of files considered for merge: 1 . Merged:  0



# Exercises

### Exercise 1. Manually change content of merged file

Create a script that opens the merged file created with the code in the tutorial above. Replace all words "neutral" with "neu" and save the edited file under a new name.


In [12]:
# your code here


### Exercise 2. Copy renamed files to another location

Run the following code to create a folder structure with 10 dummy textfiles nested into 10 folders. Assume that the text files reflect data belonging to ten participants.

In [13]:
import os
import shutil

# recursively remove folder tutorial_data2 and its content
#shutil.rmtree("tutorial_data2")

# create the tutorial_data directory
if not os.path.exists('tutorial_data2'):
    os.makedirs('tutorial_data2')

# create the subdirectories
for i in range(1, 11):
    directory_name = os.path.join('tutorial_data2', str(i))
    if not os.path.exists(directory_name):
        os.makedirs(directory_name)

# create the text files
for i in range(1, 11):
    directory_name = os.path.join('tutorial_data2', str(i))
    file_name = os.path.join(directory_name, 'file.txt')
    with open(file_name, 'w') as f:
        for j in range(1,100):
            f.write('Hello world. \t This is another column with line number ' + str(j) + '\n')


Now create a new script that creates a copy of all text files just created and put this copy into the main folder (tutorial_data2). Change the name of the copies file so that the participant number (1..10) is stored into the file name in this format file_pp1.txt, file_pp2.txt, etc.


### Exercise 3. Add information as a new column to the textfile

Create a script that opens the textfiles you created in exercise 2 and that addz the name of its file as a first column (assume data is tab-delimited), so that each file consists of three columns.

### Exercise 4. Merge the new textfiles to a single textfile

Create a script that merges all files created in Exercise 3 into a single text file. Start the text file with a header indicating file name, column 2, and column 3 separated by tabs.

Open the tab-delimited text file in a spreadsheet program and check whether it opens properly in 3-column format.