# Building `PELIC_compiled.csv`

<br>

**Author:** Ben Naismith (bnaismith@pitt.edu)  
**Date:** 9 June 2020

<br>

This notebook provides a tutorial for creating the PELIC_compiled.csv from the PELIC corpus files in the [`corpus_files`](https://github.com/ELI-Data-Mining-Group/PELIC-dataset/tree/master/corpus_files) folder. The final csv file is also available in the current repository.

<br>

**Notebook contents:**
- [Reading in necessary files](#Reading-in-necessary-files)
- [Compiling dataframe](#Compiling-dataframe)
- [Writing out `PELIC_compiled`](#Writing-out-PELIC_compiled)
- [`PELIC_compiled` mini demonstration](#PELIC_compiled-mini-demonstration)

In [1]:
# Import necessary modules
import pandas as pd
import pickle as pkl
from ast import literal_eval

## Reading in necessary files

The three necessary csv files are found in the [`corpus_files`](https://github.com/ELI-Data-Mining-Group/PELIC-dataset/tree/master/corpus_files) folder.

- [answer.csv](https://github.com/ELI-Data-Mining-Group/PELIC_dataset/tree/master/corpus_files/answer.csv)
- [course.csv](https://github.com/ELI-Data-Mining-Group/PELIC_dataset/tree/master/corpus_files/course.csv)
- [student_information.csv](https://github.com/ELI-Data-Mining-Group/PELIC_dataset/tree/master/corpus_files/student_information.csv)

In [2]:
# Read in answer.csv

answer_df = pd.read_csv("../corpus_files/answer.csv", index_col = 'answer_id')  # answer_id is unique
answer_df.info()
answer_df.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 46230 entries, 1 to 48420
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   question_id  46230 non-null  int64 
 1   anon_id      46230 non-null  object
 2   course_id    46230 non-null  int64 
 3   version      46230 non-null  int64 
 4   text_len     46230 non-null  int64 
 5   text         46230 non-null  object
 6   tokens       46230 non-null  object
 7   tok_lem_POS  46230 non-null  object
dtypes: int64(4), object(4)
memory usage: 3.2+ MB


Unnamed: 0_level_0,question_id,anon_id,course_id,version,text_len,text,tokens,tok_lem_POS
answer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,5,eq0,149,1,177,I met my friend Nife while I was studying in a...,"['I', 'met', 'my', 'friend', 'Nife', 'while', ...","[('I', 'I', 'PRP'), ('met', 'meet', 'VBD'), ('..."
2,5,am8,149,1,137,"Ten years ago, I met a women on the train betw...","['Ten', 'years', 'ago', ',', 'I', 'met', 'a', ...","[('Ten', 'ten', 'CD'), ('years', 'year', 'NNS'..."
3,12,dk5,115,1,63,In my country we usually don't use tea bags. F...,"['In', 'my', 'country', 'we', 'usually', 'do',...","[('In', 'in', 'IN'), ('my', 'my', 'PRP$'), ('c..."
4,13,dk5,115,1,6,I organized the instructions by time.,"['I', 'organized', 'the', 'instructions', 'by'...","[('I', 'I', 'PRP'), ('organized', 'organize', ..."
5,12,ad1,115,1,59,"First, prepare a port, loose tea, and cup.\nSe...","['First', ',', 'prepare', 'a', 'port', ',', 'l...","[('First', 'first', 'RB'), (',', ',', ','), ('..."


In [3]:
# Because lists are read in as strings, these need to be converted back to lists

answer_df.tokens = answer_df.tokens.apply(literal_eval)
answer_df.tok_lem_POS = answer_df.tok_lem_POS.apply(literal_eval)

In [4]:
# Read in course.csv

course_df = pd.read_csv("../corpus_files/course.csv", index_col='course_id')
course_df.info()
print(course_df['level_id'].value_counts())
course_df.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1066 entries, 1 to 1123
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   class_id  1066 non-null   object
 1   level_id  1066 non-null   int64 
 2   semester  1066 non-null   object
 3   section   1066 non-null   object
dtypes: int64(1), object(3)
memory usage: 41.6+ KB
4    402
5    305
3    273
2     86
Name: level_id, dtype: int64


Unnamed: 0_level_0,class_id,level_id,semester,section
course_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,r,2,2006_spring,A
2,r,3,2006_spring,B
3,r,4,2006_spring,M
4,r,4,2006_spring,P
5,r,4,2006_spring,Q


In [5]:
# Read in student_information.csv

sinfo_df = pd.read_csv("../corpus_files/student_information.csv", index_col = 'anon_id')
sinfo_df.info()
sinfo_df.fillna('', inplace=True) # Replace all NaN with empty strings
sinfo_df.head()

<class 'pandas.core.frame.DataFrame'>
Index: 1313 entries, ez9 to gg7
Data columns (total 20 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   gender                      1313 non-null   object 
 1   birth_year                  913 non-null    float64
 2   native_language             1313 non-null   object 
 3   language_used_at_home       912 non-null    object 
 4   non_native_language_1       859 non-null    object 
 5   yrs_of_study_lang1          863 non-null    object 
 6   study_in_classroom_lang1    863 non-null    object 
 7   ways_of_study_lang1         863 non-null    object 
 8   non_native_language_2       309 non-null    object 
 9   yrs_of_study_lang2          312 non-null    object 
 10  study_in_classroom_lang2    863 non-null    object 
 11  ways_of_study_lang2         311 non-null    object 
 12  non_native_language_3       55 non-null     object 
 13  yrs_of_study_lang3          59 non-nu

Unnamed: 0_level_0,gender,birth_year,native_language,language_used_at_home,non_native_language_1,yrs_of_study_lang1,study_in_classroom_lang1,ways_of_study_lang1,non_native_language_2,yrs_of_study_lang2,study_in_classroom_lang2,ways_of_study_lang2,non_native_language_3,yrs_of_study_lang3,study_in_classroom_lang3,ways_of_study_lang3,course_history,yrs_of_english_learning,yrs_in_english_environment,age
anon_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
ez9,Male,1978,Arabic,Arabic,English,more than 5 years,yes,Studied grammar;Worked in pairs/groups;Studied...,Turkish,less than 1 year,no,Studied by myself,,,no,,6;12;18;24;30,1-2 years,3-5 years,27
gm3,Male,1980,Arabic,Arabic,English,more than 5 years,yes,Studied grammar;Had a native-speaker teacher;S...,,,no,,,,no,,6;12;24;30;38,1-2 years,more than 5 years,25
fg5,Male,1938,Nepali,Nepali,English,more than 5 years,yes,Studied grammar;Worked in pairs/groups;Had a n...,French,less than 1 year,yes,Studied grammar;Worked in pairs/groups;Had a n...,Hindi,more than 5 years,no,Studied by myself,18;24,more than 5 years,more than 5 years,66
ce5,Female,1984,Korean,Korean,English,more than 5 years,yes,Studied grammar;Worked in pairs/groups;Had a n...,German,1-2 years,yes,Studied grammar;Studied vocabulary;Listened to...,,,no,,6;12;24;30;38;56,more than 5 years,3-5 years,21
fi7,Female,1982,Korean,Korean;Japanese,English,more than 5 years,yes,Studied grammar;Had a native-speaker teacher;S...,Japanese,less than 1 year,yes,Studied grammar;Studied vocabulary;Listened to...,French,1-2 years,yes,Studied grammar;Studied vocabulary;Listened to...,6;12;24;30;38,less than 1 year,none,23


## Compiling dataframe
The `answer_df`, `course_df`, and `sinfo_df` dataframes created in the previous section are now compiled into a single dataframe called `pelic_df`. Where necessary, functions are created to pull information from the different dataframes.  
Each row represents one text and is accompanied by relevant information about the author and class.
- `answer_df` (basis of pelic_df)
- `course_df` (class type and level information)
- `sinfo_df` (L1 and gender information)

In [6]:
# Start with answer.csv, the primary source of texts and their info.

pelic_df = answer_df.copy()
pelic_df.head()

Unnamed: 0_level_0,question_id,anon_id,course_id,version,text_len,text,tokens,tok_lem_POS
answer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,5,eq0,149,1,177,I met my friend Nife while I was studying in a...,"[I, met, my, friend, Nife, while, I, was, stud...","[(I, I, PRP), (met, meet, VBD), (my, my, PRP$)..."
2,5,am8,149,1,137,"Ten years ago, I met a women on the train betw...","[Ten, years, ago, ,, I, met, a, women, on, the...","[(Ten, ten, CD), (years, year, NNS), (ago, ago..."
3,12,dk5,115,1,63,In my country we usually don't use tea bags. F...,"[In, my, country, we, usually, do, n't, use, t...","[(In, in, IN), (my, my, PRP$), (country, count..."
4,13,dk5,115,1,6,I organized the instructions by time.,"[I, organized, the, instructions, by, time, .]","[(I, I, PRP), (organized, organize, VBD), (the..."
5,12,ad1,115,1,59,"First, prepare a port, loose tea, and cup.\nSe...","[First, ,, prepare, a, port, ,, loose, tea, ,,...","[(First, first, RB), (,, ,, ,), (prepare, prep..."


#### Add to pelic_df the native language of the author of each text
The L1 information is found in `sinfo_df`.

In [7]:
# Create a function to return the L1 based on the anon_id in 

def get_native_lang(idstr):
    if idstr in sinfo_df.index:
        return sinfo_df.loc[idstr, 'native_language']
    else: return 'Unknown'

# Test the function
print(sinfo_df.loc['eq0','native_language'])
print(get_native_lang('eq0'))

Arabic
Arabic


In [8]:
# Create a new 'L1' (first langauge) column using the get_native_lang function

pelic_df['L1'] = pelic_df['anon_id'].apply(get_native_lang)
pelic_df.head()

Unnamed: 0_level_0,question_id,anon_id,course_id,version,text_len,text,tokens,tok_lem_POS,L1
answer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,5,eq0,149,1,177,I met my friend Nife while I was studying in a...,"[I, met, my, friend, Nife, while, I, was, stud...","[(I, I, PRP), (met, meet, VBD), (my, my, PRP$)...",Arabic
2,5,am8,149,1,137,"Ten years ago, I met a women on the train betw...","[Ten, years, ago, ,, I, met, a, women, on, the...","[(Ten, ten, CD), (years, year, NNS), (ago, ago...",Thai
3,12,dk5,115,1,63,In my country we usually don't use tea bags. F...,"[In, my, country, we, usually, do, n't, use, t...","[(In, in, IN), (my, my, PRP$), (country, count...",Turkish
4,13,dk5,115,1,6,I organized the instructions by time.,"[I, organized, the, instructions, by, time, .]","[(I, I, PRP), (organized, organize, VBD), (the...",Turkish
5,12,ad1,115,1,59,"First, prepare a port, loose tea, and cup.\nSe...","[First, ,, prepare, a, port, ,, loose, tea, ,,...","[(First, first, RB), (,, ,, ,), (prepare, prep...",Korean


#### Add to pelic_df the level of the student at the time the text was written
The level information is found in `course_df`.

In [9]:
# Create a general function to return information based on the course_id in course_df

def get_course_info(idstr, columnname):
    if idstr in course_df.index:
        return course_df.loc[idstr, columnname]
    else: return 'Unknown'

# Test the function
print(course_df.loc[1,'class_id']) 
print(get_course_info(1, 'class_id'))
print(get_course_info(1, 'level_id'))

r
r
2


In [10]:
# Create a new 'class_id' column and 'level_id' column in pelic_df using the 'get_course_info' function and lambda

pelic_df['class_id'] = pelic_df['course_id'].apply(lambda x: get_course_info(x, 'class_id'))
pelic_df['level_id'] = pelic_df['course_id'].apply(lambda x: get_course_info(x, 'level_id'))
pelic_df.head(5)

Unnamed: 0_level_0,question_id,anon_id,course_id,version,text_len,text,tokens,tok_lem_POS,L1,class_id,level_id
answer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,5,eq0,149,1,177,I met my friend Nife while I was studying in a...,"[I, met, my, friend, Nife, while, I, was, stud...","[(I, I, PRP), (met, meet, VBD), (my, my, PRP$)...",Arabic,g,4
2,5,am8,149,1,137,"Ten years ago, I met a women on the train betw...","[Ten, years, ago, ,, I, met, a, women, on, the...","[(Ten, ten, CD), (years, year, NNS), (ago, ago...",Thai,g,4
3,12,dk5,115,1,63,In my country we usually don't use tea bags. F...,"[In, my, country, we, usually, do, n't, use, t...","[(In, in, IN), (my, my, PRP$), (country, count...",Turkish,w,4
4,13,dk5,115,1,6,I organized the instructions by time.,"[I, organized, the, instructions, by, time, .]","[(I, I, PRP), (organized, organize, VBD), (the...",Turkish,w,4
5,12,ad1,115,1,59,"First, prepare a port, loose tea, and cup.\nSe...","[First, ,, prepare, a, port, ,, loose, tea, ,,...","[(First, first, RB), (,, ,, ,), (prepare, prep...",Korean,w,4


**Note:** If desired, the same code could be used to create a _semester,_ _created\_date,_ or _section_ column to focus on development over time.

#### Add to pelic_df the gender of the author of each text (when known)
The gender information is found in `sinfo_df`.

In [11]:
# Create a general function to return information based on the anon_id in sinfo_df

def get_user_info(idstr, columnname):
    if idstr in sinfo_df.index:
        return sinfo_df.loc[idstr, columnname]
    else: return 'Unknown'

# Test the function
print(sinfo_df.loc['eq0','gender'])
print(get_user_info('eq0', 'gender'))

Male
Male


In [12]:
# Create a new 'gender' column in pelic_df using the 'get_user_info' function and lambda

pelic_df['gender'] = pelic_df['anon_id'].apply(lambda x: get_user_info(x, 'gender'))
pelic_df.head(5)

Unnamed: 0_level_0,question_id,anon_id,course_id,version,text_len,text,tokens,tok_lem_POS,L1,class_id,level_id,gender
answer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,5,eq0,149,1,177,I met my friend Nife while I was studying in a...,"[I, met, my, friend, Nife, while, I, was, stud...","[(I, I, PRP), (met, meet, VBD), (my, my, PRP$)...",Arabic,g,4,Male
2,5,am8,149,1,137,"Ten years ago, I met a women on the train betw...","[Ten, years, ago, ,, I, met, a, women, on, the...","[(Ten, ten, CD), (years, year, NNS), (ago, ago...",Thai,g,4,Female
3,12,dk5,115,1,63,In my country we usually don't use tea bags. F...,"[In, my, country, we, usually, do, n't, use, t...","[(In, in, IN), (my, my, PRP$), (country, count...",Turkish,w,4,Female
4,13,dk5,115,1,6,I organized the instructions by time.,"[I, organized, the, instructions, by, time, .]","[(I, I, PRP), (organized, organize, VBD), (the...",Turkish,w,4,Female
5,12,ad1,115,1,59,"First, prepare a port, loose tea, and cup.\nSe...","[First, ,, prepare, a, port, ,, loose, tea, ,,...","[(First, first, RB), (,, ,, ,), (prepare, prep...",Korean,w,4,Female


In [13]:
# Reorder columns to show learner info, course info, then text info

pelic_df = pelic_df[['anon_id','L1','gender','course_id','level_id','class_id','question_id','version','text_len','text',
                    'tokens','tok_lem_POS']]
pelic_df.head()

Unnamed: 0_level_0,anon_id,L1,gender,course_id,level_id,class_id,question_id,version,text_len,text,tokens,tok_lem_POS
answer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,eq0,Arabic,Male,149,4,g,5,1,177,I met my friend Nife while I was studying in a...,"[I, met, my, friend, Nife, while, I, was, stud...","[(I, I, PRP), (met, meet, VBD), (my, my, PRP$)..."
2,am8,Thai,Female,149,4,g,5,1,137,"Ten years ago, I met a women on the train betw...","[Ten, years, ago, ,, I, met, a, women, on, the...","[(Ten, ten, CD), (years, year, NNS), (ago, ago..."
3,dk5,Turkish,Female,115,4,w,12,1,63,In my country we usually don't use tea bags. F...,"[In, my, country, we, usually, do, n't, use, t...","[(In, in, IN), (my, my, PRP$), (country, count..."
4,dk5,Turkish,Female,115,4,w,13,1,6,I organized the instructions by time.,"[I, organized, the, instructions, by, time, .]","[(I, I, PRP), (organized, organize, VBD), (the..."
5,ad1,Korean,Female,115,4,w,12,1,59,"First, prepare a port, loose tea, and cup.\nSe...","[First, ,, prepare, a, port, ,, loose, tea, ,,...","[(First, first, RB), (,, ,, ,), (prepare, prep..."


## Writing out `PELIC_compiled`
Saved as a csv file with optional code for creating a pickle file.

In [14]:
# Write out PELIC_compiled as a csv file

pelic_df.to_csv('../PELIC_compiled.csv',index=True, header=True)

In [15]:
# Option to write out as pickle file

pelic_df.to_pickle('../pelic_compiled.pkl')

## `PELIC_compiled` mini demonstration
The following short example shows how `PELIC_compiled` can be used to apply filters to find a subset of texts, in this case texts by speakers with the following characteristics:
- Korean L1
- Female
- Level 5  

In [16]:
# Using the .loc function to create a subset of PELIC

subset = pelic_df.loc[(pelic_df.L1 == 'Korean') & (pelic_df.gender == 'Female') & (pelic_df.level_id == 5)]
subset

Unnamed: 0_level_0,anon_id,L1,gender,course_id,level_id,class_id,question_id,version,text_len,text,tokens,tok_lem_POS
answer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
132,at8,Korean,Female,118,5,w,17,1,93,my friend is ANON_NAME_0. she is a my ELI frie...,"[my, friend, is, ANON_NAME_0, ., she, is, a, m...","[(my, my, PRP$), (friend, friend, NN), (is, be..."
134,at8,Korean,Female,118,5,w,17,2,104,my friend is ANON_NAME_0. she is a my ELI frie...,"[my, friend, is, ANON_NAME_0, ., she, is, a, m...","[(my, my, PRP$), (friend, friend, NN), (is, be..."
135,at8,Korean,Female,118,5,w,17,3,104,my friend is ANON_NAME_0. she is a my ELI frie...,"[my, friend, is, ANON_NAME_0, ., she, is, a, m...","[(my, my, PRP$), (friend, friend, NN), (is, be..."
145,fn2,Korean,Female,117,5,w,4,1,271,"1. It has been said, ""Not all learning takes p...","[1, ., It, has, been, said, ,, ``, Not, all, l...","[(1, 1, CD), (., ., .), (It, It, PRP), (has, h..."
152,dj0,Korean,Female,117,5,w,4,1,299,There are many qualities of a good neighbor in...,"[There, are, many, qualities, of, a, good, nei...","[(There, there, EX), (are, be, VBP), (many, ma..."
...,...,...,...,...,...,...,...,...,...,...,...,...
47786,ga1,Korean,Female,1065,5,w,6065,1,101,1. North Korea and South Korea are dissimilar ...,"[1, ., North, Korea, and, South, Korea, are, d...","[(1, 1, CD), (., ., .), (North, North, NNP), (..."
47991,ec1,Korean,Female,1060,5,w,6081,1,23,Some people think that businesses should do a...,"[Some, people, think, that, businesses, should...","[(Some, some, DT), (people, people, NNS), (thi..."
48028,ga1,Korean,Female,1065,5,w,6083,1,82,1. People drink coffee when they are tired. Li...,"[1, ., People, drink, coffee, when, they, are,...","[(1, 1, CD), (., ., .), (People, People, NNS),..."
48191,ec1,Korean,Female,1060,5,w,6102,1,155,1. Introduction\n Thesis statement: with the h...,"[1, ., Introduction, Thesis, statement, :, wit...","[(1, 1, CD), (., ., .), (Introduction, introdu..."


Here we see that there are 2092 texts matching these criteria.  
<br>
We may also want to see how many students created these texts and how many texts they wrote on average.

In [17]:
print('There are',len(set(subset.anon_id)), 'students in this subset of PELIC.')
print('On average, these students produced',round(len(subset)/len(set(subset.anon_id)),1),'texts each.')

There are 67 students in this subset of PELIC.
On average, these students produced 31.2 texts each.


We can also check the average length of these texts.

In [18]:
print('The mean text length of this subset is', round(subset.text_len.mean(),1),'words.')

The mean text length of this subset is 94.8 words.


And we can see in which types of classes they wrote them.

In [19]:
print(subset.class_id.value_counts())

r    721
g    679
w    564
l    115
s     13
Name: class_id, dtype: int64


For more detailed tutorials, please see the [tutorials folder](https://github.com/ELI-Data-Mining-Group/PELIC-dataset/tree/master/tutorials) and description in the [`README.md`](https://github.com/ELI-Data-Mining-Group/PELIC-dataset/blob/master/README.md). For, example, you may wish to create concordances of particular linguistic items in this subset, a function which is described in the [`PELIC_concordancing_tutorial`](https://github.com/ELI-Data-Mining-Group/PELIC-dataset/blob/master/tutorials/PELIC_concordancing_tutorial.ipynb). There is also a [`exploratory_data_analysis`](https://github.com/ELI-Data-Mining-Group/PELIC-dataset/blob/master/tutorials/exploratory_data_analysis.ipynb) tutorial which shows how to probe `PELIC_compiled.csv` for statistics relating to PELIC's composition.

[Back to top](#Building-PELIC_compiled.csv)