Skip to content

Latest commit

 

History

History
26 lines (16 loc) · 1.74 KB

project-plan.md

File metadata and controls

26 lines (16 loc) · 1.74 KB

Project Plan

Summary

This project will analyze written data from ESL learners and identify differences in syntax of different ESL learners based on their native language and proficiency level. The goal is to determine how quantitative measures of syntactic complexity differ between more advanced learners and less advanced learners and to determine whether they differ between learners based on their L1.

Data

The project will utilize the PELIC data set, which is a large corpus of writing samples from ESL learners who participated in Pitt's Intensive English Program. The data has already been organized and tagged, so the bulk of future work will involve statistical and linguistic analysis rather than data cleaning.

I also plan to utilize the Tool for the Automatic Analysis of Syntactic Sophistication and Complexity (TAASCC), a syntactic analysis tool, to obtain various numerical measures of syntactic complexity. These data points will likely be added to the rest of the PELIC data for later analysis.

Analysis

The end goal is to analyze the statistical significance of the differences in quantitative measures of syntactic complexity between learners with different L1s. Exploratory data analysis, followed by linguistic analysis (part-of-speech tagging, etc.), will be required to prepare the writing samples for further statistical analysis. Research on syntactic parsing methods and SLA will be necessary (beginning with resources provided by Dr. Alan Juffs). There may also be predicative analysis using some machine learning methods if time and interest allow.

Presentation

The presentation of the analysis will almost certainly involve plots and statistical tests in Jupyter Notebook.