This graduate-level course, designed for in-service English teachers at the secondary education level, offers an insightful exploration into corpus linguistics, combined with a practical introduction to Natural Language Processing (NLP) using basic Python coding. By integrating these computational techniques, the course aims to enhance the study of large electronic text collections (corpora) and their application in understanding language use and patterns. Participants will gain a deeper comprehension of language variation and how it can inform and improve teaching practices, leading to more effective development of teaching materials and activities. The course provides a balanced mix of theoretical instruction and practical application, focusing on the analysis and interpretation of corpus data in English language usage and introducing essential NLP techniques through Python programming.
| 💾 Syllabus | 👭 Padlet: inclass activity | 📗Python basics manual | 🌳 Class log |
Week | Date | Key topic(s) | Description | Code page | Assignments |
---|---|---|---|---|---|
W01 | Mar6 | Introduction | Course overview, syllabus; What is corpus linguistics? | survey | |
W02 | Mar13 | Python basics #1 | Online Corpora: COCA, BNC, Types of corpora; NLTK | CL01,CL02,🔸nltk,📗 | |
W03 | Mar20 | Python basics #2 | Data types, NLTK (section 1) | 🔸nltk, NLTK01 | |
W04 | Mar27 | Project #1 | NLTK, 🔸Word cloud, 🔸Word Frequency list | 🔸nltk | |
W05 | Apr3 | Lexical analysis | Type vs. token, lemmatization | Code, 🔸nltk | Assign01 (Apr17) |
W06 | (Apr10) | Keywords | Text analysis, Words in context, concordance, collocations | NgramCode | |
W07 | Apr17 | Lexical diversity | Type-Token-Ratio (TTR) and other lexical diversity measures | Reading123, wordlist-stopwords, code |
Assign1 Presentation (15mins) |
W08 | Apr24 | lexical diversity measures | Midterm discussion | LD-practice, N-gramCode |
|
W09 | (May1) | Midterm | (take-home) | ||
W10 | May8 | Readability, Topic-modeling | Readability measures, NLP preprocessing, topic-modeling | Intro, Readability, App | sampletext, RE, ArticleUse |
W11 | (May15) | Sentiment Analysis | Data collection, | Individual project submission | |
W12 | May22 | Clustering Analysis | Data collection, (Clustering Analysis), Sentiment Analysis | Code | |
W13 | May29 | Project #2 | Idea brainstorming, individual project discussions, samples | TEDdata | |
W14 | June5 | Project #3 | individual project discussions, samples | ||
W15 | June12 | Final project | Presentations |
1. Data Types and Variables: ➡️details
- Understanding different types of data (quantitative vs. qualitative) and variables (discrete vs. continuous).
- Definition of frequency data.
- Differentiating between absolute frequency, relative frequency, and cumulative frequency.
- Techniques for collecting frequency data
- Organizing data into tables and charts.
- Measures of central tendency (mean, median, mode) specifically for frequency data.
- Measures of variability (range, variance, standard deviation).
- Histograms, bar charts, and pie charts for frequency data.
- Understanding and interpreting these graphical representations.
- Basic probability concepts and rules.
- Probability distributions relevant to frequency data (e.g., binomial, Poisson).
- Concepts of population vs. sample.
- Understanding sampling distributions and the central limit theorem.
- Concepts of null and alternative hypotheses.
- Tests of significance (e.g., Chi-square test) for frequency data.
- Understanding the relationship between variables.
- Linear regression analysis pertinent to frequency data.
- Analyzing and interpreting statistical results.
- Effective communication of findings.
- Ethical issues in data collection, analysis, and reporting.
- Data privacy and confidentiality.
- Multivariate analysis.
- Time-series analysis and its relevance to frequency data.