# What is computational linguistics?

## Computational linguistics in science

Computational linguistics is the scientific study of language from a computational perspective. 

Computational linguists are interested in providing computational models of various kinds of linguistic phenomena. 


## Computational linguistics in technology

Computational linguists develop a working component of a natural language system. 


For example:

1.  Speech recognizer  
    <img src="https://github.com/chong-zhang-linguistics/lectures/blob/master/Cortana.jpg?raw=True" alt="alt text" width="450" height="450">

2.  Speech to text synthesizer  
    “Read Out Loud” 
    ![alt text](https://github.com/chong-zhang-linguistics/lectures/blob/master/Adobe_Acrobat.png?raw=True)
    Adobe Reader 6.0 (or later)
    Menu > View > Read Out Loud
    
3.  Web search engine  
    <img src="https://github.com/chong-zhang-linguistics/lectures/blob/master/google.png?raw=True" alt="alt text" width="450" height="450">

4.  Machine translator 

    A word/phrase to word/phrase strategy is obviously not enough. 
    
    Machine translation engines usually relies on language models, statistical information from corpora, and neural network techniques. 
    
    Machine translators usually allow customization to specify domains, such as financial, legal, sports, etc. 

### An example: machine translation

1. BLEU scores (bilingual evaluation understudy)
    * To evaluate the translation quality from one language to another.
    Quality is considered to be the best if a machine translation is the closet to a human translation.
    BLEU approximates human judgement at a corpus level --- it may perform badly in judging individual sentences.

    * Example from Papineni (2002):

        Candidate 1: It is a guide to action which ensures that the military always obeys the commands of the party.

        Candidate 2: It is to insure the troops forever hearing the activity guidebook that party direct.

        Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.

        Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.

        Reference 3: It is the practical guide for the army always to heed the directions of the party.

    * BLEU score is between 0 and 1. 
    
    * The closer it is to 1, the better the translation. 
    
    * BLEU can easily be calculated using <span style="color:red">**Python**</span>. 
    
    * Precision (the fraction of retrieved instances that are relevant) and recall (the fraction of relevant instances retrieved over the total amount of instances) are usually twinned to evaluate translations. 
    
    * Example of poor machine translation output with high precision  
    
Candidate	|the|	the	|the	|the|	the	|the|	the
--- | ---| ---| ---| ---| ---| ---| ---
Reference 1	|the	|cat	|is	|on	|the	|mat|
Reference 2	|there	|is	|a	|cat	|on	|the	|mat

2. sentiment tranfer

    * A clear <span style="color:green">**positive**</span> tweet: Amazon employs a successful & cost effective business model that's the future. 

    * A clear <span style="color:red">**negative**</span> tweet: The #AmazonWashingtonPost, sometimes referred to as the guardian of Amazon not paying internet taxes (which they should) is FAKE NEWS!
    
    * A neutral tweet: It's true. I've bought vet supplies on amazon in the past, rather than deal with my gp's office and fee schedules

    Detecting sentiment in the original text --> machine translation --> is the sentiment still there?

    Many techniques to improve machine translation quality: 
    
    Preprocessing 
    
    Do-not-translate
    
    ![alt text](https://github.com/chong-zhang-linguistics/lectures/blob/master/Cup_and_Handle.gif?raw=True)


# What is programming?

A (very detailed, step-by-step) sequence of instructions telling a computer what to do.

Those instructions should be written in a computer programming language.

## We are going to learn Python
<img src="https://github.com/chong-zhang-linguistics/lectures/blob/master/python.png?raw=True" alt="alt text" width="450" height="450">

* Some of Python’s notable features  
    1. Uses an elegant syntax, making the programs you write easier to read.
    2. Is an easy-to-use language that makes it simple to get your program working. 
    3. Comes with a large standard library that supports many common programming tasks.


* Applications developed in Python:

<img src="https://github.com/chong-zhang-linguistics/lectures/blob/master/group.png?raw=True" alt="alt text" width="450" height="450">


# Topics 

\#  |Topics 
--- | ---
1 | strings
2 | lists, list comprehension
3 | dictionaries, counters, sets
4 | conditionals
5 | loops
6 | functions
7 | recursive function and memoization
8 | regular expressions
9 | n-gram models
10| sentiment analysis
11| name entity recognition: rule-based & machine learning
12| working with NLTK
 
## How Python is used in running experiment: PsychoPy
[Stroop effect](http://www.onlinestrooptest.com/stroop_effect_test.php)

## How Python is used in producing your own website: Pelican
[MathLing Reading Group](http://complab-stonybrook.github.io/mlrg/)



# Grading of this class

It is not possible to learn programming without doing exercises. You can sit in the classroom and follow me, type in all the commands and maybe memorize them too. But when you face a novel problem there is a high chance you get stuck again. This is because learning programming is more about learning a way of thinking, i.e. the way how programmers think. Knowing basic Python syntax does not mean you know how to program in Python, anyways, you can just look Python syntax up online when you need to. That's why I suggest you do all the homework and we will discuss your solutions in class too.


We do not have a final exam/project. Participation takes up 20%. The majority of your grade comes from your homework. Every time after the homework is due, your submission (regardless of right/wrong) counts 30% of your final grade. Then in-class discussion of your homework solutions takes up the remaining 50%. Remember, the major purpose of this class is for you to learn some coding skills in Python which facilitates your future work in computational lingusitcs. 

Component | Percentage
--- | ---
Participation | 20% 
Homework submission | 30% 
Homework presentation | 50% 

# What you need to prepare

Starting from the next class, we are going to start coding in Python! To get ready for the coding practices, you need to install Python (if you haven't already) and Jupyter notebook, see [instructions](http://jupyter.readthedocs.io/en/latest/install.html)
