### Language of Ambiguous Grammar

Let `G = (T, N, P, S)` where `T = {a, b}`, `N = {S}`, and productions `P` are:

    S → ε
    S → aSbS
    S → bSaS

Show that `G` is ambiguous by giving two derivations with different parse trees for `abab`! [2 points]

First parse tree for `abab`:

![Alt text](q4_1.drawio.svg)

Second parse tree for `abab`:

![Alt text](q4_2.drawio.svg)


What language does `G` generate? Give a formal proof! _Hint:_ Use the notation `a#σ` for the number of occurrences of `a` in `σ`. [6 points]

The language generated by `G` is the set of strings consisting of an equal number of `a` and `b`, arranged in any order, including the empty string `ε`:

    L(G) = {σ ∈ {a,b}* | a#σ = b#σ}

*Proof.* This is formally proved by inclusion in both directions. By definition of `L(G)`,

    {χ ∈ T* | S ⇒ᐩ χ} ⊆ {σ ∈ {a,b}* | a#σ = b#σ}
    
means that for every `χ ∈ T*` derivable from `S`, there exists an `σ ∈ {a,b}*` with `a#σ = b#σ` such that `χ = σ`. This is shown by __strong__ induction over the length of derivations from `S`.

- _Base._ A derivation of `χ` of length `1` from `S` can only derive `χ = ε` by the first production. As `ε ∈ L`, the base case holds.
- _Step._ We need to show that each `χ` derivable from `S` in `n` steps, `S ⇒ⁿ χ` such that `χ ∈ L`.
  Induction hypothesis: assume each `χ` derivable from `S` in `1, 2,..., n-1` steps, is in `L`

    - _Case 1:_ using production `S → aSbS`  
      If `χ` is derivable in `n` steps, then `S ⇒ aSbS ⇒ⁿ⁻¹ χ` and `χ` is `aωbτ`. Since `ω` and `τ` are derived from `S` in less than `n` steps, we know that `ω, τ ∈ L` based on the induction hypothesis, which indicates `a#ω = b#ω` and `a#τ = b#τ`. Moreover, we know that `a#χ = a#ω + a#τ + 1` and `b#χ = b#ω + b#τ + 1`. Given `a#ω = b#ω` and `a#τ = b#τ`, we can conclude that `a#χ = b#χ`, and hence `χ ∈ L `
      
    - _Case 2:_ using production `S → bSaS`  
       If `χ` is derivable in `n` steps, then `S ⇒ bSaS ⇒ⁿ⁻¹ χ` and `χ` is `bωaτ`. Since `ω` and `τ` are derived from `S` in less than `n` steps, we know that `ω, τ ∈ L` based on the induction hypothesis, which indicates `b#ω = a#ω` and `b#τ = a#τ`. Moreover, we know that `b#χ = b#ω + b#τ + 1` and `a#χ = a#ω + a#τ + 1`. Given `b#ω = a#ω` and `b#τ = a#τ`, we can conclude that `b#χ = a#χ`, and hence `χ ∈ L`.
      
The inclusion in the other direction means that every `σ ∈ {a,b}*` such that `a#σ = b#σ` can be derived from `S`:

    {σ ∈ {a,b}* | a#σ = b#σ} ⊆ {χ ∈ T* | S ⇒ᐩ χ}

This is shown by __strong__ induction over the length `n` of string `σ`.

- _Base._ For `n = 0`, obviously `σ = ε` can be generated by the first production, `S ⇒ᐩ ε`.
- _Step._ We need to show that each `σ` with length `n > 0` can be generated from `S`. Induction hypothesis: assume each `σ` with length `0, 2, 4,..., n-2` can be generated, `S ⇒ᐩ σ`
      
    - _Case 1:_ `σ` is in the form of `aωbτ` such that `ω, τ ∈ L`  
      If `σ` has length `n`, then the length of `ω` and `τ` will be `≤ n-2`. Based on the induction hypothesis, we know that `S ⇒ᐩ ω` and `S ⇒ᐩ τ`. By using production `S → aSbS`, we can conclude that `S ⇒ aSbS ⇒ᐩ aωbS ⇒ᐩ aωbτ = σ`
      
    - _Case 2:_ `σ` is in the form of `bωaτ` such that `ω, τ ∈ L`  
        If `σ` has length `n`, then the length of `ω` and `τ` will be `≤ n-2`. Based on the induction hypothesis, we know that `S ⇒ᐩ ω` and `S ⇒ᐩ τ`. By using production `S → bSaS`, we can conclude that `S ⇒ bSaS ⇒ᐩ bωaS ⇒ᐩ bωaτ = σ`

Thus we can conclude `L(G) = {σ ∈ {a,b}* | a#σ = b#σ}`.