# Problem 1

In this problem, you will write code to "parse" a restricted form of SQL queries. These exercises are about string processing and regular expressions. There are five (5) exercises, numbered 0-4, which are worth a total of ten (10) points.

In [9]:
from IPython.display import display

import re
import pandas as pd

# Random number generation for generating test cases:
from random import randrange, randint, choice, sample

**Background: SQL review.** Suppose you have two SQL tables, `OneTable` and `AnotherTable`, and you wish to perform an inner-join on that links a column named `ColA` in `OneTable` with `ColB` in `AnotherTable`. Recall that one simple way to do that in SQL would be:

```SQL
    SELECT * FROM OneTable, AnotherTable
        WHERE OneTable.ColA = AnotherTable.ColB
```

Or, consider the following more complex example. Suppose you have an additional table named `YetAThird` and you wish to extend the inner-join to include matches between its column, `ColC`, and a second column from `AnotherTable` named `ColB2`:

```SQL
    SELECT * FROM OneTable, AnotherTable, YetAThird
        WHERE OneTable.ColA = AnotherTable.ColB AND AnotherTable.ColB2 = YetAThird.ColC
```

**Exercise 0** (2 points). Suppose you are given a string containing an SQL query in the restricted form,

```SQL
    SELECT * FROM [tbls] WHERE [conds]
```

Implement the function, **`split_simple_join(q)`**, so that it takes a query string **`q`** in the form shown above and **returns a pair of substrings** corresponding to `[tbls]` and `[conds]`.

For example, if

```python
    q == """SELECT * FROM OneTable, AnotherTable, YetAThird
              WHERE OneTable.ColA = AnotherTable.ColB AND AnotherTable.ColB2=YetAThird.ColC"""
```

then

```python
    split_simple_join(q) == ("OneTable, AnotherTable, YetAThird",
                             "OneTable.ColA = AnotherTable.ColB AND AnotherTable.ColB2=YetAThird.ColC")
```

**IMPORTANT NOTE!** In this problem, you only need to return the substring between `FROM` and `WHERE` and the one after `WHERE`. You will extract the table names and conditions later on, below.

You should make the following assumptions:

* The input string `q` contains exactly one such query, with no nesting of queries (e.g., no instances of `"SELECT * FROM (SELECT ...)"`). However, the query may (or may not) be a multiline string as shown in the example. (Treat newlines as whitespace.)
* Your function should ignore any leading or trailing whitespace around the SQL keywords, e.g., `SELECT`, `FROM`, and `WHERE`.
* The substring between `SELECT` and `FROM` will be any amount of whitespace, followed by an asterisk (`*`). 
* You should **not** treat the SQL keywords in a case-sensitive way; for example, you would regard `SELECT`, `select`, and `sElEct` as the same. However, do **not** change or ignore the case of the non-SQL keywords.
* The `[tbls]` substring contains only a simple list of table names and no other substrings that might be interpreted as SQL keywords.
* The `[conds]` substring contains only table and column names (e.g., `OneTable.ColA`), the equal sign, the `AND` SQL keyword, and whitespace, but no other SQL keywords or symbols.

> Assuming you are using regular expressions for this problem, recall that you can pass [`re.VERBOSE`](https://docs.python.org/3/library/re.html#re.VERBOSE) when writing a multiline regex pattern.

In [131]:

def split_simple_join(q):
    assert type(q) is str
    #
    # YOUR CODE HERE
    #
    pattern=r'FROM[\w\s,]*WHERE'
    ans=re.findall(pattern,q,re.IGNORECASE)
    ans=ans[0].split(' ', 1)[1]
    ans=ans.rsplit('\n', 1)[0]
    
    pattern2=r'WHERE[\w\s,.=]*'
    ans2=re.findall(pattern2,q,re.IGNORECASE)
    ans2=ans2[0].split(' ', 1)[1]
    
    #a = 'FROM'
    #b = 'WHERE'
    
    #result=q.split(a)[-1].split(b)[0]
    
    #result2=q.split(b)[1]
    #ans=(result,result2)
    
    return ans,ans2
# Demo
q_demo = """SELECT * FROM OneTable, AnotherTable, YetAThird
              WHERE OneTable.ColA = AnotherTable.ColB AND AnotherTable.ColB2=YetAThird.ColC"""
print(split_simple_join(q_demo))

('OneTable, AnotherTable, YetAThird', 'OneTable.ColA = AnotherTable.ColB AND AnotherTable.ColB2=YetAThird.ColC')


In [132]:
# Test cell: `split_simple_join_test1`

assert split_simple_join(q_demo) == \
           ('OneTable, AnotherTable, YetAThird',
            'OneTable.ColA = AnotherTable.ColB AND AnotherTable.ColB2=YetAThird.ColC')
print("\n(Passed!)")


(Passed!)


In [133]:
# Test cell: `split_simple_join_test2`

__SQL = {'SELECT', 'FROM', 'WHERE'} # SQL keywords

# Different character classes
__ALPHA = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_'
__ALPHA_VARS = __ALPHA + '_'
__NUM = '0123456789'
__ALPHA_NUM_VARS = __ALPHA_VARS + __NUM
__DOT = '.'
__SPACES = ' \t\n'
__EQ = '='
__ALL = __ALPHA_NUM_VARS + __DOT + __SPACES + __EQ

def flip_coin():
    from random import choice
    return choice([True, False])

def rand_str(k_min, k_max, alphabet):
    """
    Returns a random string of `k` letters chosen uniformly from
    from `alphabet` with replacement, where `k_min <= k <= k_max`
    where `k` is also chosen uniformly at random.
    """
    assert k_max >= k_min >= 0
    k = k_min + randint(0, k_max-k_min)
    return ''.join([choice(alphabet) for _ in range(k)])

def rand_spaces(k_max, k_min=0):
    assert k_max >= k_min >= 0
    return rand_str(k_min, k_max, __SPACES)

def rand_case(s):
    """Randomly changes the case of each letter in a string."""
    return ''.join([c.upper() if flip_coin() else c.lower() for c in s])

def rand_var(k_min=1, k_max=10):
    assert k_max >= k_min >= 1
    s = choice(__ALPHA_VARS)
    s += rand_str(k_min-1, k_max-1, __ALPHA_NUM_VARS)
    if s.upper() in __SQL: # Don't generate keywords -- try again
        return rand_var(k_min, k_max)
    return s

def rand_vars(k_min, k_max):
    V = set()
    dk = randint(0, k_max-k_min)
    for k in range(k_min + dk + 1):
        v = rand_var()
        V.add(v)
    return V

def rand_table(max_cols):
    table = rand_var()
    columns = rand_vars(1, max_cols)
    return (table, columns)

def rand_tables(k_min, k_max, c_max=4):
    assert k_max >= k_min >= 1
    num_tables = k_min + randint(0, k_max-k_min)
    tables = {}
    for _ in range(num_tables):
        table, columns = rand_table(c_max)
        while table in tables:
            table, columns = rand_table(v_max, c_max)
        tables = {**tables, **{table: columns}}
    return tables

def rand_field(table_name, col_names):
    return table_name + "." + choice(list(col_names))

def rand_cond(tables):
    assert type(tables) is dict
    assert len(tables) >= 2
    a, b = sample(tables.keys(), 2)
    a_col = rand_field(a, tables[a])
    b_col = rand_field(b, tables[b])
    return (a_col, b_col)

def rand_cond_set(tables, k_min, k_max):
    k = k_min + randint(0, k_max-k_min)
    conds = set()
    while len(conds) < k:
        (a, b) = rand_cond(tables)
        if ((a, b) not in conds) and ((b, a) not in conds):
            conds.add((a, b))
    return conds

def cond_set_to_substrs(conds, max_wspad=4):
    substrs = []
    for a, b in conds:
        s = "{}{}{}{}{}".format(a,
                                rand_spaces(max_wspad),
                                __EQ,
                                rand_spaces(max_wspad),
                                b)
        substrs.append(s)
    return substrs

def substrs_to_str(substrings, sep=',', max_wspad=4):
    s_final = ''
    for k, s in enumerate(substrings):
        if k > 0:
            s_final += rand_spaces(max_wspad) + sep + rand_spaces(max_wspad)
        s_final += s
    return s_final

def rand_query_ans(max_tables, max_conds, max_wspad=4):
    tables = rand_tables(2, max_tables)
    cond_set = rand_cond_set(tables, 1, 4)
    return tables, cond_set

def pad_1(max_wspad):
    return rand_spaces(max(1, max_wspad), 1)

def form_select(max_wspad=4):
    return pad_1(max_wspad) + rand_case("SELECT") + pad_1(max_wspad) + "*"

def form_from(tables, max_wspad=4):
    from_ans = substrs_to_str(list(tables.keys()), sep=',', max_wspad=max_wspad)
    return pad_1(max_wspad) + rand_case("FROM") + pad_1(max_wspad) + from_ans, from_ans

def form_where(cond_set, max_wspad=4):
    cond_substrs = cond_set_to_substrs(cond_set)
    cond_ans = substrs_to_str(cond_substrs, sep=' AND ', max_wspad=max_wspad)
    return pad_1(max_wspad) + rand_case("WHERE") + pad_1(max_wspad) + cond_ans, cond_ans

def form_query_str(tables, cond_set, max_wspad=4):
    select_clause = form_select(max_wspad)
    from_clause, from_ans = form_from(tables, max_wspad)
    where_clause, cond_ans = form_where(cond_set, max_wspad)
    query = select_clause + from_clause + where_clause
    return query, from_ans, cond_ans

def split_simple_join_battery(num_tests, max_wspad=4):
    for k in range(num_tests):
        tables, cond_set = rand_query_ans(5, 5, max_wspad)
        qstmt, from_ans, where_ans = form_query_str(tables, cond_set, max_wspad)
        print("=== Test Statement {} ===\n'''{}'''\n".format(k, qstmt))
        print("True 'FROM' clause substring: '''{}'''\n".format(from_ans))
        print("True 'WHERE' clause substring: '''{}'''\n".format(where_ans))
    
split_simple_join_battery(5, 3)

print("\n(Passed!)")

=== Test Statement 0 ===
'''	sElECt	*	froM 	EQC ,  Ha_H8wxu_t,	v6_iZnJG,	O	,OsZB	

whErE
 
O.o  	=
 		EQC.GCtCbGiS1    AND EQC.GCtCbGiS1	=		
 v6_iZnJG.x'''

True 'FROM' clause substring: '''EQC ,  Ha_H8wxu_t,	v6_iZnJG,	O	,OsZB'''

True 'WHERE' clause substring: '''O.o  	=
 		EQC.GCtCbGiS1    AND EQC.GCtCbGiS1	=		
 v6_iZnJG.x'''

=== Test Statement 1 ===
'''
		SeLEct	*
	 frOM
 rdGIu7
 ,

wG	 	,W 	
wHerE		 W.eYQKLFQ		=wG.v
 AND 	  W.AgvYP 	
=
rdGIu7.Sn7T
	 AND 
 	wG.v	 = 	 
W.Q		
 AND  wG.jitF			=

	rdGIu7.bI'''

True 'FROM' clause substring: '''rdGIu7
 ,

wG	 	,W'''

True 'WHERE' clause substring: '''W.eYQKLFQ		=wG.v
 AND 	  W.AgvYP 	
=
rdGIu7.Sn7T
	 AND 
 	wG.v	 = 	 
W.Q		
 AND  wG.jitF			=

	rdGIu7.bI'''

=== Test Statement 2 ===
''' 

SelECt 	*	FRom  	inTrs9U ,
	WI_1u		,		YlNXLlqm_  ,beTnH	
wHere
	
YlNXLlqm_.mWlDwY0THd 
	=

 beTnH.Wm_Q		 AND  WI_1u.s8=  
YlNXLlqm_.mWlDwY0THd'''

True 'FROM' clause substring: '''inTrs9U ,
	WI_1u		,		YlNXLlqm_  ,beTnH'''

True 'WHERE' clause substring:

**Variable names.** For this problem, let a valid _variable name_ be a sequence of alphanumeric characters or underscores, where the very first character _cannot_ be a number. For example,

    some_variable
    __another_VariAble
    Yet_a_3rd_var
    _A_CSE_6040_inspired_var
    
are all valid variable names, whereas the following are not.

    123var_is_bad
    0_is_not_good_either
    4goodnessSakeStopItAlready

**Exercise 1** (2 points). Implement a function, **`is_var(s)`**, that checks whether a valid variable name. That is, it should return `True` if and only if `s` is valid according to the above definition. Your function should ignore any leading or trailing spaces in `s`.

For example:

```python
    assert is_var("__another_VariAble")
    assert not is_var("0_is_not_good_either")
    assert is_var("   foo")
    assert is_var("_A_CSE_6040_inspired_var   ")
    assert not is_var("#getMe2")
    assert is_var("   Yet_a_3rd_var  ")
    assert not is_var("123var_is_bad")
    assert not is_var("  A.okay")
```

In [104]:
def is_var(s):
    assert type(s) is str
    #
    # YOUR CODE HERE
    #
    found=re.search(r'^[\s]*[a-zA-Z_][\w\d_\s]*$',s)
    print(found)
    ans=0
    if found:
        ans=True
    
    return ans

is_var(' 6etxiJ6QfG')



None


0

In [105]:
# Test cell, part 1: `is_var_test0`

assert is_var("__another_VariAble")
assert not is_var("0_is_not_good_either")
assert is_var("   foo")
assert is_var("_A_CSE_6040_inspired_var   ")
assert not is_var("#getMe2")
assert is_var("   Yet_a_3rd_var  ")
assert not is_var("123var_is_bad")
assert not is_var("  A.okay")

print("\n(Passed part 1 of 2.)")

<_sre.SRE_Match object; span=(0, 18), match='__another_VariAble'>
None
<_sre.SRE_Match object; span=(0, 6), match='   foo'>
<_sre.SRE_Match object; span=(0, 27), match='_A_CSE_6040_inspired_var   '>
None
<_sre.SRE_Match object; span=(0, 18), match='   Yet_a_3rd_var  '>
None
None

(Passed part 1 of 2.)


In [106]:
# Test cell: `is_var_test2`

for v in rand_vars(20, 30):
    ans = flip_coin()
    if not ans:
        v = choice(__NUM) + v
    v = rand_spaces(3) + v + rand_spaces(3)
    your_ans = is_var(v)
    assert your_ans == ans, "is_var('{}') == {} instead of {}.".format(v, your_ans, ans)
    
print("\n(Passed part 2 of 2.)")

<_sre.SRE_Match object; span=(0, 9), match='\t\t\tEnw7  '>
None
None
None
None
None
None
None
<_sre.SRE_Match object; span=(0, 13), match='\nCZBXvuUaFY\n\n'>
<_sre.SRE_Match object; span=(0, 13), match=' \tP9MbmYRP4\n\n'>
None
None
None
<_sre.SRE_Match object; span=(0, 13), match='\n\nTLepBAhyvs\n'>
None
None
None
None
None
<_sre.SRE_Match object; span=(0, 7), match='\njN4\n\n\t'>
<_sre.SRE_Match object; span=(0, 11), match='_abXgUSt \n '>

(Passed part 2 of 2.)


**Column variables.** A _column variable_ consists of two valid variable names separated by a single period. For example,

    A.okay
    a32X844._387b
    __C__.B3am
    
are all examples of column variables: in each case, the substrings to the left and right of the period are valid variables.

**Exercise 2** (1 point). Implement a function, **`is_col(s)`**, so that it returns `True` if and only if **`s`** is a column variable, per the definition above.

For example:

```python
    assert is_col("A.okay")
    assert is_col("a32X844._387b")
    assert is_col("__C__.B3am")
    assert not is_col("123.abc")
    assert not is_col("abc.123")
```

As with Exercise 1, your function should ignore any leading or trailing spaces.

In [112]:
def is_col(s):
    #
    # YOUR CODE HERE
    #
    ans=0
    if s.find(".") != -1:
        a=s.split('.')[0]
        b=s.split('.')[1]
        if is_var(a)==True and is_var(b)==True:
            ans=True
    
    return ans

In [113]:
# Test cell: `is_col_test0`

assert is_col("A.okay")
assert is_col("a32X844._387b")
assert is_col("__C__.B3am")
assert not is_col("123.abc")
assert not is_col("abc.123")

print("\n(Passed part 1.)")

<_sre.SRE_Match object; span=(0, 1), match='A'>
<_sre.SRE_Match object; span=(0, 4), match='okay'>
<_sre.SRE_Match object; span=(0, 7), match='a32X844'>
<_sre.SRE_Match object; span=(0, 5), match='_387b'>
<_sre.SRE_Match object; span=(0, 5), match='__C__'>
<_sre.SRE_Match object; span=(0, 4), match='B3am'>
None
<_sre.SRE_Match object; span=(0, 3), match='abc'>
None

(Passed part 1.)


In [114]:
# Test cell: `is_col_test1`

def test_is_col_1():
    a = rand_var()
    assert not is_col(a), "is_col('{}') == {} instead of {}.".format(a, is_col(a), False)
    a_valid = flip_coin()
    if not a_valid:
        a = rand_str(1, 5, __NUM)
    return a, a_valid

for _ in range(20):
    a, a_valid = test_is_col_1()
    b, b_valid = test_is_col_1()
    ans = a_valid and b_valid
    
    c = "{}{}.{}{}".format(rand_spaces(3), a, b, rand_spaces(3))
    your_ans = is_col(c)
    print("==> is_col('{}') == {}".format(c, your_ans))
    assert your_ans == ans

print("\n(Passed part 2.)")

None
==> is_col(' 
47361.u
') == 0
<_sre.SRE_Match object; span=(0, 7), match=' MSakuQ'>
<_sre.SRE_Match object; span=(0, 10), match='s7YdyeO2LC'>
==> is_col(' MSakuQ.s7YdyeO2LC') == True
<_sre.SRE_Match object; span=(0, 4), match='\n\n_A'>
None
==> is_col('

_A.709') == 0
None
==> is_col('
7.d19Whasyq3	') == 0
<_sre.SRE_Match object; span=(0, 4), match='M325'>
<_sre.SRE_Match object; span=(0, 11), match='FO1raIL7xA '>
==> is_col('M325.FO1raIL7xA ') == True
<_sre.SRE_Match object; span=(0, 10), match='\n Ujn7kQNS'>
None
==> is_col('
 Ujn7kQNS.49	') == 0
<_sre.SRE_Match object; span=(0, 6), match=' Plxje'>
None
==> is_col(' Plxje.8') == 0
None
==> is_col('


0316.340	') == 0
None
==> is_col('	71.YVtxz
	') == 0
<_sre.SRE_Match object; span=(0, 8), match='\nEBv7yJN'>
None
==> is_col('
EBv7yJN.8') == 0
None
==> is_col(' 
732.j2tW
 ') == 0
None
==> is_col('05.p8v9  
') == 0
None
==> is_col(' 	 19227.p
') == 0
<_sre.SRE_Match object; span=(0, 11), match='\n\nUY5T6y_Lz'>
<_sre.SRE_Match objec

**Equality strings.** An _equality string_ is a string of the form,

    A.x = B.y

where `A.x` and `B.y` are _column variable_ names and `=` is an equals sign. There may be any amount of whitespace---including none---before or after each variable and the equals sign.

**Exercise 3** (2 points). Implement the function, **`extract_eqcols(s)`**, below. Given an input string **`s`**, if it is an equality string, your function should return a pair `(u, v)`, where `u` and `v` are the two column variables in the equality string. For example:

```python
    assert extract_eqcols("F3b._xyz =AB0_.def") == ("F3b._xyz", "AB0_.def")
```

If `s` is not a valid equality string, then your function should return `None`.

In [119]:
def extract_eqcols(s):
    #
    # YOUR CODE HERE
    #
    ans=None
    if s.find("=") != -1:
        a=s.split('=')[0]
        a=a.strip()
        b=s.split('=')[1]
        b=b.strip()
        if is_col(a)==True and is_col(b)==True:
            ans=a,b
        
    
    return ans
    
    
print(extract_eqcols("F3b._xyz =AB0_.def"))

<_sre.SRE_Match object; span=(0, 3), match='F3b'>
<_sre.SRE_Match object; span=(0, 4), match='_xyz'>
<_sre.SRE_Match object; span=(0, 4), match='AB0_'>
<_sre.SRE_Match object; span=(0, 3), match='def'>
('F3b._xyz', 'AB0_.def')


In [120]:
# Test cell: `extract_eqcols0`

assert extract_eqcols("F3b._xyz =AB0_.def") == ("F3b._xyz", "AB0_.def")
assert extract_eqcols("0F3b._xyz =AB0_.def") is None

print("\n(Passed part 1 of 2.)")

<_sre.SRE_Match object; span=(0, 3), match='F3b'>
<_sre.SRE_Match object; span=(0, 4), match='_xyz'>
<_sre.SRE_Match object; span=(0, 4), match='AB0_'>
<_sre.SRE_Match object; span=(0, 3), match='def'>
None

(Passed part 1 of 2.)


In [121]:
# Test cell: `extract_eqcols1`

for _ in range(5):
    _, cond_set = rand_query_ans(2, 10, 5)
    for a, b in cond_set:
        s = a + rand_spaces(3) + __EQ + rand_spaces(3) + b
        print("==> Processing:\n'''{}'''\n".format(s))
        ans = extract_eqcols(s)
        print("    *** Found: {} ***".format(ans))
        assert ans is not None, "Did not detect an equality string where there was one!"
        assert ans[0] == a and ans[1] == b, "Returned {} instead of ({}, {})".format(ans, a, b)

print("\n(Passed part 2 of 2.)")

==> Processing:
'''_e9rvmQ.IwggKBs
=
  _D9HxxW_E.I'''

<_sre.SRE_Match object; span=(0, 7), match='_e9rvmQ'>
<_sre.SRE_Match object; span=(0, 7), match='IwggKBs'>
<_sre.SRE_Match object; span=(0, 9), match='_D9HxxW_E'>
<_sre.SRE_Match object; span=(0, 1), match='I'>
    *** Found: ('_e9rvmQ.IwggKBs', '_D9HxxW_E.I') ***
==> Processing:
'''_e9rvmQ.IwggKBs   =_D9HxxW_E.oVu4cIP8'''

<_sre.SRE_Match object; span=(0, 7), match='_e9rvmQ'>
<_sre.SRE_Match object; span=(0, 7), match='IwggKBs'>
<_sre.SRE_Match object; span=(0, 9), match='_D9HxxW_E'>
<_sre.SRE_Match object; span=(0, 8), match='oVu4cIP8'>
    *** Found: ('_e9rvmQ.IwggKBs', '_D9HxxW_E.oVu4cIP8') ***
==> Processing:
'''_e9rvmQ.XB8ijWr =	

_D9HxxW_E.xeHy'''

<_sre.SRE_Match object; span=(0, 7), match='_e9rvmQ'>
<_sre.SRE_Match object; span=(0, 7), match='XB8ijWr'>
<_sre.SRE_Match object; span=(0, 9), match='_D9HxxW_E'>
<_sre.SRE_Match object; span=(0, 4), match='xeHy'>
    *** Found: ('_e9rvmQ.XB8ijWr', '_D9HxxW_E.xeHy') ***
==> Proc

**Exercise 4** (2 points). Given an SQL query in the restricted form described above, write a function that extracts all of the join conditions from the `WHERE` clause. Name this fuction, **`extract_join_conds(q)`**, where `q` is the query string. It should return a list of pairs, where each pair `(a, b)` is the name of the left- and right-hand sides in one of these conditions.

For example, suppose:

```python
    q == """SELECT * FROM OneTable, AnotherTable, YetAThird
              WHERE OneTable.ColA = AnotherTable.ColB AND AnotherTable.ColB2=YetAThird.ColC"""
```

Notice that the `WHERE` clause contains two conditions: `OneTable.ColA = AnotherTable.ColB` and `AnotherTable.ColB2=YetAThird.ColC`. Therefore, your function should return a list of two pairs,
as follows:

```python
    extract_join_conds(q) == [("OneTable.ColA", "AnotherTable.ColB"),
                              ("AnotherTable.ColB2", "YetAThird.ColC")]
```

In [140]:
def extract_join_conds(q):
    #
    # YOUR CODE HERE
    #
    ans=[]

    initial=split_simple_join(q)[1]
    joins=initial.split('AND')
        
    for i in joins:
        ans.append(extract_eqcols(i))
        
    return ans


print("==> Query:\n\t'{}'\n".format(q_demo))
print("==> Results:\n{}".format(extract_join_conds(q_demo)))

==> Query:
	'SELECT * FROM OneTable, AnotherTable, YetAThird
              WHERE OneTable.ColA = AnotherTable.ColB AND AnotherTable.ColB2=YetAThird.ColC'

<_sre.SRE_Match object; span=(0, 8), match='OneTable'>
<_sre.SRE_Match object; span=(0, 4), match='ColA'>
<_sre.SRE_Match object; span=(0, 12), match='AnotherTable'>
<_sre.SRE_Match object; span=(0, 4), match='ColB'>
<_sre.SRE_Match object; span=(0, 12), match='AnotherTable'>
<_sre.SRE_Match object; span=(0, 5), match='ColB2'>
<_sre.SRE_Match object; span=(0, 9), match='YetAThird'>
<_sre.SRE_Match object; span=(0, 4), match='ColC'>
==> Results:
[('OneTable.ColA', 'AnotherTable.ColB'), ('AnotherTable.ColB2', 'YetAThird.ColC')]


In [141]:
# Test cell: `extract_join_conds_test`

def test_extract_join_conds_1():
    tables, cond_set = rand_query_ans(5, 5, 0)
    qstmt, _, _ = form_query_str(tables, cond_set, 0)
    qstmt = re.sub("[\n\t]", " ", qstmt)
    print("=== {} ===\n".format(qstmt))
    print("  True solution: {}\n".format(cond_set))
    your_conds = extract_join_conds(qstmt)
    print("  Your solution: {}\n".format(your_conds))
    assert set(your_conds) == cond_set, "*** Mismatch? ***"
    
for _ in range(10):
    test_extract_join_conds_1()
    
print("\n(Passed!)")

===  seLeCt * fROm L1sz,QlqYd,lPFGT,FRRoAdyQ1 whERE L1sz.IQxx_  =FRRoAdyQ1.jneUQ AND L1sz._K_Yz   =   FRRoAdyQ1.xQ_sel ===

  True solution: {('L1sz.IQxx_', 'FRRoAdyQ1.jneUQ'), ('L1sz._K_Yz', 'FRRoAdyQ1.xQ_sel')}

<_sre.SRE_Match object; span=(0, 4), match='L1sz'>
<_sre.SRE_Match object; span=(0, 5), match='IQxx_'>
<_sre.SRE_Match object; span=(0, 9), match='FRRoAdyQ1'>
<_sre.SRE_Match object; span=(0, 5), match='jneUQ'>
<_sre.SRE_Match object; span=(0, 4), match='L1sz'>
<_sre.SRE_Match object; span=(0, 5), match='_K_Yz'>
<_sre.SRE_Match object; span=(0, 9), match='FRRoAdyQ1'>
<_sre.SRE_Match object; span=(0, 6), match='xQ_sel'>
  Your solution: [('L1sz.IQxx_', 'FRRoAdyQ1.jneUQ'), ('L1sz._K_Yz', 'FRRoAdyQ1.xQ_sel')]

===  SeLEct * fRom GnQ2YbI,C6wr_gmava,dlOM8k,bD,_3DGKJw3 WherE C6wr_gmava.TbtqKxyuba   =    GnQ2YbI._Inf7bNN ===

  True solution: {('C6wr_gmava.TbtqKxyuba', 'GnQ2YbI._Inf7bNN')}

<_sre.SRE_Match object; span=(0, 10), match='C6wr_gmava'>
<_sre.SRE_Match object; span=(0, 10

**Fin!** This marks the end of this problem. Don't forget to submit it to get credit.