[![pythonista](img/pythonista.png)](https://www.pythonista.io)

# ```Tipos de JOIN```.

````JOIN``` nos permite unir dos tablas a partir de una coindición que permita ligarlas ambas.

```
SELECT <cols> FROM <tabla izq> <TIPO> JOIN <tabla der> ON <condicion>
```

https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html

In [None]:
import pandas as pd
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("JOINS").getOrCreate()
ct = spark.sparkContext

## Tablas ilustrativas.

Se definirán las tablas temporales ```zona_1``` y ```zona_2```. Cada una de ellas tiene las columnas:

* ```animal```
* ```poblacion```


In [None]:
spark.createDataFrame(pd.DataFrame(
    {'animal':['zorro', 
               'conejo',
               'liebre', 
               'halcón'],
     'poblacion':[12,
                  436,
                  315,
                  7]
    })).createOrReplaceTempView('zona_1')

In [None]:
spark.sql('''SELECT * 
             FROM zona_1;''').toPandas()

In [None]:
spark.createDataFrame(pd.DataFrame(
    {'animal':['conejo',
               'jabalí',
               'venado',
               'jaguar',
               'águila',
               'halcón'],
     'poblacion':[2015,
                  450,
                  56,
                  2,
                  30,
                  25]
    })).createOrReplaceTempView('zona_2')

In [None]:
spark.sql('''
            SELECT * 
            FROM zona_2;
            ''').toPandas()

### ```JOIN```

In [None]:
spark.sql('''
            SELECT
                izq.animal, 
                izq.poblacion
            FROM zona_1 AS izq
            JOIN zona_2 AS der
                ON izq.animal = der.animal;
            ''').toPandas()

In [None]:
spark.sql('''
            SELECT 
                izq.animal, 
                izq.poblacion AS pob_izq, 
                der.poblacion AS pob_der
            FROM zona_1 AS izq
            JOIN zona_2 AS der
                ON izq.animal = der.animal;
             ''').toPandas()

### ```INNER JOIN```

In [None]:
spark.sql('''
            SELECT 
                izq.animal, 
                izq.poblacion 
            FROM zona_1 AS izq
            INNER JOIN zona_2 AS der
                ON izq.animal = der.animal;
            ''').toPandas()

In [None]:
spark.sql('''
            SELECT 
                der.animal,
                der.poblacion 
            FROM zona_1 AS izq
            INNER JOIN zona_2 AS der
                ON izq.animal = der.animal;
            ''').toPandas()

### ```LEFT JOIN```

In [None]:
spark.sql('''
            SELECT 
                izq.animal,
                izq.poblacion 
            FROM zona_1 AS izq
            LEFT JOIN zona_2 AS der
                ON izq.animal = der.animal;
            ''').toPandas()

In [None]:
spark.sql('''
            SELECT 
                der.animal,
                der.poblacion 
            FROM zona_1 AS izq
            LEFT JOIN zona_2 AS der
                ON izq.animal = der.animal;
            ''').toPandas()

In [None]:
spark.sql('''
            SELECT
                izq.animal,
                izq.poblacion AS pob_izq,
                der.poblacion AS pob_der 
            FROM zona_1 AS izq
            LEFT JOIN zona_2 AS der
                ON izq.animal = der.animal;
            ''').toPandas()

### ```RIGHT JOIN```

In [None]:
spark.sql('''
            SELECT
                izq.animal,
                izq.poblacion 
            FROM zona_1 AS izq
            RIGHT JOIN zona_2 AS der
                ON izq.animal = der.animal;
            ''').toPandas()

In [None]:
spark.sql('''
            SELECT
                der.animal,
                der.poblacion 
            FROM zona_1 AS izq
            RIGHT JOIN zona_2 AS der
                ON izq.animal = der.animal;
            ''').toPandas()

In [None]:
spark.sql('''
            SELECT
                der.animal,
                izq.poblacion AS pob_izq, 
                der.poblacion AS pob_der 
            FROM zona_1 AS izq
            RIGHT JOIN zona_2 AS der
                ON izq.animal = der.animal;
            ''').toPandas()

### ```FULL OUTER JOIN```

In [None]:
spark.sql('''
            SELECT
                izq.animal,
                izq.poblacion 
            FROM zona_1 AS izq
            FULL OUTER JOIN zona_2 AS der
                ON izq.animal = der.animal;
            ''').toPandas()

In [None]:
spark.sql('''
            SELECT 
                der.animal,
                der.poblacion 
            FROM zona_1 AS izq
            FULL OUTER JOIN zona_2 AS der
                ON izq.animal = der.animal;
            ''').toPandas()

In [None]:
spark.sql('''
            SELECT
                CASE 
                    WHEN izq.animal IS NULL THEN der.animal
                    ELSE izq.animal
                END AS animal, 
                der.poblacion AS pob_der,
                izq.poblacion AS pob_izq
            FROM zona_1 AS izq
            FULL OUTER JOIN zona_2 AS der
                ON izq.animal = der.animal;
            ''').toPandas()

### ```LEFT SEMI JOIN```

In [None]:
spark.sql('''
            SELECT
                izq.animal,
                izq.poblacion 
            FROM zona_1 AS izq
            LEFT SEMI JOIN zona_2 AS der
                ON izq.animal = der.animal;
            ''').toPandas()

In [None]:
spark.sql('''
            SELECT
                der.animal,
                der.poblacion 
            FROM zona_1 AS izq
            LEFT SEMI JOIN zona_2 AS der
                ON izq.animal = der.animal;
            ''').toPandas()

### ```LEFT ANTI JOIN```

In [None]:
spark.sql('''
            SELECT 
                izq.animal, 
                izq.poblacion 
            FROM zona_1 AS izq
            LEFT ANTI JOIN zona_2 AS der
                ON izq.animal = der.animal;
            ''').toPandas()

### CROSS JOIN

In [None]:
spark.sql('''
             SELECT
                 izq.animal AS animal_izq, 
                 der.animal AS animal_der,
                 der.poblacion AS pob_der,
                 izq.poblacion AS pob_izq
            FROM zona_1 AS izq
            CROSS JOIN zona_2 AS der''').toPandas()

In [None]:
spark.sql('''SELECT
               izq.animal AS animal_izq, 
               der.animal AS animal_der,
               der.poblacion AS pob_der,
               izq.poblacion AS pob_izq
               FROM zona_1 AS izq, zona_2 AS der''').toPandas()

<p style="text-align: center"><a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Licencia Creative Commons" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/80x15.png" /></a><br />Esta obra está bajo una <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Licencia Creative Commons Atribución 4.0 Internacional</a>.</p>
<p style="text-align: center">&copy; José Luis Chiquete Valdivieso. 2023.</p>