## Tech in the Workplace

We’ve learned about various packages & tools, but when do we know when to use them?

Well, the easy answer is that we are sometimes told which tools to use, especially in entry-level positions, or in positions where we are given sufficient training and codebase explanation.

* This is a Python 3 pipeline hosted on a Docker container…
* This is a React web-app that uses TypeScript …
* This is a R script that uses MPI to process data in parallel …

However, sometimes we are placed in positions where we are allowed to make decisions of which tech-stack we will be using for ourselves or for our team.

* Which programming language do I use to make a computational vision ML model?
* I want this pipeline to be useable by non-programmers, which programming language should I write this in?

And sometimes when we are allowed to make these decisions, we might lean towards, or be directed to use some new, unfamiliar, and hyped piece of software that claims to have novel features never-before seen.

Ex: The software tool, DataAnalyzur, makes data-pipelines easy to program, and dedicates 3% of computational power to mine bitcoin on your machine.

Technologists are the most optimisitic people, we need to reign in this optimisim

Unless your team already uses a piece of technology that you must learn, do not dedicate time to learning extraneous software tools that look interesting. 

There are millions of software tools out there, our value lies in focusing on the right ones.

Instead, focus on strengthening the essentials: 
* Python or R for programming pipelines
* SQL for database management 

Think of this like learning a language. Learning German, Spanish, Japanese, and Turkish at the same time is almost impossible. 

## 595. Big Countries

Table: World
```
+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| name        | varchar |
| continent   | varchar |
| area        | int     |
| population  | int     |
| gdp         | int     |
+-------------+---------+
name is the primary key column for this table.
Each row of this table gives information about the name of a country, the continent to which it belongs, its area, the population, and its GDP value.
```

A country is big if:

* it has an area of at least three million (i.e., 3000000 km2), or
* it has a population of at least twenty-five million (i.e., 25000000).

Write an SQL query to report the name, population, and area of the big countries.

Return the result table in any order.

Submit solution here: https://leetcode.com/problems/big-countries/description/

## Basic SQL Queries

To review, we create basic SQL queries using the following structure
	
			SELECT …
			FROM …
			WHERE … ;

From this table, get the attributes of the rows where the following is true.

To select specific rows that “contain” a pattern of characters in a row, we use patterns.
We use these patterns to discover strings that contain specific assortment of characters. 
Specifically we utilize the LIKE keyword within the WHERE clause.


In [None]:
CREATE TABLE bakers(
      baker        varchar(10) primary key
      , fullname   varchar(100)
      , age        int
      , occupation varchar(100)
      , hometown   varchar(100)
);

insert into bakers values('Antony','Antony Amourdoux',30,'Banker','London');
insert into bakers values('Briony','Briony Williams',33 ,'Full-time parent','Bristol');
insert into bakers values('Dan','Dan Beasley-Harling',36 ,'Full-time parent','London');
insert into bakers values('Imelda','Imelda McCarron',33 ,'Countryside recreation officer','County Tyrone');
insert into bakers values('Jon','Jon Jenkins',47 ,'Blood courier','Newport');
insert into bakers values('Karen','Karen Wright',60 ,'In-store sampling assistant','Wakefield');
insert into bakers values('Karin','Karin Right',60 ,'In-store sampling manager','Wakefield');
insert into bakers values('Kim-Joy','Kim-Joy Hewlett',27 ,'Mental health specialist','Leeds');
insert into bakers values('Luke','Luke Thompson',30 ,'Civil servant/house and techno DJ','Sheffield');
insert into bakers values('Manon','Manon Lagrève',26 ,'Software project manager','London');
insert into bakers values('Rahul','Rahul Mandal',30 ,'Research scientist','Rotherham');
insert into bakers values('Ruby','Ruby Bhogal',29 ,'Project manager','London');
insert into bakers values('Terry','Terry Hartill',56 ,'Retired air steward','West Midlands');


## LIKE

Some quick patterns:
* ‘K%’ : Match zero or more characters afterwards
* ‘Karen’: Match only ‘Karen’
* ‘Kar_n’: Match names that contain “K” “A” “R”, then any letter, and then “N”

In [None]:
SELECT *
FROM bakers
WHERE baker LIKE 'K%';

## Regex

We gain even more power by using POSIX Regex.

Regular Expressions (Regex) are like LIKE patterns, but are even more powerful.

We get rid of the “LIKE” keyword to form:
	SELECT baker
	FROM bakers
	WHERE baker ~* ‘KAREN’;

POSIX Regex matches patterns anywhere within the string, whereas LIKE patterns only matched exact patterns

Ex: LIKE Pattern “ab” does NOT match “abc”
Ex: Regex Pattern “ab” MATCHES “abc” because “ab” is a substring

POSIX Regex rules:
> . :  matches any character once 
> * :  matches any preceding character one or more times
> ^ : string must start with following character

In [None]:
SELECT baker FROM bakers WHERE baker ~ '^k';

## Quotation Marks

You’ll notice that within our data, we have additional quotation marks describing the name of a contestants signature dish

‘Aberffraw ‘’Creams’’’

In [None]:
create table signatures(
     episodeid int
     , baker   varchar(100)
     , make    varchar(100)
     , foreign key (baker) references bakers(baker)
);

insert into signatures values(1,'Antony','Turmeric and Caraway Goosnargh Cakes');
insert into signatures values(1,'Briony','Apple Cider Empire Biscuits');
insert into signatures values(1,'Dan','Lemon and Strawberry Shrewsburys');
insert into signatures values(1,'Imelda','Cherry and White Chocolate Oatmeal Biscuits');
insert into signatures values(1,'Jon','Aberffraw ''Creams''');
insert into signatures values(1,'Karen','Yorkshire Perkins');
insert into signatures values(1,'Kim-Joy','Orange Blossom York Biscuits');
insert into signatures values(1,'Luke','Yorkshire Gingernuts');
insert into signatures values(1,'Manon','Hazelnut Cornish Shortbread');
insert into signatures values(1,'Rahul','Fennel and Coconut Pitcaithly Bannock');
insert into signatures values(1,'Ruby','Masala Chai Devon Flats');
insert into signatures values(1,'Terry','Lake District Ginger Shortbread');
insert into signatures values(2,'Antony','Cardamom and Coconut Burfi Traybake');
insert into signatures values(2,'Briony','Turron and Orange Traybake');
insert into signatures values(2,'Dan','Black Forest Slice');
insert into signatures values(2,'Jon','Lemon Meringue Traybake');
insert into signatures values(2,'Karen','Almond and Marzipan Traybake with Rhubarb Jam');
insert into signatures values(2,'Kim-Joy','Pandan Chiffon Cake with Palm Sugar Cream');
insert into signatures values(2,'Luke','Lemon and Poppy Seed Traybake');
insert into signatures values(2,'Manon','Rosemary and Honey Traybake');
insert into signatures values(2,'Rahul','Lemon and Cardamom Traybake');
insert into signatures values(2,'Ruby','Boozy Black Forest Traybake');
insert into signatures values(2,'Terry','Rum and Raisin Traybake');
insert into signatures values(3,'Antony','Decadent Breakfast Chelsea Buns');
insert into signatures values(3,'Briony','Balsamic Strawberry Chelsea Buns');
insert into signatures values(3,'Dan','Sticky Spiced Orange Chelsea Buns');
insert into signatures values(3,'Jon','Cardiff City vs Chelsea Buns');
insert into signatures values(3,'Karen','Peak District Christmas Chelsea Buns');
insert into signatures values(3,'Kim-Joy','Pistachio and Cardamom Tangzhong Chelsea Buns');
insert into signatures values(3,'Manon','Apricot, Cranberry and Marzipan Chai Chelsea Buns');
insert into signatures values(3,'Rahul','Mango and Cranberry Chelsea Buns');
insert into signatures values(3,'Ruby','Gujarela Chelsea Buns with Dates, Almonds and Raisins');
insert into signatures values(3,'Terry','Tangy Citrus Sticky Chelsea Buns');
insert into signatures values(4,'Briony','Treacle Tart Roulade');
insert into signatures values(4,'Dan','Florida Roulade');
insert into signatures values(4,'Jon','Mango and Passion Fruit Roulade');
insert into signatures values(4,'Karen','Coffee Cream and Praline Roulade');
insert into signatures values(4,'Kim-Joy','''Sweet Dreams'' Roulade');
insert into signatures values(4,'Manon','Amarene and Kirsch Cherry Roulade');
insert into signatures values(4,'Rahul','Rhubarb and Custard Roulade');
insert into signatures values(4,'Ruby','Pina Colada Roulade');
insert into signatures values(5,'Briony','Honey and Apricot Ginger Cake');
insert into signatures values(5,'Dan','Ginger and Lemon Drip Cake');
insert into signatures values(5,'Jon','Family Christmas Ginger Cake');
insert into signatures values(5,'Karen','Bonfire Night Ginger Cake');
insert into signatures values(5,'Kim-Joy','Stem Ginger Cake with Poached Pears');
insert into signatures values(5,'Manon','Italian Meringue Ginger Cake');
insert into signatures values(5,'Rahul','Bonfire Night Caramel Ginger Cake');
insert into signatures values(5,'Ruby','Jamaican Me Crazy Ginger Cake');
insert into signatures values(5,'Terry','Caramelised Pear and Stem Ginger Cake');
insert into signatures values(6,'Briony','Home Comforts');
insert into signatures values(6,'Dan','Festive Samosas');
insert into signatures values(6,'Jon','A Romantic Dinner For Two, Samosa Style');
insert into signatures values(6,'Kim-Joy','Flavours of India');
insert into signatures values(6,'Manon','Samosas for Mum');
insert into signatures values(6,'Rahul','Paneer Singara and Misti Singara');
insert into signatures values(6,'Ruby','Traditional Samosas');
insert into signatures values(7,'Briony','French Onion Tartlets and Celeriac & Apple Tartlets');
insert into signatures values(7,'Jon','Garlic Mushroom Tartlets and Falafel & Hummus Tartlets');
insert into signatures values(7,'Kim-Joy','Broccoli & Tomato Quiches and Mascarpone Squirrel Tartlets');
insert into signatures values(7,'Manon','Summer & Winter Tartlets');
insert into signatures values(7,'Rahul','Coriander Posto & Veg Tartlets and Ghugni Chaat Tartlets');
insert into signatures values(7,'Ruby','Sage & Butternut Tartlets and ''Cheesy Greens'' Tartlets');
insert into signatures values(8,'Briony','Spanish & West Country Smørrebrød');
insert into signatures values(8,'Kim-Joy','Bumblebee Eggs & Fish Smørrebrød');
insert into signatures values(8,'Manon','Cheese and Fruit Smørrebrød');
insert into signatures values(8,'Rahul','Smoked Salmon & Roasted Vegetable Smørrebrød');
insert into signatures values(8,'Ruby','Post-Gym Smørrebrød');
insert into signatures values(9,'Briony','Mojito Madeleines & Espresso Martini Madeleines');
insert into signatures values(9,'Kim-Joy','Ginger and Lemon Madeleines & Orange Bunny Madeleines');
insert into signatures values(9,'Rahul','Lemon and Raspberry Madeleines & Orange Curd Madeleines');
insert into signatures values(9,'Ruby','Pick Your Own Madeleines');
insert into signatures values(10,'Kim-Joy','Amaretto Diplomat Filled Doughnuts & Lemon Ring Doughnuts');
insert into signatures values(10,'Rahul','Mango Créme Pâtissière Filled Doughnuts & Spiced Orange Ring Doughnuts');
insert into signatures values(10,'Ruby','Dulce De Leche Filled Doughnuts & Raspberry and Cardamom Ring Doughnuts');

## NULL Values

Just like in Python, we have None (NULL) values in SQL as well.

We haven’t spoken too much about None or NULL values in the concept of Computer Science, so let’s address this concept.

NULL is not 0 or an empty string 

Instead, it is a special value that indicates that something does yet not exist

* It could exist, but we don’t know it (Mo’s Phone #)
* It could exist, but it hasn’t been calculated yet (Final DS-Track grade)

Weird things happen when we try to compare NULL values to real values.
> NULL = 5, evaluates to UNKNOWN
> NULL > 5, evaluates to UNKNOWN
> NULL LIKE ‘K%’ evaluates to UNKNOWN

We will explore how this impacts our queries if we potentially have NULL values in our rows
Furthermore, this also results in new behavior when combined with TRUE or FALSE statements

In [None]:
SELECT *
FROM bakers
WHERE occupation IS NULL ;

## SQL Operations

Just like in Python, we have operators in SQL that come prepackaged with the language.
They are:
* String operations: upper, lower, position, substring, trim
* Numerical operations: +, -,  *, /, %, ^, !
* Mathematical operations: abs, ceil, floor, log, mod, round, sqrt

In [None]:
SELECT age - 5 as age
FROM bakers;

## Multiple Tables

In DBMS’s, we very rarely have just one table.

Instead we have a system of tables.

We often map the relationships between these tables using Entity-Relationship-Diagrams (ERD’s)

In [None]:
SELECT *
FROM bakers, signatures ;

This query results in something called the “Cartesian Product”. This is basically every single possible arrangement of rows.

Cartesian Product: the set of all possible ordered pairs with first element from the first set and second element from the second .

To prevent this explosion of information, let’s specify a WHERE clause that joins rows on the same baker.

In [None]:
SELECT *
FROM bakers b, signatures s 
WHERE b.baker = s.baker;

For each row in the cartesian product of b & s,
	Display the row only if the baker from the b dataset is the same as the baker from the s dataset

We now know how to join two tables together via cartesian product
However, it is pretty inefficient

Consider, we must generate every possible arrangement of tables bakers & signatures (975 rows) in order to just get 3 rows when looking for Antony’s dishes.

This is so abysmally inefficient, that we should find a much better approach (tomorrow)

## Set & Bag

In addition to creating the cartesian product, we can also apply set operators on our dataset to find intersecting & all information.


Let’s consider how we can use set theory in SQL
* SET operations: Union, Intersect, Except
* BAG operations: Union All, Intersect All, Except All


## Aggregate Operations

Let’s consider how we can calculate metrics on tables using aggregates
Aggregate: “a whole formed by combining several (typically disparate) elements.”

We can utilize the following aggregate functions within our SELECT clause:
* count,
* min, 
* max, 
* avg, 
* sum, 
* stddev
