## 136. Single Number

Given a non-empty array of integers nums, every element appears twice except for one. Find that single one.

You must implement a solution with a linear runtime complexity and use only constant extra space.

Example 1:
```
Input: nums = [2,2,1]
Output: 1
```

Example 2:
```
Input: nums = [4,1,2,1,2]
Output: 4
```

Example 3:
```
Input: nums = [1]
Output: 1
```

Submit solution here: https://leetcode.com/problems/single-number/description/

## Addl. Pandas Functionality

Before we dive deeper into SQL data manipulation, let’s explore more pandas. Namely:

* How to fix dataframe formatting.
* How to iterate through a dataframe.
* How to create new columns.

There are is a seemingly infinite number of ways you can manipulate dataframes, we will just explore some basics that will help us better understand the functionality of pandas.


## Formatting

In our previous dataframe we noticed some errors with our columns and the values inside of our dataframe.

* “Substitutions” column had an extraneous space
* There were two “Netherlands”

Leaving these errors in makes your job harder down the line (technical debt). We should handle these errors immediately. 

Firstly, let’s fix our “Substitutions” column.

In [33]:
import pandas as pd
# load in your dataset located in `data/scorers.csv`
df = pd.read_csv("../data/scorers.csv")

df.columns

Index(['Country', 'League', 'Club', 'Player Names', 'Matches_Played',
       'Substitution ', 'Mins', 'Goals', 'Shots', 'OnTarget',
       'Shots Per Avg Match', 'On Target Per Avg Match', 'Year'],
      dtype='object')

In [35]:
# this will result in an error!
df["Substitution"]

KeyError: 'Substitution'

In [36]:
# this fixes my error
df.rename(columns={"Substitution ": "Substitution"}, inplace=True)

In [37]:
df["Substitution"]

0      16.0
1       0.0
2       1.0
3       3.0
4      10.0
       ... 
655     0.0
656     2.0
657     0.0
658     0.0
659    11.0
Name: Substitution, Length: 660, dtype: float64

Next, let’s deal with the problem of two “Netherlands”, one with a space and one without.

In [38]:
df.groupby("Country")["Goals"].mean()

Country
 Netherlands    10.375000
Brazil          10.204301
England         13.480000
France           9.964912
Germany         12.193548
Italy           14.097826
Netherlands      5.800000
Portugal         7.812500
Spain           12.820225
USA             12.972222
Name: Goals, dtype: float64

In [21]:
df.groupby("Country")["Goals"].groups.keys()

dict_keys([' Netherlands', 'Brazil', 'England', 'France', 'Germany', 'Italy', 'Netherlands', 'Portugal ', 'Spain', 'USA'])

In [24]:
df.replace(" Netherlands", "Netherlands", inplace=True)

# check countries again
df.groupby("Country")["Goals"].mean()

Country
Brazil         10.204301
England        13.480000
France          9.964912
Germany        12.193548
Italy          14.097826
Netherlands     7.833333
Portugal        7.812500
Spain          12.820225
USA            12.972222
Name: Goals, dtype: float64

## Iterating through a DataFrame

Sometimes, we utilize previous dataframes to create new dataframes. Sometimes, we must iterate through a dataframe to properly compute measures on collected data.

We accomplish this through the `iterrows()` built-in method, or `itertuples()`.

Leaving these errors in makes your job harder down the line (technical debt). We should handle these errors immediately. 

In [39]:
for index, row in df.iterrows():
    print(row)

Country                              Spain
League                             La Liga
Club                                 (BET)
Player Names               Juanmi Callejon
Matches_Played                        19.0
Substitution                          16.0
Mins                                1849.0
Goals                                 11.0
Shots                                 48.0
OnTarget                              20.0
Shots Per Avg Match                   2.47
On Target Per Avg Match               1.03
Year                                  2016
Name: 0, dtype: object
Country                                Spain
League                               La Liga
Club                                   (BAR)
Player Names               Antoine Griezmann
Matches_Played                          36.0
Substitution                             0.0
Mins                                  3129.0
Goals                                   16.0
Shots                                   88.0
OnTarget     

In [41]:
from transliterate import translit

for index, row in df.iterrows():
    print(row["Mins"])

1849.0
3129.0
2940.0
2842.0
1745.0
2634.0
1967.0
2694.0
2354.0
2904.0
2480.0
2340.0
2910.0
3361.0
1392.0
1735.0
2102.0
2984.0
1633.0
1690.0
2567.0
2054.0
3294.0
2648.0
2671.0
3247.0
3276.0
2633.0
2764.0
3188.0
3308.0
2585.0
3194.0
3031.0
3030.0
2737.0
2228.0
3138.0
2033.0
3241.0
2315.0
2673.0
2028.0
2695.0
2449.0
2477.0
2871.0
2420.0
2496.0
1351.0
2772.0
2950.0
2000.0
833.0
2894.0
1494.0
2121.0
1895.0
3075.0
1882.0
3385.0
3448.0
2636.0
2938.0
3299.0
2250.0
3101.0
3228.0
2347.0
2327.0
3232.0
3182.0
2444.0
3511.0
2562.0
2493.0
2578.0
1311.0
2174.0
2856.0
2981.0
2757.0
2234.0
1684.0
2448.0
2016.0
2051.0
2670.0
2360.0
2511.0
1786.0
3322.0
1412.0
2423.0
3093.0
2821.0
2143.0
2007.0
2147.0
1834.0
2390.0
2596.0
3053.0
1872.0
2770.0
3008.0
2071.0
3123.0
3168.0
2864.0
2869.0
2163.0
2375.0
3081.0
3555.0
2785.0
2572.0
1630.0
3180.0
2517.0
3359.0
3014.0
2229.0
2407.0
2935.0
2735.0
2864.0
2797.0
2914.0
2854.0
2115.0
3157.0
3491.0
3177.0
2897.0
2799.0
3133.0
3096.0
2244.0
2950.0
2247.0
2507.0
2447.0


## Creating New Columns

And sometimes we would like to create new columns to progress our analyses.

We accomplish this using syntax similar to Python dictionaries, where we simply assign a value to a new key.

In [42]:
diction = {}

diction["Hello"] = "Goodbye"

print(diction)

{'Hello': 'Goodbye'}


Let’s say we want to compute the ratio of goals per shots.

In [44]:
df["Goals_Ratio"] = df["Goals"] / df["Shots"]

df.head()

Unnamed: 0,Country,League,Club,Player Names,Matches_Played,Substitution,Mins,Goals,Shots,OnTarget,Shots Per Avg Match,On Target Per Avg Match,Year,Goals_Ratio
0,Spain,La Liga,(BET),Juanmi Callejon,19.0,16.0,1849.0,11.0,48.0,20.0,2.47,1.03,2016,1
1,Spain,La Liga,(BAR),Antoine Griezmann,36.0,0.0,3129.0,16.0,88.0,41.0,2.67,1.24,2016,1
2,Spain,La Liga,(ATL),Luis Suarez,34.0,1.0,2940.0,28.0,120.0,57.0,3.88,1.84,2016,1
3,Spain,La Liga,(CAR),Ruben Castro,32.0,3.0,2842.0,,117.0,42.0,3.91,1.4,2016,1
4,Spain,La Liga,(VAL),Kevin Gameiro,21.0,10.0,1745.0,13.0,50.0,23.0,2.72,1.25,2016,1


## Database Concepts

Before we explore SQL and its syntax, we must be familiar with a couple of concepts & terms
* DBMS (Database Management System) 
* Database
* Relational Database
* Keys
* Set Theory


## DBMS

DBMS (Database Management System) is a software tool for storing and managing large amounts of data.

A database server is a specific installation of a DBMS.
Meanwhile, a database is simply a collection of data, often stored in a server.

Properties of a DBMS

Allow users to Read/Write/Update using a query language
Allow multiple users to manipulate data without failure
Store massive amounts of data.
Backups

## Databases

A set of rules that describe how data is stored.

What kinds of data does it accept? (Strings, numbers, booleans)

Database schema (or data model) describes what types of data are valid to store

Database instance is the actual data that exists inside the database

We will be working with relational databases, which are essentially dataframes.

## Keys

A key is some data value that maps to some unique assortment of data

## SQL Basics

For the next queries, we will be working on the website: https://onecompiler.com/postgresql/

To create a database, copy and paste the following query:

We use the syntax `CREATE TABLE <NAME>(attribute1 datatyp1, attribute2 datatype2)`

In [None]:
create table bakers(
      baker        varchar(10) primary key
      , fullname   varchar(100)
      , age        int
      , occupation varchar(100)
      , hometown   varchar(100)
);

In [None]:
insert into bakers values('Antony','Antony Amourdoux',30,'Banker','London');
insert into bakers values('Briony','Briony Williams',33 ,'Full-time parent','Bristol');
insert into bakers values('Dan','Dan Beasley-Harling',36 ,'Full-time parent','London');
insert into bakers values('Imelda','Imelda McCarron',33 ,'Countryside recreation officer','County Tyrone');
insert into bakers values('Jon','Jon Jenkins',47 ,'Blood courier','Newport');
insert into bakers values('Karen','Karen Wright',60 ,'In-store sampling assistant','Wakefield');
insert into bakers values('Kim-Joy','Kim-Joy Hewlett',27 ,'Mental health specialist','Leeds');
insert into bakers values('Luke','Luke Thompson',30 ,'Civil servant/house and techno DJ','Sheffield');
insert into bakers values('Manon','Manon Lagrève',26 ,'Software project manager','London');
insert into bakers values('Rahul','Rahul Mandal',30 ,'Research scientist','Rotherham');
insert into bakers values('Ruby','Ruby Bhogal',29 ,'Project manager','London');
insert into bakers values('Terry','Terry Hartill',56 ,'Retired air steward','West Midlands');

## Queries

Firstly, we will review how to create queries for data. Essentially, how to request data.
The general format goes:

				SELECT ... 
                FROM ... 
                WHERE …;

Let’s say we want to select the names of bakers that are younger than 40.

In [None]:
SELECT baker, age
FROM bakers
WHERE age < 40;

We can get more information by indicating more attributes:

In [None]:
SELECT baker, age
FROM bakers
WHERE age < 40;

We can combine logical statements using “and” or “or”

Let’s say we want to select bakers only from London:

In [None]:
SELECT baker, age
FROM bakers
WHERE age < 40 and hometown = 'London';