### SQL
#### cross_join
Combina todas las filas de dos tablas
``` mysql
select * from A cross_join B
```

### Python Basics

#### List comprehensions
Manera rapida de crear listas con operativas no muy complejas

In [2]:
list1 = [x * 2 for x in range(10)]
list2 = [x * 2 for x in range(10) if x%2 == 0]
list3 = [x * 2 if x%2 == 0 else x * 3 for x in range(10)]
print(list1)
print(list2)
print(list3)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
[0, 4, 8, 12, 16]
[0, 3, 4, 9, 8, 15, 12, 21, 16, 27]


#### Lambdas
Funciones anonimas para pasar a funciones de orden superior (ejemplos mas adelante con Pandas)

In [3]:
func1 = lambda num: num*2
func2 = lambda num: num*2 if num%2==0 else num*3

print(func1(2))
print(func2(2))
print(func2(3))

4
4
9


#### Pandas

In [24]:
import pandas as pd
import numpy as np

Dataframe $\rightarrow$ Matriz de nFilas x nColumnas

In [5]:
df = pd.read_csv('datasets/iris/iris.csv')
df.head()

Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


Series $\rightarrow$ 1 fila x nColumnas (o al reves)

In [6]:
series = df.Species
print(series.head())

0    setosa
1    setosa
2    setosa
3    setosa
4    setosa
Name: Species, dtype: object


groupby $\rightarrow$ Para agrupar datasets por valores iguales en una columna

In [7]:
grouped = df.groupby('Species')
grouped.mean()

Unnamed: 0_level_0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width
Species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
setosa,5.006,3.428,1.462,0.246
versicolor,5.936,2.77,4.26,1.326
virginica,6.588,2.974,5.552,2.026


pivot $\rightarrow$ para poner una columna con valores discretos como columnas

In [46]:
titanic = pd.read_csv('datasets/titanic/titanic3.csv')
titanic = titanic[['sex', 'age', 'ticket']]
mean_age_per_ticket_and_sex = titanic.groupby(['sex', 'ticket']).mean().reset_index()
mean_age_per_ticket_and_sex.age = round(mean_age_per_ticket_and_sex.age, 2)
mean_age_per_ticket_and_sex.pivot(columns='sex', index='ticket', values='age').head()

sex,female,male
ticket,Unnamed: 1_level_1,Unnamed: 2_level_1
110152,26.33,
110413,28.5,52.0
110465,,47.0
110469,,30.0
110489,,42.0


reset_index $\rightarrow$ resetea los indices que se quedan al pivotar o agrupar para tener indices ordenados empezando en 0

In [48]:
print(titanic.head())
mean_age_per_ticket_and_sex = titanic.groupby(['sex', 'ticket']).mean().reset_index()
print(mean_age_per_ticket_and_sex)

      sex      age  ticket
0  female  29.0000   24160
1    male   0.9167  113781
2  female   2.0000  113781
3    male  30.0000  113781
4  female  25.0000  113781
         sex       ticket        age
0     female       110152  26.333333
1     female       110413  28.500000
2     female       110813  60.000000
3     female       111361  30.000000
4     female       112053  19.000000
...      ...          ...        ...
1079    male   W./C. 6607        NaN
1080    male   W./C. 6608  17.000000
1081    male  W.E.P. 5734  46.000000
1082    male    W/C 14208  30.000000
1083    male    WE/P 5735  70.000000

[1084 rows x 3 columns]
