# SQL
#### cross_join
Combina todas las filas de dos tablas
``` mysql
select * from A cross_join B
```

# Python Basics

#### List comprehensions
Manera rapida de crear listas con operativas no muy complejas

In [2]:
list1 = [x * 2 for x in range(10)]
list2 = [x * 2 for x in range(10) if x%2 == 0]
list3 = [x * 2 if x%2 == 0 else x * 3 for x in range(10)]
print(list1)
print(list2)
print(list3)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
[0, 4, 8, 12, 16]
[0, 3, 4, 9, 8, 15, 12, 21, 16, 27]


#### Lambdas
Funciones anonimas para pasar a funciones de orden superior (ejemplos mas adelante con Pandas)

In [3]:
func1 = lambda num: num*2
func2 = lambda num: num*2 if num%2==0 else num*3

print(func1(2))
print(func2(2))
print(func2(3))

4
4
9


# Pandas

In [4]:
import pandas as pd
import numpy as np

Dataframe $\rightarrow$ Matriz de nFilas x nColumnas

In [5]:
df = pd.read_csv('datasets/iris/iris.csv')
df.head()

Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


Series $\rightarrow$ 1 fila x nColumnas (o al reves)

In [6]:
series = df.Species
print(series.head())

0    setosa
1    setosa
2    setosa
3    setosa
4    setosa
Name: Species, dtype: object


groupby $\rightarrow$ Para agrupar datasets por valores iguales en una columna

In [7]:
grouped = df.groupby('Species')
grouped.mean()

Unnamed: 0_level_0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width
Species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
setosa,5.006,3.428,1.462,0.246
versicolor,5.936,2.77,4.26,1.326
virginica,6.588,2.974,5.552,2.026


pivot $\rightarrow$ para poner una columna con valores discretos como columnas

In [8]:
titanic = pd.read_csv('datasets/titanic/titanic3.csv')
titanic = titanic[['sex', 'age', 'ticket']]
mean_age_per_ticket_and_sex = titanic.groupby(['sex', 'ticket']).mean().reset_index()
mean_age_per_ticket_and_sex.age = round(mean_age_per_ticket_and_sex.age, 2)
mean_age_per_ticket_and_sex.pivot(columns='sex', index='ticket', values='age').head()

sex,female,male
ticket,Unnamed: 1_level_1,Unnamed: 2_level_1
110152,26.33,
110413,28.5,52.0
110465,,47.0
110469,,30.0
110489,,42.0


reset_index $\rightarrow$ resetea los indices que se quedan al pivotar o agrupar para tener indices ordenados empezando en 0

In [10]:
print(titanic.head())
mean_age_per_ticket_and_sex = titanic.groupby(['sex', 'ticket']).mean()
print(mean_age_per_ticket_and_sex.head())
print(mean_age_per_ticket_and_sex.reset_index().head())

      sex      age  ticket
0  female  29.0000   24160
1    male   0.9167  113781
2  female   2.0000  113781
3    male  30.0000  113781
4  female  25.0000  113781
                     age
sex    ticket           
female 110152  26.333333
       110413  28.500000
       110813  60.000000
       111361  30.000000
       112053  19.000000
      sex  ticket        age
0  female  110152  26.333333
1  female  110413  28.500000
2  female  110813  60.000000
3  female  111361  30.000000
4  female  112053  19.000000


merge $\rightarrow$ podemos juntar dos datasets igual que en SQL (outer, inner, right, left...)

In [35]:
#Imaginamos que tenemos un dataset con todos los datos y otro con los nombres de las palomas y sus posiciones
pigeon_racing = pd.read_csv('datasets/pigeon-race/pigeon-racing.csv')
positions = pigeon_racing[['Pigeon', 'Pos']]
pigeon_racing.drop('Pos', axis=1, inplace=True)

race_and_positions = pigeon_racing.merge(positions, on='Pigeon')

race_and_positions.head()

Unnamed: 0,Breeder,Pigeon,Name,Color,Sex,Ent,Arrival,Speed,To Win,Eligible,Pos
0,Texas Outlaws,19633-AU15-FOYS,,BCWF,H,1,42:14.0,172.155,0:00:00,Yes,1
1,Junior Juanich,0402-AU15-JRL,,SIWF,H,1,47:36.0,163.569,0:05:21,Yes,2
2,Jerry Allensworth,0404-AU15-VITA,Perch Potato,BB,H,1,47:41.0,163.442,0:05:27,Yes,3
3,Alias-Alias,2013-AU15-ALIA,,BBSP,H,1,47:43.0,163.392,0:05:28,Yes,4
4,Greg Glazier,5749-AU15-SLI,,BC,H,1,47:44.0,163.366,0:05:30,Yes,5


concat $\rightarrow$ cuando tenemos un dataset fragmentado en varios podemos concatenarlos

In [51]:
pigeon_racing = pd.read_csv('datasets/pigeon-race/pigeon-racing.csv')
pigeon_racing_1 = pigeon_racing.iloc[0:int(pigeon_racing.shape[0]/2)]
pigeon_racing_2 = pigeon_racing.iloc[int(pigeon_racing.shape[0]/2):int(pigeon_racing.shape[0])]
print(pigeon_racing_1.head())
print(pigeon_racing_2.head())

pd.concat([pigeon_racing_1, pigeon_racing_2])

   Pos            Breeder           Pigeon          Name Color Sex  Ent  \
0    1      Texas Outlaws  19633-AU15-FOYS           NaN  BCWF   H    1   
1    2     Junior Juanich    0402-AU15-JRL           NaN  SIWF   H    1   
2    3  Jerry Allensworth   0404-AU15-VITA  Perch Potato    BB   H    1   
3    4        Alias-Alias   2013-AU15-ALIA           NaN  BBSP   H    1   
4    5       Greg Glazier    5749-AU15-SLI           NaN    BC   H    1   

   Arrival    Speed   To Win Eligible  
0  42:14.0  172.155  0:00:00      Yes  
1  47:36.0  163.569  0:05:21      Yes  
2  47:41.0  163.442  0:05:27      Yes  
3  47:43.0  163.392  0:05:28      Yes  
4  47:44.0  163.366  0:05:30      Yes  
     Pos              Breeder          Pigeon Name Color Sex  Ent  Arrival  \
200  201      Milner-Mckinsey  2489-AU15-VITA  NaN    BB   H    5  13:46.0   
201  202           T C R Loft    3354-AU15-AA  NaN    BB   H    2  14:12.0   
202  203       American Lofts  3584-AU15-CORP  NaN  OPAL   H    3  14:14.0 

Unnamed: 0,Pos,Breeder,Pigeon,Name,Color,Sex,Ent,Arrival,Speed,To Win,Eligible
0,1,Texas Outlaws,19633-AU15-FOYS,,BCWF,H,1,42:14.0,172.155,0:00:00,Yes
1,2,Junior Juanich,0402-AU15-JRL,,SIWF,H,1,47:36.0,163.569,0:05:21,Yes
2,3,Jerry Allensworth,0404-AU15-VITA,Perch Potato,BB,H,1,47:41.0,163.442,0:05:27,Yes
3,4,Alias-Alias,2013-AU15-ALIA,,BBSP,H,1,47:43.0,163.392,0:05:28,Yes
4,5,Greg Glazier,5749-AU15-SLI,,BC,H,1,47:44.0,163.366,0:05:30,Yes
...,...,...,...,...,...,...,...,...,...,...,...
395,396,Hutchins/Milner,2496-AU15-VITA,,BB,H,5,13:37.0,90.901,1:31:23,Yes
396,397,Twin200,7799-AU15-VITA,,SIL,H,2,20:25.0,87.817,1:38:10,Yes
397,398,Mayberry Classic,5508-AU15-MAC,,BBSP,H,2,29:42.0,83.929,1:47:28,Yes
398,399,Sierra Ranch Classic,0519-AU15-SIER,,BC,H,6,44:49.0,78.286,2:02:34,Yes
