# Programación orientada a objetos a partir de DataFrames (Scala)

## 1. Declaración de la clase class 

Declaramos la case class con los mismos atributos que los registros finales del DataFrame (los nombres de los atributos deben ser idénticos). 

In [55]:
case class Flight(DEST_COUNTRY_NAME: String,
numberFlights: BigInt)

defined class Flight


## 2. Creación de un DataFrame con los datos y transformación de los datos

### 2.1 Lectura de los datos

In [74]:
val data = spark
.read
.option("inferSchema", "true")
.option("header", "true")
.csv("2015-summary.csv")

data: org.apache.spark.sql.DataFrame = [DEST_COUNTRY_NAME: string, ORIGIN_COUNTRY_NAME: string ... 1 more field]


### 2.2 Transformación de los datos

Transformación de los datos. 

En este caso:

- Agrupar por país de destino del vuelo: "`.groupBy(" <columna> ")` y "`.sum("count")`"

- Renombrar el registro sum("count"): "`.withColumnRenamed(" <nombreOriginal> ", " <nombreFinal> ")`"

- Ordenar por orden descendiente: "`.sort(desc(" <columna> ")`"

In [75]:
val groupedData = data.groupBy("DEST_COUNTRY_NAME").sum("count").withColumnRenamed("sum(count)", "numberFlights").sort(desc("numberFlights"))

groupedData.show()

+------------------+-------------+
| DEST_COUNTRY_NAME|numberFlights|
+------------------+-------------+
|     United States|       411352|
|            Canada|         8399|
|            Mexico|         7140|
|    United Kingdom|         2025|
|             Japan|         1548|
|           Germany|         1468|
|Dominican Republic|         1353|
|       South Korea|         1048|
|       The Bahamas|          955|
|            France|          935|
|          Colombia|          873|
|            Brazil|          853|
|       Netherlands|          776|
|             China|          772|
|           Jamaica|          666|
|        Costa Rica|          588|
|       El Salvador|          561|
|            Panama|          510|
|              Cuba|          466|
|             Spain|          420|
+------------------+-------------+
only showing top 20 rows



groupedData: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [DEST_COUNTRY_NAME: string, numberFlights: bigint]


## 3. Conversión de los datos en objetos

Conversión de cada registro en objetos de tipo Flight con la función "`.as[<clase>]`", que agrupamos en un array con la función "`.collect()`":

In [76]:
val flights = groupedData.as[Flight].collect()

flights: Array[Flight] = Array(Flight(United States,411352), Flight(Canada,8399), Flight(Mexico,7140), Flight(United Kingdom,2025), Flight(Japan,1548), Flight(Germany,1468), Flight(Dominican Republic,1353), Flight(South Korea,1048), Flight(The Bahamas,955), Flight(France,935), Flight(Colombia,873), Flight(Brazil,853), Flight(Netherlands,776), Flight(China,772), Flight(Jamaica,666), Flight(Costa Rica,588), Flight(El Salvador,561), Flight(Panama,510), Flight(Cuba,466), Flight(Spain,420), Flight(Guatemala,397), Flight(Italy,382), Flight(Honduras,362), Flight(Aruba,346), Flight(Ireland,335), Flight(Hong Kong,332), Flight(Australia,329), Flight(Sint Maarten,325), Flight(United Arab Emirates,320), Flight(Cayman Islands,314), Flight(Switzerland,294), Flight(Venezuela,290), Flight(Peru,279), Fl...


Ahora podemos acceder a cada uno de esos objetos:

In [77]:
flights(0)

res50: Flight = Flight(United States,411352)


In [78]:
flights(0).DEST_COUNTRY_NAME

res51: String = United States


In [79]:
for (f <- flights)
{
    println(f.DEST_COUNTRY_NAME + " " + f.numberFlights)
}

United States 411352
Canada 8399
Mexico 7140
United Kingdom 2025
Japan 1548
Germany 1468
Dominican Republic 1353
South Korea 1048
The Bahamas 955
France 935
Colombia 873
Brazil 853
Netherlands 776
China 772
Jamaica 666
Costa Rica 588
El Salvador 561
Panama 510
Cuba 466
Spain 420
Guatemala 397
Italy 382
Honduras 362
Aruba 346
Ireland 335
Hong Kong 332
Australia 329
Sint Maarten 325
United Arab Emirates 320
Cayman Islands 314
Switzerland 294
Venezuela 290
Peru 279
Ecuador 268
Taiwan 266
Belgium 259
Turks and Caicos Islands 230
Haiti 226
Trinidad and Tobago 211
Belize 188
Bermuda 183
Iceland 181
Argentina 180
Nicaragua 179
Russia 176
Chile 174
Luxembourg 155
Barbados 154
Denmark 153
Saint Kitts and Nevis 139
Turkey 138
Philippines 134
Israel 134
Portugal 127
Antigua and Barbuda 126
Saint Lucia 123
Norway 121
Sweden 118
New Zealand 111
Qatar 108
British Virgin Islands 107
Curacao 90
Saudi Arabia 83
Federated States of Micronesia 69
Guyana 64
Austria 62
India 61
Paraguay 60
Nigeria 59
Bonai