## Sorting the data

We can sort the data with the `sort` method providing a column. The default is ascending order

In [31]:
harryPotters.sort($"# num_pages").show()

+--------------------+----------+-----------+
|               title|      isbn|# num_pages|
+--------------------+----------+-----------+
|Harry Potter und ...|3895849618|         13|
|Unauthorized Harr...|0976540606|        152|
|Harry Potter Boxe...|0439434866|       1820|
|Mapping the World...|1932100598|        195|
|Mugglenet.Com's W...|1569755833|        216|
|Looking for God i...|1414306342|        234|
|Harry Potter Scho...|043932162X|        240|
|Harry Potter and ...|0812694554|        243|
|Harry Potter and ...|158234681X|        250|
|Harry Potter Y La...|0613359607|        254|
|Harry Potter Boxe...|0439682584|       2690|
|Harry Potter y la...|8498380138|        288|
|Harry Potter and ...|0439554934|        320|
|Harry Potter und ...|3551354014|        334|
|Harry Potter Coll...|0439827604|       3342|
|Harry Potter and ...|0439064864|        341|
|Harry Potter et l...|2070541304|        349|
|Harry Potter und ...|3551552096|        351|
|Harry Potter and ...|0439554896| 

## Bring in some more functions

* For descending order we need some help, a common import is the `import org.apache.spark.sql.functions._` package
* Contains a wide range of functions that compliment what is in `DataFrame` API
* Nearly all of the functions of this Scala `object` is 

In [32]:
import org.apache.spark.sql.functions._

import org.apache.spark.sql.functions._


## Sorting in Descending

* For descending order we need some help, a common import is the `import org.apache.spark.sql.functions._` package
* Contains a wide range of functions that compliment what is in `DataFrame` API

In [37]:
harryPotters.sort(desc("# num_pages")).show()

+--------------------+----------+-----------+
|               title|      isbn|# num_pages|
+--------------------+----------+-----------+
|J.K. Rowling's Ha...|0826452329|         96|
|Harry Potter y la...|8478887423|        896|
|Harry Potter y la...|8478888845|        893|
|Harry Potter and ...|0439358078|        870|
|Harry Potter and ...|0747584664|        768|
|Harry Potter and ...|0439785960|        652|
|Harry Potter and ...|074754624X|        636|
|Harry Potter e il...|888451049X|        627|
|Harry Potter y el...|8478889930|        602|
|Harry Potter and ...|074757362X|        480|
|Harry Potter und ...|355155210X|        448|
|Harry Potter and ...|043965548X|        435|
|Harry Potter and ...|0786222727|        424|
|Ultimate Unoffici...|0972393617|        412|
|Harry Potter ve S...|3570211029|        403|
|Гарри Поттер и фи...|535300308X|        400|
|The Science Of Ha...|0755311515|        374|
|Harry Potter y el...|8478886559|        359|
|Harry Potter ve F...|3570211010| 

## Inappropriate Types

The `# num_pages` column is not in a numerical format we can prove it by calling `printSchema`

In [38]:
harryPotters.printSchema()

root
 |-- title: string (nullable = true)
 |-- isbn: string (nullable = true)
 |-- # num_pages: string (nullable = true)



In [None]:
df = harryPotters.withColumn("num_pages", df("num_pages").cast(IntegerType)).drop("# num_pages")

In [31]:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.IntegerType
val converted = harryPotters.withColumn("# num_pages", $"# num_pages".cast(IntegerType))
converted.printSchema()

root
 |-- title: string (nullable = true)
 |-- isbn: string (nullable = true)
 |-- # num_pages: integer (nullable = true)



import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.IntegerType
converted: org.apache.spark.sql.DataFrame = [title: string, isbn: string ... 1 more field]


## Creating Rows

## Adding Rows

## Dropping Rows

## Taking Rows

## Creating Columns

val total = converted.agg(sum($"# num_pages"))
total.show()

In [40]:
harryPotters.select("title").where($"title".contains("Prisoner")).show(20, false)

+-----------------------------------------------------------+
|title                                                      |
+-----------------------------------------------------------+
|Harry Potter and the Prisoner of Azkaban (Harry Potter  #3)|
|Harry Potter and the Prisoner of Azkaban (Harry Potter  #3)|
+-----------------------------------------------------------+



In [None]:
collect_list($"title")

In [54]:
booksDF.groupBy($"authors").agg(collect_list($"title").alias("titles")).show()

+--------------------+--------------------+
|             authors|              titles|
+--------------------+--------------------+
|Abraham Lincoln-D...|[Speeches and Wri...|
|    Amanda Eyre Ward|    [How to Be Lost]|
|         Ann Beattie|[The Doctor's Hou...|
|         Ann Rinaldi|[A Break with Cha...|
|Charles Dickens-S...|[A Tale of Two Ci...|
|          Dava Sobel|[Galileo's Daught...|
|        Doug Stanton|[In Harm's Way: T...|
|     Eric Klinenberg|[Heat Wave: A Soc...|
|Gayle Lynds-Rober...|[The Altman Code ...|
|Haruki Murakami-U...|    [Naokos Lächeln]|
|          Ian Ogilvy|[Measle and the D...|
|J.E. Austen Leigh...|[A Memoir of Jane...|
|        Jack Meadows|[The Future of th...|
|          James Frey|[A Million Little...|
|Johanna Hurwitz-V...|[Anne Frank: Life...|
|John  Baxter-Mel Bay|[Deluxe Encyclope...|
|Jonathan Swift-YKids|[Gulliver's Travels]|
|     Karen Armstrong|[A History of God...|
|Laura  Jordan-San...|   [Anhelos ocultos]|
|    Laurence Olivier|[Confessio

Lab: What are the harry potter's books average rating?