# Kotlin: Tips Dataset

(short intro about the dataset)

For this example, we will import the following packages

In [2]:
%use multik
%use dataframe
%use lets-plot

Let's load the "Tips" dataset, and show it's first 5 rows:

In [3]:
val tips = DataFrame.readCSV("../resources/example-datasets/datasets/tips.csv")
tips.head()

During the loading of the dataset, some values could have been mapped to a wrong datatype (e.g. `Date` can be loaded as `String` if not well formatted). 

With the `schema()` method it's possible to see how values have been parsed.

In [6]:
tips.schema()

total_bill: Double
tip: Double
smoker: Boolean
day: String
time: String
size: Int

And we can compute some basic statistics thanks to the `describe()` method

In [7]:
tips.describe()

AS we can see that we have some *categorical* data, as `day` and `time`, with respectively 4 and 2 values, and some other numerical data like the `total_bill` or the `tip` amount.

Let's compute some more domain specific statistics.

1. Percentage of smokers

In [12]:
tips["smoker"].valueCounts()
    .convert { count }.with { it.toDouble() / tips.rowsCount() }

2. All the data about the most expensive `total_bill`

In [152]:
tips.sortByDesc { total_bill }.head(1)

3. Get the highest bill per person (`size`) in a table 

In [15]:
tips.map { total_bill / size }.max()

20.275

4. Group `total_bill` in 10 equally spaced ranges, counting the number of bills in that range (like `pandas.cut()`) 

In [245]:
val maxTip = tips.max { total_bill }.toInt()

val billRanges = tips.groupBy { total_bill.map { it.toInt() / (maxTip / 10) } }
    .count()
    .sortBy { total_bill }
    .convert { total_bill }.with { "(${it * (maxTip / 10)}, ${(it * (maxTip/10) + (maxTip / 10))}]" }

billRanges

We can graph those ranges:

In [333]:
ggplot(billRanges.toMap()) +
    geomHistogram(stat = Stat.identity, color = "dark-green", alpha=0.3, showLegend = false) { 
        x="total_bill"; y = "count" ; fill="count" } +
    scaleFillHue() +
    xlab("Total Bill") +
    ggtitle("Total Bills Distribution")
    