`Krangl` is an equivalant of pandas for `kotlin`. See [documentation](https://krangl.gitbook.io/docs/) 

In [24]:

%use krangl(0.10)

# Loading data files


Let's import our datafile mpg.csv, which contains fuel economy data for 234 cars.

* mpg : miles per gallon
* class : car classification
* cty : city mpg
* cyl : # of cylinders
* displ : engine displacement in liters
* drv : f = front-wheel drive, r = rear wheel drive, 4 = 4wd
* fl : fuel (e = ethanol E85, d = diesel, r = regular, p = premium, c = CNG)
* hwy : highway mpg
* manufacturer : automobile manufacturer
* model : model of car
* trans : type of transmission
* year : model year



In [26]:
// Read the file
val df = DataFrame.readCSV("../data/mpg.csv")

// View the first 10 rows
df.print(maxRows=10)

A DataFrame: 234 x 12
          manufacturer        model   displ   year   cyl        trans   drv   cty   hwy   fl
 1    1           audi           a4     1.8   1999     4     auto(l5)     f    18    29    p
 2    2           audi           a4     1.8   1999     4   manual(m5)     f    21    29    p
 3    3           audi           a4       2   2008     4   manual(m6)     f    20    31    p
 4    4           audi           a4       2   2008     4     auto(av)     f    21    30    p
 5    5           audi           a4     2.8   1999     6     auto(l5)     f    16    26    p
 6    6           audi           a4     2.8   1999     6   manual(m5)     f    18    26    p
 7    7           audi           a4     3.1   2008     6     auto(av)     f    18    27    p
 8    8           audi   a4 quattro     1.8   1999     4   manual(m5)     4    18    26    p
 9    9           audi   a4 quattro     1.8   1999     4     auto(l5)     4    16    25    p
10   10           audi   a4 quattro       2   20

In [27]:
// View the number of rows in the dataframes
df.count()

n
234


In [31]:
// View the column names 
df.schema()

DataFrame with 234 observations
              [Int]  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 2...
manufacturer  [Str]  audi, audi, audi, audi, audi, audi, audi, audi, audi, audi, audi, audi, audi, au...
model         [Str]  a4, a4, a4, a4, a4, a4, a4, a4 quattro, a4 quattro, a4 quattro, a4 quattro, a4 q...
displ         [Dbl]  1.8, 1.8, 2, 2, 2.8, 2.8, 3.1, 1.8, 1.8, 2, 2, 2.8, 2.8, 3.1, 3.1, 2.8, 3.1, 4.2...
year          [Int]  1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, 2008, 2008, 1999, 1999, 20...
cyl           [Int]  4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,...
trans         [Str]  auto(l5), manual(m5), manual(m6), auto(av), auto(l5), manual(m5), auto(av), manu...
drv           [Str]  f, f, f, f, f, f, f, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, r, r, r, r, r, r, r, r, r,...
cty           [Int]  18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17, 15, 15, 17, 16, 14, 11, ...
hwy           [Int]  29

In [47]:
// This is how to find the average cty fuel economy across all cars. 
df["cty"].mean(true)

16.858974358974358

In [48]:
//Similarly this is how to find the average hwy fuel economy across all cars.
df["hwy"].mean(true)

23.44017094017094

In [78]:
//  return the unique values for the number of cylinders the cars in our dataset have.
df["cyl"].values().distinct()

[4, 6, 8, 5]

In [80]:
//Here's a more complex example where we are
//grouping the cars by number of cylinder, and finding the average cty mpg for each group.

In [102]:
df.groupBy("cyl").summarize( "avg" , {it -> it["cty"].mean()})

cyl,avg
4,21.012345679012345
6,16.21518987341772
8,12.571428571428571
5,20.5


In [103]:
// return the unique values for the class types in our dataset.
df["class"].values().distinct()

[compact, midsize, suv, 2seater, minivan, pickup, subcompact]

In [104]:
// And here's an example of how to find the average hwy mpg for each class of vehicle in our dataset.
df.groupBy("class").summarize( "avg" , {it -> it["hwy"].mean()})

class,avg
compact,28.29787234042553
midsize,27.29268292682927
suv,18.12903225806452
2seater,24.8
minivan,22.363636363636363
pickup,16.87878787878788
subcompact,28.142857142857142


The original notebook also explains basics of `Classes`, `Objects`, datetime functions which are straightforward in `kotlin` and out of scope for this notebook