# Assignment 2

In this assignment, you will be guided to use Clojure to perform the following:

1. Read file a CSV file from disk.
2. Perform parsing of individual cell values in a line in the file.
3. Perform parsing of individual lines in the file.
4. Construct records as hashmap from each data row in the file.
5. Perform a basic data analysis.

## About the dataset

The dataset is obtained from Kaggle: https://www.kaggle.com/datasets/pradeepmanje/bankchurners-set2

It consists of a small collection of bank customer information.

In [None]:
"🔒"
; We will be using several builtin libraries

(require '[clojure.java.io :as io])
(require '[clojure.string :as str])
(require '[clojure.pprint :refer [pprint]])
(require '[clojure.edn :as edn])
(load-file "my.clj")

(def CSV-FILE "my_BankerChurners.csv")

# Load lines

Define a symbols `lines` that will be assigned to the sequence of lines from the file `CSV-FILE`.

You should use `io/reader` and `line-seq` to read the lines.

See:

- https://clojuredocs.org/clojure.java.io/reader
- https://clojuredocs.org/clojure.core/line-seq


In [None]:
"✍️"
; @workUnit

(def lines ...)

In [None]:
"🔒"
; @check
; @title: first three lines

(println "Total lines:" (count lines))
(show (take-last 3 lines))

# Parsing Value

Each value in the CSV file is either numerical (e.g. `0.819`, `9.4E-5`) or a quoted string (e.g. `"Married"`).
Write a function that will convert the value string into Clojure data.

It's actually really easy because `edn/read-string` can take care of the conversion.

See: https://clojuredocs.org/clojure.edn/read-string

In [None]:
"✍️"
; @workUnit

(defn parse-value [x]
    ...)

In [None]:
"🔒"
; @check
; @title: parse-value int

{"100" (parse-value "100"),
 "3.1415" (parse-value "3.1415")
 "10e3" (parse-value "10e3")
 "hello" (parse-value "\"hello\"")}

# Parse line

Each line of the CSV file consists of comma-separated value strings.  Implement a function `(parse-line line)` to
map each line string into a **vector** of parsed values.

Hint:

- `mapv` always returns a vector: https://clojuredocs.org/clojure.core/mapv

- `str/split` splits a string using a regular expression as a separator.
  https://clojuredocs.org/clojure.string/split
  
- Regular expressions in Clojure is part of the language: https://cljs.github.io/api/syntax/regex

In [None]:
"✍️"
; @workUnit

(defn parse-line [line]
    ...)

In [None]:
"🔒"
; @check
; @title: parse first line

(->> lines
    (first)
    (parse-line)
    (show))

In [None]:
"🔒"
; @check
; @title: parse last line

(->> lines
    (last)
    (parse-line)
    (show))

# Make records

Each row should be a hashmap from attributes to their values.  Write a function to convert
each row from a vector to a hashmap.

You are given the attribute names in the vector `columns`.

Hint: consider using the `zipmap` function https://clojuredocs.org/clojure.core/zipmap

In [None]:
"🔒"
(def columns [:id :attrition :age :gender :dependents :education :marital :income :card :months 
              :rel-count :inactive :contacts :limit :balance :open :change :amount :count :change2 :ratio])

In [None]:
"✍️"
; @workUnit

(defn make-record [row]
    ...)

In [None]:
"🔒"
; @check
; @title: make-record of last line

(show (make-record (-> lines last parse-line)))

# Data analysis

We will perform some data analysis using Clojure.  The data set consists of `:gender` and credit card limits `:limit`.
The data analysis should produce the following result:

```
{"F": {:count ___
       :limit ___
       :mean ___}
 "M": {:count ___
       :limit ___
       :mean ___}}
```

The data shows the count, total credit card limit, and mean credit card limit for both female and male customers.

You are to perform the analysis from the `lines` in the CSV file with the help of the parsing functions.
The result must be assigned to a symbol called `result`.

<font color="red">Note: all numerical values must be rounded to the nearest integer.</font>

In [None]:
"✍️"
; @workUnit

(def result ...)

In [None]:
"🔒"
; @check
; @grade: 5
; @title: show data analysis result

(show result)