# Assignment 2

In this assignment, you will be guided to use Clojure to perform the following:

1. Read file a CSV file from disk.
2. Perform parsing of individual cell values in a line in the file.
3. Perform parsing of individual lines in the file.
4. Construct records as hashmap from each data row in the file.
5. Perform a basic data analysis.

## About the dataset

The dataset is obtained from Kaggle: https://www.kaggle.com/datasets/pradeepmanje/bankchurners-set2

It consists of a small collection of bank customer information.

In [1]:
"🔒"
; We will be using several builtin libraries

(require '[clojure.java.io :as io])
(require '[clojure.string :as str])
(require '[clojure.pprint :refer [pprint]])
(require '[clojure.edn :as edn])
(load-file "my.clj")

(def CSV-FILE "my_BankerChurners.csv")

#'user/CSV-FILE

# Load lines

Define a symbols `lines` that will be assigned to the sequence of lines from the file `CSV-FILE`.

You should use `io/reader` and `line-seq` to read the lines.

See:

- https://clojuredocs.org/clojure.java.io/reader
- https://clojuredocs.org/clojure.core/line-seq


In [2]:
"✍️"
; @workUnit

(def lines 
    (with-open [rdr (clojure.java.io/reader CSV-FILE)]
        (reduce conj [] (line-seq rdr))))



#'user/lines

In [3]:
"🔒"
; @check
; @title: first three lines

(println "Total lines:" (count lines))
(show (take-last 3 lines))

Total lines: 10128
("716506083,\"Attrited Customer\",44,\"F\",1,\"High School\",\"Married\",\"Less than $40K\",\"Blue\",36,5,3,4,5409,0,5409,0.819,10291,60,0.818,0,0.99788,0.00211827"
 "717406983,\"Attrited Customer\",30,\"M\",2,\"Graduate\",\"Unknown\",\"$40K - $60K\",\"Blue\",36,4,3,3,5281,0,5281,0.535,8395,62,0.722,0,0.99671,0.00329379"
 "714337233,\"Attrited Customer\",43,\"F\",2,\"Graduate\",\"Married\",\"Less than $40K\",\"Silver\",25,6,2,4,10388,1961,8427,0.703,10294,61,0.649,0.189,0.99662,0.00337654")


nil

# Parsing Value

Each value in the CSV file is either numerical (e.g. `0.819`, `9.4E-5`) or a quoted string (e.g. `"Married"`).
Write a function that will convert the value string into Clojure data.

It's actually really easy because `edn/read-string` can take care of the conversion.

See: https://clojuredocs.org/clojure.edn/read-string

In [4]:
"✍️"
; @workUnit

(defn parse-value [x]
    (edn/read-string x))

#'user/parse-value

In [5]:
"🔒"
; @check
; @title: parse-value int

{"100" (parse-value "100"),
 "3.1415" (parse-value "3.1415")
 "10e3" (parse-value "10e3")
 "hello" (parse-value "\"hello\"")}

{"100" 100, "3.1415" 3.1415, "10e3" 10000.0, "hello" "hello"}

# Parse line

Each line of the CSV file consists of comma-separated value strings.  Implement a function `(parse-line line)` to
map each line string into a **vector** of parsed values.

Hint:

- `mapv` always returns a vector: https://clojuredocs.org/clojure.core/mapv

- `str/split` splits a string using a regular expression as a separator.
  https://clojuredocs.org/clojure.string/split
  
- Regular expressions in Clojure is part of the language: https://cljs.github.io/api/syntax/regex

In [6]:
"✍️"
; @workUnit

(defn parse-line [line]
    (mapv (fn [i] (edn/read-string i)) (str/split line #",")))

#'user/parse-line

In [7]:
"🔒"
; @check
; @title: parse first line

(->> lines
    (first)
    (parse-line)
    (show))

["CLIENTNUM"
 "Attrition_Flag"
 "Customer_Age"
 "Gender"
 "Dependent_count"
 "Education_Level"
 "Marital_Status"
 "Income_Category"
 "Card_Category"
 "Months_on_book"
 "Total_Relationship_Count"
 "Months_Inactive_12_mon"
 "Contacts_Count_12_mon"
 "Credit_Limit"
 "Total_Revolving_Bal"
 "Avg_Open_To_Buy"
 "Total_Amt_Chng_Q4_Q1"
 "Total_Trans_Amt"
 "Total_Trans_Ct"
 "Total_Ct_Chng_Q4_Q1"
 "Avg_Utilization_Ratio"
 "Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1"
 "Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2"]


nil

In [8]:
"🔒"
; @check
; @title: parse last line

(->> lines
    (last)
    (parse-line)
    (show))

[714337233
 "Attrited Customer"
 43
 "F"
 2
 "Graduate"
 "Married"
 "Less than $40K"
 "Silver"
 25
 6
 2
 4
 10388
 1961
 8427
 0.703
 10294
 61
 0.649
 0.189
 0.99662
 0.00337654]


nil

# Make records

Each row should be a hashmap from attributes to their values.  Write a function to convert
each row from a vector to a hashmap.

You are given the attribute names in the vector `columns`.

Hint: consider using the `zipmap` function https://clojuredocs.org/clojure.core/zipmap

In [9]:
"🔒"
(def columns [:id :attrition :age :gender :dependents :education :marital :income :card :months 
              :rel-count :inactive :contacts :limit :balance :open :change :amount :count :change2 :ratio])

#'user/columns

In [10]:
"✍️"
; @workUnit

(defn make-record [row]
    (zipmap columns row))

#'user/make-record

In [11]:
"🔒"
; @check
; @title: make-record of last line

(show (make-record (-> lines last parse-line)))

{:age 43,
 :amount 10294,
 :attrition "Attrited Customer",
 :balance 1961,
 :card "Silver",
 :change 0.703,
 :change2 0.649,
 :contacts 4,
 :count 61,
 :dependents 2,
 :education "Graduate",
 :gender "F",
 :id 714337233,
 :inactive 2,
 :income "Less than $40K",
 :limit 10388,
 :marital "Married",
 :months 25,
 :open 8427,
 :ratio 0.189,
 :rel-count 6}


nil

# Data analysis

We will perform some data analysis using Clojure.  The data set consists of `:gender` and credit card limits `:limit`.
The data analysis should produce the following result:

```
{"F": {:count ___
       :limit ___
       :mean ___}
 "M": {:count ___
       :limit ___
       :mean ___}}
```

The data shows the count, total credit card limit, and mean credit card limit for both female and male customers.

You are to perform the analysis from the `lines` in the CSV file with the help of the parsing functions.
The result must be assigned to a symbol called `result`.

<font color="red">Note: all numerical values must be rounded to the nearest integer.</font>

In [14]:
"✍️"
; @workUnit

;; used for debugging
;; (def fewlines lines);[(lines 1) (lines 2) (lines 3) (lines 4) (lines 5)])

;; since i commented out the fewlines thing above had to replace the use of it later on with just 'lines'

;; kinda upsetting there is no builtin round function, but luckily int has us covered
(defn round [x]
    (let [y (- x (int x))]
        (cond
            ;; debug
            ;; (println y) x
            (= 0 y) x
            (>= y 0.5) (inc (int x))
            :else (int x)

        )
    )
)

;; parses the line into something useful
(defn upd [line]
    (let [parsed (make-record (parse-line line))
          gender (:gender parsed)
          limit  (:limit parsed)
          ]
        
         {
          :key gender
          :count 1
          :limit limit
          :mean limit
         }
        )
)

(def result 
    
    ;; tail recursion is soooo much easier than map reduce
    ;; variables being changeable is kinda op 
    (loop [f {:count 0 :limit 0 :mean 0}
           m {:count 0 :limit 0 :mean 0}
           c (count lines)
           i 1]
        
        (cond
            
            ;; if we get to the end of the list finish off the hashmap
            (= i c) {  
                     "F" {
                          :count (:count f)
                          :limit (round (:limit f))
                          :mean  (round (/ (:limit f) (:count f)))
                          }

                     "M" {
                          :count (:count m)
                          :limit (round (:limit m))
                          :mean  (round (/ (:limit m) (:count m)))
                          }
                     }
            
            ;; this is pain
            :else   (let [line (upd (lines i))
                          gender (line :key)
                          count_ (line :count)
                          limit  (line :limit)
                          mean   (line :mean)
                          
                          ;; debug line
                          ;; uuu (println line)
                          ]
                        
                        ;; gender check, into ugly stuff
                        (if (= gender "F")
                            
                            ;; this is so clean, and then the else is uhhh
                            (-> f
                                (update :count + 1)
                                (update :limit + limit)
                                (update :mean + mean)
                                (recur m c (inc i))
                            )
                            
                            ;; i have no idea how to pipe into the second position
                            ;; so i just expanded it because everything i tried broke a lot
                            (recur f (update (update (update m :count + 1) :limit + limit) :mean + mean) c (inc i))
                        )
                    )
        )
        
    )
)

#'user/result

In [15]:
"🔒"
; @check
; @grade: 5
; @title: show data analysis result

(show result)

{"F" {:count 5358, :limit 26917811, :mean 5024},
 "M" {:count 4769, :limit 60497984, :mean 12686}}


nil