# Wine

### Introduction:

This exercise is a adaptation from the UCI Wine dataset.
The only pupose is to practice deleting data with pandas.

### Step 1. Import the necessary libraries

In [None]:
using DotEnv
using Pkg
DotEnv.load!()
Pkg.activate(ENV["ENV_PATH"])

using CSV
using Dates
using Downloads
using Statistics
using DataFrames

### Step 2. Import the dataset from this [address](https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data). 

### Step 3. Assign it to a variable called wine

and assign the columns as below

The attributes are (donated by Riccardo Leardi, riclea '@' anchem.unige.it):  
1. alcohol  
2. malic_acid  
3. alcalinity_of_ash  
4. magnesium  
5. flavanoids  
6. proanthocyanins  
7. hue 

In [2]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data"
file = Downloads.download(url)
columns = ["alcohol", "malic_acid", "alcalinity_of_ash", "magnesium", "flavanoids", "proanthocyanins", "hue"]
wine = CSV.read(file, DataFrame, header = columns)
first(wine, 3)

└ @ CSV /home/hanjiya/.julia/packages/CSV/XLcqT/src/file.jl:593


Row,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue,Column8,Column9,Column10,Column11,Column12,Column13,Column14
Unnamed: 0_level_1,Int64,Float64,Float64,Float64,Float64,Int64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Int64
1,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
2,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
3,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185


### Step 4. Delete the first, fourth, seventh, nineth, eleventh, thirteenth and fourteenth columns

In [3]:
select!(wine, Not([1, 4, 7, 9, 12, 13, 14]))
first(wine, 5)

Row,malic_acid,alcalinity_of_ash,flavanoids,proanthocyanins,Column8,Column10,Column11
Unnamed: 0_level_1,Float64,Float64,Float64,Int64,Float64,Float64,Float64
1,14.23,1.71,15.6,127,3.06,2.29,5.64
2,13.2,1.78,11.2,100,2.76,1.28,4.38
3,13.16,2.36,18.6,101,3.24,2.81,5.68
4,14.37,1.95,16.8,113,3.49,2.18,7.8
5,13.24,2.59,21.0,118,2.69,1.82,4.32


### Step 5. Assign the columns as below:

The attributes are (donated by Riccardo Leardi, riclea '@' anchem.unige.it):
1. alcohol
2. malic_acid
3. alcalinity_of_ash
4. magnesium
5. flavanoids
6. proanthocyanins
7. hue


In [15]:
rename!(wine, [:alcohol, :malic_acid, :alcalinity_of_ash, :magnesium, :flavanoids, :proanthocyanins, :hue]);
first(wine, 5)

Row,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
Unnamed: 0_level_1,Float64,Float64,Float64,Int64,Float64,Float64,Float64
1,10.0,1.71,15.6,127,3.06,2.29,5.64
2,14.06,2.15,17.6,121,2.51,1.25,5.05
3,14.1,2.16,18.0,105,3.32,2.38,5.75
4,14.12,1.48,16.8,95,2.43,1.57,5.0
5,13.75,1.73,16.0,89,2.76,1.81,5.6


### Step 6. Set the values of the first 3 rows from alcohol as missing

In [16]:
allowmissing!(wine, :alcohol)
wine[1:3, :alcohol] .= missing

3-element view(::Vector{Union{Missing, Float64}}, 1:3) with eltype Union{Missing, Float64}:
 missing
 missing
 missing

### Step 7. Now set the value of the rows 3 and 4 of magnesium as missing

In [17]:
allowmissing!(wine, :magnesium)
wine[3:4, :magnesium] .= missing

2-element view(::Vector{Union{Missing, Int64}}, 3:4) with eltype Union{Missing, Int64}:
 missing
 missing

### Step 8. Fill the value of missing with the number 10 in alcohol and 100 in magnesium

In [18]:
replace!(wine[!, :alcohol], missing => 10)
replace!(wine[!, :magnesium], missing => 100);

### Step 9. Count the number of missing values

In [21]:
all_total_missing = 0
for col in names(wine)
    total_missing = sum(ismissing.(wine[!, col]))
    all_total_missing += total_missing
    println("$col total missing value: $total_missing")
end

@show all_total_missing;

alcohol total missing value: 0
malic_acid total missing value: 0
alcalinity_of_ash total missing value: 0
magnesium total missing value: 0
flavanoids total missing value: 0
proanthocyanins total missing value: 0
hue total missing value: 0
all_total_missing = 0


### Step 10.  Create an array of 10 random numbers up until 10

In [22]:
using Random

random_number = rand(1:10, 10)

10-element Vector{Int64}:
  8
 10
  7
  9
  9
  7
  1
 10
  6
 10

### Step 11.  Use random numbers you generated as an index and assign missing value to each of cell.

In [23]:
wine[random_number, :alcohol] .= missing
first(wine, 10)

Row,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
Unnamed: 0_level_1,Float64?,Float64,Float64,Int64?,Float64,Float64,Float64
1,missing,1.71,15.6,127,3.06,2.29,5.64
2,10.0,2.15,17.6,121,2.51,1.25,5.05
3,10.0,2.16,18.0,100,3.32,2.38,5.75
4,14.12,1.48,16.8,100,2.43,1.57,5.0
5,13.75,1.73,16.0,89,2.76,1.81,5.6
6,missing,1.73,11.4,91,3.69,2.81,5.4
7,missing,1.87,12.0,102,3.64,2.96,7.5
8,missing,1.81,17.2,112,2.91,1.46,7.3
9,missing,1.92,20.0,120,3.14,1.97,6.2
10,missing,1.57,20.0,115,3.4,1.72,6.6


### Step 12.  How many missing values do we have?

In [24]:
for col in names(wine)
    total_miss = sum(ismissing.(wine[!, col]))
    println("$(col): $(total_miss)")
end

alcohol: 6
malic_acid: 0
alcalinity_of_ash: 0
magnesium: 0
flavanoids: 0
proanthocyanins: 0
hue: 0


### Step 13. Delete the rows that contain missing values

In [25]:
dropmissing!(wine)
first(wine, 5)

Row,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
Unnamed: 0_level_1,Float64,Float64,Float64,Int64,Float64,Float64,Float64
1,10.0,2.15,17.6,121,2.51,1.25,5.05
2,10.0,2.16,18.0,100,3.32,2.38,5.75
3,14.12,1.48,16.8,100,2.43,1.57,5.0
4,13.75,1.73,16.0,89,2.76,1.81,5.6
5,14.19,1.59,16.5,108,3.93,1.86,8.7


### Step 14. Print only the non-null values in alcohol

In [26]:
mask = .!ismissing.(wine[!, "alcohol"])
wine[mask , :alcohol]

164-element Vector{Float64}:
 10.0
 10.0
 14.12
 13.75
 14.19
 13.64
 14.06
 12.93
 13.71
 12.85
  ⋮
 13.4
 12.2
 12.77
 14.16
 13.71
 13.4
 13.27
 13.17
 14.13

### BONUS: Create your own question and answer it.