# Wine

### Introduction:

This exercise is a adaptation from the UCI Wine dataset.
The only pupose is to practice deleting data with pandas.

### Step 1. Import the necessary libraries

In [None]:
import pandas as pd
import numpy as np

### Step 2. Import the dataset from this [address](https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data).

In [None]:
wine = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', header=None)

### Step 3. Assign it to a variable called wine

In [None]:
wine = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', header=None)

### Step 4. Delete the first, fourth, seventh, nineth, eleventh, thirteenth and fourteenth columns

In [None]:
wine = wine.drop(columns=[0, 3, 6, 8, 10, 12, 13])

### Step 5. Assign the columns as below:

The attributes are (donated by Riccardo Leardi, riclea '@' anchem.unige.it):  
1) alcohol  
2) malic_acid  
3) alcalinity_of_ash  
4) magnesium  
5) flavanoids  
6) proanthocyanins  
7) hue

In [None]:
wine.columns = ['alcohol', 'malic_acid', 'alcalinity_of_ash', 'magnesium', 'flavanoids', 'proanthocyanins', 'hue']

### Step 6. Set the values of the first 3 rows from alcohol as NaN

In [None]:
wine.loc[0:2, 'alcohol'] = np.nan

### Step 7. Now set the value of the rows 3 and 4 of magnesium as NaN

In [None]:
wine.loc[[3, 4], 'magnesium'] = np.nan

In [None]:
import pandas as pd
import numpy as np

wine.loc[[3, 4], 'magnesium'] = np.nan

print(wine)

     alcohol  malic_acid  alcalinity_of_ash  magnesium  flavanoids  \
0        NaN        1.71               15.6      127.0        3.06   
1        NaN        1.78               11.2      100.0        2.76   
2        NaN        2.36               18.6      101.0        3.24   
3      14.37        1.95               16.8        NaN        3.49   
4      13.24        2.59               21.0        NaN        2.69   
..       ...         ...                ...        ...         ...   
173    13.71        5.65               20.5       95.0        0.61   
174    13.40        3.91               23.0      102.0        0.75   
175    13.27        4.28               20.0      120.0        0.69   
176    13.17        2.59               20.0      120.0        0.68   
177    14.13        4.10               24.5       96.0        0.76   

     proanthocyanins   hue  
0               2.29  1.04  
1               1.28  1.05  
2               2.81  1.03  
3               2.18  0.86  
4             

### Step 8. Fill the value of NaN with the number 10 in alcohol and 100 in magnesium

In [None]:
wine['alcohol'].fillna(10, inplace=True)
wine['magnesium'].fillna(100, inplace=True)


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  wine['alcohol'].fillna(10, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  wine['magnesium'].fillna(100, inplace=True)


### Step 9. Count the number of missing values

In [None]:

missing_values = wine.isna().sum()
print(missing_values)


alcohol              0
malic_acid           0
alcalinity_of_ash    0
magnesium            0
flavanoids           0
proanthocyanins      0
hue                  0
dtype: int64


In [None]:
missing_values_count = wine.isnull().sum().sum()
print(f"Total number of missing values: {missing_values_count}")

Total number of missing values: 0


### Step 10.  Create an array of 10 random numbers up until 10

In [None]:
import numpy as np

random_numbers = np.random.randint(0, 10, size=10)
print(random_numbers)

[5 3 9 4 8 8 9 8 5 4]


### Step 11.  Use random numbers you generated as an index and assign NaN value to each of cell.

In [None]:
import numpy as np


row_indices = random_numbers
col_indices = np.random.randint(0, len(wine.columns), size=10)

wine.iloc[row_indices, col_indices] = np.nan

### Step 12.  How many missing values do we have?

In [None]:
missing_values_count = wine.isnull().sum().sum()
print(f"Total number of missing values: {missing_values_count}")

Total number of missing values: 30


### Step 13. Delete the rows that contain missing values

In [None]:
wine.dropna(inplace=True)
wine = wine.dropna()

### Step 14. Print only the non-null values in alcohol

In [None]:
print(wine['alcohol'][wine['alcohol'].notnull()])

0      10.00
1      10.00
2      10.00
6      14.39
7      14.06
       ...  
173    13.71
174    13.40
175    13.27
176    13.17
177    14.13
Name: alcohol, Length: 173, dtype: float64


### Step 15.  Reset the index, so it starts with 0 again

In [None]:
wine.reset_index(drop=True, inplace=True)

### BONUS: Create your own question and answer it.

In [None]:
wine.describe()

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
count,173.0,173.0,173.0,173.0,173.0,173.0,173.0
mean,12.90763,2.350173,19.578613,99.514451,1.997168,1.580231,0.955988
std,0.879801,1.128031,3.325755,14.351731,0.99369,0.576693,0.23133
min,10.0,0.74,10.6,70.0,0.34,0.41,0.48
25%,12.29,1.6,17.4,88.0,1.1,1.24,0.78
50%,12.93,1.87,19.5,98.0,2.09,1.54,0.96
75%,13.62,3.12,21.5,107.0,2.79,1.95,1.12
max,14.75,5.8,30.0,162.0,5.08,3.58,1.71
