# Wine

### Introduction:

This exercise is a adaptation from the UCI Wine dataset.
The only pupose is to practice deleting data with pandas.

### Step 1. Import the necessary libraries

In [270]:
import pandas as pd 
import numpy as np 


### Step 2. Import the dataset from this [address](https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data). 

### Step 3. Assign it to a variable called wine

In [271]:
wine = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data')
wine.head()

Unnamed: 0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,.28,2.29,5.64,1.04,3.92,1065
0,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
1,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
2,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
3,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735
4,1,14.2,1.76,2.45,15.2,112,3.27,3.39,0.34,1.97,6.75,1.05,2.85,1450


### Step 4. Delete the first, fourth, seventh, nineth, eleventh, thirteenth and fourteenth columns

In [272]:
wine.drop(wine.columns[[0,3,6,8,10,12,13]], axis=1,inplace=True)

### Step 5. Assign the columns as below:

The attributes are (donated by Riccardo Leardi, riclea '@' anchem.unige.it):  
1) alcohol  
2) malic_acid  
3) alcalinity_of_ash  
4) magnesium  
5) flavanoids  
6) proanthocyanins  
7) hue 

In [273]:
wine.columns = ['alcohol','malic_acid', 'alcalinity_of_ash', 'magnesium','flavanoids','proanthocyanins','hue']
wine

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,13.20,1.78,11.2,100,2.76,1.28,1.05
1,13.16,2.36,18.6,101,3.24,2.81,1.03
2,14.37,1.95,16.8,113,3.49,2.18,0.86
3,13.24,2.59,21.0,118,2.69,1.82,1.04
4,14.20,1.76,15.2,112,3.39,1.97,1.05
...,...,...,...,...,...,...,...
172,13.71,5.65,20.5,95,0.61,1.06,0.64
173,13.40,3.91,23.0,102,0.75,1.41,0.70
174,13.27,4.28,20.0,120,0.69,1.35,0.59
175,13.17,2.59,20.0,120,0.68,1.46,0.60


### Step 6. Set the values of the first 3 rows from alcohol as NaN

In [274]:
wine['alcohol'][0:3] = np.full(3, np.nan)
wine

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  wine['alcohol'][0:3] = np.full(3, np.nan)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  wine['alcohol'][0:3] 

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,,1.78,11.2,100,2.76,1.28,1.05
1,,2.36,18.6,101,3.24,2.81,1.03
2,,1.95,16.8,113,3.49,2.18,0.86
3,13.24,2.59,21.0,118,2.69,1.82,1.04
4,14.20,1.76,15.2,112,3.39,1.97,1.05
...,...,...,...,...,...,...,...
172,13.71,5.65,20.5,95,0.61,1.06,0.64
173,13.40,3.91,23.0,102,0.75,1.41,0.70
174,13.27,4.28,20.0,120,0.69,1.35,0.59
175,13.17,2.59,20.0,120,0.68,1.46,0.60


### Step 7. Now set the value of the rows 3 and 4 of magnesium as NaN

In [275]:
wine['magnesium'][2:4] = np.full(2,np.nan)
wine

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  wine['magnesium'][2:4] = np.full(2,np.nan)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  wine['magnesium'][2:

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,,1.78,11.2,100.0,2.76,1.28,1.05
1,,2.36,18.6,101.0,3.24,2.81,1.03
2,,1.95,16.8,,3.49,2.18,0.86
3,13.24,2.59,21.0,,2.69,1.82,1.04
4,14.20,1.76,15.2,112.0,3.39,1.97,1.05
...,...,...,...,...,...,...,...
172,13.71,5.65,20.5,95.0,0.61,1.06,0.64
173,13.40,3.91,23.0,102.0,0.75,1.41,0.70
174,13.27,4.28,20.0,120.0,0.69,1.35,0.59
175,13.17,2.59,20.0,120.0,0.68,1.46,0.60


### Step 8. Fill the value of NaN with the number 10 in alcohol and 100 in magnesium

In [276]:
wine.fillna({'alcohol': 10, 'magnesium': 100},inplace=True)

### Step 9. Count the number of missing values

In [277]:
wine.info()
wine.isnull().sum()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 177 entries, 0 to 176
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   alcohol            177 non-null    float64
 1   malic_acid         177 non-null    float64
 2   alcalinity_of_ash  177 non-null    float64
 3   magnesium          177 non-null    float64
 4   flavanoids         177 non-null    float64
 5   proanthocyanins    177 non-null    float64
 6   hue                177 non-null    float64
dtypes: float64(7)
memory usage: 9.8 KB


alcohol              0
malic_acid           0
alcalinity_of_ash    0
magnesium            0
flavanoids           0
proanthocyanins      0
hue                  0
dtype: int64

### Step 10.  Create an array of 10 random numbers up until 10

In [278]:
np.random.seed(0)
arr = np.random.randint(0,11,size=10)
arr

array([5, 0, 3, 3, 7, 9, 3, 5, 2, 4], dtype=int32)

### Step 11.  Use random numbers you generated as an index and assign NaN value to each of cell.

In [279]:
for i in range(len(arr)):
    if i+1 > len(arr):
        break
    elif i%2 ==0:
        x,y =arr[i],arr[i+1]
        print(x,y)
        wine.loc[[x,y]] = np.nan

5 0
3 3
7 9
3 5
2 4


### Step 12.  How many missing values do we have?

In [284]:
wine.isna().sum().sum()

np.int64(49)

### Step 13. Delete the rows that contain missing values

In [289]:
wine.dropna(axis=0,inplace=True)


### Step 14. Print only the non-null values in alcohol

In [292]:
wine['alcohol']

1      10.00
6      14.06
8      13.86
10     14.12
11     13.75
       ...  
172    13.71
173    13.40
174    13.27
175    13.17
176    14.13
Name: alcohol, Length: 170, dtype: float64

### Step 15.  Reset the index, so it starts with 0 again

In [294]:
wine.reset_index(drop=True)

Unnamed: 0,alcohol,malic_acid,alcalinity_of_ash,magnesium,flavanoids,proanthocyanins,hue
0,10.00,2.36,18.6,101.0,3.24,2.81,1.03
1,14.06,2.15,17.6,121.0,2.51,1.25,1.06
2,13.86,1.35,16.0,98.0,3.15,1.85,1.01
3,14.12,1.48,16.8,95.0,2.43,1.57,1.17
4,13.75,1.73,16.0,89.0,2.76,1.81,1.15
...,...,...,...,...,...,...,...
165,13.71,5.65,20.5,95.0,0.61,1.06,0.64
166,13.40,3.91,23.0,102.0,0.75,1.41,0.70
167,13.27,4.28,20.0,120.0,0.69,1.35,0.59
168,13.17,2.59,20.0,120.0,0.68,1.46,0.60


### BONUS: Create your own question and answer it.