# Wine

### Introduction:

This exercise is a adaptation from the UCI Wine dataset.
The only pupose is to practice deleting data with pandas.

### Step 1. Import the necessary libraries

In [1]:
# Step 1. Import the necessary libraries
import pandas as pd
import numpy as np

# Step 2. Import the dataset
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data'

# Step 3. Assign it to a variable called wine
wine = pd.read_csv(url, header=None)

# Step 4. Delete the first, fourth, seventh, ninth, eleventh, thirteenth and fourteenth columns
cols_to_drop = [0, 3, 6, 8, 10, 12, 13]
wine.drop(columns=cols_to_drop, inplace=True)

# Step 5. Assign the columns as:
wine.columns = [
    'alcohol',
    'malic_acid',
    'alcalinity_of_ash',
    'magnesium',
    'flavanoids',
    'proanthocyanins',
    'hue'
]

# Step 6. Set the values of the first 3 rows from alcohol as NaN
wine.loc[0:2, 'alcohol'] = np.nan

# Step 7. Set the values of rows 3 and 4 of magnesium as NaN
wine.loc[3:4, 'magnesium'] = np.nan

# Step 8. Fill NaN in alcohol with 10, and in magnesium with 100
wine['alcohol'].fillna(10, inplace=True)
wine['magnesium'].fillna(100, inplace=True)

# Step 9. Count the number of missing values
print("\nMissing values per column:\n", wine.isnull().sum())

# Step 10. Create an array of 10 random integers up to 10 (for indexing)
rand_indices = np.random.randint(0, len(wine), size=10)

# Step 11. Use random indices to set one value to NaN in each row (e.g., in 'hue')
for i in rand_indices:
    wine.loc[i, 'hue'] = np.nan

# Step 12. How many missing values now?
print("\nTotal missing values in dataset:", wine.isnull().sum().sum())

# Step 13. Delete the rows that contain missing values
wine.dropna(inplace=True)

# Step 14. Print only the non-null values in alcohol
print("\nNon-null alcohol values:\n", wine['alcohol'].dropna())

# Step 15. Reset the index so it starts with 0
wine.reset_index(drop=True, inplace=True)

# BONUS: What is the average flavanoids content after cleaning?
avg_flavanoids = wine['flavanoids'].mean()
print(f"\nBONUS: Average Flavanoids = {avg_flavanoids:.2f}")



Missing values per column:
 alcohol              0
malic_acid           0
alcalinity_of_ash    0
magnesium            0
flavanoids           0
proanthocyanins      0
hue                  0
dtype: int64

Total missing values in dataset: 10

Non-null alcohol values:
 0      10.00
1      10.00
2      10.00
3      14.37
4      13.24
       ...  
172    14.16
173    13.71
175    13.27
176    13.17
177    14.13
Name: alcohol, Length: 168, dtype: float64

BONUS: Average Flavanoids = 2.03


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  wine['alcohol'].fillna(10, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  wine['magnesium'].fillna(100, inplace=True)


### Step 2. Import the dataset from this [address](https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data). 

### Step 3. Assign it to a variable called wine

### Step 4. Delete the first, fourth, seventh, nineth, eleventh, thirteenth and fourteenth columns

### Step 5. Assign the columns as below:

The attributes are (donated by Riccardo Leardi, riclea '@' anchem.unige.it):  
1) alcohol  
2) malic_acid  
3) alcalinity_of_ash  
4) magnesium  
5) flavanoids  
6) proanthocyanins  
7) hue 

### Step 6. Set the values of the first 3 rows from alcohol as NaN

### Step 7. Now set the value of the rows 3 and 4 of magnesium as NaN

### Step 8. Fill the value of NaN with the number 10 in alcohol and 100 in magnesium

### Step 9. Count the number of missing values

### Step 10.  Create an array of 10 random numbers up until 10

### Step 11.  Use random numbers you generated as an index and assign NaN value to each of cell.

### Step 12.  How many missing values do we have?

### Step 13. Delete the rows that contain missing values

### Step 14. Print only the non-null values in alcohol

### Step 15.  Reset the index, so it starts with 0 again

### BONUS: Create your own question and answer it.