<a href="https://colab.research.google.com/github/PashaIanko/Pandas-Micro-Learning-Course/blob/main/1_deleting_columns.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Deleting columns in pandas**


**1. Import pandas library**

In [23]:
import pandas as pd

**2. Download (or create) the dataset**

In [24]:
dataset = pd.DataFrame(
    {
        'John': {
            'favorite_fruit': 'apple',
            'favorite_vegetable': 'cucumber',
            'height_cm': 175
        },
        'Anna': {
            'favorite_fruit': 'orange',
            'favorite_vegetable': 'onion',
            'height_cm': 165.5
        },
        'Lorenzo': {
            'favorite_fruit': 'banana',
            'favorite_vegetable': 'tomato',
            'height_cm': 180.2
        }
    }
).T

**3. Look at the dataset**

In [25]:
dataset.head()

Unnamed: 0,favorite_fruit,favorite_vegetable,height_cm
John,apple,cucumber,175.0
Anna,orange,onion,165.5
Lorenzo,banana,tomato,180.2


**4. Delete one column and look at the result**

In [26]:
dataset_no_vegetable = dataset.drop(
    ['favorite_vegetable'],
    axis='columns'
)

In [28]:
dataset_no_vegetable.head()

Unnamed: 0,favorite_fruit,height_cm
John,apple,175.0
Anna,orange,165.5
Lorenzo,banana,180.2


**4. Delete multiple columns and look at the result**

In [29]:
dataset_one_column = dataset.drop(
    ['favorite_fruit', 'height_cm'],
    axis='columns'
)

In [30]:
dataset_one_column.head()

Unnamed: 0,favorite_vegetable
John,cucumber
Anna,onion
Lorenzo,tomato


# Extra knowledge

#### When `inplace=False` -> columns **will not** be deleted!

In [31]:
print(f'Columns before deletion: {dataset.columns.tolist()}')
dataset.drop(['height_cm'], axis='columns', inplace=False)
print(f'Columns after deletion: {dataset.columns.tolist()}')

Columns before deletion: ['favorite_fruit', 'favorite_vegetable', 'height_cm']
Columns after deletion: ['favorite_fruit', 'favorite_vegetable', 'height_cm']


#### When `inplace=True` -> Columns **will** be deleted!

In [32]:
print(f'Columns before deletion: {dataset.columns.tolist()}')
dataset.drop(['height_cm'], axis='columns', inplace=True)
print(f'Columns after deletion: {dataset.columns.tolist()}')

Columns before deletion: ['favorite_fruit', 'favorite_vegetable', 'height_cm']
Columns after deletion: ['favorite_fruit', 'favorite_vegetable']


# Exercises

## Import libraries

In [33]:
import pandas as pd
import seaborn as sns

## Load the iris dataset

In [34]:
iris_dataset = sns.load_dataset('iris')
iris_dataset.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


## Exercise 1

- Fill the code in the function `def delete_sepal_length()`. This function has to return a new dataset, without `sepal_length` column

In [35]:
def delete_sepal_length(df):
    # Return a dataset, that will
    # not contain a 'sepal_length' column
    # return ...
    pass

In [36]:
def test_solution(df, expect_columns):
    try:
        if (df.columns != expect_columns).any():
            print(
                f'''
                Mistake!
                Expect: {expect_columns}
                Your columns: {df.columns.tolist()}
                '''
            )
        else:
            print(f'WELL DONE! CORRECT!')
    except:
        print(
            f'''
            Mistake: your function returned: {df},
            but it is supposed to return a dataset with
            {expect_columns} columns
            '''
        )

In [37]:
df_new = delete_sepal_length(iris_dataset)
test_solution(df_new, ['sepal_width', 'petal_length', 'petal_width', 'species'])


            Mistake: your function returned: None,
            but it is supposed to return a dataset with
            ['sepal_width', 'petal_length', 'petal_width', 'species'] columns
            


## Solution to the exercise 1

In [38]:
def delete_sepal_length(df):
    return df.drop(['sepal_length'], axis='columns')

df_new = delete_sepal_length(iris_dataset)
test_solution(df_new, ['sepal_width', 'petal_length', 'petal_width', 'species'])

WELL DONE! CORRECT!


## Exercise 2

In [39]:
def delete_all_except_species(df):
    # Return a dataset, that will contain
    # only one column - 'species'

    pass

In [40]:
df_only_species = delete_all_except_species(iris_dataset)
test_solution(df_only_species, ['species'])


            Mistake: your function returned: None,
            but it is supposed to return a dataset with
            ['species'] columns
            


## Solution to exercise 2

In [41]:
def delete_all_except_species(df):
    return df.drop(
        # we delete every column, that is not equal to 'species' below:
        [c for c in df.columns.tolist() if c != 'species'],
        axis='columns'
    )

In [42]:
test_solution(
    delete_all_except_species(iris_dataset), ['species']
)

WELL DONE! CORRECT!
