<a href="https://colab.research.google.com/github/PashaIanko/Pandas-Micro-Learning-Course/blob/main/1_deleting_columns.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Deleting columns in pandas**


## All in one

**1. Import pandas library**

In [119]:
import pandas as pd

In [120]:
dataset = pd.DataFrame(
    {
        'John': {
            'favorite_fruit': 'apple',
            'favorite_vegetable': 'cucumber',
            'height_cm': 175
        },
        'Anna': {
            'favorite_fruit': 'orange',
            'favorite_vegetable': 'onion',
            'height_cm': 165.5
        },
        'Lorenzo': {
            'favorite_fruit': 'banana',
            'favorite_vegetable': 'tomato',
            'height_cm': 180.2
        }
    }
).T

**2. Look at the dataset**

In [121]:
dataset.head()

Unnamed: 0,favorite_fruit,favorite_vegetable,height_cm
John,apple,cucumber,175.0
Anna,orange,onion,165.5
Lorenzo,banana,tomato,180.2


**3. Delete one column and look at the result**

In [122]:
dataset_no_vegetable = dataset.drop(
    ['favorite_vegetable'],
    axis='columns'
)

In [123]:
dataset_no_vegetable.head()

Unnamed: 0,favorite_fruit,height_cm
John,apple,175.0
Anna,orange,165.5
Lorenzo,banana,180.2


**4. Delete multiple columns and look at the result**

In [124]:
dataset_one_column = dataset.drop(
    ['favorite_fruit', 'height_cm'],
    axis='columns'
)

In [125]:
dataset_one_column.head()

Unnamed: 0,favorite_vegetable
John,cucumber
Anna,onion
Lorenzo,tomato


## Step by step

### Import libraries

In [92]:
import pandas as pd
import seaborn as sns

### Create the dataset

In [93]:
dataset = pd.DataFrame(
    {
        'John': {
            'favorite_fruit': 'apple',
            'favorite_vegetable': 'cucumber',
            'height_cm': 175
        },
        'Anna': {
            'favorite_fruit': 'orange',
            'favorite_vegetable': 'onion',
            'height_cm': 165.5
        },
        'Lorenzo': {
            'favorite_fruit': 'banana',
            'favorite_vegetable': 'tomato',
            'height_cm': 180.2
        }
    }
).T

In [94]:
dataset.head()

Unnamed: 0,favorite_fruit,favorite_vegetable,height_cm
John,apple,cucumber,175.0
Anna,orange,onion,165.5
Lorenzo,banana,tomato,180.2


### Delete the column

In [95]:
dataset_no_fruit = dataset.drop(
    ['favorite_fruit'],
    axis='columns'
)

dataset_no_fruit.head()

Unnamed: 0,favorite_vegetable,height_cm
John,cucumber,175.0
Anna,onion,165.5
Lorenzo,tomato,180.2


In [25]:
dataset_no_fruit = dataset.drop(['favorite_fruit'], axis='columns')
dataset_no_fruit.head()

Unnamed: 0,favorite_vegetable,height_cm
John,cucumber,175.0
Anna,onion,165.5
Lorenzo,tomato,180.2


### Delete multiple columns

In [96]:
dataset_only_height = dataset.drop(
    ['favorite_fruit', 'favorite_vegetable'],
    axis='columns'
)
dataset_only_height.head()

Unnamed: 0,height_cm
John,175.0
Anna,165.5
Lorenzo,180.2


### Delete columns inplace

#### When `inplace=False` -> columns **will not** be deleted!

In [97]:
print(f'Columns before deletion: {dataset.columns.tolist()}')
dataset.drop(['height_cm'], axis='columns', inplace=False)
print(f'Columns after deletion: {dataset.columns.tolist()}')

Columns before deletion: ['favorite_fruit', 'favorite_vegetable', 'height_cm']
Columns after deletion: ['favorite_fruit', 'favorite_vegetable', 'height_cm']


#### When `inplace=True` -> Columns **will** be deleted!

In [98]:
print(f'Columns before deletion: {dataset.columns.tolist()}')
dataset.drop(['height_cm'], axis='columns', inplace=True)
print(f'Columns after deletion: {dataset.columns.tolist()}')

Columns before deletion: ['favorite_fruit', 'favorite_vegetable', 'height_cm']
Columns after deletion: ['favorite_fruit', 'favorite_vegetable']


## Exercises

In [39]:
iris_dataset = sns.load_dataset('iris')
iris_dataset.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


### Exercise 1

- Fill the code in the function `def delete_sepal_length()`. This function has to return a new dataset, without `sepal_length` column

In [84]:
def delete_sepal_length(df):
    # RETURN HERE A DATASET
    # return ...
    pass

In [89]:
def test_solution(df):
    expect_columns = ['sepal_width', 'petal_length', 'petal_width', 'species']
    try:
        if (df.columns != expect_columns).any():
            print(
                f'''
                Mistake!
                Expect: {expect_columns}
                Your columns: {df.columns.tolist()}
                '''
            )
        else:
            print(f'WELL DONE! CORRECT!')
    except:
        print(
            f'''
            Mistake: your function returned: {df},
            but it is supposed to return a dataset without
            'sepal_length' column
            '''
        )

In [90]:
df_new = delete_sepal_length(iris_dataset)
test_solution(df_new)


            Mistake: your function returned: None,
            but it is supposed to return a dataset without
            'sepal_length' column
            


### Solution to the exercise

In [91]:
def delete_sepal_length(df):
    return df.drop(['sepal_length'], axis='columns')

df_new = delete_sepal_length(iris_dataset)
test_solution(df_new)

WELL DONE! CORRECT!
