# **Introduction**

In this activity you will practice using Pandas functionality to check for and remove any unwanted data from a dataset. This activity will cover the following topics:

.Removing columns from a DataFrame

.Removing rows from a DataFrame

.Removing rows based on a condition

.Checking for duplicate data

*Create a DataFrame called df from the given CSV file exotic_plants_data.csv, then drop the column Type and assign the result to a new DataFrame called df_no_type*

In [2]:
import pandas as pd


# Your code here
df = pd.read_csv('exotic_plants_data.csv')
# your code here
df.head()

Unnamed: 0,Plant Name,Type,Origin,Height (cm)
0,Orchid,Ornamental,Tropical,30
1,Fern,Ground Cover,Tropical,40
2,Bamboo,Grass,Asia,600
3,Cactus,Succulent,America,60
4,Bird of Paradise,Ornamental,Africa,150


In [3]:
#Remove the column names Type
df_no_type = df.drop(columns='Type')
df_no_type

Unnamed: 0,Plant Name,Origin,Height (cm)
0,Orchid,Tropical,30
1,Fern,Tropical,40
2,Bamboo,Asia,600
3,Cactus,America,60
4,Bird of Paradise,Africa,150
...,...,...,...
71,Ficus,Asia,200
72,Columbine,North America,30
73,Jasmine,Asia,90
74,Fuchsia,Central and South America,40


*Remove rows at index 57 and 61 from the df DataFrame and assign the result to a new DataFrame called df_dropped_indices.*

In [4]:
df_dropped_indices = df.drop(index=[57,61])
df_dropped_indices.head(62)

Unnamed: 0,Plant Name,Type,Origin,Height (cm)
0,Orchid,Ornamental,Tropical,30
1,Fern,Ground Cover,Tropical,40
2,Bamboo,Grass,Asia,600
3,Cactus,Succulent,America,60
4,Bird of Paradise,Ornamental,Africa,150
...,...,...,...,...
58,Maple,Tree,North America,250
59,Orchid,Flower,Asia,50
60,Cactus,Succulent,Americas,30
62,Bamboo,Grass,Asia,900


*Remove rows where the Origin column is equal to Africa from the df DataFrame and store the result in a new DataFrame called df_no_african_plants.*

In [5]:
df_no_african_plants = df[df['Origin']!='Africa']
df_no_african_plants

Unnamed: 0,Plant Name,Type,Origin,Height (cm)
0,Orchid,Ornamental,Tropical,30
1,Fern,Ground Cover,Tropical,40
2,Bamboo,Grass,Asia,600
3,Cactus,Succulent,America,60
5,Banana Tree,Tree,Tropical,300
...,...,...,...,...
71,Ficus,Tree,Asia,200
72,Columbine,Flower,North America,30
73,Jasmine,Shrub,Asia,90
74,Fuchsia,Flower,Central and South America,40


*Check the df DataFrame for any duplicate rows and assign the result to a new DataFrame called df_duplicates*

In [6]:
df_duplicates = df[df.duplicated()]
df_duplicates

Unnamed: 0,Plant Name,Type,Origin,Height (cm)
6,Cactus,Succulent,America,60
30,Rafflesia,Flower,Southeast Asia,20
47,Kangaroo Paw,Flower,Australia,60
48,Bougainvillea,Shrub,South America,400
49,Bird of Paradise,Ornamental,Africa,150
50,Venus Flytrap,Carnivorous,North America,15
51,Rose,Flower,Asia,60


*Check the df DataFrame for any duplicate rows based on the Plant Name and Type columns and assign the result to a new DataFrame called df_plant_type_duplicates.*

In [7]:
df_plant_type_duplicates = df[df.duplicated(subset=['Plant Name','Type'])]
df_plant_type_duplicates

Unnamed: 0,Plant Name,Type,Origin,Height (cm)
6,Cactus,Succulent,America,60
22,Bamboo,Grass,Asia,500
30,Rafflesia,Flower,Southeast Asia,20
47,Kangaroo Paw,Flower,Australia,60
48,Bougainvillea,Shrub,South America,400
49,Bird of Paradise,Ornamental,Africa,150
50,Venus Flytrap,Carnivorous,North America,15
51,Rose,Flower,Asia,60
53,Tulip,Flower,Europe,30
55,Sunflower,Flower,North America,180


*Create a mask called clean_mask that will clean up any duplicates in the df DataFrame that have the same Plant Name and Origin and only keep the most up-to-date duplicate entry.*

In [8]:
clean_mask = df.duplicated(subset=['Plant Name', 'Origin'],keep='last')
# In case you want to keep the first occurrence instead of the last one, use keep='first'
# clean_mask = df.duplicated(subset=['Plant Name', 'Origin'], keep='first')

# Now you can use this mask to filter out the duplicates
cleaned_df = df[~clean_mask]

# your code here
cleaned_df

Unnamed: 0,Plant Name,Type,Origin,Height (cm)
0,Orchid,Ornamental,Tropical,30
1,Fern,Ground Cover,Tropical,40
5,Banana Tree,Tree,Tropical,300
6,Cactus,Succulent,America,60
7,Monstera,Ornamental,Tropical,70
...,...,...,...,...
71,Ficus,Tree,Asia,200
72,Columbine,Flower,North America,30
73,Jasmine,Shrub,Asia,90
74,Fuchsia,Flower,Central and South America,40
