Having grasped the knowledge on Regex, you can use Pandas intuitively to make more advances in manipulating data.

In this notebook will be guided by the below scenario:
##### Scenario:
Suppose we have been given a dataset of different vehicle models recorded in a town. The model names are inconsistently formatted with leading/trailing spaces and varying case letters.

##### Question 
Clean the 'model' column by removing leading/trailing spaces and converting all names to lowercase.

In [15]:
"""
# Sample Data - No reference to the data (generated randomly) - Play around with data you can generate

Task  >>>>>>>>>>>>> # Make the second column to be LowerCase <<<<<<<<<<<<<<<<<

# Data is stored in a  dictionary with a key-pair value
# Clean the model column
# Isolate the column that needs cleaning
# Strip used to remove any leading or trailing spaces from the string
# .upper/.lower is used to convert the characters in the string to Upper or Lower case
# The methods are chained to perform both operations on a single line 
# Output should be the same as the column in the the DF
"""

import pandas as pd

Data = {
    'model':[' Fortuner (Legender 2.8)', 'vitz', 'BmW(x7 Series)', 'LandROVER(4500cc)', 'vitz(1390cc)'],
    'driveTrain':['Four WHEEL Drive', '2 Wheel Drive(Front wheel)', '2WD (REAR wheel)', '4X4', 'Front Drive']
}
df = pd.DataFrame(Data)

df['model'] = df['model'].str.strip().str.upper()
print(df)



                     model                  driveTrain
0  FORTUNER (LEGENDER 2.8)            Four WHEEL Drive
1                     VITZ  2 Wheel Drive(Front wheel)
2           BMW(X7 SERIES)            2WD (REAR wheel)
3        LANDROVER(4500CC)                         4X4
4             VITZ(1390CC)                 Front Drive


From the DataFrame, some of the models have the engine capacity defined.

Use Regex to extract the Engine Capacity from model and create a new column known as Engine Size
Fill with uknown if there is no Engine Capacity defined

In [16]:
"""
# Populating the newly created column we use the str.extract on the model column
# Extraction is done matching a regualr pattern from each string
# Regex Pattern (\d+\.\d+|\d+)
# Pattern matches:
        i) Sequence of digits followed by decimal points & more digits
        ii) Sequence of digits
# Extracted values are assigned to the Engine Size column
# Missing Values in the column  are filled with "Unknown"

"""

df['Engine Size'] = df['model'].str.extract('(\d+\.\d+|\d+)').fillna("Unknown")
print(df)

                     model                  driveTrain Engine Size
0  FORTUNER (LEGENDER 2.8)            Four WHEEL Drive         2.8
1                     VITZ  2 Wheel Drive(Front wheel)     Unknown
2           BMW(X7 SERIES)            2WD (REAR wheel)           7
3        LANDROVER(4500CC)                         4X4        4500
4             VITZ(1390CC)                 Front Drive        1390


  """
  df['Engine Size'] = df['model'].str.extract('(\d+\.\d+|\d+)').fillna("Unknown")


After some cleaning, we need to check  the values that are unique and their respective count 

Perform data aggregation to count the number of occurence of each unique model

In [17]:
# More cleaning and model name standardization 
"""
# df['model']                          Access the model column as a pandas Series
# .str.extract('([a-zA-Z]+)            Used to extract the alphabetic characters
# expand=False                         Ensures that it remains to be a Pandas Series
# fillna('').str.strip().str.lower()   This is a string operation 

"""

df['model'] = df['model'].str.extract('([a-zA-Z]+)', expand=False).fillna('').str.strip().str.lower()

# Count the number of occurences
model_count = df['model'].value_counts()
print(model_count)

model
vitz         2
fortuner     1
bmw          1
landrover    1
Name: count, dtype: int64
