# Data Science with Python - Pandas Basics
---

# PANDAS BASICS
---

**version: 0.25.2**
  
Pandas is a high-level data manipulation tool built on top of the Numpy package. The pandas package has very interesting data structures for data manipulation and is therefore widely used by data scientists.


## Data Structures

### Series

Series are labeled one-dimensional arrays capable of storing any type of data. The row labels are called **index**. The basic way of creating a Series is as follows:


```
    s = pd.Series(data, index = index)
```

The *data* argument can be a dictionary, a list, a Numpy array or a constant.

### DataFrames

DataFrame is a two-dimensional tabular data structure with row and column labels. Like Series, DataFrames are capable of storing any type of data.


```
    df = pd.DataFrame(data, index = index, columns = columns)
```

The *data* argument can be a dictionary, a list, a Numpy array, a Series and another DataFrame.

**Documentation:** https://pandas.pydata.org/pandas-docs/version/0.25/

## Data Structures

In [1]:
import pandas as pd

### Creating a Series from a List

In [2]:
cars = ['Jetta Variant', 'Passat', 'Crossfox']
cars

['Jetta Variant', 'Passat', 'Crossfox']

In [3]:
pd.Series(cars)

0    Jetta Variant
1           Passat
2         Crossfox
dtype: object

### Creating a DataFrame from a list of dictionaries

In [5]:
data = [
     {'Name': 'Jetta Variant', 'Engine': '4.0 Turbo Engine', 'Year': 2003, 'Mileage': 44410.0, 'Zero_km': False, 'Value': 88078.64},
     {'Name': 'Passat', 'Engine': 'Diesel Engine', 'Year': 1991, 'Mileage': 5712.0, 'Zero_km': False, 'Value': 106161.94},
     {'Name': 'Crossfox', 'Engine': 'V8 Diesel Engine', 'Year': 1990, 'Mileage': 37123.0, 'Zero_km': False, 'Value': 72832.16}
]

In [6]:
dataset = pd.DataFrame(data)

In [7]:
dataset

Unnamed: 0,Name,Engine,Year,Mileage,Zero_km,Value
0,Jetta Variant,4.0 Turbo Engine,2003,44410.0,False,88078.64
1,Passat,Diesel Engine,1991,5712.0,False,106161.94
2,Crossfox,V8 Diesel Engine,1990,37123.0,False,72832.16


In [12]:
dataset[['Name', 'Mileage', 'Zero_km', 'Value', 'Engine', 'Year']]    # Modify columns order

Unnamed: 0,Name,Mileage,Zero_km,Value,Engine,Year
0,Jetta Variant,44410.0,False,88078.64,4.0 Turbo Engine,2003
1,Passat,5712.0,False,106161.94,Diesel Engine,1991
2,Crossfox,37123.0,False,72832.16,V8 Diesel Engine,1990


### Creating a DataFrame from a Dictionary (whose *values* are lists)

In [13]:
data = {
     'Name': ['Jetta Variant', 'Passat', 'Crossfox'],
     'Engine': ['4.0 Turbo Engine', 'Diesel Engine', 'V8 Diesel Engine'],
     'Year': [2003, 1991, 1990],
     'Mileage': [44410.0, 5712.0, 37123.0],
     'Zero_km': [False, False, False],
     'Value': [88078.64, 106161.94, 72832.16]
}

In [14]:
dataset = pd.DataFrame(data)

In [15]:
dataset

Unnamed: 0,Name,Engine,Year,Mileage,Zero_km,Value
0,Jetta Variant,4.0 Turbo Engine,2003,44410.0,False,88078.64
1,Passat,Diesel Engine,1991,5712.0,False,106161.94
2,Crossfox,V8 Diesel Engine,1990,37123.0,False,72832.16


### Creating a DataFrame from an external file

In [16]:
dataset = pd.read_csv('db.csv', sep = ';', index_col = 0)

In [17]:
dataset

Unnamed: 0_level_0,Engine,Year,Mileage,Zero_km,Accessories,Value
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Jetta Variant,4.0 Turbo Engine,2003,44410.0,False,"['Alloy wheels', 'Power locks', 'Autopilot', '...",88078.64
Passat,Diesel Engine,1991,5712.0,False,"['Multimedia Center', 'Panoramic Roof', 'ABS B...",106161.94
Crossfox,V8 diesel engine,1990,37123.0,False,"['Autopilot', 'Stability control', 'Twilight s...",72832.16
DS5,Motor 2.4 Turbo,2019,,True,"['Power locks', '4 X 4', 'Power windows', 'Twi...",124549.07
Aston Martin DB4,2.4 Turbo Engine,2006,25757.0,False,"['Alloy wheels', '4 X 4', 'Multimedia center',...",92612.10
...,...,...,...,...,...,...
Phantom 2013,V8 engine,2014,27505.0,False,"['Stability control', 'Autopilot', 'Automatic ...",51759.58
Cadillac Ciel concept,Motor V8,1991,29981.0,False,"['Leather seats', 'Digital panel', 'Rain senso...",51667.06
GLK class,5.0 V8 Bi-Turbo engine,2002,52637.0,False,"['Alloy wheels', 'Traction control', 'Automati...",68934.03
Aston Martin DB5,Diesel Engine,1996,7685.0,False,"['Air conditioning', '4 X 4', 'Automatic trans...",122110.90


## Selections with DataFrames

In [18]:
dataset.head()    # 5 first rows

Unnamed: 0_level_0,Engine,Year,Mileage,Zero_km,Accessories,Value
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Jetta Variant,4.0 Turbo Engine,2003,44410.0,False,"['Alloy wheels', 'Power locks', 'Autopilot', '...",88078.64
Passat,Diesel Engine,1991,5712.0,False,"['Multimedia Center', 'Panoramic Roof', 'ABS B...",106161.94
Crossfox,V8 diesel engine,1990,37123.0,False,"['Autopilot', 'Stability control', 'Twilight s...",72832.16
DS5,Motor 2.4 Turbo,2019,,True,"['Power locks', '4 X 4', 'Power windows', 'Twi...",124549.07
Aston Martin DB4,2.4 Turbo Engine,2006,25757.0,False,"['Alloy wheels', '4 X 4', 'Multimedia center',...",92612.1


### Selecting columns

In [19]:
dataset['Value']

Name
Jetta Variant             88078.64
Passat                   106161.94
Crossfox                  72832.16
DS5                      124549.07
Aston Martin DB4          92612.10
                           ...    
Phantom 2013              51759.58
Cadillac Ciel concept     51667.06
GLK class                 68934.03
Aston Martin DB5         122110.90
Macan                     90381.47
Name: Value, Length: 258, dtype: float64

In [20]:
type(dataset['Value'])

pandas.core.series.Series

In [21]:
dataset[['Value']]

Unnamed: 0_level_0,Value
Name,Unnamed: 1_level_1
Jetta Variant,88078.64
Passat,106161.94
Crossfox,72832.16
DS5,124549.07
Aston Martin DB4,92612.10
...,...
Phantom 2013,51759.58
Cadillac Ciel concept,51667.06
GLK class,68934.03
Aston Martin DB5,122110.90


In [22]:
type(dataset[['Value']])

pandas.core.frame.DataFrame

### Selecting lines - [ i : j ]

**Note:** Indexing originates from zero and in *slices* the row with index *i* is **included** and the row with index *j* is **not is included** in the result.

In [23]:
dataset[0:3]

Unnamed: 0_level_0,Engine,Year,Mileage,Zero_km,Accessories,Value
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Jetta Variant,4.0 Turbo Engine,2003,44410.0,False,"['Alloy wheels', 'Power locks', 'Autopilot', '...",88078.64
Passat,Diesel Engine,1991,5712.0,False,"['Multimedia Center', 'Panoramic Roof', 'ABS B...",106161.94
Crossfox,V8 diesel engine,1990,37123.0,False,"['Autopilot', 'Stability control', 'Twilight s...",72832.16


### Using *.loc* for selections

**Note:** Selects a group of rows and columns according to labels or a boolean matrix.

In [24]:
dataset.loc['Passat']

Engine                                             Diesel Engine
Year                                                        1991
Mileage                                                   5712.0
Zero_km                                                    False
Accessories    ['Multimedia Center', 'Panoramic Roof', 'ABS B...
Value                                                  106161.94
Name: Passat, dtype: object

In [25]:
dataset.loc[['Passat', 'DS5']]

Unnamed: 0_level_0,Engine,Year,Mileage,Zero_km,Accessories,Value
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Passat,Diesel Engine,1991,5712.0,False,"['Multimedia Center', 'Panoramic Roof', 'ABS B...",106161.94
DS5,Motor 2.4 Turbo,2019,,True,"['Power locks', '4 X 4', 'Power windows', 'Twi...",124549.07


In [26]:
dataset.loc[['Passat', 'DS5'], ['Engine', 'Value']]

Unnamed: 0_level_0,Engine,Value
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Passat,Diesel Engine,106161.94
DS5,Motor 2.4 Turbo,124549.07


In [27]:
dataset.loc[:, ['Engine', 'Value']]

Unnamed: 0_level_0,Engine,Value
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Jetta Variant,4.0 Turbo Engine,88078.64
Passat,Diesel Engine,106161.94
Crossfox,V8 diesel engine,72832.16
DS5,Motor 2.4 Turbo,124549.07
Aston Martin DB4,2.4 Turbo Engine,92612.10
...,...,...
Phantom 2013,V8 engine,51759.58
Cadillac Ciel concept,Motor V8,51667.06
GLK class,5.0 V8 Bi-Turbo engine,68934.03
Aston Martin DB5,Diesel Engine,122110.90


### Using *.iloc* for selections

**Note:** Selects based on indices, i.e. based on the position of the information.

In [28]:
dataset.head()

Unnamed: 0_level_0,Engine,Year,Mileage,Zero_km,Accessories,Value
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Jetta Variant,4.0 Turbo Engine,2003,44410.0,False,"['Alloy wheels', 'Power locks', 'Autopilot', '...",88078.64
Passat,Diesel Engine,1991,5712.0,False,"['Multimedia Center', 'Panoramic Roof', 'ABS B...",106161.94
Crossfox,V8 diesel engine,1990,37123.0,False,"['Autopilot', 'Stability control', 'Twilight s...",72832.16
DS5,Motor 2.4 Turbo,2019,,True,"['Power locks', '4 X 4', 'Power windows', 'Twi...",124549.07
Aston Martin DB4,2.4 Turbo Engine,2006,25757.0,False,"['Alloy wheels', '4 X 4', 'Multimedia center',...",92612.1


In [29]:
dataset.iloc[[1]]

Unnamed: 0_level_0,Engine,Year,Mileage,Zero_km,Accessories,Value
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Passat,Diesel Engine,1991,5712.0,False,"['Multimedia Center', 'Panoramic Roof', 'ABS B...",106161.94


In [30]:
dataset.iloc[1:4]

Unnamed: 0_level_0,Engine,Year,Mileage,Zero_km,Accessories,Value
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Passat,Diesel Engine,1991,5712.0,False,"['Multimedia Center', 'Panoramic Roof', 'ABS B...",106161.94
Crossfox,V8 diesel engine,1990,37123.0,False,"['Autopilot', 'Stability control', 'Twilight s...",72832.16
DS5,Motor 2.4 Turbo,2019,,True,"['Power locks', '4 X 4', 'Power windows', 'Twi...",124549.07


In [32]:
dataset.iloc[1:4, [0, 5, 2]]

Unnamed: 0_level_0,Engine,Value,Mileage
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Passat,Diesel Engine,106161.94,5712.0
Crossfox,V8 diesel engine,72832.16,37123.0
DS5,Motor 2.4 Turbo,124549.07,


In [33]:
dataset.iloc[[1, 42, 22], [0, 5, 2]]

Unnamed: 0_level_0,Engine,Value,Mileage
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Passat,Diesel Engine,106161.94,5712.0
Optima,Motor 1.8 16v,86641.34,
Lamborghini Obvious,V6 Diesel Engine,133529.84,98079.0


In [34]:
dataset.iloc[:, [0, 5, 2]]

Unnamed: 0_level_0,Engine,Value,Mileage
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Jetta Variant,4.0 Turbo Engine,88078.64,44410.0
Passat,Diesel Engine,106161.94,5712.0
Crossfox,V8 diesel engine,72832.16,37123.0
DS5,Motor 2.4 Turbo,124549.07,
Aston Martin DB4,2.4 Turbo Engine,92612.10,25757.0
...,...,...,...
Phantom 2013,V8 engine,51759.58,27505.0
Cadillac Ciel concept,Motor V8,51667.06,29981.0
GLK class,5.0 V8 Bi-Turbo engine,68934.03,52637.0
Aston Martin DB5,Diesel Engine,122110.90,7685.0


## Queries with DataFrames

In [35]:
dataset.head()

Unnamed: 0_level_0,Engine,Year,Mileage,Zero_km,Accessories,Value
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Jetta Variant,4.0 Turbo Engine,2003,44410.0,False,"['Alloy wheels', 'Power locks', 'Autopilot', '...",88078.64
Passat,Diesel Engine,1991,5712.0,False,"['Multimedia Center', 'Panoramic Roof', 'ABS B...",106161.94
Crossfox,V8 diesel engine,1990,37123.0,False,"['Autopilot', 'Stability control', 'Twilight s...",72832.16
DS5,Motor 2.4 Turbo,2019,,True,"['Power locks', '4 X 4', 'Power windows', 'Twi...",124549.07
Aston Martin DB4,2.4 Turbo Engine,2006,25757.0,False,"['Alloy wheels', '4 X 4', 'Multimedia center',...",92612.1


In [None]:
dataset.Engine

Nome
Jetta Variant                  Motor 4.0 Turbo
Passat                            Motor Diesel
Crossfox                       Motor Diesel V8
DS5                            Motor 2.4 Turbo
Aston Martin DB4               Motor 2.4 Turbo
                                 ...          
Phantom 2013                          Motor V8
Cadillac Ciel concept                 Motor V8
Classe GLK               Motor 5.0 V8 Bi-Turbo
Aston Martin DB5                  Motor Diesel
Macan                          Motor Diesel V6
Name: Motor, Length: 258, dtype: object

In [36]:
select = dataset.Engine == 'Diesel Engine'

In [37]:
type(select)        # boolean series: dataset.Engine == 'Diesel Engine'

pandas.core.series.Series

In [38]:
dataset[select]

Unnamed: 0_level_0,Engine,Year,Mileage,Zero_km,Accessories,Value
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Passat,Diesel Engine,1991,5712.0,False,"['Multimedia Center', 'Panoramic Roof', 'ABS B...",106161.94
Effa Hafei Pickup Trunk,Diesel Engine,1991,102959.0,False,"['Stability control', 'Digital panel', 'Electr...",125684.65
Sorento,Diesel Engine,2019,,True,"['Rain sensor', 'Parking camera', 'Parking sen...",81399.35
New Fiesta Hatch,Diesel Engine,2017,118895.0,False,"['Parking sensor', 'Power locks', 'Autopilot',...",66007.16
Kangoo Express,Diesel Engine,2007,29132.0,False,"['Leather seats', 'Automatic gearshift', 'Auto...",146716.91
Fit,Diesel Engine,2013,44329.0,False,"['ABS Brakes', 'Parking Camera', 'Automatic Tr...",77836.23
Symbol,Diesel Engine,2016,117714.0,False,"['4 X 4', 'Autopilot', 'Twilight sensor', 'Par...",133030.6
A4 Sedan,Diesel Engine,2002,30511.0,False,"['Parking camera', '4 X 4', 'Power locks', 'Ai...",96369.04
A4 Avant,Diesel Engine,2014,17357.0,False,"['Panoramic roof', '4 X 4', 'Leather seats', '...",138946.88
Silver Shadow,Diesel Engine,2015,99052.0,False,"['4 X 4', 'Multimedia Center', 'Air conditioni...",143568.22


In [41]:
dataset[(dataset.Engine == 'Diesel Engine') & (dataset.Zero_km == True)]

Unnamed: 0_level_0,Engine,Year,Mileage,Zero_km,Accessories,Value
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Sorento,Diesel Engine,2019,,True,"['Rain sensor', 'Parking camera', 'Parking sen...",81399.35
Camry,Diesel Engine,2019,,True,"['Power locks', 'Alloy wheels', 'Twilight sens...",138597.27
Aston Martin Virage,Diesel Engine,2019,,True,"['Power Locks', 'Traction Control', 'Parking C...",97290.18
7 Series Sedan,Diesel Engine,2019,,True,"['Power windows', 'Power locks', 'Alloy wheels...",67539.79


In [42]:
(dataset.Engine == 'Diesel Engine') & (dataset.Zero_km == True)       # boolean series  (& / |)

Name
Jetta Variant            False
Passat                   False
Crossfox                 False
DS5                      False
Aston Martin DB4         False
                         ...  
Phantom 2013             False
Cadillac Ciel concept    False
GLK class                False
Aston Martin DB5         False
Macan                    False
Length: 258, dtype: bool

### Using the query method

In [None]:
dataset.query('Engine == "Diesel Engine" and Zero_km == True')      # boolean series  (& / |) / (and / or)

Unnamed: 0_level_0,Motor,Ano,Quilometragem,Zero_km,Acessórios,Valor
Nome,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Sorento,Motor Diesel,2019,,True,"['Sensor de chuva', 'Câmera de estacionamento'...",81399.35
Cielo Hatch,Motor Diesel,2019,,True,"['Painel digital', 'Central multimídia', 'Câme...",145197.7
Camry,Motor Diesel,2019,,True,"['Travas elétricas', 'Rodas de liga', 'Sensor ...",138597.27
Aston Martin Virage,Motor Diesel,2019,,True,"['Travas elétricas', 'Controle de tração', 'Câ...",97290.18
Série 7 Sedã,Motor Diesel,2019,,True,"['Vidros elétricos', 'Travas elétricas', 'Roda...",67539.79


## Iterating with DataFrames

In [43]:
dataset.head()

Unnamed: 0_level_0,Engine,Year,Mileage,Zero_km,Accessories,Value
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Jetta Variant,4.0 Turbo Engine,2003,44410.0,False,"['Alloy wheels', 'Power locks', 'Autopilot', '...",88078.64
Passat,Diesel Engine,1991,5712.0,False,"['Multimedia Center', 'Panoramic Roof', 'ABS B...",106161.94
Crossfox,V8 diesel engine,1990,37123.0,False,"['Autopilot', 'Stability control', 'Twilight s...",72832.16
DS5,Motor 2.4 Turbo,2019,,True,"['Power locks', '4 X 4', 'Power windows', 'Twi...",124549.07
Aston Martin DB4,2.4 Turbo Engine,2006,25757.0,False,"['Alloy wheels', '4 X 4', 'Multimedia center',...",92612.1


In [44]:
for item in dataset:
  print(item)

Engine
Year
Mileage
Zero_km
Accessories
Value


In [45]:
for index, row in dataset.iterrows():
  if (2019 - row['Year'] != 0):
    dataset.loc[index, 'Km_avg'] = row['Mileage'] / (2019 - row['Year'])
  else:
    dataset.loc[index, 'Km_avg'] = 0
    
dataset

Unnamed: 0_level_0,Engine,Year,Mileage,Zero_km,Accessories,Value,Km_avg
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Jetta Variant,4.0 Turbo Engine,2003,44410.0,False,"['Alloy wheels', 'Power locks', 'Autopilot', '...",88078.64,2775.625000
Passat,Diesel Engine,1991,5712.0,False,"['Multimedia Center', 'Panoramic Roof', 'ABS B...",106161.94,204.000000
Crossfox,V8 diesel engine,1990,37123.0,False,"['Autopilot', 'Stability control', 'Twilight s...",72832.16,1280.103448
DS5,Motor 2.4 Turbo,2019,,True,"['Power locks', '4 X 4', 'Power windows', 'Twi...",124549.07,0.000000
Aston Martin DB4,2.4 Turbo Engine,2006,25757.0,False,"['Alloy wheels', '4 X 4', 'Multimedia center',...",92612.10,1981.307692
...,...,...,...,...,...,...,...
Phantom 2013,V8 engine,2014,27505.0,False,"['Stability control', 'Autopilot', 'Automatic ...",51759.58,5501.000000
Cadillac Ciel concept,Motor V8,1991,29981.0,False,"['Leather seats', 'Digital panel', 'Rain senso...",51667.06,1070.750000
GLK class,5.0 V8 Bi-Turbo engine,2002,52637.0,False,"['Alloy wheels', 'Traction control', 'Automati...",68934.03,3096.294118
Aston Martin DB5,Diesel Engine,1996,7685.0,False,"['Air conditioning', '4 X 4', 'Automatic trans...",122110.90,334.130435


## Data processing

In [46]:
dataset.head()

Unnamed: 0_level_0,Engine,Year,Mileage,Zero_km,Accessories,Value,Km_avg
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Jetta Variant,4.0 Turbo Engine,2003,44410.0,False,"['Alloy wheels', 'Power locks', 'Autopilot', '...",88078.64,2775.625
Passat,Diesel Engine,1991,5712.0,False,"['Multimedia Center', 'Panoramic Roof', 'ABS B...",106161.94,204.0
Crossfox,V8 diesel engine,1990,37123.0,False,"['Autopilot', 'Stability control', 'Twilight s...",72832.16,1280.103448
DS5,Motor 2.4 Turbo,2019,,True,"['Power locks', '4 X 4', 'Power windows', 'Twi...",124549.07,0.0
Aston Martin DB4,2.4 Turbo Engine,2006,25757.0,False,"['Alloy wheels', '4 X 4', 'Multimedia center',...",92612.1,1981.307692


In [47]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
Index: 258 entries, Jetta Variant to Macan
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Engine       258 non-null    object 
 1   Year         258 non-null    int64  
 2   Mileage      197 non-null    float64
 3   Zero_km      258 non-null    bool   
 4   Accessories  258 non-null    object 
 5   Value        258 non-null    float64
 6   Km_avg       258 non-null    float64
dtypes: bool(1), float64(3), int64(1), object(2)
memory usage: 22.5+ KB


In [48]:
dataset.Mileage.isna()

Name
Jetta Variant            False
Passat                   False
Crossfox                 False
DS5                       True
Aston Martin DB4         False
                         ...  
Phantom 2013             False
Cadillac Ciel concept    False
GLK class                False
Aston Martin DB5         False
Macan                    False
Name: Mileage, Length: 258, dtype: bool

In [49]:
dataset[dataset.Mileage.isna()]

Unnamed: 0_level_0,Engine,Year,Mileage,Zero_km,Accessories,Value,Km_avg
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
DS5,Motor 2.4 Turbo,2019,,True,"['Power locks', '4 X 4', 'Power windows', 'Twi...",124549.07,0.0
A5,4.0 Turbo Engine,2019,,True,"['Automatic transmission', 'Parking camera', '...",56445.20,0.0
J5,V6 engine,2019,,True,"['Twilight sensor', 'Digital panel', 'Alloy wh...",53183.38,0.0
A3,Motor 1.0 8v,2019,,True,"['4 X 4', 'Autopilot', 'Multimedia Center', 'P...",88552.39,0.0
Series 1 M,V8 engine,2019,,True,"['Stability control', 'Multimedia center', 'Di...",94564.40,0.0
...,...,...,...,...,...,...,...
Lamborghini Reventón,Motor 4.0 Turbo,2019,,True,"['Traction control', 'Air conditioning', 'Mult...",67664.86,0.0
Benni Mini,V8 engine,2019,,True,"['Twilight sensor', 'Automatic transmission', ...",126247.84,0.0
Uno,Motor Diesel V6,2019,,True,"['Multimedia center', 'Twilight sensor', 'Stab...",128852.21,0.0
Santa Fe,Motor 3.0 32v,2019,,True,"['Electric locks', 'Air conditioning', '4 X 4'...",129415.33,0.0


In [50]:
dataset.fillna(0, inplace = True)     # Without 'inplace' property, it would only change visualization

In [51]:
dataset.query("Zero_km == True")

Unnamed: 0_level_0,Engine,Year,Mileage,Zero_km,Accessories,Value,Km_avg
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
DS5,Motor 2.4 Turbo,2019,0.0,True,"['Power locks', '4 X 4', 'Power windows', 'Twi...",124549.07,0.0
A5,4.0 Turbo Engine,2019,0.0,True,"['Automatic transmission', 'Parking camera', '...",56445.20,0.0
J5,V6 engine,2019,0.0,True,"['Twilight sensor', 'Digital panel', 'Alloy wh...",53183.38,0.0
A3,Motor 1.0 8v,2019,0.0,True,"['4 X 4', 'Autopilot', 'Multimedia Center', 'P...",88552.39,0.0
Series 1 M,V8 engine,2019,0.0,True,"['Stability control', 'Multimedia center', 'Di...",94564.40,0.0
...,...,...,...,...,...,...,...
Lamborghini Reventón,Motor 4.0 Turbo,2019,0.0,True,"['Traction control', 'Air conditioning', 'Mult...",67664.86,0.0
Benni Mini,V8 engine,2019,0.0,True,"['Twilight sensor', 'Automatic transmission', ...",126247.84,0.0
Uno,Motor Diesel V6,2019,0.0,True,"['Multimedia center', 'Twilight sensor', 'Stab...",128852.21,0.0
Santa Fe,Motor 3.0 32v,2019,0.0,True,"['Electric locks', 'Air conditioning', '4 X 4'...",129415.33,0.0


In [52]:
dataset = pd.read_csv('db.csv', sep = ';')

In [53]:
dataset

Unnamed: 0,Name,Engine,Year,Mileage,Zero_km,Accessories,Value
0,Jetta Variant,4.0 Turbo Engine,2003,44410.0,False,"['Alloy wheels', 'Power locks', 'Autopilot', '...",88078.64
1,Passat,Diesel Engine,1991,5712.0,False,"['Multimedia Center', 'Panoramic Roof', 'ABS B...",106161.94
2,Crossfox,V8 diesel engine,1990,37123.0,False,"['Autopilot', 'Stability control', 'Twilight s...",72832.16
3,DS5,Motor 2.4 Turbo,2019,,True,"['Power locks', '4 X 4', 'Power windows', 'Twi...",124549.07
4,Aston Martin DB4,2.4 Turbo Engine,2006,25757.0,False,"['Alloy wheels', '4 X 4', 'Multimedia center',...",92612.10
...,...,...,...,...,...,...,...
253,Phantom 2013,V8 engine,2014,27505.0,False,"['Stability control', 'Autopilot', 'Automatic ...",51759.58
254,Cadillac Ciel concept,Motor V8,1991,29981.0,False,"['Leather seats', 'Digital panel', 'Rain senso...",51667.06
255,GLK class,5.0 V8 Bi-Turbo engine,2002,52637.0,False,"['Alloy wheels', 'Traction control', 'Automati...",68934.03
256,Aston Martin DB5,Diesel Engine,1996,7685.0,False,"['Air conditioning', '4 X 4', 'Automatic trans...",122110.90


In [54]:
dataset.dropna(subset = ['Mileage'], inplace = True)

In [55]:
dataset

Unnamed: 0,Name,Engine,Year,Mileage,Zero_km,Accessories,Value
0,Jetta Variant,4.0 Turbo Engine,2003,44410.0,False,"['Alloy wheels', 'Power locks', 'Autopilot', '...",88078.64
1,Passat,Diesel Engine,1991,5712.0,False,"['Multimedia Center', 'Panoramic Roof', 'ABS B...",106161.94
2,Crossfox,V8 diesel engine,1990,37123.0,False,"['Autopilot', 'Stability control', 'Twilight s...",72832.16
4,Aston Martin DB4,2.4 Turbo Engine,2006,25757.0,False,"['Alloy wheels', '4 X 4', 'Multimedia center',...",92612.10
5,Palio Weekend,Motor 1.8 16v,2012,10728.0,False,"['Parking sensor', 'Panoramic roof', 'Twilight...",97497.73
...,...,...,...,...,...,...,...
253,Phantom 2013,V8 engine,2014,27505.0,False,"['Stability control', 'Autopilot', 'Automatic ...",51759.58
254,Cadillac Ciel concept,Motor V8,1991,29981.0,False,"['Leather seats', 'Digital panel', 'Rain senso...",51667.06
255,GLK class,5.0 V8 Bi-Turbo engine,2002,52637.0,False,"['Alloy wheels', 'Traction control', 'Automati...",68934.03
256,Aston Martin DB5,Diesel Engine,1996,7685.0,False,"['Air conditioning', '4 X 4', 'Automatic trans...",122110.90
