# Pandas Exploring Data by Filtering and Grouping

## Let's remember cars dataset:
The dataset contains the information of 10.000 cars. There are 9 different columns:
- Make (Car brand, example: Ford)
- Model (The Model of the Car, example: Focus)
- Year (The Year in which the car was build, example: 2012)
- Variant (The car model version showing the PS, example: 1.6 Trendline)
- Kms (The kilometers the cars has been driven, example: 90000)
- Price (The offered price for the car, example: 10000)
- Doors (How many doors the car has, example: 4)
- Kind (Type of car, example: Pick-Up)
- Location (Where the car is located, example: Buenos Aires)



## Recall Dataset

In [2]:
import pandas as pd

In [3]:
#reading data
cars = pd.read_csv("https://raw.githubusercontent.com/juliandnl/redi_ss20/master/cars.csv")

In [4]:
cars.head()

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
0,Volkswagen,Vento,2012,2.5 Luxury 170cv,99950,360000,4.0,Sedán,Córdoba
1,Ford,Ranger,2012,2.3 Cd Xl Plus 4x2,140000,320000,2.0,Pick-Up,Entre Ríos
2,Volkswagen,Fox,2011,1.6 Trendline,132000,209980,5.0,Hatchback,Bs.as. G.b.a. Sur
3,Ford,Ranger,2017,3.2 Cd Xls Tdci 200cv Automática,13000,798000,4.0,Pick-Up,Neuquén
4,Volkswagen,Gol,2013,1.4 Power 83cv 3 p,107000,146000,3.0,Hatchback,Córdoba


In [5]:
cars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 9 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Make      10000 non-null  object 
 1   Model     10000 non-null  object 
 2   Year      10000 non-null  int64  
 3   Variant   10000 non-null  object 
 4   Kms       10000 non-null  int64  
 5   Price     10000 non-null  int64  
 6   Doors     10000 non-null  float64
 7   Kind      10000 non-null  object 
 8   Location  10000 non-null  object 
dtypes: float64(1), int64(3), object(5)
memory usage: 703.2+ KB


### The Cars-Data Set consists of:

*   9 columns, 10.000 rows
*   Doors is a float (meaning a floating point number) vs. year being an integer

**💡 When you import data into a Pandas DataFrame, Pandas by default tries to know the data types of each column. The columns with text are by default marked as Object datatype**.💡

In [6]:
cars.nunique(0)
# number of unique values per column

Make           5
Model         62
Year           8
Variant      592
Kms         1201
Price        948
Doors          4
Kind          10
Location      28
dtype: int64


## Task

**Familiarize yourself with the dataset**: When you first start to work with a dataset, it's crucial to explore and understand data.

1. Checkars is particularly interested in cars of t*he make Ford  produced in or after 2016.* How many Ford cars from 2016 are in the dataset?
2. We would like to analyze cars based on *price and km by brands*. How expensive are the car makes in general? Which car make is the cheapest one? Which car make has on average the most kms?
3. What are the *average prices by brands and by the number of doors* that cars have?
4. How many *unique models* does each brand have?
5. Which kind of cars is the *most expensive and most popular*?

Bonus:
6. What are the *min, max and average production years* for Ford cars?


## 1.How many Fords from 2016 are in the dataset?

During this exploration, you might want to check a subset of data. For example, you would like to have a subset for just 4 columns namely, Make, Model, Year, Location.



In [7]:
cars[['Make', 'Model', 'Year', 'Location']].head(5)

Unnamed: 0,Make,Model,Year,Location
0,Volkswagen,Vento,2012,Córdoba
1,Ford,Ranger,2012,Entre Ríos
2,Volkswagen,Fox,2011,Bs.as. G.b.a. Sur
3,Ford,Ranger,2017,Neuquén
4,Volkswagen,Gol,2013,Córdoba


There are several ways to filter data from a dataframe in Pandas. We'll cover the 3 of them.

1. Boolean Logic and Logical operators
**2. Query - HIDE?**
3. Retrieving data through indexing with .loc()


## 1. Logical Operators


In [14]:
# 1. Get all cars made in 2016 or later
# returns TRUE for all rows, where Year >= 2016
cars['Year'] >= 2016

0       False
1       False
2       False
3        True
4       False
        ...  
9995     True
9996    False
9997    False
9998    False
9999    False
Name: Year, Length: 10000, dtype: bool

In [15]:
# returns rows across all columns in the dataframe cars, where the condition cars['Year'] >= 2016 is TRUE
cars[cars['Year'] >= 2016]

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
3,Ford,Ranger,2017,3.2 Cd Xls Tdci 200cv Automática,13000,798000,4.0,Pick-Up,Neuquén
8,Ford,Focus III,2017,1.6 S,17500,367000,5.0,Hatchback,Buenos Aires Interior
29,Volkswagen,Saveiro,2017,1.6 Gp Ce 101cv Safety + Pack High,20000,370000,2.0,Pick-Up,Córdoba
35,Ford,Focus III,2016,2.0 Sedan Se Plus Mt,15022,550000,4.0,Sedán,Bs.as. G.b.a. Sur
49,Ford,Focus III,2016,1.6 S,39000,454800,5.0,Hatchback,Bs.as. G.b.a. Sur
...,...,...,...,...,...,...,...,...,...
9985,Volkswagen,Gol Trend,2016,1.6 Trendline 101cv,54100,270000,5.0,Hatchback,Buenos Aires Interior
9990,Ford,Focus III,2016,2.0 Se Plus Mt,48000,450000,5.0,Hatchback,Bs.as. G.b.a. Sur
9991,Ford,Ranger,2016,3.2 Cd 4x2 Xlt At Tdci 200cv,59000,790000,4.0,Pick-Up,Bs.as. G.b.a. Oeste
9994,Ford,Focus III,2016,2.0 Se,18000,370000,5.0,Hatchback,Capital Federal


In [17]:
# 2. Get only Ford cars
cars[cars['Make'] == 'Ford']

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
1,Ford,Ranger,2012,2.3 Cd Xl Plus 4x2,140000,320000,2.0,Pick-Up,Entre Ríos
3,Ford,Ranger,2017,3.2 Cd Xls Tdci 200cv Automática,13000,798000,4.0,Pick-Up,Neuquén
6,Ford,Ka,2012,1.0 Fly Viral 63cv,95243,142000,3.0,Hatchback,Bs.as. G.b.a. Sur
7,Ford,Ka,2012,1.6 Fly Viral 95cv,110000,148000,3.0,Hatchback,Buenos Aires Interior
8,Ford,Focus III,2017,1.6 S,17500,367000,5.0,Hatchback,Buenos Aires Interior
...,...,...,...,...,...,...,...,...,...
9994,Ford,Focus III,2016,2.0 Se,18000,370000,5.0,Hatchback,Capital Federal
9995,Ford,Focus III,2016,2.0 Se,67000,399000,5.0,Hatchback,Bs.as. G.b.a. Oeste
9997,Ford,Fiesta Kinetic Design,2012,1.6 Design 120cv Titanium,89000,250000,5.0,Hatchback,Tucumán
9998,Ford,Fiesta Kinetic Design,2013,1.6 Design 120cv Titanium,76000,295000,5.0,Hatchback,Buenos Aires Interior


In [20]:
# Combine Ford & >= 2016
cars[(cars['Make'] == 'Ford') & (cars['Year'] >= 2016)]

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
1,Ford,Ranger,2012,2.3 Cd Xl Plus 4x2,140000,320000,2.0,Pick-Up,Entre Ríos
3,Ford,Ranger,2017,3.2 Cd Xls Tdci 200cv Automática,13000,798000,4.0,Pick-Up,Neuquén
6,Ford,Ka,2012,1.0 Fly Viral 63cv,95243,142000,3.0,Hatchback,Bs.as. G.b.a. Sur
7,Ford,Ka,2012,1.6 Fly Viral 95cv,110000,148000,3.0,Hatchback,Buenos Aires Interior
8,Ford,Focus III,2017,1.6 S,17500,367000,5.0,Hatchback,Buenos Aires Interior
...,...,...,...,...,...,...,...,...,...
9994,Ford,Focus III,2016,2.0 Se,18000,370000,5.0,Hatchback,Capital Federal
9995,Ford,Focus III,2016,2.0 Se,67000,399000,5.0,Hatchback,Bs.as. G.b.a. Oeste
9997,Ford,Fiesta Kinetic Design,2012,1.6 Design 120cv Titanium,89000,250000,5.0,Hatchback,Tucumán
9998,Ford,Fiesta Kinetic Design,2013,1.6 Design 120cv Titanium,76000,295000,5.0,Hatchback,Buenos Aires Interior


In [31]:
# does also work with 'OR'
cars[(cars['Make'] == 'Volkswagen') | (cars['Year'] >= 2015)]

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
0,Volkswagen,Vento,2012,2.5 Luxury 170cv,99950,360000,4.0,Sedán,Córdoba
2,Volkswagen,Fox,2011,1.6 Trendline,132000,209980,5.0,Hatchback,Bs.as. G.b.a. Sur
3,Ford,Ranger,2017,3.2 Cd Xls Tdci 200cv Automática,13000,798000,4.0,Pick-Up,Neuquén
4,Volkswagen,Gol,2013,1.4 Power 83cv 3 p,107000,146000,3.0,Hatchback,Córdoba
5,Volkswagen,Amarok,2014,2.0 Cd Tdi 180cv 4x4 Highline C34,115000,790000,4.0,Pick-Up,Buenos Aires Interior
...,...,...,...,...,...,...,...,...,...
9992,Volkswagen,Suran,2011,1.6 Imotion Trendline 11b,80000,190000,5.0,Monovolumen,Bs.as. G.b.a. Norte
9993,Ford,Ecosport,2015,2.0 Titanium 143cv 4x2,50000,490000,5.0,SUV,Córdoba
9994,Ford,Focus III,2016,2.0 Se,18000,370000,5.0,Hatchback,Capital Federal
9995,Ford,Focus III,2016,2.0 Se,67000,399000,5.0,Hatchback,Bs.as. G.b.a. Oeste


## 2. `query()`
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html


In [32]:
cars.query('Make == "Ford"')

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
1,Ford,Ranger,2012,2.3 Cd Xl Plus 4x2,140000,320000,2.0,Pick-Up,Entre Ríos
3,Ford,Ranger,2017,3.2 Cd Xls Tdci 200cv Automática,13000,798000,4.0,Pick-Up,Neuquén
6,Ford,Ka,2012,1.0 Fly Viral 63cv,95243,142000,3.0,Hatchback,Bs.as. G.b.a. Sur
7,Ford,Ka,2012,1.6 Fly Viral 95cv,110000,148000,3.0,Hatchback,Buenos Aires Interior
8,Ford,Focus III,2017,1.6 S,17500,367000,5.0,Hatchback,Buenos Aires Interior
...,...,...,...,...,...,...,...,...,...
9994,Ford,Focus III,2016,2.0 Se,18000,370000,5.0,Hatchback,Capital Federal
9995,Ford,Focus III,2016,2.0 Se,67000,399000,5.0,Hatchback,Bs.as. G.b.a. Oeste
9997,Ford,Fiesta Kinetic Design,2012,1.6 Design 120cv Titanium,89000,250000,5.0,Hatchback,Tucumán
9998,Ford,Fiesta Kinetic Design,2013,1.6 Design 120cv Titanium,76000,295000,5.0,Hatchback,Buenos Aires Interior


In [33]:
cars.query('Make == "Ford" & Year >=2016')

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
3,Ford,Ranger,2017,3.2 Cd Xls Tdci 200cv Automática,13000,798000,4.0,Pick-Up,Neuquén
8,Ford,Focus III,2017,1.6 S,17500,367000,5.0,Hatchback,Buenos Aires Interior
35,Ford,Focus III,2016,2.0 Sedan Se Plus Mt,15022,550000,4.0,Sedán,Bs.as. G.b.a. Sur
49,Ford,Focus III,2016,1.6 S,39000,454800,5.0,Hatchback,Bs.as. G.b.a. Sur
57,Ford,Mondeo,2018,2.0 Se Ecoboost At 240cv,15000,1200000,4.0,Sedán,Entre Ríos
...,...,...,...,...,...,...,...,...,...
9949,Ford,Focus III,2017,1.6 S,10000,530000,5.0,Hatchback,Bs.as. G.b.a. Oeste
9990,Ford,Focus III,2016,2.0 Se Plus Mt,48000,450000,5.0,Hatchback,Bs.as. G.b.a. Sur
9991,Ford,Ranger,2016,3.2 Cd 4x2 Xlt At Tdci 200cv,59000,790000,4.0,Pick-Up,Bs.as. G.b.a. Oeste
9994,Ford,Focus III,2016,2.0 Se,18000,370000,5.0,Hatchback,Capital Federal


## 3. Indexing with `.loc()` 
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html




In [34]:
cars.loc[cars['Year'] >= 2016]


Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
3,Ford,Ranger,2017,3.2 Cd Xls Tdci 200cv Automática,13000,798000,4.0,Pick-Up,Neuquén
8,Ford,Focus III,2017,1.6 S,17500,367000,5.0,Hatchback,Buenos Aires Interior
29,Volkswagen,Saveiro,2017,1.6 Gp Ce 101cv Safety + Pack High,20000,370000,2.0,Pick-Up,Córdoba
35,Ford,Focus III,2016,2.0 Sedan Se Plus Mt,15022,550000,4.0,Sedán,Bs.as. G.b.a. Sur
49,Ford,Focus III,2016,1.6 S,39000,454800,5.0,Hatchback,Bs.as. G.b.a. Sur
...,...,...,...,...,...,...,...,...,...
9985,Volkswagen,Gol Trend,2016,1.6 Trendline 101cv,54100,270000,5.0,Hatchback,Buenos Aires Interior
9990,Ford,Focus III,2016,2.0 Se Plus Mt,48000,450000,5.0,Hatchback,Bs.as. G.b.a. Sur
9991,Ford,Ranger,2016,3.2 Cd 4x2 Xlt At Tdci 200cv,59000,790000,4.0,Pick-Up,Bs.as. G.b.a. Oeste
9994,Ford,Focus III,2016,2.0 Se,18000,370000,5.0,Hatchback,Capital Federal


In [15]:
# To answer our question:
cars.loc[(cars['Make'] == 'Ford') & (cars['Year'] >=2016)]

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
3,Ford,Ranger,2017,3.2 Cd Xls Tdci 200cv Automática,13000,798000,4.0,Pick-Up,Neuquén
8,Ford,Focus III,2017,1.6 S,17500,367000,5.0,Hatchback,Buenos Aires Interior
35,Ford,Focus III,2016,2.0 Sedan Se Plus Mt,15022,550000,4.0,Sedán,Bs.as. G.b.a. Sur
49,Ford,Focus III,2016,1.6 S,39000,454800,5.0,Hatchback,Bs.as. G.b.a. Sur
57,Ford,Mondeo,2018,2.0 Se Ecoboost At 240cv,15000,1200000,4.0,Sedán,Entre Ríos
...,...,...,...,...,...,...,...,...,...
9949,Ford,Focus III,2017,1.6 S,10000,530000,5.0,Hatchback,Bs.as. G.b.a. Oeste
9990,Ford,Focus III,2016,2.0 Se Plus Mt,48000,450000,5.0,Hatchback,Bs.as. G.b.a. Sur
9991,Ford,Ranger,2016,3.2 Cd 4x2 Xlt At Tdci 200cv,59000,790000,4.0,Pick-Up,Bs.as. G.b.a. Oeste
9994,Ford,Focus III,2016,2.0 Se,18000,370000,5.0,Hatchback,Capital Federal


In [16]:
cars.loc[(cars['Year'] <=2018) & (cars['Year'] >=2016)]

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
3,Ford,Ranger,2017,3.2 Cd Xls Tdci 200cv Automática,13000,798000,4.0,Pick-Up,Neuquén
8,Ford,Focus III,2017,1.6 S,17500,367000,5.0,Hatchback,Buenos Aires Interior
29,Volkswagen,Saveiro,2017,1.6 Gp Ce 101cv Safety + Pack High,20000,370000,2.0,Pick-Up,Córdoba
35,Ford,Focus III,2016,2.0 Sedan Se Plus Mt,15022,550000,4.0,Sedán,Bs.as. G.b.a. Sur
49,Ford,Focus III,2016,1.6 S,39000,454800,5.0,Hatchback,Bs.as. G.b.a. Sur
...,...,...,...,...,...,...,...,...,...
9985,Volkswagen,Gol Trend,2016,1.6 Trendline 101cv,54100,270000,5.0,Hatchback,Buenos Aires Interior
9990,Ford,Focus III,2016,2.0 Se Plus Mt,48000,450000,5.0,Hatchback,Bs.as. G.b.a. Sur
9991,Ford,Ranger,2016,3.2 Cd 4x2 Xlt At Tdci 200cv,59000,790000,4.0,Pick-Up,Bs.as. G.b.a. Oeste
9994,Ford,Focus III,2016,2.0 Se,18000,370000,5.0,Hatchback,Capital Federal


In [17]:
cars.loc[(cars['Year'] <=2018) | (cars['Year'] >=2016)]

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
0,Volkswagen,Vento,2012,2.5 Luxury 170cv,99950,360000,4.0,Sedán,Córdoba
1,Ford,Ranger,2012,2.3 Cd Xl Plus 4x2,140000,320000,2.0,Pick-Up,Entre Ríos
2,Volkswagen,Fox,2011,1.6 Trendline,132000,209980,5.0,Hatchback,Bs.as. G.b.a. Sur
3,Ford,Ranger,2017,3.2 Cd Xls Tdci 200cv Automática,13000,798000,4.0,Pick-Up,Neuquén
4,Volkswagen,Gol,2013,1.4 Power 83cv 3 p,107000,146000,3.0,Hatchback,Córdoba
...,...,...,...,...,...,...,...,...,...
9995,Ford,Focus III,2016,2.0 Se,67000,399000,5.0,Hatchback,Bs.as. G.b.a. Oeste
9996,Volkswagen,Bora,2012,1.9 Trendline I 100cv,120000,240000,4.0,Sedán,Buenos Aires Interior
9997,Ford,Fiesta Kinetic Design,2012,1.6 Design 120cv Titanium,89000,250000,5.0,Hatchback,Tucumán
9998,Ford,Fiesta Kinetic Design,2013,1.6 Design 120cv Titanium,76000,295000,5.0,Hatchback,Buenos Aires Interior


### Conclusion: 
There are 631 Ford cars produced in or after 2016 in our dataset.

## 2.Let's analyze cars based on price and kms by brands. 
 

1. How expensive are the car makes in general?
2. Which car make is the cheapest one? 
3. Which car make has on average the most Kms?

The `groupby` function is one of the most used function of the pandas library. In addition to filtering, grouping data during the exploration is quite useful.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html

In [20]:
# Mean price over all cars & brands:
cars.Price.mean()

349545.4545

In [22]:
# Mean price per car brand:
print(cars.groupby('Make')[['Price']].mean())
print(cars.groupby('Make')[['Price']].mean().round(1).sort_values('Price', ascending= False))

                       Price
Make                        
Chrysler       366845.833333
Ford           353041.877764
Honda          381520.603448
Mercedes Benz  512749.515625
Volkswagen     335441.534119
                  Price
Make                   
Mercedes Benz  512749.5
Honda          381520.6
Chrysler       366845.8
Ford           353041.9
Volkswagen     335441.5


We can also add another column to the groups. For example, what is the average kms by each make?

In [23]:
cars.groupby('Make')[['Price','Kms']].mean().round(1)

Unnamed: 0_level_0,Price,Kms
Make,Unnamed: 1_level_1,Unnamed: 2_level_1
Chrysler,366845.8,79594.0
Ford,353041.9,69897.5
Honda,381520.6,81619.7
Mercedes Benz,512749.5,67358.7
Volkswagen,335441.5,77660.2


### Conclusion:
- The most expensive car make is Mercedes Benz
- Volkswagen is the cheapest car make.
- The most used cars in Honda make (The highest km)

## 3.What are the average prices by brands and by the number of doors that cars have?

In [24]:
cars.groupby(['Make','Doors'])[['Price']].mean().round(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,Price
Make,Doors,Unnamed: 2_level_1
Chrysler,5.0,366845.83
Ford,2.0,481503.91
Ford,3.0,158703.58
Ford,4.0,455984.9
Ford,5.0,331894.57
Honda,3.0,26000.0
Honda,4.0,323781.2
Honda,5.0,431716.49
Mercedes Benz,2.0,574315.37
Mercedes Benz,4.0,497913.05


### Conclusion:
- For Volkswagen, the most expensive cars have 4 doors.
- For Mercedes, the most expensive cars have 2 doors.

## 4.How many unique models does have each brand?

In [26]:
cars.groupby('Make')[['Model']].nunique()

Unnamed: 0_level_0,Model
Make,Unnamed: 1_level_1
Chrysler,2
Ford,11
Honda,6
Mercedes Benz,15
Volkswagen,28


### Conclusion
- There are 28 different models in Volkswagen.
- The least variety is in Chrysler: only 2 models.

## 5.Which kind of cars is the most expensive and most popular?

The `agg()` method allows you to perform specific aggregation(s) on specific columns.https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.agg.html

In [29]:
print(cars.groupby('Kind')[['Price']].agg( ['mean','count']).round(1))
print(cars.groupby('Kind')[['Price']].mean().round(1))

                Price      
                 mean count
Kind                       
Cabriolet     79272.6    11
Coupé        543092.2    90
Furgón       702054.7    53
Hatchback    276554.8  4056
Minivan      650000.0     1
Monovolumen  270500.2   705
Pick-Up      545194.6  1547
Rural        330247.1    59
SUV          380708.0  1305
Sedán        338569.6  2173
                Price
Kind                 
Cabriolet     79272.6
Coupé        543092.2
Furgón       702054.7
Hatchback    276554.8
Minivan      650000.0
Monovolumen  270500.2
Pick-Up      545194.6
Rural        330247.1
SUV          380708.0
Sedán        338569.6


### Conclusion
- The most popular kind is Hatchback and then Sedan.
- The most expensive kind is Furgon, while the cheapest one is Cabriolet.

## BONUS: What are the min, max and average production year for Ford cars?

In [30]:
ford_cars = cars[cars['Make'] == 'Ford']
ford_cars['Year'].agg(['mean','min','max']).round(1)

mean    2013.5
min     2011.0
max     2018.0
Name: Year, dtype: float64

In [None]:
#cars[cars['Make'] == 'Ford']['Year'].agg(['mean','min','max']).round(1)

## Further Materials

Useful aggregation functions:
- `mean()`
- `median()`
- `var()`
- `std()`
- `min()`
- `max()`
- `count()`
- `sum()`
- `nunique()`

### Sorting Data

Let's say, we would like to see the least used cars (in terms of kms).

We can use the `sort_values()` for this purpose. (`sort_values()` sorts ascending by deafult.)



In [31]:
cars.sort_values('Kms').head(5)

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
5665,Volkswagen,Saveiro,2017,1.6 Gp Cs 101cv Safety,5094,320000,2.0,Pick-Up,Bs.as. G.b.a. Sur
8673,Ford,Ecosport,2017,1.6 S 110cv 4x2,5100,390000,5.0,SUV,Bs.as. G.b.a. Sur
8828,Volkswagen,Gol Trend,2017,1.6 Trendline 101cv 3p,5500,330000,3.0,Hatchback,Bs.as. G.b.a. Sur
2461,Ford,Focus III,2017,2.0 Se,5500,480000,5.0,Hatchback,Capital Federal
7910,Volkswagen,Saveiro,2017,1.6 Gp Cs 101cv Safety,5600,360000,2.0,Pick-Up,Bs.as. Costa Atlántica


We can modify the order of sort_values if we would like to see the most used cars.

In [32]:
cars.sort_values('Kms', ascending=False).head(5)

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
5204,Ford,Ranger,2011,3.0 Cd Xl 4x2,218000,300000,2.0,Pick-Up,Bs.as. G.b.a. Sur
5793,Ford,Ranger,2011,3.0 Cd Superduty 4x4,215000,385000,2.0,Pick-Up,Bs.as. G.b.a. Sur
7369,Ford,Ranger,2011,3.0 Cd Xl Plus 4x2,215000,310000,2.0,Pick-Up,Córdoba
4029,Volkswagen,Amarok,2011,2.0 Cd Tdi 4x2 Trendline Ll17 1t8,214000,400000,4.0,Pick-Up,Chubut
2049,Volkswagen,Amarok,2011,2.0 Cd Tdi 4x4 Highline Pack 1hp,209898,550000,4.0,Pick-Up,Bs.as. G.b.a. Norte


**NOTE**: Using the `sort_values()` method does not overwrite the object. It remains unaltered after applying it the `sort_values()` function

If you want to overwrite the object with the new values ordered there are 2 ways to do it:


1.   Assign to `cars` the value `cars.sort_values()`




In [33]:
sorted_cars = cars.sort_values('Kms')
sorted_cars.head(5)

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
5665,Volkswagen,Saveiro,2017,1.6 Gp Cs 101cv Safety,5094,320000,2.0,Pick-Up,Bs.as. G.b.a. Sur
8673,Ford,Ecosport,2017,1.6 S 110cv 4x2,5100,390000,5.0,SUV,Bs.as. G.b.a. Sur
8828,Volkswagen,Gol Trend,2017,1.6 Trendline 101cv 3p,5500,330000,3.0,Hatchback,Bs.as. G.b.a. Sur
2461,Ford,Focus III,2017,2.0 Se,5500,480000,5.0,Hatchback,Capital Federal
7910,Volkswagen,Saveiro,2017,1.6 Gp Cs 101cv Safety,5600,360000,2.0,Pick-Up,Bs.as. Costa Atlántica


2.   Use the `inplace` `parameter` of `.sort_values()`

In [34]:
cars.sort_values('Kms', ascending=False, inplace = True)
cars.head(5)

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
5204,Ford,Ranger,2011,3.0 Cd Xl 4x2,218000,300000,2.0,Pick-Up,Bs.as. G.b.a. Sur
5793,Ford,Ranger,2011,3.0 Cd Superduty 4x4,215000,385000,2.0,Pick-Up,Bs.as. G.b.a. Sur
7369,Ford,Ranger,2011,3.0 Cd Xl Plus 4x2,215000,310000,2.0,Pick-Up,Córdoba
4029,Volkswagen,Amarok,2011,2.0 Cd Tdi 4x2 Trendline Ll17 1t8,214000,400000,4.0,Pick-Up,Chubut
2049,Volkswagen,Amarok,2011,2.0 Cd Tdi 4x4 Highline Pack 1hp,209898,550000,4.0,Pick-Up,Bs.as. G.b.a. Norte


Another way to sort data in Pandas is using sort_index(). As its name suggests this statement is used for sorting indexes. 

In [35]:
cars.sort_index()

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
0,Volkswagen,Vento,2012,2.5 Luxury 170cv,99950,360000,4.0,Sedán,Córdoba
1,Ford,Ranger,2012,2.3 Cd Xl Plus 4x2,140000,320000,2.0,Pick-Up,Entre Ríos
2,Volkswagen,Fox,2011,1.6 Trendline,132000,209980,5.0,Hatchback,Bs.as. G.b.a. Sur
3,Ford,Ranger,2017,3.2 Cd Xls Tdci 200cv Automática,13000,798000,4.0,Pick-Up,Neuquén
4,Volkswagen,Gol,2013,1.4 Power 83cv 3 p,107000,146000,3.0,Hatchback,Córdoba
...,...,...,...,...,...,...,...,...,...
9995,Ford,Focus III,2016,2.0 Se,67000,399000,5.0,Hatchback,Bs.as. G.b.a. Oeste
9996,Volkswagen,Bora,2012,1.9 Trendline I 100cv,120000,240000,4.0,Sedán,Buenos Aires Interior
9997,Ford,Fiesta Kinetic Design,2012,1.6 Design 120cv Titanium,89000,250000,5.0,Hatchback,Tucumán
9998,Ford,Fiesta Kinetic Design,2013,1.6 Design 120cv Titanium,76000,295000,5.0,Hatchback,Buenos Aires Interior


In [36]:
cars.sort_index(inplace= True) 
#cars.sort_index(ascending = False)
cars

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
0,Volkswagen,Vento,2012,2.5 Luxury 170cv,99950,360000,4.0,Sedán,Córdoba
1,Ford,Ranger,2012,2.3 Cd Xl Plus 4x2,140000,320000,2.0,Pick-Up,Entre Ríos
2,Volkswagen,Fox,2011,1.6 Trendline,132000,209980,5.0,Hatchback,Bs.as. G.b.a. Sur
3,Ford,Ranger,2017,3.2 Cd Xls Tdci 200cv Automática,13000,798000,4.0,Pick-Up,Neuquén
4,Volkswagen,Gol,2013,1.4 Power 83cv 3 p,107000,146000,3.0,Hatchback,Córdoba
...,...,...,...,...,...,...,...,...,...
9995,Ford,Focus III,2016,2.0 Se,67000,399000,5.0,Hatchback,Bs.as. G.b.a. Oeste
9996,Volkswagen,Bora,2012,1.9 Trendline I 100cv,120000,240000,4.0,Sedán,Buenos Aires Interior
9997,Ford,Fiesta Kinetic Design,2012,1.6 Design 120cv Titanium,89000,250000,5.0,Hatchback,Tucumán
9998,Ford,Fiesta Kinetic Design,2013,1.6 Design 120cv Titanium,76000,295000,5.0,Hatchback,Buenos Aires Interior


### Exercise
Find which **10 cars** has the highest prices. Please use `sort_values()` and `head()`.
While finding the top ten expensive cars overwrite the dataset and then return back to original sorting by using `sort_index()`.

In [37]:
cars.head()

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
0,Volkswagen,Vento,2012,2.5 Luxury 170cv,99950,360000,4.0,Sedán,Córdoba
1,Ford,Ranger,2012,2.3 Cd Xl Plus 4x2,140000,320000,2.0,Pick-Up,Entre Ríos
2,Volkswagen,Fox,2011,1.6 Trendline,132000,209980,5.0,Hatchback,Bs.as. G.b.a. Sur
3,Ford,Ranger,2017,3.2 Cd Xls Tdci 200cv Automática,13000,798000,4.0,Pick-Up,Neuquén
4,Volkswagen,Gol,2013,1.4 Power 83cv 3 p,107000,146000,3.0,Hatchback,Córdoba


In [38]:
cars.sort_values('Price', ascending=False).head(10)

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
8007,Mercedes Benz,Clase GLC,2017,2.0 Glc250 300 4matic Atomático,18000,2800000,5.0,SUV,Bs.as. G.b.a. Sur
2632,Ford,Mustang,2018,5.0 Gt 421cv,7000,2350000,2.0,Coupé,Bs.as. G.b.a. Norte
9520,Mercedes Benz,Clase E,2014,3.5 E350 Avantgarde Sport B.eff At,34000,1725000,4.0,Sedán,Córdoba
5404,Mercedes Benz,ML,2013,3.5 Ml350 4matic Sport B.efficiency,72000,1499990,5.0,SUV,Capital Federal
2032,Mercedes Benz,Clase E,2014,3.5 E350 Avantgarde Sport B.eff At,59000,1390000,4.0,Sedán,Capital Federal
7897,Volkswagen,Tiguan Allspace,2018,1.4 Tsi Trendline 150cv Dsg,10000,1310000,5.0,SUV,Bs.as. G.b.a. Oeste
5831,Mercedes Benz,ML,2011,3.5 Ml350 4matic Sport Facelift,31700,1200000,5.0,SUV,Neuquén
57,Ford,Mondeo,2018,2.0 Se Ecoboost At 240cv,15000,1200000,4.0,Sedán,Entre Ríos
6434,Mercedes Benz,Clase A,2014,2.0 A 250 At Sport B.efficiency,48255,1170000,5.0,Hatchback,Bs.as. G.b.a. Norte
9187,Mercedes Benz,Clase A,2014,2.0 A 250 At Sport B.efficiency,48455,1170000,5.0,Hatchback,Bs.as. G.b.a. Oeste


In [39]:
cars.sort_values('Price', ascending=False, inplace = True) 
#or cars = cars.sort_values('Price', ascending=False).head(10)
cars.head(10)

Unnamed: 0,Make,Model,Year,Variant,Kms,Price,Doors,Kind,Location
8007,Mercedes Benz,Clase GLC,2017,2.0 Glc250 300 4matic Atomático,18000,2800000,5.0,SUV,Bs.as. G.b.a. Sur
2632,Ford,Mustang,2018,5.0 Gt 421cv,7000,2350000,2.0,Coupé,Bs.as. G.b.a. Norte
9520,Mercedes Benz,Clase E,2014,3.5 E350 Avantgarde Sport B.eff At,34000,1725000,4.0,Sedán,Córdoba
5404,Mercedes Benz,ML,2013,3.5 Ml350 4matic Sport B.efficiency,72000,1499990,5.0,SUV,Capital Federal
2032,Mercedes Benz,Clase E,2014,3.5 E350 Avantgarde Sport B.eff At,59000,1390000,4.0,Sedán,Capital Federal
7897,Volkswagen,Tiguan Allspace,2018,1.4 Tsi Trendline 150cv Dsg,10000,1310000,5.0,SUV,Bs.as. G.b.a. Oeste
5831,Mercedes Benz,ML,2011,3.5 Ml350 4matic Sport Facelift,31700,1200000,5.0,SUV,Neuquén
57,Ford,Mondeo,2018,2.0 Se Ecoboost At 240cv,15000,1200000,4.0,Sedán,Entre Ríos
6434,Mercedes Benz,Clase A,2014,2.0 A 250 At Sport B.efficiency,48255,1170000,5.0,Hatchback,Bs.as. G.b.a. Norte
9187,Mercedes Benz,Clase A,2014,2.0 A 250 At Sport B.efficiency,48455,1170000,5.0,Hatchback,Bs.as. G.b.a. Oeste
