---
# Data Science and Artificial Intelliegence Practicum
## 2.2-modul. Data Analysis: Pandas
---

## 2.2.5 - Arithmetics, Sorting, Ranking

In [1]:
import pandas as pd
import numpy as np

#### Arithmetic operators

- `add`, `radd` - Addition (+)
- `sub`, `rsub` - Subtraction (-)
- `mul`, `rmul` - Multiplication (*)
- `div`, `rdiv` - Floating division (/)
- `floordiv`, `rfloordiv` - Integer division (//)
- `mod` - Modulo (%)
- `pow`, `rpow` - Exponential power (\*\*)

All support to substitute a `fill_value` for missing data in one of the inputs.

### arithmetics with `Series`

In [2]:
s1 = pd.Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])
s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])

In [3]:
s1

a    7.3
c   -2.5
d    3.4
e    1.5
dtype: float64

In [4]:
s2

a   -2.1
c    3.6
e   -1.5
f    4.0
g    3.1
dtype: float64

If one object does not have an existing index in the other, the result will be `NaN`.

In [5]:
s1 + s2

a    5.2
c    1.1
d    NaN
e    0.0
f    NaN
g    NaN
dtype: float64

### arithmetics with `DataFrame`

In [6]:
df1 = pd.DataFrame(np.arange(9.).reshape(3,3), columns=list('bcd'), index=['Olma','Anor','Uzum'])
df2 = pd.DataFrame(np.arange(12.).reshape(4,3), columns=list('abc'), index=['Olma','Anor','Qovun','Anjir'])

In [7]:
df1

Unnamed: 0,b,c,d
Olma,0.0,1.0,2.0
Anor,3.0,4.0,5.0
Uzum,6.0,7.0,8.0


In [8]:
df2

Unnamed: 0,a,b,c
Olma,0.0,1.0,2.0
Anor,3.0,4.0,5.0
Qovun,6.0,7.0,8.0
Anjir,9.0,10.0,11.0


In [9]:
df1 + df2

Unnamed: 0,a,b,c,d
Anjir,,,,
Anor,,7.0,9.0,
Olma,,1.0,3.0,
Qovun,,,,
Uzum,,,,


#### `DataFrame.add`
Get Addition of dataframe and other, element-wise (binary operator add).

In [10]:
df1.add(df2)

Unnamed: 0,a,b,c,d
Anjir,,,,
Anor,,7.0,9.0,
Olma,,1.0,3.0,
Qovun,,,,
Uzum,,,,


##### `fill_value` : *float or None, default None*
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.


In [11]:
df1.add(df2, fill_value=0)

Unnamed: 0,a,b,c,d
Anjir,9.0,10.0,11.0,
Anor,3.0,7.0,9.0,5.0
Olma,0.0,1.0,3.0,2.0
Qovun,6.0,7.0,8.0,
Uzum,,6.0,7.0,8.0


#### `DataFrame.sub`
Get Subtraction of dataframe and other, element-wise (binary operator sub).

In [12]:
df1.sub(2)  # subtract 2 from each df element

Unnamed: 0,b,c,d
Olma,-2.0,-1.0,0.0
Anor,1.0,2.0,3.0
Uzum,4.0,5.0,6.0


In [13]:
df2.sub(df1)  # subtract df1 from df2

Unnamed: 0,a,b,c,d
Anjir,,,,
Anor,,1.0,1.0,
Olma,,1.0,1.0,
Qovun,,,,
Uzum,,,,


#### `DataFrame.rsub`
Get Subtraction of dataframe and other, element-wise (binary operator rsub).

In [14]:
df1.rsub(df2, fill_value=0) # subtract df1 from df2

Unnamed: 0,a,b,c,d
Anjir,9.0,10.0,11.0,
Anor,3.0,1.0,1.0,-5.0
Olma,0.0,1.0,1.0,-2.0
Qovun,6.0,7.0,8.0,
Uzum,,-6.0,-7.0,-8.0


### Arithmetics with `DataFrame` and `Series`

In [15]:
df = pd.DataFrame(np.arange(1,13.).reshape(3,4), index=list('abc'), columns=['Olma','Anor','Qovun','Anjir'])
df

Unnamed: 0,Olma,Anor,Qovun,Anjir
a,1.0,2.0,3.0,4.0
b,5.0,6.0,7.0,8.0
c,9.0,10.0,11.0,12.0


#### add `Series` to `DataFrame` by columns

In [16]:
obj = pd.Series([4, 3, 2, 1], index=df.columns)
obj

Olma     4
Anor     3
Qovun    2
Anjir    1
dtype: int64

In [17]:
df + obj

Unnamed: 0,Olma,Anor,Qovun,Anjir
a,5.0,5.0,5.0,5.0
b,9.0,9.0,9.0,9.0
c,13.0,13.0,13.0,13.0


#### add `Series` to `DataFrame` by rows

In [18]:
obj0 = pd.Series([4, 6, 8], index=df.index)
obj0

a    4
b    6
c    8
dtype: int64

In [21]:
df.add(obj0, axis='rows')

Unnamed: 0,Olma,Anor,Qovun,Anjir
a,5.0,6.0,7.0,8.0
b,11.0,12.0,13.0,14.0
c,17.0,18.0,19.0,20.0


In [22]:
df.add(obj0)  # default axis='columns'

Unnamed: 0,Anjir,Anor,Olma,Qovun,a,b,c
a,,,,,,,
b,,,,,,,
c,,,,,,,


In [23]:
df - obj['Olma']

Unnamed: 0,Olma,Anor,Qovun,Anjir
a,-3.0,-2.0,-1.0,0.0
b,1.0,2.0,3.0,4.0
c,5.0,6.0,7.0,8.0


In [24]:
df.sub(obj['Anor'])

Unnamed: 0,Olma,Anor,Qovun,Anjir
a,-2.0,-1.0,0.0,1.0
b,2.0,3.0,4.0,5.0
c,6.0,7.0,8.0,9.0


### applying function to `DataFrmae`

In [31]:
df = pd.DataFrame(np.random.randn(3,4),
                  index=list('abc'),
                  columns=['Olma','Anor','Qovun','Anjir']
                  )
df

Unnamed: 0,Olma,Anor,Qovun,Anjir
a,2.104853,-1.915365,0.783849,-0.627147
b,1.345325,-0.722568,0.76859,-0.508942
c,-1.015794,0.17049,-0.024809,-1.087766


#### apply function to each element in `df`

In [32]:
np.abs(df)

Unnamed: 0,Olma,Anor,Qovun,Anjir
a,2.104853,1.915365,0.783849,0.627147
b,1.345325,0.722568,0.76859,0.508942
c,1.015794,0.17049,0.024809,1.087766


In [33]:
np.round(df)

Unnamed: 0,Olma,Anor,Qovun,Anjir
a,2.0,-2.0,1.0,-1.0
b,1.0,-1.0,1.0,-1.0
c,-1.0,0.0,-0.0,-1.0


#### by `DataFrame` methods

In [35]:
df.mean()

Olma     0.811461
Anor    -0.822481
Qovun    0.509210
Anjir   -0.741285
dtype: float64

In [36]:
df.abs()

Unnamed: 0,Olma,Anor,Qovun,Anjir
a,2.104853,1.915365,0.783849,0.627147
b,1.345325,0.722568,0.76859,0.508942
c,1.015794,0.17049,0.024809,1.087766


#### `apply` - apply a function along an axis of the DataFrame

In [39]:
df.apply(np.mean)
df.applymap()

Olma     0.811461
Anor    -0.822481
Qovun    0.509210
Anjir   -0.741285
dtype: float64

In [46]:
df.apply(np.sum, axis='columns')

a    0.346189
b    0.882405
c   -1.957879
dtype: float64

In [48]:
# subtract min element from max
func = lambda x: x.max() - x.min()

In [60]:
df.apply(func, axis='rows')

Olma     3.120646
Anor     2.085855
Qovun    0.808658
Anjir    0.578823
dtype: float64

In [61]:
def foo(x):
    return pd.Series([x.min(), x.max()], index=['min', 'max'])

In [64]:
df.apply(foo)

Unnamed: 0,Olma,Anor,Qovun,Anjir
min,-1.015794,-1.915365,-0.024809,-1.087766
max,2.104853,0.17049,0.783849,-0.508942


#### `applymap` - apply a function to a Dataframe elementwise

In [66]:
df = pd.DataFrame(np.arange(1,13).reshape(3,4), index=list('abc'), columns=['Olma','Anor','Qovun','Anjir'])
df

Unnamed: 0,Olma,Anor,Qovun,Anjir
a,1,2,3,4
b,5,6,7,8
c,9,10,11,12


In [67]:
df.applymap(lambda x: x**2)

Unnamed: 0,Olma,Anor,Qovun,Anjir
a,1,4,9,16
b,25,36,49,64
c,81,100,121,144


### Sorting

In [68]:
obj = pd.Series([7, 1, 4, 9], index=['d', 'a', 'b', 'c'])
obj

d    7
a    1
b    4
c    9
dtype: int64

#### `Series.sort_index`

In [69]:
# sort Series by index labels
obj.sort_index()

a    1
b    4
c    9
d    7
dtype: int64

#### `Series.sort_values`

In [70]:
# sort by the values
obj.sort_values()

a    1
b    4
d    7
c    9
dtype: int64

#### `DataFrame.sort_index`

In [71]:
df = pd.read_csv('https://raw.githubusercontent.com/anvarnarz/praktikum_datasets/main/usa_cars.csv', index_col=0)
df.head()

Unnamed: 0,price,brand,model,year,title_status,mileage,color,vin,lot,state,country,condition
0,6300,toyota,cruiser,2008,clean vehicle,274117,black,jtezu11f88k007763,159348797,new jersey,usa,10 days left
1,2899,ford,se,2011,clean vehicle,190552,silver,2fmdk3gc4bbb02217,166951262,tennessee,usa,6 days left
2,5350,dodge,mpv,2018,clean vehicle,39590,silver,3c4pdcgg5jt346413,167655728,georgia,usa,2 days left
3,25000,ford,door,2014,clean vehicle,64146,blue,1ftfw1et4efc23745,167753855,virginia,usa,22 hours left
4,27700,chevrolet,1500,2018,clean vehicle,6654,red,3gcpcrec2jg473991,167763266,florida,usa,22 hours left


In [72]:
df.sort_index()

Unnamed: 0,price,brand,model,year,title_status,mileage,color,vin,lot,state,country,condition
0,6300,toyota,cruiser,2008,clean vehicle,274117,black,jtezu11f88k007763,159348797,new jersey,usa,10 days left
1,2899,ford,se,2011,clean vehicle,190552,silver,2fmdk3gc4bbb02217,166951262,tennessee,usa,6 days left
2,5350,dodge,mpv,2018,clean vehicle,39590,silver,3c4pdcgg5jt346413,167655728,georgia,usa,2 days left
3,25000,ford,door,2014,clean vehicle,64146,blue,1ftfw1et4efc23745,167753855,virginia,usa,22 hours left
4,27700,chevrolet,1500,2018,clean vehicle,6654,red,3gcpcrec2jg473991,167763266,florida,usa,22 hours left
...,...,...,...,...,...,...,...,...,...,...,...,...
2494,7800,nissan,versa,2019,clean vehicle,23609,red,3n1cn7ap9kl880319,167722715,california,usa,1 days left
2495,9200,nissan,versa,2018,clean vehicle,34553,silver,3n1cn7ap5jl884088,167762225,florida,usa,21 hours left
2496,9200,nissan,versa,2018,clean vehicle,31594,silver,3n1cn7ap9jl884191,167762226,florida,usa,21 hours left
2497,9200,nissan,versa,2018,clean vehicle,32557,black,3n1cn7ap3jl883263,167762227,florida,usa,2 days left


In [73]:
df.sort_index(axis=1)  # sort by column names

Unnamed: 0,brand,color,condition,country,lot,mileage,model,price,state,title_status,vin,year
0,toyota,black,10 days left,usa,159348797,274117,cruiser,6300,new jersey,clean vehicle,jtezu11f88k007763,2008
1,ford,silver,6 days left,usa,166951262,190552,se,2899,tennessee,clean vehicle,2fmdk3gc4bbb02217,2011
2,dodge,silver,2 days left,usa,167655728,39590,mpv,5350,georgia,clean vehicle,3c4pdcgg5jt346413,2018
3,ford,blue,22 hours left,usa,167753855,64146,door,25000,virginia,clean vehicle,1ftfw1et4efc23745,2014
4,chevrolet,red,22 hours left,usa,167763266,6654,1500,27700,florida,clean vehicle,3gcpcrec2jg473991,2018
...,...,...,...,...,...,...,...,...,...,...,...,...
2494,nissan,red,1 days left,usa,167722715,23609,versa,7800,california,clean vehicle,3n1cn7ap9kl880319,2019
2495,nissan,silver,21 hours left,usa,167762225,34553,versa,9200,florida,clean vehicle,3n1cn7ap5jl884088,2018
2496,nissan,silver,21 hours left,usa,167762226,31594,versa,9200,florida,clean vehicle,3n1cn7ap9jl884191,2018
2497,nissan,black,2 days left,usa,167762227,32557,versa,9200,florida,clean vehicle,3n1cn7ap3jl883263,2018


In [76]:
df.sort_index(ascending=False)  # reverse order

Unnamed: 0,price,brand,model,year,title_status,mileage,color,vin,lot,state,country,condition
2498,9200,nissan,versa,2018,clean vehicle,31371,silver,3n1cn7ap4jl884311,167762228,florida,usa,21 hours left
2497,9200,nissan,versa,2018,clean vehicle,32557,black,3n1cn7ap3jl883263,167762227,florida,usa,2 days left
2496,9200,nissan,versa,2018,clean vehicle,31594,silver,3n1cn7ap9jl884191,167762226,florida,usa,21 hours left
2495,9200,nissan,versa,2018,clean vehicle,34553,silver,3n1cn7ap5jl884088,167762225,florida,usa,21 hours left
2494,7800,nissan,versa,2019,clean vehicle,23609,red,3n1cn7ap9kl880319,167722715,california,usa,1 days left
...,...,...,...,...,...,...,...,...,...,...,...,...
4,27700,chevrolet,1500,2018,clean vehicle,6654,red,3gcpcrec2jg473991,167763266,florida,usa,22 hours left
3,25000,ford,door,2014,clean vehicle,64146,blue,1ftfw1et4efc23745,167753855,virginia,usa,22 hours left
2,5350,dodge,mpv,2018,clean vehicle,39590,silver,3c4pdcgg5jt346413,167655728,georgia,usa,2 days left
1,2899,ford,se,2011,clean vehicle,190552,silver,2fmdk3gc4bbb02217,166951262,tennessee,usa,6 days left


#### `DataFrame.sort_values`

In [79]:
df.sort_values(by='price', ascending=False)

Unnamed: 0,price,brand,model,year,title_status,mileage,color,vin,lot,state,country,condition
502,84900,mercedes-benz,sl-class,2017,clean vehicle,25302,silver,wddjk7ea3hf044968,167607883,florida,usa,2 days left
1340,74000,ford,drw,2019,clean vehicle,10536,no_color,1ft8w4dt6ked32656,167780682,illinois,usa,2 days left
1336,70000,ford,drw,2019,clean vehicle,9643,no_color,1ft8w3dt3kee48276,167780680,illinois,usa,2 days left
277,67000,dodge,challenger,2019,clean vehicle,10944,blue,2c3cdzl97kh518237,167759490,ohio,usa,21 hours left
1215,65500,ford,srw,2019,clean vehicle,6500,black,1ft7w2bt0kec44818,167718954,indiana,usa,21 hours left
...,...,...,...,...,...,...,...,...,...,...,...,...
141,0,dodge,van,2008,salvage insurance,177948,orange,2d8hn44h88r669549,167756157,utah,usa,2 days left
391,0,cadillac,coupe,2000,salvage insurance,105169,white,1g6el12y9yu148063,167651218,virginia,usa,9 days left
285,0,ford,door,2000,salvage insurance,124969,black,1fafp34p7yw270338,167251902,oklahoma,usa,17 hours left
290,0,mazda,door,2009,salvage insurance,117541,gray,jm3er293590215768,167543177,indiana,usa,16 hours left


### Ranking
`DataFrame.rank` - compute numerical data ranks (1 through n) along axis

In [87]:
# rank by year
df['rating'] = df['year'].rank()
df

Unnamed: 0,price,brand,model,year,title_status,mileage,color,vin,lot,state,country,condition,rating
0,6300,toyota,cruiser,2008,clean vehicle,274117,black,jtezu11f88k007763,159348797,new jersey,usa,10 days left,70.5
1,2899,ford,se,2011,clean vehicle,190552,silver,2fmdk3gc4bbb02217,166951262,tennessee,usa,6 days left,115.0
2,5350,dodge,mpv,2018,clean vehicle,39590,silver,3c4pdcgg5jt346413,167655728,georgia,usa,2 days left,1362.0
3,25000,ford,door,2014,clean vehicle,64146,blue,1ftfw1et4efc23745,167753855,virginia,usa,22 hours left,336.5
4,27700,chevrolet,1500,2018,clean vehicle,6654,red,3gcpcrec2jg473991,167763266,florida,usa,22 hours left,1362.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2494,7800,nissan,versa,2019,clean vehicle,23609,red,3n1cn7ap9kl880319,167722715,california,usa,1 days left,2005.5
2495,9200,nissan,versa,2018,clean vehicle,34553,silver,3n1cn7ap5jl884088,167762225,florida,usa,21 hours left,1362.0
2496,9200,nissan,versa,2018,clean vehicle,31594,silver,3n1cn7ap9jl884191,167762226,florida,usa,21 hours left,1362.0
2497,9200,nissan,versa,2018,clean vehicle,32557,black,3n1cn7ap3jl883263,167762227,florida,usa,2 days left,1362.0


In [89]:
df.sort_values("year")

Unnamed: 0,price,brand,model,year,title_status,mileage,color,vin,lot,state,country,condition,rating
32,29800,chevrolet,camaro,1973,clean vehicle,46226,red,1q87t3n166389,167763370,pennsylvania,usa,22 hours left,1.0
405,25,ford,door,1984,salvage insurance,41577,white,2ftcf15y9eca14589,167611661,arkansas,usa,17 hours left,2.0
545,0,gmc,door,1993,salvage insurance,0,light blue,1gkfk16k5pj701631,167358601,colorado,usa,18 hours left,3.0
362,25,ford,pickup,1994,salvage insurance,206162,white,1ftdf15y2rnb12612,167361489,georgia,usa,2 days left,4.5
322,0,ford,chassis,1994,salvage insurance,0,green,1fdee14n7rha47894,167359174,california,usa,19 hours left,4.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...
426,55600,lexus,gx,2020,clean vehicle,8186,silver,jtjam7bx4l5251250,167605747,florida,usa,2 days left,2475.5
806,28300,chevrolet,colorado,2020,clean vehicle,13886,silver,1gcgtcen7l1101496,167805497,indiana,usa,2 days left,2475.5
2322,15000,nissan,altima,2020,clean vehicle,14320,blue,1n4bl4cv4lc134145,167656212,ohio,usa,2 days left,2475.5
2344,20100,nissan,rogue,2020,clean vehicle,10875,blue,jn8at2mvxlw103235,167614842,minnesota,usa,14 hours left,2475.5


In [91]:
# rank by mileage
df['rating'] = df['mileage'].rank()
df.sort_values('mileage')

Unnamed: 0,price,brand,model,year,title_status,mileage,color,vin,lot,state,country,condition,rating
545,0,gmc,door,1993,salvage insurance,0,light blue,1gkfk16k5pj701631,167358601,colorado,usa,18 hours left,3.5
504,100,peterbilt,truck,2012,salvage insurance,0,blue,1xp4d49x1cd144875,167529787,florida,usa,17 hours left,3.5
1619,650,ford,door,2017,salvage insurance,0,black,1fadp3k21hl268441,167651911,california,usa,2 days left,3.5
309,0,chevrolet,door,2004,salvage insurance,0,maroon,3gnek12t74g240524,167418651,wyoming,usa,18 hours left,3.5
1236,4200,ford,door,2013,clean vehicle,0,no_color,1fadp3j23dl155179,167773673,pennsylvania,usa,2 days left,3.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...
531,2000,chevrolet,pickup,2003,clean vehicle,507985,red,1gcgc24u53z128586,167781223,wisconsin,usa,21 hours left,2495.0
490,475,peterbilt,truck,2012,salvage insurance,902041,gold,1xp4d49x9cd123630,167529786,florida,usa,17 hours left,2496.0
516,0,peterbilt,truck,2009,salvage insurance,982486,blue,1xp7d49x09d784257,167529788,florida,usa,17 hours left,2497.0
1827,3200,ford,door,2013,clean vehicle,999999,silver,1fadp3k21dl266148,167727773,south carolina,usa,21 hours left,2498.0


In [94]:
# rank by price
df['rating'] = df['price'].rank(method='min')
df.sort_values('price')

Unnamed: 0,price,brand,model,year,title_status,mileage,color,vin,lot,state,country,condition,rating
410,0,chevrolet,door,1995,salvage insurance,274706,green,2gcec19h8s1195266,167425634,arizona,usa,2 days left,1.0
330,0,ford,door,1996,salvage insurance,296860,green,1falp62w5th144314,167359712,california,usa,19 hours left,1.0
331,0,ford,door,2006,salvage insurance,203158,red,1fmzk04136ga07119,167610991,illinois,usa,17 hours left,1.0
339,0,ford,door,2002,salvage insurance,214800,black,3fafp37372r151014,167360232,south carolina,usa,2 days left,1.0
496,0,ford,pickup,1996,salvage insurance,252588,red,1ftef15n0tlc14455,167357804,oklahoma,usa,17 hours left,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1215,65500,ford,srw,2019,clean vehicle,6500,black,1ft7w2bt0kec44818,167718954,indiana,usa,21 hours left,2495.0
277,67000,dodge,challenger,2019,clean vehicle,10944,blue,2c3cdzl97kh518237,167759490,ohio,usa,21 hours left,2496.0
1336,70000,ford,drw,2019,clean vehicle,9643,no_color,1ft8w3dt3kee48276,167780680,illinois,usa,2 days left,2497.0
1340,74000,ford,drw,2019,clean vehicle,10536,no_color,1ft8w4dt6ked32656,167780682,illinois,usa,2 days left,2498.0
