# Aggregation
- Sammanställning av mycket data i mindre mer digestible metrics
- [Link to lecture 1](https://ithogskolan.sharepoint.com/:v:/s/AI23/Ec_HjIGFe5xJm0dGV7je9EQB3jYE2BBvfQBtL-Bz1Q_cbg?e=3gasuX)
- [link to lecture 2](https://ithogskolan.sharepoint.com/:v:/s/AI23/ESQAaytQDEdCrqnNh9q2wbUBbYsRAtUTB-XVaMllXLdmow?e=nTi2dv)
- [link to lecture 3](https://ithogskolan.sharepoint.com/:v:/s/AI23/ETC6OAgm1oFDryo-PmNv448B3XtErYmjnmPu9GqXaQ8j5A?e=okmi4e)
- [link to lecture 4]()

In [1]:
import numpy as np
import pandas as pd

### Built -in aggregation methods in Pandas:

In [30]:
numbers = pd.Series(np.random.randint(low=1, high=100, size=5))
numbers[3] = np.nan
numbers

0    98.0
1    35.0
2    60.0
3     NaN
4    10.0
dtype: float64

In [31]:
# Some of Pandas built in aggregation methods are:
print(f'{numbers.min() = }')
print(f'{numbers.max() = }')
print(f'{numbers.sum() = }')
print(f'{numbers.mean() = }')
print(f'{numbers.count() = }') # tar ej med NaN
print(f'{numbers.median() = }')
print(f'{numbers.mode() = }') # returnerar en lista?
print(f'{numbers.mode()[0] = }') # returnerar första värdet
print(f'{numbers.size = }') # size tar med även NaN, vilket count inte gör


numbers.min() = 10.0
numbers.max() = 98.0
numbers.sum() = 203.0
numbers.mean() = 50.75
numbers.count() = 4
numbers.median() = 47.5
numbers.mode() = 0    10.0
1    35.0
2    60.0
3    98.0
dtype: float64
numbers.mode()[0] = 10.0
numbers.size = 5


### When run on a dataframe (multiple Series), they return a single vlue for each Series, forming new Series.
- Dvs, den aggregerar varje kolumn i en dataframe, så att den returnerar en Series med alla svaren

In [75]:
numbers_df = pd.DataFrame(np.random.randint(low=1, high=100, size=[5, 5]), columns=['A','B','C','D','E'])
numbers_df.loc[[0, 3],['B', 'E']] = np.nan
numbers_df

Unnamed: 0,A,B,C,D,E
0,12,,66,91,
1,76,48.0,49,90,30.0
2,52,4.0,66,55,85.0
3,23,,2,68,
4,77,63.0,12,68,80.0


In [76]:
print(f'{numbers_df.min() = }')# default axis = "index", ger min för varje col
print()
print(f'{numbers_df.min(axis="index") = }') # samma som ovan
print()
print(f'{numbers_df.min(axis="columns") = }') # returnerar en Series med det lägsta värdet i varje RAD 
print()
print(f'{numbers_df.min(axis="columns").min() = }') # min of min sas, minsta värdet i hela df
print()
print(f'{numbers_df.min(axis="columns").median() = }')
print()
print(f'{numbers_df.min(axis="index").median() = }')

numbers_df.min() = A    12.0
B     4.0
C     2.0
D    55.0
E    30.0
dtype: float64

numbers_df.min(axis="index") = A    12.0
B     4.0
C     2.0
D    55.0
E    30.0
dtype: float64

numbers_df.min(axis="columns") = 0    12.0
1    30.0
2     4.0
3     2.0
4    12.0
dtype: float64

numbers_df.min(axis="columns").min() = 2.0

numbers_df.min(axis="columns").median() = 12.0

numbers_df.min(axis="index").median() = 12.0


In [68]:
# Count null values in each column:
numbers_df.isna().sum(axis="columns")

0    2
1    0
2    0
3    2
4    0
dtype: int64

In [70]:
numbers_df.isna().sum()

0    0
1    2
2    0
3    0
4    2
dtype: int64

In [71]:
numbers_df.isna()

Unnamed: 0,0,1,2,3,4
0,False,True,False,False,True
1,False,False,False,False,False
2,False,False,False,False,False
3,False,True,False,False,True
4,False,False,False,False,False


## Working with real data
- Lecture 2

In [34]:
autos = pd.read_json("../Data/autos_json.json")
autos.tail(3)

Unnamed: 0,aspiration,body-style,bore,city-mpg,compression-ratio,curb-weight,drive-wheels,engine-location,engine-size,engine-type,...,make,normalized-losses,num-of-cylinders,num-of-doors,peak-rpm,price,stroke,symboling,wheel-base,width
202,std,sedan,3.58,18,8.8,3012,rwd,front,173,ohcv,...,volvo,95.0,six,four,5500.0,21485.0,2.87,-1,109.1,68.9
203,turbo,sedan,3.01,26,23.0,3217,rwd,front,145,ohc,...,volvo,95.0,six,four,4800.0,22470.0,3.4,-1,109.1,68.9
204,turbo,sedan,3.78,19,9.5,3062,rwd,front,141,ohc,...,volvo,95.0,four,four,5400.0,22625.0,3.15,-1,109.1,68.9


In [36]:
# medelpris
autos['price'].mean()

13207.129353233831

In [38]:
autos[['length','width','height']].head(3) # man kan skicka in en lista

Unnamed: 0,length,width,height
0,168.8,64.1,48.8
1,168.8,64.1,48.8
2,171.2,65.5,52.4


In [42]:
# running mean() on multiple columns (DataFrame) returns a Series of means.
autos[['length','width','height']].head(3).mean() # tre första raderna
autos[['length','width','height']].mean() # alla bilar

length    174.049268
width      65.907805
height     53.724878
dtype: float64

In [43]:
# medel för bara volvo, med boolean mask
autos[autos['make'] == 'volvo'][['length','width','height']].mean()

length    188.800000
width      67.963636
height     56.236364
dtype: float64

In [44]:
# medel för bara volvo, med query
autos.query("make =='volvo'")[['length','width','height']].mean()

length    188.800000
width      67.963636
height     56.236364
dtype: float64

In [46]:
# query med fler villkor
autos.query("make =='volvo' & `body-style` == 'sedan'")[['length','width','height']].mean()

length    188.8000
width      68.2500
height     55.7625
dtype: float64

### Multiple aggregation

In [49]:
autos[['length','width','height']].agg(min)

  autos[['length','width','height']].agg(min)


length    141.1
width      60.3
height     47.8
dtype: float64

In [50]:
autos[['length','width','height']].agg(['min','max','mean'])

Unnamed: 0,length,width,height
min,141.1,60.3,47.8
max,208.1,72.3,59.8
mean,174.049268,65.907805,53.724878


In [55]:
# hur vi skapar en ny df innehållande endast det som är numeriskt (float eller int):
# gör med list comprehension:
cols = [col for col in autos.columns if autos[col].dtype in ["int64", "float64"]] # ger en lista med alla cols som är numeriska
# kolla fredriks föreläsning för resten
# autos[cols]

Unnamed: 0,bore,city-mpg,compression-ratio,curb-weight,engine-size,height,highway-mpg,horsepower,length,normalized-losses,peak-rpm,price,stroke,symboling,wheel-base,width
0,3.47,21,9.0,2548,130,48.8,27,111.0,168.8,,5000.0,13495.0,2.68,3,88.6,64.1
1,3.47,21,9.0,2548,130,48.8,27,111.0,168.8,,5000.0,16500.0,2.68,3,88.6,64.1
2,2.68,19,9.0,2823,152,52.4,26,154.0,171.2,,5000.0,16500.0,3.47,1,94.5,65.5
3,3.19,24,10.0,2337,109,54.3,30,102.0,176.6,164.0,5500.0,13950.0,3.40,2,99.8,66.2
4,3.19,18,8.0,2824,136,54.3,22,115.0,176.6,164.0,5500.0,17450.0,3.40,2,99.4,66.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
200,3.78,23,9.5,2952,141,55.5,28,114.0,188.8,95.0,5400.0,16845.0,3.15,-1,109.1,68.9
201,3.78,19,8.7,3049,141,55.5,25,160.0,188.8,95.0,5300.0,19045.0,3.15,-1,109.1,68.8
202,3.58,18,8.8,3012,173,55.5,23,134.0,188.8,95.0,5500.0,21485.0,2.87,-1,109.1,68.9
203,3.01,26,23.0,3217,145,55.5,27,106.0,188.8,95.0,4800.0,22470.0,3.40,-1,109.1,68.9


In [54]:
autos.describe() # tar ut metrics för alla numeriska värden

Unnamed: 0,bore,city-mpg,compression-ratio,curb-weight,engine-size,height,highway-mpg,horsepower,length,normalized-losses,peak-rpm,price,stroke,symboling,wheel-base,width
count,201.0,205.0,205.0,205.0,205.0,205.0,205.0,203.0,205.0,164.0,203.0,201.0,201.0,205.0,205.0,205.0
mean,3.329751,25.219512,10.142537,2555.565854,126.907317,53.724878,30.75122,104.256158,174.049268,122.0,5125.369458,13207.129353,3.255423,0.834146,98.756585,65.907805
std,0.273539,6.542142,3.97204,520.680204,41.642693,2.443522,6.886443,39.714369,12.337289,35.442168,479.33456,7947.066342,0.316717,1.245307,6.021776,2.145204
min,2.54,13.0,7.0,1488.0,61.0,47.8,16.0,48.0,141.1,65.0,4150.0,5118.0,2.07,-2.0,86.6,60.3
25%,3.15,19.0,8.6,2145.0,97.0,52.0,25.0,70.0,166.3,94.0,4800.0,7775.0,3.11,0.0,94.5,64.1
50%,3.31,24.0,9.0,2414.0,120.0,54.1,30.0,95.0,173.2,115.0,5200.0,10295.0,3.29,1.0,97.0,65.5
75%,3.59,30.0,9.4,2935.0,141.0,55.5,34.0,116.0,183.1,150.0,5500.0,16500.0,3.41,2.0,102.4,66.9
max,3.94,49.0,23.0,4066.0,326.0,59.8,54.0,288.0,208.1,256.0,6600.0,45400.0,4.17,3.0,120.9,72.3


### Split-Apply-Combine, aka Group-By
- Lecture 3
- Select a feature to use as key
- Split the dataset into group for each unique key value
- Apply aggregation to each group ('sum' etc.)
- Combine aggregated data into a new dataset  
![image](https://nicholasvadivelu.com/assets/images/posts/groupby/split-apply-combine.svg)

### Group by
- Use Pandas .groupby() method to select a key and split into groups.
- This created a new DataFramGroupBy object containing the grouped DataFrames.
- På ren svenska: Gropuby gör splitten, därefter gör man sina apply och combine saker på DataFrameGroupBy.

In [56]:
makes = autos.groupby("make") # 'make' är key, den feature man grupperar på.

In [57]:
type(makes) # returnerar att det är ett DataFrameGroupBy object.

pandas.core.groupby.generic.DataFrameGroupBy

In [58]:
len(makes) # DataFrameGroupBy objektet innehåller 22 dataframes

22

In [59]:
# kollar att det stämmer:
len(autos['make'].unique())

22

In [None]:
# annan bra sak att kunna, value_counts()
autos['make'].value_counts()

In [61]:
makes.groups # returnerar en dict med alla index för alla keys, från originaldatan

{'alfa-romero': [0, 1, 2], 'audi': [3, 4, 5, 6, 7, 8, 9], 'bmw': [10, 11, 12, 13, 14, 15, 16, 17], 'chevrolet': [18, 19, 20], 'dodge': [21, 22, 23, 24, 25, 26, 27, 28, 29], 'honda': [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], 'isuzu': [43, 44, 45, 46], 'jaguar': [47, 48, 49], 'mazda': [50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66], 'mercedes-benz': [67, 68, 69, 70, 71, 72, 73, 74], 'mercury': [75], 'mitsubishi': [76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88], 'nissan': [89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106], 'peugot': [107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117], 'plymouth': [118, 119, 120, 121, 122, 123, 124], 'porsche': [125, 126, 127, 128, 129], 'renault': [130, 131], 'saab': [132, 133, 134, 135, 136, 137], 'subaru': [138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149], 'toyota': [150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 17

In [63]:
autos.loc[makes.groups['jaguar']] # på den ursprungliga dataframen

Unnamed: 0,aspiration,body-style,bore,city-mpg,compression-ratio,curb-weight,drive-wheels,engine-location,engine-size,engine-type,...,make,normalized-losses,num-of-cylinders,num-of-doors,peak-rpm,price,stroke,symboling,wheel-base,width
47,std,sedan,3.63,15,8.1,4066,rwd,front,258,dohc,...,jaguar,145.0,six,four,4750.0,32250.0,4.17,0,113.0,69.6
48,std,sedan,3.63,15,8.1,4066,rwd,front,258,dohc,...,jaguar,,six,four,4750.0,35550.0,4.17,0,113.0,69.6
49,std,sedan,3.54,13,11.5,3950,rwd,front,326,ohcv,...,jaguar,,twelve,two,5000.0,36000.0,2.76,0,102.0,70.6


In [64]:
makes.get_group('jaguar') # På DataFrameGroupBy objektet

Unnamed: 0,aspiration,body-style,bore,city-mpg,compression-ratio,curb-weight,drive-wheels,engine-location,engine-size,engine-type,...,make,normalized-losses,num-of-cylinders,num-of-doors,peak-rpm,price,stroke,symboling,wheel-base,width
47,std,sedan,3.63,15,8.1,4066,rwd,front,258,dohc,...,jaguar,145.0,six,four,4750.0,32250.0,4.17,0,113.0,69.6
48,std,sedan,3.63,15,8.1,4066,rwd,front,258,dohc,...,jaguar,,six,four,4750.0,35550.0,4.17,0,113.0,69.6
49,std,sedan,3.54,13,11.5,3950,rwd,front,326,ohcv,...,jaguar,,twelve,two,5000.0,36000.0,2.76,0,102.0,70.6


### Apply and combine
- It's possible to access a single group, as shown above with Jaguar.
- However, most of the time we rather apply aggregation functions to each group individually and combine the result into a new dataset.

In [77]:
makes.count() # returnerar en ny dataframe

Unnamed: 0_level_0,aspiration,body-style,bore,city-mpg,compression-ratio,curb-weight,drive-wheels,engine-location,engine-size,engine-type,...,length,normalized-losses,num-of-cylinders,num-of-doors,peak-rpm,price,stroke,symboling,wheel-base,width
make,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
alfa-romero,3,3,3,3,3,3,3,3,3,3,...,3,0,3,3,3,3,3,3,3,3
audi,7,7,7,7,7,7,7,7,7,7,...,7,4,7,7,7,6,7,7,7,7
bmw,8,8,8,8,8,8,8,8,8,8,...,8,4,8,8,8,8,8,8,8,8
chevrolet,3,3,3,3,3,3,3,3,3,3,...,3,3,3,3,3,3,3,3,3,3
dodge,9,9,9,9,9,9,9,9,9,9,...,9,9,9,8,9,9,9,9,9,9
honda,13,13,13,13,13,13,13,13,13,13,...,13,13,13,13,13,13,13,13,13,13
isuzu,4,4,4,4,4,4,4,4,4,4,...,4,0,4,4,4,2,4,4,4,4
jaguar,3,3,3,3,3,3,3,3,3,3,...,3,1,3,3,3,3,3,3,3,3
mazda,17,17,13,17,17,17,17,17,17,17,...,17,15,17,16,17,17,13,17,17,17
mercedes-benz,8,8,8,8,8,8,8,8,8,8,...,8,5,8,8,8,8,8,8,8,8


In [83]:
#makes.mean() # ger ett fel, eftersom mean inte går att köra på ett 'object'
makes[['width','length','height']] # returnerar fortfarande en DataFrameGroupBy object typ
makes[['width','length','height']].mean().head(3) # returnerar en DataFrame
#makes['price'].mean() # returnerar medelpriset för varje märke

Unnamed: 0_level_0,width,length,height
make,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
alfa-romero,64.566667,169.6,50.0
audi,68.714286,183.828571,54.428571
bmw,66.475,184.5,54.825


### SeriesGroupBy
- indexing a DataFrameGroupBy object with a single column will return a SeriesGroupBy object.
- på samma sätt som att man tar ut en kolumn i en df ger det en series.

In [90]:
sgb = makes['price'] # väljer ut en col i makes DataFrameGroupBy objectet => SeriesGroupBy object
type(sgb)
sgb.groups
sgb.get_group('jaguar')
sgb.mean().head(3)

make
alfa-romero    15498.333333
audi           17859.166667
bmw            26118.750000
Name: price, dtype: float64

In [92]:
# kan också skrivas:
autos.groupby('make')['price'].mean().head(3)

make
alfa-romero    15498.333333
audi           17859.166667
bmw            26118.750000
Name: price, dtype: float64

In [93]:
# spara datan, i en Series
result = autos.groupby('make')['price'].mean().head(3)
type(result)

pandas.core.series.Series

In [95]:
# kan också skrivas
(
    autos
        .groupby('make')['price']
        .mean()
        .head(3)
)    

make
alfa-romero    15498.333333
audi           17859.166667
bmw            26118.750000
Name: price, dtype: float64

In [115]:
# kolla i föreläsningen
autos.groupby('make')[['width','length','height']].mean().head()

Unnamed: 0_level_0,width,length,height
make,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
alfa-romero,64.566667,169.6,50.0
audi,68.714286,183.828571,54.428571
bmw,66.475,184.5,54.825
chevrolet,62.5,151.933333,52.4
dodge,64.166667,160.988889,51.644444


### Multiple aggregation on SeriesGroupBy
- Use Pandas .agg() method on SeriesGroupBy to do multiple aggregation on a single feature.

In [98]:
sgb.min().head()

make
alfa-romero    13495.0
audi           13950.0
bmw            16430.0
Name: price, dtype: float64

In [100]:
sgb.agg(["min","mean","max"]).head() # ger df

Unnamed: 0_level_0,min,mean,max
make,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
alfa-romero,13495.0,15498.333333,16500.0
audi,13950.0,17859.166667,23875.0
bmw,16430.0,26118.75,41315.0


In [103]:
# kan skrivas på en rad
autos.groupby('make')['price'].agg(['min','mean','max']).head()

Unnamed: 0_level_0,min,mean,max
make,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
alfa-romero,13495.0,15498.333333,16500.0
audi,13950.0,17859.166667,23875.0
bmw,16430.0,26118.75,41315.0


In [106]:
# describe på
autos.groupby('make')['price'].describe().head()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
make,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
alfa-romero,3.0,15498.333333,1734.937559,13495.0,14997.5,16500.0,16500.0,16500.0
audi,6.0,17859.166667,3452.379493,13950.0,15800.0,17580.0,18617.5,23875.0
bmw,8.0,26118.75,9263.832033,16430.0,19958.75,22835.0,32290.0,41315.0


### Multiple aggregation on DataFrameGroupBy
- using Pandas .agg() method on a DataFrameGroupBy to do multiple aggregations on multiple feature will return a multi-index column dataframe.

In [108]:
makes[['length','width','height']].agg(['min','mean','max']).head()

Unnamed: 0_level_0,length,length,length,width,width,width,height,height,height
Unnamed: 0_level_1,min,mean,max,min,mean,max,min,mean,max
make,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
alfa-romero,168.8,169.6,171.2,64.1,64.566667,65.5,48.8,50.0,52.4
audi,176.6,183.828571,192.7,66.2,68.714286,71.4,52.0,54.428571,55.9
bmw,176.8,184.5,197.0,64.8,66.475,70.9,53.7,54.825,56.3
chevrolet,141.1,151.933333,158.8,60.3,62.5,63.6,52.0,52.4,53.2
dodge,157.3,160.988889,174.6,63.8,64.166667,66.3,50.2,51.644444,59.8


### Custom columns aggregation

In [110]:
autos.groupby('make').agg({'price': 'mean','horsepower':'max'}).head()

Unnamed: 0_level_0,price,horsepower
make,Unnamed: 1_level_1,Unnamed: 2_level_1
alfa-romero,15498.333333,154.0
audi,17859.166667,160.0
bmw,26118.75,182.0
chevrolet,6007.0,70.0
dodge,7875.444444,145.0


In [113]:
autos.groupby('make').agg(
    average_price = pd.NamedAgg(column='price', aggfunc='mean'),
    min_horsepower = pd.NamedAgg(column='horsepower', aggfunc='min'),
    max_horsepower = pd.NamedAgg(column='horsepower', aggfunc='max')
    ).head()

Unnamed: 0_level_0,average_price,min_horsepower,max_horsepower
make,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
alfa-romero,15498.333333,111.0,154.0
audi,17859.166667,102.0,160.0
bmw,26118.75,101.0,182.0
chevrolet,6007.0,48.0,70.0
dodge,7875.444444,68.0,145.0


### skriver en egen function, istället för de inbyggda
- Lecture 4

In [118]:
def list_unique(x):
    return ", ".join(x.unique())

autos.groupby('make').agg(
    average_price = pd.NamedAgg(column='price', aggfunc='mean'),
    min_horsepower = pd.NamedAgg(column='horsepower', aggfunc='min'),
    max_horsepower = pd.NamedAgg(column='horsepower', aggfunc='max'),
    body_styles = pd.NamedAgg(column='body-style', aggfunc=list_unique)
    ).head()

Unnamed: 0_level_0,average_price,min_horsepower,max_horsepower,body_styles
make,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
alfa-romero,15498.333333,111.0,154.0,"convertible, hatchback"
audi,17859.166667,102.0,160.0,"sedan, wagon, hatchback"
bmw,26118.75,101.0,182.0,sedan
chevrolet,6007.0,48.0,70.0,"hatchback, sedan"
dodge,7875.444444,68.0,145.0,"hatchback, sedan, wagon"


In [121]:
def list_unique(x):
    return ", ".join(x.apply(str).unique())

autos.groupby('make').agg(
    average_price = pd.NamedAgg(column='price', aggfunc='mean'),
    min_horsepower = pd.NamedAgg(column='horsepower', aggfunc='min'),
    max_horsepower = pd.NamedAgg(column='horsepower', aggfunc='max'),
    length = pd.NamedAgg(column='length', aggfunc=list_unique)
    ).sort_values(by="average_price", ascending=False) # sorterar på priset från högt till lågt

Unnamed: 0_level_0,average_price,min_horsepower,max_horsepower,length
make,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
jaguar,34600.0,176.0,262.0,"199.6, 191.7"
mercedes-benz,33647.0,123.0,184.0,"190.9, 187.5, 202.6, 180.3, 208.1, 199.2"
porsche,31400.5,143.0,288.0,"168.9, 175.7"
bmw,26118.75,101.0,182.0,"176.8, 189.0, 193.8, 197.0"
volvo,18063.181818,106.0,162.0,188.8
audi,17859.166667,102.0,160.0,"176.6, 177.3, 192.7, 178.2"
mercury,16503.0,175.0,175.0,178.4
alfa-romero,15498.333333,111.0,154.0,"168.8, 171.2"
peugot,15489.090909,95.0,142.0,"186.7, 198.9"
saab,15223.333333,110.0,160.0,186.6


Vanlig Python kod.

In [123]:
# list.sort()
mylist = [3,5,2,4,6]
mylist.sort() # sorterar listan, är en metod på själva listan
mylist

[2, 3, 4, 5, 6]

In [124]:
# sorted(list)
mylist = [3,5,2,4,6]
mylistsorted = sorted(mylist)
mylistsorted

[2, 3, 4, 5, 6]

key tar en function som beskriver hur varje item ska processas.... 

In [133]:
def my_sorting_func(person):
    #return person["firstname"]
    return len(person["firstname"])

mylist = [
    {"firstname": "Fredrik", "LastName":"Johansson","Age":42},
     {"firstname": "Anna", "LastName":"Karlsson","Age":38},
     {"firstname": "Anders", "LastName":"Svensson","Age":24}
]

sorted(mylist, key=my_sorting_func, reverse=True)

[{'firstname': 'Fredrik', 'LastName': 'Johansson', 'Age': 42},
 {'firstname': 'Anders', 'LastName': 'Svensson', 'Age': 24},
 {'firstname': 'Anna', 'LastName': 'Karlsson', 'Age': 38}]

### Lamda är ett annat sätt att skriva en funktion
- nedan ger samma som ovan
- lambda är en sk anonym funktion, den har inget namn

In [134]:
mylist = [
    {"firstname": "Fredrik", "LastName":"Johansson","Age":42},
     {"firstname": "Anna", "LastName":"Karlsson","Age":38},
     {"firstname": "Anders", "LastName":"Svensson","Age":24}
]

sorted(mylist, key=lambda person: len(person['firstname']), reverse=True)

[{'firstname': 'Fredrik', 'LastName': 'Johansson', 'Age': 42},
 {'firstname': 'Anders', 'LastName': 'Svensson', 'Age': 24},
 {'firstname': 'Anna', 'LastName': 'Karlsson', 'Age': 38}]

Fredrik har mer i sin föreläsning här.....