### This notebook covers:

* Advanced indexing with binary ops
* Sorting, lookups and reordering
* Pruning duplicates, NAs, Columns, rows + reindexing
* Custom-transformations: same-shape and agg
* Advanced points on views,copies and memory

### Revision:

* Boolean masking with
    - isin(a_list), between(x,y, inclusive='neither') methods
    - binary operators like &, | , ^, ~
* 2D Indexing
    - df['a':'b']['c','d']
    - df.loc[[a,b],[c,d]]
* Fancy indexing:
    - df.lookup([a,b,f],[c,d,e])
    - df.loc[[a,b],[c,d]]
* Sorting index/column:
    - df.sort_values(by=column_name, ascending=True)
    - df.sort_index(inplace=True, axis=0/1)
* Transpose of df:
    - df.T
    - df.swapaxes(1,0)
* reordering:
    - df.reindex(index=[a,s,v,ss], columns=[fd,sl,iu])
    
* Identifying duplicates:
    - df.duplicated(subset=[a,b,c],keep='first')   - first/last/False
* removing duplicates:
    - df.drop_duplicates(subset=[a,c],keep='first')  - first/last/False
* removing secific rows/columns:
    - df.drop(labels=[a,c,g], axis=0/1)
    - df.drop(index=[a,v,c], columns=[w,r,t])
    - df.pop('a')
    - with reindex()
* Handling NaNs:
    - identifying NaNs: df[df.isna().values], np.count_nonzero(df.isna())
    - dropna(how='any', axis=0)
    - fillna('value') / fillna({'a': 5, 'b': 44})/ fillna(method='ffill/bfill', axis=0/1)
* Aggregates() = agg():
    - df.agg('mean', numeric_only=True, axis=0)   # can do min, max, sum
    - df.select_dtypes(np.number).agg(['mean', 'min','max'])
    - df.aggregate({'age':'mean', 'market_value': ['min','max']})
* transform() - useful when we want to do something at run time
    - df.select_dtypes(include=object).transform(random_case)
    - df.select_dtypes(np.number).transform(lambda x: x*0.91)
    - df.select_dtypes(np.number).transform([np.mean, np.sqrt], axis=0)
* apply() - can work as agg or transform
    - df.select_dtypes(np.number).apply(['mean', min], args=(), raw=False, by_row='compat', axis = 0, result_type=None)
    - df.apply(round_floats)
* applymap() - elementwise operations
    - agg, transform, apply are vectorized operations supported by numpy to boost performance.
    - sometimes you need elementwise operations for specific requirements.
    - df.applymap(log_and_transform) # every 100th element logging is done
* Setting df element value: 
    - df[column][row] = value = this will raise SettingWithCopyWarning
    - df.loc[row, column] = value, df.iloc[row,column] = value, df.at[row,column] = value, df.iat[row, column] = value
    - pd.options.mode.chained_assignment = 'warn' -->change it to None to not show warning
* View vs Copy:
    - Pandas gives copy except loc/iloc/at/iat is used exclusively.
* Adding Dataframe Columns:
    - df['col_name'] = [a,b,...,c]
    - df.insert(0, 'col_name', list_of_values)
    - df.assign(col_name=list_of_values, col_name2= list_of_values2)
* Adding Dataframe Rows:
    - df.loc['row'] = [a,b,...c]
    - df.append(pd.Series([a,b,...,c]) # series/df
    - df.concat([df, other_df], axis=0)
    - df.concat([df, series.to_frame().T], axis=0)

### New dataset

In [3]:
import pandas as pd
data_url = 'https://andybek.com/pandas-soccer'

players = pd.read_csv(data_url)

In [4]:
players.info(verbose=False,memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 465 entries, 0 to 464
Columns: 17 entries, name to new_signing
dtypes: float64(2), int64(10), object(5)
memory usage: 190.7 KB


In [5]:
print(players.dtypes.value_counts())
players.info()

int64      10
object      5
float64     2
dtype: int64
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 465 entries, 0 to 464
Data columns (total 17 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   name          465 non-null    object 
 1   club          465 non-null    object 
 2   age           465 non-null    int64  
 3   position      464 non-null    object 
 4   position_cat  465 non-null    int64  
 5   market_value  462 non-null    float64
 6   page_views    465 non-null    int64  
 7   fpl_value     465 non-null    float64
 8   fpl_sel       465 non-null    object 
 9   fpl_points    465 non-null    int64  
 10  region        465 non-null    int64  
 11  nationality   465 non-null    object 
 12  new_foreign   465 non-null    int64  
 13  age_cat       465 non-null    int64  
 14  club_id       465 non-null    int64  
 15  big_club      465 non-null    int64  
 16  new_signing   465 non-null    int64  
dtypes: float64(2), int

In [6]:
players.head()

Unnamed: 0,name,club,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
0,Alexis Sanchez,Arsenal,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0
1,Mesut Ozil,Arsenal,28,AM,1,50.0,4395,9.5,5.60%,167,2,Germany,0,4,1,1,0
2,Petr Cech,Arsenal,35,GK,4,7.0,1529,5.5,5.90%,134,2,Czech Republic,0,6,1,1,0
3,Theo Walcott,Arsenal,28,RW,1,20.0,2393,7.5,1.50%,122,1,England,0,4,1,1,0
4,Laurent Koscielny,Arsenal,31,CB,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0


In [7]:
players.shape

(465, 17)

### Boolian Masking with other approaches

In [8]:
players[players.market_value>40]

Unnamed: 0,name,club,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
0,Alexis Sanchez,Arsenal,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0
1,Mesut Ozil,Arsenal,28,AM,1,50.0,4395,9.5,5.60%,167,2,Germany,0,4,1,1,0
96,Eden Hazard,Chelsea,26,LW,1,75.0,4220,10.5,2.30%,224,2,Belgium,0,3,5,1,0
97,Diego Costa,Chelsea,28,CF,1,50.0,4454,10.0,3.00%,196,2,Spain,0,4,5,1,0
108,N%27Golo Kante,Chelsea,26,DM,2,50.0,4042,5.0,13.80%,83,2,France,0,3,5,1,1
218,Philippe Coutinho,Liverpool,25,AM,1,45.0,2958,9.0,30.80%,171,3,Brazil,0,3,10,1,0
244,Kevin De Bruyne,Manchester+City,26,AM,1,65.0,2252,10.0,17.50%,199,2,Belgium,0,3,11,1,0
245,Sergio Aguero,Manchester+City,29,CF,1,65.0,4046,11.5,9.70%,175,3,Argentina,0,4,11,1,0
246,Raheem Sterling,Manchester+City,22,LW,1,45.0,2074,8.0,3.80%,149,1,England,0,2,11,1,0
264,Romelu Lukaku,Manchester+United,24,CF,1,50.0,3727,11.5,45.00%,221,2,Belgium,0,2,12,1,0


In [9]:
# find defenders that is Back players.
print(players.position.unique())

['LW' 'AM' 'GK' 'RW' 'CB' 'RB' 'CF' 'LB' 'DM' 'RM' 'CM' nan 'SS' 'LM']


In [10]:
players[players.position.isin(['CB','LB','RB'])]

Unnamed: 0,name,club,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
4,Laurent Koscielny,Arsenal,31,CB,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0
5,Hector Bellerin,Arsenal,22,RB,3,30.0,1675,6.0,13.70%,119,2,Spain,0,2,1,1,0
7,Nacho Monreal,Arsenal,31,LB,3,13.0,555,5.5,4.70%,115,2,Spain,0,4,1,1,0
8,Shkodran Mustafi,Arsenal,25,CB,3,30.0,1877,5.5,4.00%,90,2,Germany,0,3,1,1,1
17,Gabriel Paulista,Arsenal,26,CB,3,13.0,552,5.0,0.10%,45,3,Brazil,0,3,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
455,Aaron Cresswell,West+Ham,27,LB,3,12.0,380,5.0,1.30%,60,1,England,0,3,20,0,0
458,Angelo Ogbonna,West+Ham,29,CB,3,9.0,247,4.5,1.10%,45,2,Italy,0,4,20,0,0
459,Pablo Zabaleta,West+Ham,32,RB,3,7.0,698,5.0,2.70%,45,3,Argentina,0,5,20,0,0
461,Arthur Masuaku,West+Ham,23,LB,3,7.0,199,4.5,0.20%,34,4,Congo DR,0,2,20,0,1


In [14]:
# Players with market values between 40 and 50.
players[players.market_value.between(40,50, inclusive='neither')]  # both, neither, left, right

Unnamed: 0,name,club,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
218,Philippe Coutinho,Liverpool,25,AM,1,45.0,2958,9.0,30.80%,171,3,Brazil,0,3,10,1,0
246,Raheem Sterling,Manchester+City,22,LW,1,45.0,2074,8.0,3.80%,149,1,England,0,2,11,1,0
380,Dele Alli,Tottenham,21,CM,2,45.0,4626,9.5,38.60%,225,1,England,0,1,17,1,0


In [15]:
players.age.le(25).equals(players.age<=25)

True

### Binary OR | , AND & , XOR ^ , NOT ~

In [16]:
# order is not important, labels are important.

In [20]:
# OR | : 
print(True | True, True | False, False| True, False|False)
# AND & :
print(True & True, True & False, False & True, False & False)
# XOR ^:
print(True ^ True, True ^ False, False ^ True, False ^ False)
# NOT ~:
print( ~True, ~False)
print(~pd.Series([True,False]))

True True True False
True False False False
False True True False
-2 -1
0    False
1     True
dtype: bool


In [21]:
# 00000000 = 0,  00000001 = 1, 00000010 = 2, 00000011 = 3
# 11111111 = -1, 11111110 = -2, 11111101 = -3, 11111100 = -4

In [29]:
players[(players.position =='LB') &
        (players.age<=25) &
       (players.market_value >= 10) &
       ~(players.club.isin(['Arsenal','Tottenham']))]

Unnamed: 0,name,club,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
236,Alberto Moreno,Liverpool,25,LB,3,10.0,397,4.5,0.30%,8,2,Spain,0,3,10,1,0
281,Luke Shaw,Manchester+United,22,LB,3,20.0,947,5.0,0.40%,45,1,England,0,2,12,1,0


In [32]:
# arsenal right back and chalsea goalkeeper

arsenal_rb = (players.club == 'Arsenal') & (players.position == 'RB')
chalsea_gk = (players.club == 'Chelsea') & (players.position == 'GK')
players[arsenal_rb | chalsea_gk]

Unnamed: 0,name,club,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
5,Hector Bellerin,Arsenal,22,RB,3,30.0,1675,6.0,13.70%,119,2,Spain,0,2,1,1,0
27,Carl Jenkinson,Arsenal,25,RB,3,5.0,561,4.5,0.40%,2,1,England,0,3,1,1,0
102,Thibaut Courtois,Chelsea,25,GK,4,40.0,1260,5.5,18.50%,141,2,Belgium,0,3,5,1,0
109,Willy Caballero,Chelsea,35,GK,4,1.5,542,5.0,0.20%,64,3,Argentina,0,6,5,1,0


In [38]:
# Challenge: 
# Find the players that meet below criteria
#1. They are english nationality
#2. their market value is more than twice the avg market value of league
#3. they either have more than 4000 view or are new signings but not both

avg_market_value = players.market_value.mean()
print(players.new_signing.unique())
print(players.region.unique())

players[(players.nationality == 'England') &
        (players.market_value > 2*avg_market_value) &
        ((players.page_views> 4000) ^ (players.new_signing == 1))
       ]

[0 1]
[3 2 1 4]


Unnamed: 0,name,club,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
256,John Stones,Manchester+City,23,CB,3,35.0,1078,5.5,2.30%,59,1,England,0,2,11,1,1
380,Dele Alli,Tottenham,21,CM,2,45.0,4626,9.5,38.60%,225,1,England,0,1,17,1,0
381,Harry Kane,Tottenham,23,CF,1,60.0,4161,12.5,35.10%,224,1,England,0,2,17,1,0


### 2D Indexing

In [44]:
# chelsea player and <23 years
chelsea_under23 = (players.club=='Chelsea') & (players.age<=23)

In [45]:
players[chelsea_under23]

Unnamed: 0,name,club,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
110,Michy Batshuayi,Chelsea,23,CF,1,25.0,1162,8.5,1.60%,48,2,Belgium,0,2,5,1,1
111,Kurt Zouma,Chelsea,22,CB,3,15.0,723,5.5,0.80%,15,2,France,0,2,5,1,0
112,Kenedy,Chelsea,21,LB,3,7.0,566,5.0,0.10%,3,3,Brazil,0,1,5,1,0
115,Tiemoue Bakayoko,Chelsea,22,DM,2,16.0,1011,5.0,1.60%,0,2,France,1,2,5,1,0


In [46]:
players.loc[chelsea_under23, ['position','market_value']]

Unnamed: 0,position,market_value
110,CF,25.0
111,CB,15.0
112,LB,7.0
115,DM,16.0


In [53]:
p_cols = [x for x in players.columns if x.startswith('p')]
p_cols = players.columns.str.startswith('p')
players.loc[chelsea_under23, p_cols]

Unnamed: 0,position,position_cat,page_views
110,CF,1,1162
111,CB,3,723
112,LB,3,566
115,DM,2,1011


In [55]:
print(chelsea_under23.shape, players.shape, p_cols.shape)

(465,) (465, 17) (17,)


In [57]:
players[chelsea_under23]['position']  # slower than loc - chaining []

110    CF
111    CB
112    LB
115    DM
Name: position, dtype: object

### Fancy Indexing - indexing based on list of index labels an columns

In [60]:
players.loc[[1,132],['position','market_value']]

Unnamed: 0,position,market_value
1,AM,50.0
132,CF,6.0


In [61]:
players.lookup([1,132],['position','market_value'])

  players.lookup([1,132],['position','market_value'])


array(['AM', 6.0], dtype=object)

In [63]:
players.lookup([1,5,6],['position','market_value','region'])

  players.lookup([1,5,6],['position','market_value','region'])


array(['AM', 30.0, 2], dtype=object)

### Sorting by Index/Column

In [65]:
players.sort_values(by='market_value',ascending=False)

Unnamed: 0,name,club,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
96,Eden Hazard,Chelsea,26,LW,1,75.00,4220,10.5,2.30%,224,2,Belgium,0,3,5,1,0
267,Paul Pogba,Manchester+United,24,CM,2,75.00,7435,8.0,19.50%,115,2,France,0,2,12,1,1
0,Alexis Sanchez,Arsenal,28,LW,1,65.00,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0
244,Kevin De Bruyne,Manchester+City,26,AM,1,65.00,2252,10.0,17.50%,199,2,Belgium,0,3,11,1,0
245,Sergio Aguero,Manchester+City,29,CF,1,65.00,4046,11.5,9.70%,175,3,Argentina,0,4,11,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
287,Joel Castro Pereira,Manchester+United,21,GK,4,0.10,395,4.0,1.00%,6,2,Portugal,0,1,12,1,0
113,Eduardo Carvalho,Chelsea,34,LW,1,0.05,467,5.0,0.10%,0,2,Portugal,0,6,5,1,1
30,Granit Xhaka,Arsenal,24,,2,,1815,5.5,2.00%,85,2,Switzerland,0,2,1,1,0
192,Steve Mounie,Huddersfield,22,CF,1,,56,6.0,0.60%,0,2,Benin,0,2,8,0,0


In [67]:
players.set_index('name',inplace=True)
players.head()

Unnamed: 0_level_0,club,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Alexis Sanchez,Arsenal,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0
Mesut Ozil,Arsenal,28,AM,1,50.0,4395,9.5,5.60%,167,2,Germany,0,4,1,1,0
Petr Cech,Arsenal,35,GK,4,7.0,1529,5.5,5.90%,134,2,Czech Republic,0,6,1,1,0
Theo Walcott,Arsenal,28,RW,1,20.0,2393,7.5,1.50%,122,1,England,0,4,1,1,0
Laurent Koscielny,Arsenal,31,CB,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0


In [68]:
players.sort_index(inplace=True)
players.head()

Unnamed: 0_level_0,club,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Aaron Cresswell,West+Ham,27,LB,3,12.0,380,5.0,1.30%,60,1,England,0,3,20,0,0
Aaron Lennon,Everton,30,RW,1,5.0,504,5.5,0.20%,22,1,England,0,4,7,0,0
Aaron Mooy,Huddersfield,26,CM,2,5.0,588,5.5,2.50%,0,4,Australia,0,3,8,0,0
Aaron Ramsey,Arsenal,26,CM,2,35.0,1040,7.0,5.10%,56,1,Wales,0,3,1,1,0
Abdoulaye Doucoure,Watford,24,CM,2,6.0,124,5.0,0.00%,38,2,France,0,2,18,0,0


In [70]:
players.sort_index(axis=1).head()

Unnamed: 0_level_0,age,age_cat,big_club,club,club_id,fpl_points,fpl_sel,fpl_value,market_value,nationality,new_foreign,new_signing,page_views,position,position_cat,region
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Aaron Cresswell,27,3,0,West+Ham,20,60,1.30%,5.0,12.0,England,0,0,380,LB,3,1
Aaron Lennon,30,4,0,Everton,7,22,0.20%,5.5,5.0,England,0,0,504,RW,1,1
Aaron Mooy,26,3,0,Huddersfield,8,0,2.50%,5.5,5.0,Australia,0,0,588,CM,2,4
Aaron Ramsey,26,3,1,Arsenal,1,56,5.10%,7.0,35.0,Wales,0,0,1040,CM,2,1
Abdoulaye Doucoure,24,2,0,Watford,18,38,0.00%,5.0,6.0,France,0,0,124,CM,2,2


In [72]:
players.reset_index(inplace=True)

### sorting vs reordering - reindex()

In [73]:
players_lite = players.iloc[:4,:4]
players_lite

Unnamed: 0,name,club,age,position
0,Aaron Cresswell,West+Ham,27,LB
1,Aaron Lennon,Everton,30,RW
2,Aaron Mooy,Huddersfield,26,CM
3,Aaron Ramsey,Arsenal,26,CM


In [74]:
# row order: [3,1,2,0], column order: [age, name, position, club]
players_lite.reindex(index=[3,1,2,0],columns=['age','name','position','club'])

Unnamed: 0,age,name,position,club
3,26,Aaron Ramsey,CM,Arsenal
1,30,Aaron Lennon,RW,Everton
2,26,Aaron Mooy,CM,Huddersfield
0,27,Aaron Cresswell,LB,West+Ham


In [76]:
players.reindex(index=[3,1,2,0])

Unnamed: 0,name,club,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
3,Aaron Ramsey,Arsenal,26,CM,2,35.0,1040,7.0,5.10%,56,1,Wales,0,3,1,1,0
1,Aaron Lennon,Everton,30,RW,1,5.0,504,5.5,0.20%,22,1,England,0,4,7,0,0
2,Aaron Mooy,Huddersfield,26,CM,2,5.0,588,5.5,2.50%,0,4,Australia,0,3,8,0,0
0,Aaron Cresswell,West+Ham,27,LB,3,12.0,380,5.0,1.30%,60,1,England,0,3,20,0,0


In [77]:
players.reindex(index=[3,1,2,0]).sort_index(axis=1)

Unnamed: 0,age,age_cat,big_club,club,club_id,fpl_points,fpl_sel,fpl_value,market_value,name,nationality,new_foreign,new_signing,page_views,position,position_cat,region
3,26,3,1,Arsenal,1,56,5.10%,7.0,35.0,Aaron Ramsey,Wales,0,0,1040,CM,2,1
1,30,4,0,Everton,7,22,0.20%,5.5,5.0,Aaron Lennon,England,0,0,504,RW,1,1
2,26,3,0,Huddersfield,8,0,2.50%,5.5,5.0,Aaron Mooy,Australia,0,0,588,CM,2,4
0,27,3,0,West+Ham,20,60,1.30%,5.0,12.0,Aaron Cresswell,England,0,0,380,LB,3,1


In [80]:
players.reindex(index=[3,1,2,0],columns=sorted(players.columns)[:6])  # or players.columns.sort_values()

Unnamed: 0,age,age_cat,big_club,club,club_id,fpl_points
3,26,3,1,Arsenal,1,56
1,30,4,0,Everton,7,22
2,26,3,0,Huddersfield,8,0
0,27,3,0,West+Ham,20,60


In [86]:
# Avoid below approach for sorting columns
players_lite.swapaxes(1,0)
players_lite.T.sort_index().T

Unnamed: 0,age,club,name,position
0,27,West+Ham,Aaron Cresswell,LB
1,30,Everton,Aaron Lennon,RW
2,26,Huddersfield,Aaron Mooy,CM
3,26,Arsenal,Aaron Ramsey,CM


In [96]:
# challenge

#1 sort the players df by age in ascending order. who is youngest ?
print(players.sort_values(by='age').head(2))

#2 set the club column as index and sort it alphabatically
players.set_index('club', inplace=True)
players.sort_index(inplace=True)
players.head()


          club           name  age position  position_cat  market_value  \
231  Liverpool   Ben Woodburn   17       LW             1           1.5   
437  West+Brom  Jonathan Leko   18       RW             1           1.5   

     page_views  fpl_value fpl_sel  fpl_points  region nationality  \
231        1241        4.5   0.10%           5       1       Wales   
437         169        4.5   0.20%          12       1     England   

     new_foreign  age_cat  club_id  big_club  new_signing  
231            0        1       10         1            0  
437            0        1       19         0            0  


Unnamed: 0_level_0,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
club,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Arsenal,David Ospina,28,GK,4,7.0,544,5.0,0.20%,2,3,Colombia,0,4,1,1,0
Arsenal,Alexandre Lacazette,26,CF,1,40.0,1183,10.5,26.50%,0,2,France,1,3,1,1,0
Arsenal,Alexis Sanchez,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0
Arsenal,Laurent Koscielny,31,CB,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0
Arsenal,Mesut Ozil,28,AM,1,50.0,4395,9.5,5.60%,167,2,Germany,0,4,1,1,0


In [97]:
#3 sort the df by club ascending and market value descending
players.sort_values(by=['club','market_value'], ascending=[True,False])

Unnamed: 0_level_0,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
club,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Arsenal,Alexis Sanchez,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0
Arsenal,Mesut Ozil,28,AM,1,50.0,4395,9.5,5.60%,167,2,Germany,0,4,1,1,0
Arsenal,Alexandre Lacazette,26,CF,1,40.0,1183,10.5,26.50%,0,2,France,1,3,1,1,0
Arsenal,Granit Xhaka,24,DM,2,35.0,1815,5.5,2.00%,85,2,Switzerland,0,2,1,1,0
Arsenal,Granit Xhaka,24,DM,2,35.0,1815,5.5,2.00%,85,2,Switzerland,0,2,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
West+Ham,Edimilson Fernandes,21,CM,2,5.0,288,4.5,0.40%,38,2,Switzerland,0,1,20,0,1
West+Ham,Sam Byram,23,RB,3,4.5,198,4.5,0.30%,29,1,England,0,2,20,0,0
West+Ham,Darren Randolph,30,GK,4,2.5,459,4.5,0.40%,69,2,Ireland,0,4,20,0,0
West+Ham,James Collins,33,CB,3,2.0,187,4.5,0.90%,69,2,Wales,0,5,20,0,0


### Identifying duplicates

In [101]:
players.reset_index(inplace=True)
players.duplicated()

0      False
1      False
2      False
3      False
4      False
       ...  
460    False
461    False
462    False
463    False
464    False
Length: 465, dtype: bool

In [102]:
players[players.duplicated()]

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
14,Arsenal,Granit Xhaka,24,DM,2,35.0,1815,5.5,2.00%,85,2,Switzerland,0,2,1,1,0
17,Arsenal,Alex Oxlade-Chamberlain,23,RM,2,22.0,1519,6.0,1.80%,83,1,England,0,2,1,1,0
24,Arsenal,Alex Oxlade-Chamberlain,23,RM,2,22.0,1519,6.0,1.80%,83,1,England,0,2,1,1,0


In [104]:
# defining duplicates based on specific columns
players[players.duplicated(subset=['club','age','position','market_value'])]

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
14,Arsenal,Granit Xhaka,24,DM,2,35.0,1815,5.5,2.00%,85,2,Switzerland,0,2,1,1,0
17,Arsenal,Alex Oxlade-Chamberlain,23,RM,2,22.0,1519,6.0,1.80%,83,1,England,0,2,1,1,0
24,Arsenal,Alex Oxlade-Chamberlain,23,RM,2,22.0,1519,6.0,1.80%,83,1,England,0,2,1,1,0
65,Brighton+and+Hove,Shane Duffy,25,CB,3,5.0,243,4.5,0.60%,0,2,Ireland,0,3,3,0,0
254,Manchester+City,Fernandinho,32,DM,2,18.0,595,5.0,0.80%,78,3,Brazil,0,5,11,1,0
266,Manchester+United,Marcos Rojo,27,CB,3,18.0,1063,5.5,0.10%,77,3,Argentina,0,3,12,1,0
301,Newcastle+United,Lascelles,27,CB,3,5.0,400,4.5,3.60%,0,1,England,0,3,13,0,0


In [108]:
# will take first occurence as original, others as duplicates
players.sort_values(by=['club','name']).head(7)


Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
20,Arsenal,Aaron Ramsey,26,CM,2,35.0,1040,7.0,5.10%,56,1,Wales,0,3,1,1,0
18,Arsenal,Alex Iwobi,21,LW,1,10.0,1812,5.5,1.00%,89,4,Nigeria,0,1,1,1,0
16,Arsenal,Alex Oxlade-Chamberlain,23,RM,2,22.0,1519,6.0,1.80%,83,1,England,0,2,1,1,0
17,Arsenal,Alex Oxlade-Chamberlain,23,RM,2,22.0,1519,6.0,1.80%,83,1,England,0,2,1,1,0
24,Arsenal,Alex Oxlade-Chamberlain,23,RM,2,22.0,1519,6.0,1.80%,83,1,England,0,2,1,1,0
1,Arsenal,Alexandre Lacazette,26,CF,1,40.0,1183,10.5,26.50%,0,2,France,1,3,1,1,0
2,Arsenal,Alexis Sanchez,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0


In [110]:
# to change this use keep param - first/last/false
print(players[players.duplicated(subset=['club','age','position','market_value'],keep='last')])
players[players.duplicated(subset=['club','age','position','market_value'],keep=False)]  # all duplicated will be shown.

                  club                     name  age position  position_cat  \
6              Arsenal             Granit Xhaka   24       DM             2   
16             Arsenal  Alex Oxlade-Chamberlain   23       RM             2   
17             Arsenal  Alex Oxlade-Chamberlain   23       RM             2   
63   Brighton+and+Hove               Lewis Dunk   25       CB             3   
251    Manchester+City                 Fernando   32       DM             2   
265  Manchester+United           Chris Smalling   27       CB             3   
295   Newcastle+United             Ciaran Clark   27       CB             3   

     market_value  page_views  fpl_value fpl_sel  fpl_points  region  \
6            35.0        1815        5.5   2.00%          85       2   
16           22.0        1519        6.0   1.80%          83       1   
17           22.0        1519        6.0   1.80%          83       1   
63            5.0         140        4.5   4.10%           0       1   
251    

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
6,Arsenal,Granit Xhaka,24,DM,2,35.0,1815,5.5,2.00%,85,2,Switzerland,0,2,1,1,0
14,Arsenal,Granit Xhaka,24,DM,2,35.0,1815,5.5,2.00%,85,2,Switzerland,0,2,1,1,0
16,Arsenal,Alex Oxlade-Chamberlain,23,RM,2,22.0,1519,6.0,1.80%,83,1,England,0,2,1,1,0
17,Arsenal,Alex Oxlade-Chamberlain,23,RM,2,22.0,1519,6.0,1.80%,83,1,England,0,2,1,1,0
24,Arsenal,Alex Oxlade-Chamberlain,23,RM,2,22.0,1519,6.0,1.80%,83,1,England,0,2,1,1,0
63,Brighton+and+Hove,Lewis Dunk,25,CB,3,5.0,140,4.5,4.10%,0,1,England,0,3,3,0,0
65,Brighton+and+Hove,Shane Duffy,25,CB,3,5.0,243,4.5,0.60%,0,2,Ireland,0,3,3,0,0
251,Manchester+City,Fernando,32,DM,2,18.0,338,4.5,0.40%,18,3,Brazil,0,5,11,1,0
254,Manchester+City,Fernandinho,32,DM,2,18.0,595,5.0,0.80%,78,3,Brazil,0,5,11,1,0
265,Manchester+United,Chris Smalling,27,CB,3,18.0,834,5.5,1.30%,52,1,England,0,3,12,1,0


### Removing Duplicates

In [111]:
players.market_value.mean()

11.125649350649349

In [114]:
players_u = players.drop_duplicates(keep='first')

In [115]:
players_u.market_value.mean()

11.026252723311545

### Removing specific rows/columns from df

In [119]:
players.drop(labels=17,axis=0)  # or players.drop(index=17/[17,29,30])

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
0,Arsenal,David Ospina,28,GK,4,7.0,544,5.0,0.20%,2,3,Colombia,0,4,1,1,0
1,Arsenal,Alexandre Lacazette,26,CF,1,40.0,1183,10.5,26.50%,0,2,France,1,3,1,1,0
2,Arsenal,Alexis Sanchez,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0
3,Arsenal,Laurent Koscielny,31,CB,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0
4,Arsenal,Mesut Ozil,28,AM,1,50.0,4395,9.5,5.60%,167,2,Germany,0,4,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
460,West+Ham,Mark Noble,30,CM,2,7.0,425,5.5,0.10%,71,1,England,0,4,20,0,0
461,West+Ham,Michail Antonio,27,RW,1,18.0,1142,7.5,0.50%,132,1,England,0,3,20,0,0
462,West+Ham,Robert Snodgrass,29,RW,1,8.0,1210,6.0,6.50%,133,2,Scotland,0,4,20,0,0
463,West+Ham,Ashley Fletcher,21,CF,1,1.0,412,4.5,5.90%,16,1,England,0,1,20,0,1


In [120]:
players.drop(labels=['name','club'],axis=1)

Unnamed: 0,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
0,28,GK,4,7.0,544,5.0,0.20%,2,3,Colombia,0,4,1,1,0
1,26,CF,1,40.0,1183,10.5,26.50%,0,2,France,1,3,1,1,0
2,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0
3,31,CB,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0
4,28,AM,1,50.0,4395,9.5,5.60%,167,2,Germany,0,4,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
460,30,CM,2,7.0,425,5.5,0.10%,71,1,England,0,4,20,0,0
461,27,RW,1,18.0,1142,7.5,0.50%,132,1,England,0,3,20,0,0
462,29,RW,1,8.0,1210,6.0,6.50%,133,2,Scotland,0,4,20,0,0
463,21,CF,1,1.0,412,4.5,5.90%,16,1,England,0,1,20,0,1


### pop()

In [122]:
# pop removes single column at a time  - players.pop('age')
# it returns removed column as a series
# pop modifies existing df( inplace change )

In [126]:
players_u.pop('age')

0      28
1      26
2      28
3      31
4      28
       ..
460    30
461    27
462    29
463    21
464    27
Name: age, Length: 462, dtype: int64

In [127]:
players_u.head()

Unnamed: 0,name,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
0,David Ospina,GK,4,7.0,544,5.0,0.20%,2,3,Colombia,0,4,1,1,0
1,Alexandre Lacazette,CF,1,40.0,1183,10.5,26.50%,0,2,France,1,3,1,1,0
2,Alexis Sanchez,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0
3,Laurent Koscielny,CB,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0
4,Mesut Ozil,AM,1,50.0,4395,9.5,5.60%,167,2,Germany,0,4,1,1,0


### reindex() to remove unwanted rows/columns

In [130]:
unwanted_rows = [2,4,6,8]
unwanted_columns = ['position','region','age_cat']
players.reindex(index=set(players.index).difference(unwanted_rows),
               columns=set(players.columns).difference(unwanted_columns))

Unnamed: 0,nationality,page_views,club_id,big_club,position_cat,new_signing,new_foreign,fpl_sel,club,age,fpl_points,market_value,name,fpl_value
0,Colombia,544,1,1,4,0,0,0.20%,Arsenal,28,2,7.0,David Ospina,5.0
1,France,1183,1,1,1,0,1,26.50%,Arsenal,26,0,40.0,Alexandre Lacazette,10.5
3,France,912,1,1,3,0,0,0.70%,Arsenal,31,121,22.0,Laurent Koscielny,6.0
5,Spain,943,1,1,2,0,0,0.10%,Arsenal,32,38,12.0,Santi Cazorla,7.0
7,Spain,2055,1,1,1,1,0,0.10%,Arsenal,28,20,15.0,Lucas Perez,7.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
460,England,425,20,0,2,0,0,0.10%,West+Ham,30,71,7.0,Mark Noble,5.5
461,England,1142,20,0,1,0,0,0.50%,West+Ham,27,132,18.0,Michail Antonio,7.5
462,Scotland,1210,20,0,1,0,0,6.50%,West+Ham,29,133,8.0,Robert Snodgrass,6.0
463,England,412,20,0,1,1,0,5.90%,West+Ham,21,16,1.0,Ashley Fletcher,4.5


### Null values in DataFrame

In [131]:
players.isna()

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
460,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
461,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
462,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
463,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [132]:
import numpy as np
np.count_nonzero(players.isna())

4

In [133]:
players[players.isna()]

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
0,,,,,,,,,,,,,,,,,
1,,,,,,,,,,,,,,,,,
2,,,,,,,,,,,,,,,,,
3,,,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
460,,,,,,,,,,,,,,,,,
461,,,,,,,,,,,,,,,,,
462,,,,,,,,,,,,,,,,,
463,,,,,,,,,,,,,,,,,


In [137]:
players[players.isna().values]

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
9,Arsenal,Granit Xhaka,24,,2,,1815,5.5,2.00%,85,2,Switzerland,0,2,1,1,0
9,Arsenal,Granit Xhaka,24,,2,,1815,5.5,2.00%,85,2,Switzerland,0,2,1,1,0
190,Huddersfield,Steve Mounie,22,CF,1,,56,6.0,0.60%,0,2,Benin,0,2,8,0,0
194,Leicester+City,Kasper Schmeichel,30,GK,4,,1601,5.0,2.40%,109,2,Denmark,0,4,9,0,0


### dropping and filling NaNs

In [135]:
players.fillna('some meaningfull values').loc[[9,190,194]]

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
9,Arsenal,Granit Xhaka,24,some meaningfull values,2,some meaningfull values,1815,5.5,2.00%,85,2,Switzerland,0,2,1,1,0
190,Huddersfield,Steve Mounie,22,CF,1,some meaningfull values,56,6.0,0.60%,0,2,Benin,0,2,8,0,0
194,Leicester+City,Kasper Schmeichel,30,GK,4,some meaningfull values,1601,5.0,2.40%,109,2,Denmark,0,4,9,0,0


In [139]:
players.fillna({'position': 'GK', 'market_value': 30}).loc[[9,190,194]]  # you can use mean values

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
9,Arsenal,Granit Xhaka,24,GK,2,30.0,1815,5.5,2.00%,85,2,Switzerland,0,2,1,1,0
190,Huddersfield,Steve Mounie,22,CF,1,30.0,56,6.0,0.60%,0,2,Benin,0,2,8,0,0
194,Leicester+City,Kasper Schmeichel,30,GK,4,30.0,1601,5.0,2.40%,109,2,Denmark,0,4,9,0,0


In [147]:
players.dropna().loc[[9,190,194]]


KeyError: "None of [Int64Index([9, 190, 194], dtype='int64')] are in the [index]"

In [148]:
players.dropna(axis=1).loc[[9,194,190]]

Unnamed: 0,club,name,age,position_cat,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
9,Arsenal,Granit Xhaka,24,2,1815,5.5,2.00%,85,2,Switzerland,0,2,1,1,0
194,Leicester+City,Kasper Schmeichel,30,4,1601,5.0,2.40%,109,2,Denmark,0,4,9,0,0
190,Huddersfield,Steve Mounie,22,1,56,6.0,0.60%,0,2,Benin,0,2,8,0,0


### Methos and axes with fillna()

In [149]:
players[players.isna().values]

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
9,Arsenal,Granit Xhaka,24,,2,,1815,5.5,2.00%,85,2,Switzerland,0,2,1,1,0
9,Arsenal,Granit Xhaka,24,,2,,1815,5.5,2.00%,85,2,Switzerland,0,2,1,1,0
190,Huddersfield,Steve Mounie,22,CF,1,,56,6.0,0.60%,0,2,Benin,0,2,8,0,0
194,Leicester+City,Kasper Schmeichel,30,GK,4,,1601,5.0,2.40%,109,2,Denmark,0,4,9,0,0


In [156]:
players.fillna(method='ffill').loc[[8,9,189,190,193,194]] # ffill/pad = forward fill, bfill/backfill = back fill

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
8,Arsenal,Kieran Gibbs,27,LB,3,10.0,489,5.0,0.50%,45,1,England,0,3,1,1,0
9,Arsenal,Granit Xhaka,24,LB,2,10.0,1815,5.5,2.00%,85,2,Switzerland,0,2,1,1,0
189,Huddersfield,Chris Lowe,28,LB,3,1.5,84,4.5,0.70%,0,2,Germany,0,4,8,0,0
190,Huddersfield,Steve Mounie,22,CF,1,1.5,56,6.0,0.60%,0,2,Benin,0,2,8,0,0
193,Leicester+City,Riyad Mahrez,26,RW,1,30.0,1753,8.5,1.70%,120,4,Algeria,0,3,9,0,0
194,Leicester+City,Kasper Schmeichel,30,GK,4,30.0,1601,5.0,2.40%,109,2,Denmark,0,4,9,0,0


In [155]:
players.fillna(method='ffill', axis=1).loc[[9,190,194]]  # axis=0 by default

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
9,Arsenal,Granit Xhaka,24,24,2,2,1815,5.5,2.00%,85,2,Switzerland,0,2,1,1,0
190,Huddersfield,Steve Mounie,22,CF,1,1,56,6.0,0.60%,0,2,Benin,0,2,8,0,0
194,Leicester+City,Kasper Schmeichel,30,GK,4,4,1601,5.0,2.40%,109,2,Denmark,0,4,9,0,0


In [165]:
# challenge

#1 remove row 2,10,21 and market_value column. don't modify existing one. create df2
df2 = players.drop(index=[2,10,21],columns='market_value')
df2.head()

#2 nationality column contains any na? how many unique nationalities it contains
print(df2.nationality.unique().size)
df2.nationality[df2.nationality.isna()]

#3 isolate players with unique combinations of age-position for each club. don't include club itself
df2.drop_duplicates(subset=['club','age','position'],keep='first').loc[:,['age','position']]

61


Unnamed: 0,age,position
0,28,GK
1,26,CF
3,31,CB
4,28,AM
5,32,CM
...,...,...
459,24,AM
460,30,CM
462,29,RW
463,21,CF


### Calculatingg aggregates with agg()

In [181]:
players.agg('mean', numeric_only=True, axis=0)   # can do min, max, sum, 

age              26.776344
position_cat      2.178495
market_value     11.125649
page_views      771.546237
fpl_value         5.450538
fpl_points       57.544086
region            1.989247
new_foreign       0.034409
age_cat           3.195699
club_id          10.253763
big_club          0.309677
new_signing       0.144086
dtype: float64

In [168]:
players.big_club.mean()

0.3096774193548387

In [172]:
players.select_dtypes(np.number).agg('min')

age             17.00
position_cat     1.00
market_value     0.05
page_views       3.00
fpl_value        4.00
fpl_points       0.00
region           1.00
new_foreign      0.00
age_cat          1.00
club_id          1.00
big_club         0.00
new_signing      0.00
dtype: float64

In [182]:
players.select_dtypes(np.number).aggregate(['min','max','mean'])  # agg = aggregate

Unnamed: 0,age,position_cat,market_value,page_views,fpl_value,fpl_points,region,new_foreign,age_cat,club_id,big_club,new_signing
min,17.0,1.0,0.05,3.0,4.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0
max,38.0,4.0,75.0,7664.0,12.5,264.0,4.0,1.0,6.0,20.0,1.0,1.0
mean,26.776344,2.178495,11.125649,771.546237,5.450538,57.544086,1.989247,0.034409,3.195699,10.253763,0.309677,0.144086


In [183]:
players.agg({'age': ['min', 'max'], 'market_value': 'mean'})

Unnamed: 0,age,market_value
min,17.0,
max,38.0,
mean,,11.125649


### Same-Shape Transforms: 

In [175]:
players.head(3)

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
0,Arsenal,David Ospina,28,GK,4,7.0,544,5.0,0.20%,2,3,Colombia,0,4,1,1,0
1,Arsenal,Alexandre Lacazette,26,CF,1,40.0,1183,10.5,26.50%,0,2,France,1,3,1,1,0
2,Arsenal,Alexis Sanchez,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0


In [177]:
# usdeur = 0.91
players.loc[:, ['market_value', 'fpl_value']].transform(lambda x: x*0.91)

Unnamed: 0,market_value,fpl_value
0,6.37,4.550
1,36.40,9.555
2,59.15,10.920
3,20.02,5.460
4,45.50,8.645
...,...,...
460,6.37,5.005
461,16.38,6.825
462,7.28,5.460
463,0.91,4.095


In [178]:
players.loc[:, ['market_value', 'fpl_value']] * 0.91

Unnamed: 0,market_value,fpl_value
0,6.37,4.550
1,36.40,9.555
2,59.15,10.920
3,20.02,5.460
4,45.50,8.645
...,...,...
460,6.37,5.005
461,16.38,6.825
462,7.28,5.460
463,0.91,4.095


In [179]:
# to use string methods on series values
players.name.str.upper()

0             DAVID OSPINA
1      ALEXANDRE LACAZETTE
2           ALEXIS SANCHEZ
3        LAURENT KOSCIELNY
4               MESUT OZIL
              ...         
460             MARK NOBLE
461        MICHAIL ANTONIO
462       ROBERT SNODGRASS
463        ASHLEY FLETCHER
464        AARON CRESSWELL
Name: name, Length: 465, dtype: object

In [180]:
# Useful when we want to do specific things at run time.
import random
def random_case(x):
    funcs = [x.str.upper, x.str.lower, x.str.title, x.str.swapcase]
    return random.choice(funcs)()
for i in range(2):
    print(players.select_dtypes(include=object).transform(random_case).head())

      club                 name position fpl_sel nationality
0  aRSENAL         DAVID OSPINA       GK   0.20%    cOLOMBIA
1  aRSENAL  ALEXANDRE LACAZETTE       CF  26.50%      fRANCE
2  aRSENAL       ALEXIS SANCHEZ       LW  17.10%       cHILE
3  aRSENAL    LAURENT KOSCIELNY       CB   0.70%      fRANCE
4  aRSENAL           MESUT OZIL       AM   5.60%     gERMANY
      club                 name position fpl_sel nationality
0  aRSENAL         david ospina       Gk   0.20%    colombia
1  aRSENAL  alexandre lacazette       Cf  26.50%      france
2  aRSENAL       alexis sanchez       Lw  17.10%       chile
3  aRSENAL    laurent koscielny       Cb   0.70%      france
4  aRSENAL           mesut ozil       Am   5.60%     germany


In [190]:
players.select_dtypes(np.number).transform([np.exp, np.sqrt])

Unnamed: 0_level_0,age,age,position_cat,position_cat,market_value,market_value,page_views,page_views,fpl_value,fpl_value,...,new_foreign,new_foreign,age_cat,age_cat,club_id,club_id,big_club,big_club,new_signing,new_signing
Unnamed: 0_level_1,exp,sqrt,exp,sqrt,exp,sqrt,exp,sqrt,exp,sqrt,...,exp,sqrt,exp,sqrt,exp,sqrt,exp,sqrt,exp,sqrt
0,1.446257e+12,5.291503,54.598150,2.000000,1.096633e+03,2.645751,1.803841e+236,23.323808,148.413159,2.236068,...,1.000000,0.0,54.598150,2.000000,2.718282e+00,1.000000,2.718282,1.0,1.000000,0.0
1,1.957296e+11,5.099020,2.718282,1.000000,2.353853e+17,6.324555,inf,34.394767,36315.502674,3.240370,...,2.718282,1.0,20.085537,1.732051,2.718282e+00,1.000000,2.718282,1.0,1.000000,0.0
2,1.446257e+12,5.291503,2.718282,1.000000,1.694889e+28,8.062258,inf,65.795137,162754.791419,3.464102,...,1.000000,0.0,54.598150,2.000000,2.718282e+00,1.000000,2.718282,1.0,1.000000,0.0
3,2.904885e+13,5.567764,20.085537,1.732051,3.584913e+09,4.690416,inf,30.199338,403.428793,2.449490,...,1.000000,0.0,54.598150,2.000000,2.718282e+00,1.000000,2.718282,1.0,1.000000,0.0
4,1.446257e+12,5.291503,2.718282,1.000000,5.184706e+21,7.071068,inf,66.294796,13359.726830,3.082207,...,1.000000,0.0,54.598150,2.000000,2.718282e+00,1.000000,2.718282,1.0,1.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
460,1.068647e+13,5.477226,7.389056,1.414214,1.096633e+03,2.645751,3.759714e+184,20.615528,244.691932,2.345208,...,1.000000,0.0,54.598150,2.000000,4.851652e+08,4.472136,1.000000,0.0,1.000000,0.0
461,5.320482e+11,5.196152,2.718282,1.000000,6.565997e+07,4.242641,inf,33.793490,1808.042414,2.738613,...,1.000000,0.0,20.085537,1.732051,4.851652e+08,4.472136,1.000000,0.0,1.000000,0.0
462,3.931334e+12,5.385165,2.718282,1.000000,2.980958e+03,2.828427,inf,34.785054,403.428793,2.449490,...,1.000000,0.0,54.598150,2.000000,4.851652e+08,4.472136,1.000000,0.0,1.000000,0.0
463,1.318816e+09,4.582576,2.718282,1.000000,2.718282e+00,1.000000,8.498192e+178,20.297783,90.017131,2.121320,...,1.000000,0.0,2.718282,1.000000,4.851652e+08,4.472136,1.000000,0.0,2.718282,1.0


### More Flexibility with apply()
* apply() can act as agg or transform

In [194]:
def round_floats(x):
    if x.dtype == float:
        return round(x)
    return x
players.apply(round_floats).head()

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
0,Arsenal,David Ospina,28,GK,4,7.0,544,5.0,0.20%,2,3,Colombia,0,4,1,1,0
1,Arsenal,Alexandre Lacazette,26,CF,1,40.0,1183,10.0,26.50%,0,2,France,1,3,1,1,0
2,Arsenal,Alexis Sanchez,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0
3,Arsenal,Laurent Koscielny,31,CB,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0
4,Arsenal,Mesut Ozil,28,AM,1,50.0,4395,10.0,5.60%,167,2,Germany,0,4,1,1,0


In [196]:
players.transform(round_floats).head()

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing
0,Arsenal,David Ospina,28,GK,4,7.0,544,5.0,0.20%,2,3,Colombia,0,4,1,1,0
1,Arsenal,Alexandre Lacazette,26,CF,1,40.0,1183,10.0,26.50%,0,2,France,1,3,1,1,0
2,Arsenal,Alexis Sanchez,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0
3,Arsenal,Laurent Koscielny,31,CB,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0
4,Arsenal,Mesut Ozil,28,AM,1,50.0,4395,10.0,5.60%,167,2,Germany,0,4,1,1,0


In [199]:
players.select_dtypes(np.number).apply('mean', axis=0)

age              26.776344
position_cat      2.178495
market_value     11.125649
page_views      771.546237
fpl_value         5.450538
fpl_points       57.544086
region            1.989247
new_foreign       0.034409
age_cat           3.195699
club_id          10.253763
big_club          0.309677
new_signing       0.144086
dtype: float64

In [221]:
players.select_dtypes(np.number).apply('mean', axis=1, result_type='expand')

0       49.916667
1      105.708333
2      392.333333
3       91.916667
4      388.208333
          ...    
460     47.125000
461    112.625000
462    117.750000
463     39.875000
464     42.583333
Length: 465, dtype: float64

In [204]:
players.loc[460,[dtype != object for dtype in players.dtypes]].mean()

47.125

In [216]:
players.select_dtypes(np.number).apply(['mean', min], args=(), raw=False, by_row='compat', axis = 0, result_type=None)
# raw=True will pass each row/column as nd.array | False will pass each row/column as series
# by_row = 'compat'  / False - only works when more than one function as list/dict is passed
# result_type = None/'expand'/'reduce'/'broadcast' - only affects when axis=1

Unnamed: 0,age,position_cat,market_value,page_views,fpl_value,fpl_points,region,new_foreign,age_cat,club_id,big_club,new_signing
mean,26.776344,2.178495,11.125649,771.546237,5.450538,57.544086,1.989247,0.034409,3.195699,10.253763,0.309677,0.144086
min,17.0,1.0,0.05,3.0,4.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0


### Elementwise operations with applymap()

In [222]:
# vectorized ops: agg(), transform(), apply() = gives performance gain by applying function to row/column all at once.
# This is possible by SIMD(Single instruction multiple data processors)
# non-vectorized: applymap()

In [225]:
inflation = 1.02
mini_df = players.loc[:, ['market_value','fpl_value']]
mini_df.head()

Unnamed: 0,market_value,fpl_value
0,7.0,5.0
1,40.0,10.5
2,65.0,12.0
3,22.0,6.0
4,50.0,9.5


In [227]:
(mini_df*inflation).head()

Unnamed: 0,market_value,fpl_value
0,7.14,5.1
1,40.8,10.71
2,66.3,12.24
3,22.44,6.12
4,51.0,9.69


In [230]:
from datetime import datetime
counter = 0

def log_and_transform(x):
    global counter
    counter +=1
    if counter%100==0:
        print(f'It is {datetime.now()} and I just transformed {counter}th value.')
    return x*inflation
              

mini_df.apply(log_and_transform)

Unnamed: 0,market_value,fpl_value
0,7.14,5.10
1,40.80,10.71
2,66.30,12.24
3,22.44,6.12
4,51.00,9.69
...,...,...
460,7.14,5.61
461,18.36,7.65
462,8.16,6.12
463,1.02,4.59


In [231]:
mini_df.applymap(log_and_transform)

It is 2024-08-08 03:14:33.419884 and I just transformed 100th value.
It is 2024-08-08 03:14:33.421324 and I just transformed 200th value.
It is 2024-08-08 03:14:33.421324 and I just transformed 300th value.
It is 2024-08-08 03:14:33.421324 and I just transformed 400th value.
It is 2024-08-08 03:14:33.422391 and I just transformed 500th value.
It is 2024-08-08 03:14:33.422391 and I just transformed 600th value.
It is 2024-08-08 03:14:33.422391 and I just transformed 700th value.
It is 2024-08-08 03:14:33.422391 and I just transformed 800th value.
It is 2024-08-08 03:14:33.422391 and I just transformed 900th value.


Unnamed: 0,market_value,fpl_value
0,7.14,5.10
1,40.80,10.71
2,66.30,12.24
3,22.44,6.12
4,51.00,9.69
...,...,...
460,7.14,5.61
461,18.36,7.65
462,8.16,6.12
463,1.02,4.59


In [239]:
# Challenge
#1: create a standalone function to categorize a popularity:
def categorize_popularity(x):
    if x <220:
        return 'relatively unknown'
    elif x<600:
        return 'kind of popular'
    elif x< 2000:
        return 'popular'
    else:
        return 'super-popular'

In [240]:
# apply a function to players page views column
players.page_views.apply(categorize_popularity)

0      kind of popular
1              popular
2        super-popular
3              popular
4        super-popular
            ...       
460    kind of popular
461            popular
462            popular
463    kind of popular
464    kind of popular
Name: page_views, Length: 465, dtype: object

In [241]:
# Add a column to players 'popularity' with result from 2nd step
players['popularity'] = players.page_views.apply(categorize_popularity)

In [242]:
players.head()

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing,popularity
0,Arsenal,David Ospina,28,GK,4,7.0,544,5.0,0.20%,2,3,Colombia,0,4,1,1,0,kind of popular
1,Arsenal,Alexandre Lacazette,26,CF,1,40.0,1183,10.5,26.50%,0,2,France,1,3,1,1,0,popular
2,Arsenal,Alexis Sanchez,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0,super-popular
3,Arsenal,Laurent Koscielny,31,CB,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0,popular
4,Arsenal,Mesut Ozil,28,AM,1,50.0,4395,9.5,5.60%,167,2,Germany,0,4,1,1,0,super-popular


In [245]:
#4 how many super-popular players? 
players.popularity.isin(['super-popular']).sum()

37

### Setting dataframe values

In [250]:
players.head()

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing,popularity
0,Arsenal,David Ospina,28,GK,4,7.0,544,5.0,0.20%,2,3,Colombia,0,4,1,1,0,kind of popular
1,Arsenal,Alexandre Lacazette,26,CF,1,40.0,1183,10.5,26.50%,0,2,France,1,3,1,1,0,popular
2,Arsenal,Alexis Sanchez,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0,super-popular
3,Arsenal,Laurent Koscielny,31,LW,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0,popular
4,Arsenal,Mesut Ozil,28,AM,1,50.0,4395,9.5,5.60%,167,2,Germany,0,4,1,1,0,super-popular


In [260]:
%%timeit
players.loc[3,'position'] = 'GK'
# players.head()

72.8 µs ± 7.8 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [262]:
# at() and iat() should be prefered for single value indexing
players.iloc[3,3] = 'LW'
players.at[3,'position'] = 'GK'
players.iat[3,3] = 'LW'
players.head()

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing,popularity
0,Arsenal,David Ospina,28,GK,4,7.0,544,5.0,0.20%,2,3,Colombia,0,4,1,1,0,kind of popular
1,Arsenal,Alexandre Lacazette,26,CF,1,40.0,1183,10.5,26.50%,0,2,France,1,3,1,1,0,popular
2,Arsenal,Alexis Sanchez,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0,super-popular
3,Arsenal,Laurent Koscielny,31,LW,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0,popular
4,Arsenal,Mesut Ozil,28,AM,1,50.0,4395,9.5,5.60%,167,2,Germany,0,4,1,1,0,super-popular


In [263]:
%%timeit
players.at[3,'position'] = 'GK'
# players.head()

7.89 µs ± 611 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


### SettingWithCopy Warning

In [266]:
players['position'][3] = 'LW'
# pandas doesn't garrantee that it replaces original df or just changes copy of df.

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  players['position'][3] = 'LW'


In [267]:
players.head()

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing,popularity
0,Arsenal,David Ospina,28,GK,4,7.0,544,5.0,0.20%,2,3,Colombia,0,4,1,1,0,kind of popular
1,Arsenal,Alexandre Lacazette,26,CF,1,40.0,1183,10.5,26.50%,0,2,France,1,3,1,1,0,popular
2,Arsenal,Alexis Sanchez,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0,super-popular
3,Arsenal,Laurent Koscielny,31,LW,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0,popular
4,Arsenal,Mesut Ozil,28,AM,1,50.0,4395,9.5,5.60%,167,2,Germany,0,4,1,1,0,super-popular


In [269]:
pd.options.mode.chained_assignment = 'warn'  # default

In [270]:
# None => if you want to turn off SettingWithCopyWarning

### View vs Copy

In [271]:
# copy = a copy of the data
# view = a window into the data

In [272]:
# 2 point rule:
# 1. pandas loves to give us copies, but
# 2. if we use loc/iloc/at/iat, we are guaranteed to get a view

In [275]:
players.loc[:3,'position']= ['CM', 'CM', 'CM', 'CM']
players.head()
# we don't get warning as we are using loc

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing,popularity
0,Arsenal,David Ospina,28,CM,4,7.0,544,5.0,0.20%,2,3,Colombia,0,4,1,1,0,kind of popular
1,Arsenal,Alexandre Lacazette,26,CM,1,40.0,1183,10.5,26.50%,0,2,France,1,3,1,1,0,popular
2,Arsenal,Alexis Sanchez,28,CM,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0,super-popular
3,Arsenal,Laurent Koscielny,31,CM,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0,popular
4,Arsenal,Mesut Ozil,28,AM,1,50.0,4395,9.5,5.60%,167,2,Germany,0,4,1,1,0,super-popular


In [276]:
players['position'].loc[:3] = ['GK', 'CF', 'LW', 'CM']  
# we get warning because before loc, we don't know if we are working with copy/view

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  players['position'].loc[:3] = ['GK', 'CF', 'LW', 'CM']


In [277]:
players.head()

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing,popularity
0,Arsenal,David Ospina,28,GK,4,7.0,544,5.0,0.20%,2,3,Colombia,0,4,1,1,0,kind of popular
1,Arsenal,Alexandre Lacazette,26,CF,1,40.0,1183,10.5,26.50%,0,2,France,1,3,1,1,0,popular
2,Arsenal,Alexis Sanchez,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0,super-popular
3,Arsenal,Laurent Koscielny,31,CM,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0,popular
4,Arsenal,Mesut Ozil,28,AM,1,50.0,4395,9.5,5.60%,167,2,Germany,0,4,1,1,0,super-popular


### Adding dataframe columns

In [278]:
players.popularity

0      kind of popular
1              popular
2        super-popular
3              popular
4        super-popular
            ...       
460    kind of popular
461            popular
462            popular
463    kind of popular
464    kind of popular
Name: popularity, Length: 465, dtype: object

In [279]:
'MVtoFPL' in players, 'name' in players

(False, True)

In [280]:
players['MVtoFPL'] = 1.0

In [281]:
players.head()

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing,popularity,MVtoFPL
0,Arsenal,David Ospina,28,GK,4,7.0,544,5.0,0.20%,2,3,Colombia,0,4,1,1,0,kind of popular,1.0
1,Arsenal,Alexandre Lacazette,26,CF,1,40.0,1183,10.5,26.50%,0,2,France,1,3,1,1,0,popular,1.0
2,Arsenal,Alexis Sanchez,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0,super-popular,1.0
3,Arsenal,Laurent Koscielny,31,CM,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0,popular,1.0
4,Arsenal,Mesut Ozil,28,AM,1,50.0,4395,9.5,5.60%,167,2,Germany,0,4,1,1,0,super-popular,1.0


In [282]:
players['MVtoFPL'] = players['market_value']/players['fpl_value']

In [283]:
players.head()

Unnamed: 0,club,name,age,position,position_cat,market_value,page_views,fpl_value,fpl_sel,fpl_points,region,nationality,new_foreign,age_cat,club_id,big_club,new_signing,popularity,MVtoFPL
0,Arsenal,David Ospina,28,GK,4,7.0,544,5.0,0.20%,2,3,Colombia,0,4,1,1,0,kind of popular,1.4
1,Arsenal,Alexandre Lacazette,26,CF,1,40.0,1183,10.5,26.50%,0,2,France,1,3,1,1,0,popular,3.809524
2,Arsenal,Alexis Sanchez,28,LW,1,65.0,4329,12.0,17.10%,264,3,Chile,0,4,1,1,0,super-popular,5.416667
3,Arsenal,Laurent Koscielny,31,CM,3,22.0,912,6.0,0.70%,121,2,France,0,4,1,1,0,popular,3.666667
4,Arsenal,Mesut Ozil,28,AM,1,50.0,4395,9.5,5.60%,167,2,Germany,0,4,1,1,0,super-popular,5.263158


In [284]:
df_mini = players.iloc[:4,1:5]

In [285]:
df_mini

Unnamed: 0,name,age,position,position_cat
0,David Ospina,28,GK,4
1,Alexandre Lacazette,26,CF,1
2,Alexis Sanchez,28,LW,1
3,Laurent Koscielny,31,CM,3


In [286]:
# insert()
names = pd.Series(['David Ospina', 'Alex', 'Laurent', 'Rony'])
df_mini.insert(0,'nickname', names)

In [287]:
df_mini

Unnamed: 0,nickname,name,age,position,position_cat
0,David Ospina,David Ospina,28,GK,4
1,Alex,Alexandre Lacazette,26,CF,1
2,Laurent,Alexis Sanchez,28,LW,1
3,Rony,Laurent Koscielny,31,CM,3


In [289]:
# assign() = columns passed as keyword args, returns new copy
df_mini.assign(career_goals=[34,54,22,31], nationality=['American','England', 'China', 'Canada'])

Unnamed: 0,nickname,name,age,position,position_cat,career_goals,nationality
0,David Ospina,David Ospina,28,GK,4,34,American
1,Alex,Alexandre Lacazette,26,CF,1,54,England
2,Laurent,Alexis Sanchez,28,LW,1,22,China
3,Rony,Laurent Koscielny,31,CM,3,31,Canada


### Adding rows to dataframe

In [290]:
df_mini

Unnamed: 0,nickname,name,age,position,position_cat
0,David Ospina,David Ospina,28,GK,4
1,Alex,Alexandre Lacazette,26,CF,1
2,Laurent,Alexis Sanchez,28,LW,1
3,Rony,Laurent Koscielny,31,CM,3


In [303]:
cristiano = pd.Series({
    'nickname': 'Christiano',
    'name': 'Christiano Ronaldo',
    'age': 32,
    'position': 'LW',
    'position_cat': 3
},name=4)
df_mini.append(cristiano)
# df_mini.append([series1, series2, series3])

  df_mini.append(cristiano)


Unnamed: 0,nickname,name,age,position,position_cat
0,David Ospina,David Ospina,28,GK,4
1,Alex,Alexandre Lacazette,26,CF,1
2,Laurent,Alexis Sanchez,28,LW,1
3,Rony,Laurent Koscielny,31,CM,3
4,Christiano,Christiano Ronaldo,32,LW,3


In [304]:
other_players = pd.DataFrame({
    'nickname': ['Alex two', 'Rony2'],
    'name': ['Alex2', 'Rony2'],
    'age': [23,24],
    'position': ['GK','LW'],
    'position_cat': [3,5]
}, index=[5,6])
df_mini.append(other_players)

  df_mini.append(other_players)


Unnamed: 0,nickname,name,age,position,position_cat
0,David Ospina,David Ospina,28,GK,4
1,Alex,Alexandre Lacazette,26,CF,1
2,Laurent,Alexis Sanchez,28,LW,1
3,Rony,Laurent Koscielny,31,CM,3
5,Alex two,Alex2,23,GK,3
6,Rony2,Rony2,24,LW,5


In [306]:
# setting with enlargement
df_mini.loc[9]= 'some_vlaue'
df_mini

Unnamed: 0,nickname,name,age,position,position_cat
0,David Ospina,David Ospina,28,GK,4
1,Alex,Alexandre Lacazette,26,CF,1
2,Laurent,Alexis Sanchez,28,LW,1
3,Rony,Laurent Koscielny,31,CM,3
9,some_vlaue,some_vlaue,some_vlaue,some_vlaue,some_vlaue


In [307]:
# adding row to dataframe is inefficient(very expensive operation)

In [309]:
pd.concat([df_mini,other_players], axis=0)

Unnamed: 0,nickname,name,age,position,position_cat
0,David Ospina,David Ospina,28,GK,4
1,Alex,Alexandre Lacazette,26,CF,1
2,Laurent,Alexis Sanchez,28,LW,1
3,Rony,Laurent Koscielny,31,CM,3
9,some_vlaue,some_vlaue,some_vlaue,some_vlaue,some_vlaue
5,Alex two,Alex2,23,GK,3
6,Rony2,Rony2,24,LW,5


In [311]:
pd.concat([df_mini,cristiano.to_frame().T], axis=0)

Unnamed: 0,nickname,name,age,position,position_cat
0,David Ospina,David Ospina,28,GK,4
1,Alex,Alexandre Lacazette,26,CF,1
2,Laurent,Alexis Sanchez,28,LW,1
3,Rony,Laurent Koscielny,31,CM,3
9,some_vlaue,some_vlaue,some_vlaue,some_vlaue,some_vlaue
4,Christiano,Christiano Ronaldo,32,LW,3


In [317]:
# challenge:
# 1 from players, select 4x4 dataframe as df_random
df_random = players.sample(4).sample(4, axis=1)
df_random

Unnamed: 0,region,page_views,position,nationality
283,2,1640,RB,Italy
169,2,56,GK,Denmark
50,1,504,RW,England
271,3,849,GK,Argentina


In [321]:
%%timeit
# 2 extend df by 1) vertically by adding row 2) horizontally by adding column
df_random.loc[300] = [3, 288, 'GK', 'Italy']
df_random

207 µs ± 11.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [324]:
%%timeit
df_random['name'] = ['Rony', 'Alex', 'Tom', 'Gary', 'Monty']
df_random

56.1 µs ± 2.49 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [362]:
df_random.drop('name',axis=1, inplace=True)

In [360]:
df_random.insert(4,'name', ['Rony', 'Alex', 'Tom', 'Gary', 'Monty'])

In [367]:
df_random

Unnamed: 0,region,page_views,position,nationality
283,2,1640,RB,Italy
169,2,56,GK,Denmark
50,1,504,RW,England
271,3,849,GK,Argentina
300,3,288,GK,Italy


In [366]:
%%timeit
pd.concat([df_random, pd.Series({'region':2, 'page_views': 589, 'position': 'LW', 'nationality': 'Inidia'}).to_frame().T], axis=0)

885 µs ± 123 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
