**Sorted Series - Created from Dictonaries**

In [1]:
import numpy as np
import pandas as pd

In [8]:
d = {
    'A': 10,
    'C': 90,
    'D': 66,
    'B': 30
}

temp = pd.Series(d)
print(temp) # The keys get sorted and then the series formed

A    10
C    90
D    66
B    30
dtype: int64


In [14]:
# If you want series follows the same order as you defined then
indx = ['A', 'C', 'D', 'B', 'E']
temp = pd.Series(d, index = indx)

temp # The value which not found in the dict will get consirded as Nan

A    10.0
C    90.0
D    66.0
B    30.0
E     NaN
dtype: float64

**Behavour of Addition operation - same for other binarty operator**

In [2]:
a = {
    'A': 10,
    'B': 30, # Not present in b - Nan
    'C': 100
}
b = {
    'A': 90,
    'C': np.nan, # Due to this overall addition is Nan
    'D': 66, # Not present in a - Nan
}

sr1 = pd.Series(a)
sr2 = pd.Series(b)

sr1 + sr2

A    100.0
B      NaN
C      NaN
D      NaN
dtype: float64

**index attribute**

In [23]:
temp.index.name = 'index of temp' # name of index of series
temp.name = 'name of temp' # Name of series
# These are different than name of columns

**Sorted Dataframe - Created from Dictonaries**

In [29]:
dict = {
    'state': ['Maharashtra', 'Delhi', 'Karnatak'],
    'year': [2013, 2034, 1934],
    'pop': [1.3, 4.2, 9.0]
}

df = pd.DataFrame(dict) # The data given in book is wrong
df
# The order of columns are as same as the order of keys in dictonary

Unnamed: 0,state,year,pop
0,Maharashtra,2013,1.3
1,Delhi,2034,4.2
2,Karnatak,1934,9.0


**Series view**

In [36]:
temp_sr = df['pop']
temp_sr[1] = 100.0
df # Changes are reflected in dictonary as well

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  temp_sr[1] = 100.0


Unnamed: 0,state,year,pop
0,Maharashtra,2013,1.3
1,Delhi,2034,100.0
2,Karnatak,1934,9.0


**Nested List**

In [64]:
lst = [
    [[1,2,3], [2,3,4]],
    [[4,5,6], [5,6,7]],
    [[7,8,9], [8,9,0]]
]
pd.DataFrame(lst)

Unnamed: 0,0,1
0,"[1, 2, 3]","[2, 3, 4]"
1,"[4, 5, 6]","[5, 6, 7]"
2,"[7, 8, 9]","[8, 9, 0]"


**Nested Dictonaries**

In [39]:
dict = {
    'state': {1:'Maharashtra', 4:'Delhi', 3:'Karnatak'},
    'year': {4:2013, 1:2034, 3:1934}
}
df = pd.DataFrame(dict) # Index are not gonna be in sorted order
df.index.name = 'Optimus'
df
# Primary keys are columns and secondary keys are indexes

Unnamed: 0_level_0,state,year
Optimus,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Maharashtra,2034
4,Delhi,2013
3,Karnatak,1934


In [58]:
df.reset_index()
df.iloc[0].values 
# As you notice the dtype of columns are int and string but the dtype of np.array which you accessed will take object (if hetrogenous)

array(['Maharashtra', 2034], dtype=object)

In [62]:
# If the length of list of new indexes are same as df.shape[0] then only the indexes get replaced
df.index = [1, 3, 4]
df

Unnamed: 0,state,year
1,Maharashtra,2034
3,Delhi,2013
4,Karnatak,1934


**Reindexing - This method not only reset the index but also fills the missing values**

In [68]:
df.reindex([1,2,3,4,5], method = 'ffill')

Unnamed: 0,state,year
1,Maharashtra,2034
2,Maharashtra,2034
3,Delhi,2013
4,Karnatak,1934
5,Karnatak,1934


In [78]:
df.reindex(columns=['state', 'year', 'Time'], fill_value=100)

Unnamed: 0,state,year,Time
1,Maharashtra,2034,100
3,Delhi,2013,100
4,Karnatak,1934,100


**drop function - used for dropping both columns and indexes**

In [82]:
df.drop([1, 3], axis=0) # inplace = False you can add

Unnamed: 0,state,year
4,Karnatak,1934


In [84]:
df.drop(['state'], axis=1)

Unnamed: 0,year
1,2034
3,2013
4,1934


**Slicing with labels is inclusive**

In [95]:
df['num'] = [300, 5, 67]
df.loc[::, 'state':'year'] # Inclusive - from state to year

Unnamed: 0,state,year
1,Maharashtra,2034
3,Delhi,2013
4,Karnatak,1934


In [107]:
df.loc[::, 'num'].where(df.loc[::, 'num'] > 10, 'Hero')

1     300
3    Hero
4      67
Name: num, dtype: object

**Arathematic Operation on Dataframes**

In [127]:
temp1 = pd.DataFrame(np.random.randint(0, 10, size=(3,3)))
temp2 = pd.DataFrame(np.random.randint(0, 10, size=(3,4)))

In [135]:
temp1 + temp2 # We cannot handle missing values using operator but

Unnamed: 0,0,1,2,3
0,11,14,8,
1,6,11,4,
2,13,11,8,


In [136]:
addition_arr = temp1.add(temp2, fill_value=100)
addition_arr

Unnamed: 0,0,1,2,3
0,11,14,8,104.0
1,6,11,4,109.0
2,13,11,8,109.0


In [137]:
divn_arr = temp1.rdiv(temp2, fill_value=100)
divn_arr

Unnamed: 0,0,1,2,3
0,0.222222,1.0,0.333333,0.04
1,2.0,0.571429,inf,0.09
2,1.6,0.222222,0.0,0.09


**Apply Method -**
A complete column / row is passed (based on axis) in Dataframe rather than each element like series

In [2]:
# Over series 
sr = pd.Series([1,2,3,4,5])
sr.apply(lambda x: x**2)

0     1
1     4
2     9
3    16
4    25
dtype: int64

In [14]:
# Over Dataframe
temp = [
    [1,2,3],
    [4,5,6],
    # ['A', 'B', 'C'],
]
df = pd.DataFrame(temp)
print(df.apply(lambda x: x ** 2))
df.apply(lambda x: x.max() - x.min(), axis = 1)

    0   1   2
0   1   4   9
1  16  25  36


0    2
1    2
dtype: int64

In [9]:
df.mean()

0    2.5
1    3.5
2    4.5
dtype: float64

In [16]:
df.apply(lambda x : 1/x.mean(), axis = 'index') 
# axis attribute helps when you using aggrigate functions like mean(), std()

0    0.400000
1    0.285714
2    0.222222
dtype: float64

In [4]:
# Returning multiple values
df.apply(lambda x: pd.Series([x.max(), x.min(), x.mean(), x.std()], index=['max', 'min', 'mean', 'std']))

Unnamed: 0,0,1,2
max,4.0,5.0,6.0
min,1.0,2.0,3.0
mean,2.5,3.5,4.5
std,2.12132,2.12132,2.12132


**Rank Method**

In [23]:
pd.Series([0, 10, 3, 2, 20]).rank()

0    1.0
1    4.0
2    3.0
3    2.0
4    5.0
dtype: float64

**fetching the indexes where the max and min were placed**

*argmin / argmax - finds the indexes where the min and max placed*


*idxmin / idxmax - finds the labels where the min and max placed*

In [17]:
df = pd.DataFrame([
    [0, np.nan, 10, -3, 4, 90, np.nan],
    [0, np.nan, 10, -3, np.nan, 90, np.nan],
    [0, np.nan, 10, -3, 4, np.nan, np.nan],
], index=np.array(['a', 'b', 'c']))
df

Unnamed: 0,0,1,2,3,4,5,6
a,0,,10,-3,4.0,90.0,
b,0,,10,-3,,90.0,
c,0,,10,-3,4.0,,


In [18]:
df.idxmax()

  df.idxmax()


0      a
1    NaN
2      a
3      a
4      a
5      a
6    NaN
dtype: object

In [33]:
df.idxmin() # Nan not considered as nither max nor min

  df.idxmin()


0      a
1    NaN
2      a
3      a
4      a
5      a
6    NaN
dtype: object

In [39]:
temp_df = df.dropna(axis=1)
temp_df

Unnamed: 0,0,2,3
a,0,10,-3
b,0,10,-3
c,0,10,-3
