There are 4 data manipulation functions:

* <strong>reindex()</strong>: it is used to reorder the index in the dataframe i.e it changes the row labels and column labels in the dataframe.

* <strong>set_index()</strong>: It is used to set a column into an index.

* <strong>reset_index()</strong>: It is used to reset the index or convert an index into a column.

* <strong>sort_index()</strong>: It is used to sort the index.

In [1]:
import numpy as np
import pandas as pd

#  I. reindex() function

In [2]:
index = ['Firefox','Chrome','Safari','IE10','Konquerror']

df = pd.DataFrame({'http_status':[200,200,404,404,301],
                    'response_time':[0.04,0.02,0.07,0.08,1.0]},index=index)

df

Unnamed: 0,http_status,response_time
Firefox,200,0.04
Chrome,200,0.02
Safari,404,0.07
IE10,404,0.08
Konquerror,301,1.0


In [3]:
new_index = ['Safari','Iceweasel','Comodo Dragon','IE10','Chrome']

df.reindex(new_index)

Unnamed: 0,http_status,response_time
Safari,404.0,0.07
Iceweasel,,
Comodo Dragon,,
IE10,404.0,0.08
Chrome,200.0,0.02


# II. reset_index() function

In [4]:
df = pd.DataFrame([('bird',389.0),
                    ('bird',24.0),
                    ('mammal',80.5),
                    ('mammal',np.nan)],index=['falcon','parrot','lion','monkey'],columns=('class','max_speed'))

df

Unnamed: 0,class,max_speed
falcon,bird,389.0
parrot,bird,24.0
lion,mammal,80.5
monkey,mammal,


In [5]:
#convert index into a column

df.reset_index()

Unnamed: 0,index,class,max_speed
0,falcon,bird,389.0
1,parrot,bird,24.0
2,lion,mammal,80.5
3,monkey,mammal,


In [6]:
#completely remove the index

df.reset_index(drop=True)

Unnamed: 0,class,max_speed
0,bird,389.0
1,bird,24.0
2,mammal,80.5
3,mammal,


#  III. sort_index() function

In [7]:
df = pd.DataFrame({'month':[1,4,7,10],
                   'year':[2012,2014,2013,2014],
                   'Sale':[55,40,84,31]})

df

Unnamed: 0,month,year,Sale
0,1,2012,55
1,4,2014,40
2,7,2013,84
3,10,2014,31


In [8]:
df['Sale'].sort_values()

3    31
1    40
0    55
2    84
Name: Sale, dtype: int64

In [9]:
df['Sale'].sort_index()

0    55
1    40
2    84
3    31
Name: Sale, dtype: int64

# IV. set_index() function 

In [10]:
df = pd.DataFrame({'month':[1,4,7,10],
                   'year':[2012,2014,2013,2014],
                   'Sale':[55,40,84,31]})

df

Unnamed: 0,month,year,Sale
0,1,2012,55
1,4,2014,40
2,7,2013,84
3,10,2014,31


In [11]:
#set month column as index

df.set_index('month')

Unnamed: 0_level_0,year,Sale
month,Unnamed: 1_level_1,Unnamed: 2_level_1
1,2012,55
4,2014,40
7,2013,84
10,2014,31


#  Replace() function 

It is used to replace old values with the new ones.

In [12]:
df = pd.DataFrame({'month':[1,4,7,10],
                   'year':[2012,2014,2013,2014],
                   'Sale':[55,40,84,31]})

df

Unnamed: 0,month,year,Sale
0,1,2012,55
1,4,2014,40
2,7,2013,84
3,10,2014,31


In [13]:
df['month'] = df['month'].replace((7),(6))

In [14]:
df

Unnamed: 0,month,year,Sale
0,1,2012,55
1,4,2014,40
2,6,2013,84
3,10,2014,31


#  droplevel() function

It is used to remove levels in multi level indexing

In [15]:
df = pd.DataFrame([
    [1,2,3,4],
    [5,6,7,8],
    [9,10,11,12]
]).set_index([0,1]).rename_axis(['a','b'])

df

Unnamed: 0_level_0,Unnamed: 1_level_0,2,3
a,b,Unnamed: 2_level_1,Unnamed: 3_level_1
1,2,3,4
5,6,7,8
9,10,11,12


In [16]:
df.droplevel('a')

Unnamed: 0_level_0,2,3
b,Unnamed: 1_level_1,Unnamed: 2_level_1
2,3,4
6,7,8
10,11,12


# Split() function

In [17]:
data = pd.read_csv('bigmart.csv')

In [18]:
data.head()

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Establishment_Year,Outlet_Size,Outlet_Location_Type,Outlet_Type,Item_Outlet_Sales
0,FDA15,9.3,Low Fat,0.016047,Dairy,249.8092,OUT049,1999,Medium,Tier 1,Supermarket Type1,3735.138
1,DRC01,5.92,Regular,0.019278,Soft Drinks,48.2692,OUT018,2009,Medium,Tier 3,Supermarket Type2,443.4228
2,FDN15,17.5,Low Fat,0.01676,Meat,141.618,OUT049,1999,Medium,Tier 1,Supermarket Type1,2097.27
3,FDX07,19.2,Regular,0.0,Fruits and Vegetables,182.095,OUT010,1998,,Tier 3,Grocery Store,732.38
4,NCD19,8.93,Low Fat,0.0,Household,53.8614,OUT013,1987,High,Tier 3,Supermarket Type1,994.7052


In [19]:
data['Item_Fat_Content'].str.split('|')

0       [Low Fat]
1       [Regular]
2       [Low Fat]
3       [Regular]
4       [Low Fat]
          ...    
8518    [Low Fat]
8519    [Regular]
8520    [Low Fat]
8521    [Regular]
8522    [Low Fat]
Name: Item_Fat_Content, Length: 8523, dtype: object

In [20]:
data['y'] = data['Item_Fat_Content'].str.split(' ')

data['x'] = data['y'].apply(lambda x:x[-1])

data['x']

0           Fat
1       Regular
2           Fat
3       Regular
4           Fat
         ...   
8518        Fat
8519    Regular
8520        Fat
8521    Regular
8522        Fat
Name: x, Length: 8523, dtype: object

# strip() function

It will remove all the leading and trailing characters from the string.

In [21]:
data['x'] = data['x'].str.strip(')')
data['x'] = data['x'].str.strip('(')
data['x']

0           Fat
1       Regular
2           Fat
3       Regular
4           Fat
         ...   
8518        Fat
8519    Regular
8520        Fat
8521    Regular
8522        Fat
Name: x, Length: 8523, dtype: object

### Stack Function
* Stack the prescribed level(s) from columns to index.

* Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame. The new inner-most levels are created by pivoting the columns of the current dataframe:

    * if the columns have a single level, the output is a Series;

    * if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame.
    
![image.png](attachment:image.png)

In [22]:
# lets Create a DataFrame
df = pd.DataFrame([[0, 1], [2, 3]],
                    index=['cat', 'dog'],
                    columns=['weight', 'height'])

# lets Print the data
df

Unnamed: 0,weight,height
cat,0,1
dog,2,3


In [23]:
df.stack()

cat  weight    0
     height    1
dog  weight    2
     height    3
dtype: int64

### Unstack Function

* Pivot a level of the (necessarily hierarchical) index labels.

* Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.

* If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex).
![image.png](attachment:image.png)

In [24]:
df.unstack()

weight  cat    0
        dog    2
height  cat    1
        dog    3
dtype: int64

In [25]:
s = df.stack()
s.unstack()

Unnamed: 0,weight,height
cat,0,1
dog,2,3


In [26]:
s.unstack(level=0)

Unnamed: 0,cat,dog
weight,0,2
height,1,3


### The Melt Function
* This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.
![image.png](attachment:image.PNG)

In [27]:
# lets Create a DataFrame
df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
                   'B': {0: 1, 1: 3, 2: 5},
                   'C': {0: 2, 1: 4, 2: 6}})

# lets print the dataframe
df

Unnamed: 0,A,B,C
0,a,1,2
1,b,3,4
2,c,5,6


In [28]:
# lets melt the data 
df.melt(id_vars=['A'], value_vars=['B'])

Unnamed: 0,A,variable,value
0,a,B,1
1,b,B,3
2,c,B,5


In [29]:
#create a dataframe
df = pd.DataFrame({'Name': {0:'Ritika',1:'shyam',2:'neil'},
                  'Course': {0:'Masters',1:'Graduate',2:'Masters'},
                  'Age': {0:22,1:20,2:24}})
df

Unnamed: 0,Name,Course,Age
0,Ritika,Masters,22
1,shyam,Graduate,20
2,neil,Masters,24


In [30]:
df.melt(id_vars=['Name'], value_vars=['Course','Age'])

Unnamed: 0,Name,variable,value
0,Ritika,Course,Masters
1,shyam,Course,Graduate
2,neil,Course,Masters
3,Ritika,Age,22
4,shyam,Age,20
5,neil,Age,24


### The Explode Function

* Transform each element of a list-like to a row, replicating index values.
![image.png](attachment:image1.png)

In [31]:
# lets Create a Data Frame
df = pd.DataFrame({'A': [[1, 2, 3], 'foo', [], [3, 4]], 'B': 1})
df

Unnamed: 0,A,B
0,"[1, 2, 3]",1
1,foo,1
2,[],1
3,"[3, 4]",1


In [32]:
# lets Explode the Column A
df.explode('A')

Unnamed: 0,A,B
0,1,1
0,2,1
0,3,1
1,foo,1
2,,1
3,3,1
3,4,1


### Squeeze Function
* Series or DataFrames with a single element are squeezed to a scalar. DataFrames with a single column or a single row are squeezed to a Series. Otherwise the object is unchanged.

* This method is most useful when you don’t know if your object is a Series or DataFrame, but you do know it has just a single column. In that case you can safely call squeeze to ensure you have a Series.

![image.png](attachment:image2.png)

In [33]:
# lets create a DataFrame
df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])

# lets print the DataFrame
df

Unnamed: 0,a,b
0,1,2
1,3,4


In [34]:
# Slicing a single column will produce a DataFrame with the columns having only one value:
df_a = df[['a']]
df_a

Unnamed: 0,a
0,1
1,3


In [35]:
#lets Squeeze df_a, to get scalar values
df_a.squeeze()

0    1
1    3
Name: a, dtype: int64