In [1]:
import pandas as pd
import numpy as np

# 1. Working with Pandas Series

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. Labels need not be unique but must be a hashable type. The object supports both integer and label-based indexing and provides a host of methods for performing operations involving the index.

In [2]:
pd.__version__

'1.5.3'

Seires through list

In [3]:
lst = [1,2,3,4,5]

print(pd.Series(lst))

0    1
1    2
2    3
3    4
4    5
dtype: int64


Series through Numpy array

In [4]:
arr = np.array([1,2,3,4,5])

print(pd.Series(arr))

0    1
1    2
2    3
3    4
4    5
dtype: int64


Giving Index from our own end

In [5]:
pd.Series(data = ['Rasheed', 'Sharjeel', 'Mateen', 'Aarfath'], index = [1,2,3,4])

1     Rasheed
2    Sharjeel
3      Mateen
4     Aarfath
dtype: object

Series through Dictionary values.

In [6]:
calories_burn = {'Monday' : 200, "Tuesday": 100, 'Wednesday': 150, 'Thursday': 300, 'Friday': 250, 'Saturday': 200, 'Sunday': 150}
print(pd.Series(calories_burn))

Monday       200
Tuesday      100
Wednesday    150
Thursday     300
Friday       250
Saturday     200
Sunday       150
dtype: int64


Using repeat function along with creating a Series

Pandas Series.repeat function repeat elements of a Series. It returns a new Series where each element of the current Series is repeated consecutively a given number of times.

In [7]:
pd.Series(10).repeat(5)

0    10
0    10
0    10
0    10
0    10
dtype: int64

we can use the reset function to make the index accurate

In [8]:
pd.Series(10).repeat(5).reset_index()

Unnamed: 0,index,0
0,0,10
1,0,10
2,0,10
3,0,10
4,0,10


This code indicates:
    
• 10 should be repeated 5 times and

• 20 should be repeated 2 times

In [9]:
s = pd.Series([10,11,12]).repeat([2,3,4]).reset_index(drop = True)
print(s)

0    10
1    10
2    11
3    11
4    11
5    12
6    12
7    12
8    12
dtype: int64


Accessing elements

In [10]:
s[7]

12

[0] or [50] something like this would not work becasue the we can access elements based on the index which we procided

In [11]:
s

0    10
1    10
2    11
3    11
4    11
5    12
6    12
7    12
8    12
dtype: int64

By last n numbers (start - end-1)

In [12]:
s[3:7]

3    11
4    11
5    12
6    12
dtype: int64

### b) Aggregate function on pandas Series

Pandas Series.aggregate() function aggregate using one or more operations over the specified axis in the given series object.

In [13]:
sr = pd.Series([1,2,3,4,5,6,7,8])
sr.agg([min, max, sum])

min     1
max     8
sum    36
dtype: int64

### c) Series absolute function

Pandas Series.abs() method is used to get the absolute numeric value of each element in Series/DataFrame.

In [14]:
sr = pd.Series([1,-2,-3,-4,5,6,-7,8])

print(sr.abs())

0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
dtype: int64


### d) Appending Series

Pandas Series.append() function is used to concatenate two or more series object.

Syntax: Series.append(to_append, ignore_index=False, verify_integrity=False)
    
Parameter: to_append: Series or list/tuple of Series ignore_index : If True, do not use the index labels. verify_integrity: If True, raise Exception on creating index with duplicates

In [15]:
sr1 = pd.Series([1,2,3,4,5,6,7,8])
sr2 = pd.Series([1,-2,-3,-4,5,6,-7,8])

s3 = sr1.append(sr2)
s3

  s3 = sr1.append(sr2)


0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
0    1
1   -2
2   -3
3   -4
4    5
5    6
6   -7
7    8
dtype: int64

To make the index accurate:

In [16]:
s3.reset_index(drop = True)

0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     1
9    -2
10   -3
11   -4
12    5
13    6
14   -7
15    8
dtype: int64

### e) Astype function

Pandas astype() is the one of the most important methods. It is used to change data type of a series. When data frame is made from a csv file, the columns are imported and data type is set automatically which many times is not what it actually should have

In [17]:
sr1

0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
dtype: int64

You can see below int64 is mentioned

In [18]:
type(sr1[2])

numpy.int64

Now you can see it is written as object

In [19]:
print(sr1.astype('float'))
print(sr1.astype('str'))

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    6.0
6    7.0
7    8.0
dtype: float64
0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
dtype: object


### f) Between Function

Pandas between() method is used on series to check which values lie between first and second argument.

In [20]:
sr = pd.Series([1,2,34,45,28,11,23,7,9,35])
print(sr.between(15,50))

0    False
1    False
2     True
3     True
4     True
5    False
6     True
7    False
8    False
9     True
dtype: bool


### g) All strings functions can be used to extract or modify texts in a series

Upper and Lower Function

Len function

Strip Function

Split Function

Contains Function

Replace Function

Count Function

Startswith and Endswith Function

Find Fiction

In [21]:
ser = pd.Series(['Abdul Rasheed', 'Abdul Mateen', 'Data Science', 'Aafath Khan', 'Aadam Khan'])

Upper and lower funtion

In [22]:
print(ser.str.upper())
print(ser.str.lower())

0    ABDUL RASHEED
1     ABDUL MATEEN
2     DATA SCIENCE
3      AAFATH KHAN
4       AADAM KHAN
dtype: object
0    abdul rasheed
1     abdul mateen
2     data science
3      aafath khan
4       aadam khan
dtype: object


Length Function

In [23]:
for i in ser:
    print(len(i))

13
12
12
11
10


Strip Function

In [24]:
ser = pd.Series(['  Abdul Rasheed', 'Abdul Mateen   ', '  Data Science', 'Aafath Khan', '  Aadam Khan  '])
for i in ser:
    print(i, len(i))

  Abdul Rasheed 15
Abdul Mateen    15
  Data Science 14
Aafath Khan 11
  Aadam Khan   14


In [25]:
ser = ser.str.strip()
for i in ser:
    print(i, len(i))

Abdul Rasheed 13
Abdul Mateen 12
Data Science 12
Aafath Khan 11
Aadam Khan 10


Split Function

In [26]:
ser.str.split()[1]

['Abdul', 'Mateen']

In [27]:
print(pd.Series(['10/3/1983', '10/3/1994', '29/10/1998']).str.split('/')[0])
print(pd.Series(['10/3/1983', '10/3/1994', '29/10/1998']).str.split('/')[1])
print(pd.Series(['10/3/1983', '10/3/1994', '29/10/1998']).str.split('/')[2])

['10', '3', '1983']
['10', '3', '1994']
['29', '10', '1998']


Contains Function

In [28]:
ser = pd.Series(['Abdul Rasheed', 'Abdul Mateen', 'Data Science', 'Aafath Khan', 'Aadam Khan'])

ser.str.contains('A')

0     True
1     True
2    False
3     True
4     True
dtype: bool

Replace Function

In [29]:
ser = pd.Series(['Abdul Rasheed', 'Abdul Mateen', 'Data Science', 'Aafath Khan', 'Aadam Khan'])

ser.str.replace(" ", "_")

0    Abdul_Rasheed
1     Abdul_Mateen
2     Data_Science
3      Aafath_Khan
4       Aadam_Khan
dtype: object

Count Function

In [30]:
ser.str.count('a')

0    1
1    1
2    2
3    3
4    3
dtype: int64

Startswith and endswith

In [31]:
ser = pd.Series(['Abdul Rasheed', 'Abdul Mateen', 'Data Science', 'Aafath Khan', 'Aadam Khan'])
print(ser.str.endswith('d'))
print(ser.str.startswith('A'))

0     True
1    False
2    False
3    False
4    False
dtype: bool
0     True
1     True
2    False
3     True
4     True
dtype: bool


Finds Function

In [32]:

ser.str.find('Addul')

0   -1
1   -1
2   -1
3   -1
4   -1
dtype: int64

### h) Converting a Series to List

Pandas tolist() is used to convert a series to list. Initially the series is of type pandas.core.series

In [33]:
ser.to_list()

['Abdul Rasheed', 'Abdul Mateen', 'Data Science', 'Aafath Khan', 'Aadam Khan']

# 2. Detailed Coding Implementations on Pandas DataFrame

Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.

### a) Creating Data Frames

In the real world, a Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file.
Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionary etc. Dataframe can be created in different ways here are some ways by which we create a dataframe:

#### Creating a dataframe using List:

DataFrame can be created using a single list or a list of lists.

In [34]:
lst = ['Machine Learning', 'Python', 'SQL', 'Tableu', 'Data Science', 'Numpy', 'Pandas', 'Matplotlibs']
df = pd.DataFrame(lst)
df

Unnamed: 0,0
0,Machine Learning
1,Python
2,SQL
3,Tableu
4,Data Science
5,Numpy
6,Pandas
7,Matplotlibs


In [35]:
lst = [['Machine Learning', 40], ['Python', 20], ['SQL', 10], ['Tableu',7], ['Data Science', 100], ['Numpy',8], ['Pandas',10],['Matplotlibs',5]]
pd.DataFrame(lst) 

Unnamed: 0,0,1
0,Machine Learning,40
1,Python,20
2,SQL,10
3,Tableu,7
4,Data Science,100
5,Numpy,8
6,Pandas,10
7,Matplotlibs,5


#### Creating DataFrame from dict of narray/lists:

To create DataFrame from dict of narray/list, all the narray must be of same length. If index is passed then the length index should be equal to the length of arrays. If no index is passed, then by default, index will be range(n) where n is the array length.

In [36]:
data = {'Name':['Rasheed', 'Mateen', 'Aarfath', 'Aadam'], 'Age':[24, 26,14,6]}
pd.DataFrame(data)

Unnamed: 0,Name,Age
0,Rasheed,24
1,Mateen,26
2,Aarfath,14
3,Aadam,6


#### A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and renaming.

Column Selection: In Order to select a column in Pandas DataFrame, we can either access the columns by calling them by their columns name.

In [37]:
data = {'Name'        :['Rasheed', 'Mateen', 'Aarfath', 'Aadam'],
        'Age'         :[24, 26,14,6],
        'Address'     :['Dallas', 'Hyderabad', 'Hyderabad', 'Hyderabad'],
        'Qualification':['MBA & IT', 'B.com', '9th grade', '1st grade']
       }
df = pd.DataFrame(data)
df[['Name', "Qualification"]]

Unnamed: 0,Name,Qualification
0,Rasheed,MBA & IT
1,Mateen,B.com
2,Aarfath,9th grade
3,Aadam,1st grade


### b) Slicing in DataFrames Using iloc and loc

Pandas comprises many methods for its proper functioning. loc and iloc0 are one of those methods. These are used in slicing data from the Pandas DataFrame. They help in the convenient selection of data from the DataFrame in Python. They are used in filtering the data according to some conditions.

In [38]:
data = { 'One'   : pd.Series([1,2,3,4,5,6]),
         'two'   : pd.Series([10,20,30,40,50,60]),
         'three' : pd.Series([100,200,300,400,500,600]),
         'Four'  : pd.Series([1000,2000,3000,4000,5000,6000])
    
}
df = pd.DataFrame(data)
df

Unnamed: 0,One,two,three,Four
0,1,10,100,1000
1,2,20,200,2000
2,3,30,300,3000
3,4,40,400,4000
4,5,50,500,5000
5,6,60,600,6000


#### Basic loc Operations

Python loc() function The loc function is label based dataïselecting method which means that we have to pass the name of the row or column which we want to select. This method includes the last element of the range passed in it, unlike iloc. loc can accept the boolean data unlike iloc0. Many operations can be performed using the loc method like

In [39]:
df.loc[1:4, 'two':'Four']

Unnamed: 0,two,three,Four
1,20,200,2000
2,30,300,3000
3,40,400,4000
4,50,500,5000


#### Basic iloc Operations

The iloc() function is an indexed-based selecting method which means that we have to pass an integer index in the method to select a specific row/column.
This method does not include the last element of the range passed in it unlike loc. iloc does not accept the boolean data unlike loc.

In [40]:
df.iloc[0:3,0:3]

Unnamed: 0,One,two,three
0,1,10,100
1,2,20,200
2,3,30,300


• you can see index 3 of both row and column has not been added here so 1 was inclusize but 3 is exclusive in the case of ilocs

Let's see another example

In [41]:
df.iloc[:,2:3]

Unnamed: 0,three
0,100
1,200
2,300
3,400
4,500
5,600


Selecting Spefic Rows

In [42]:
df.iloc[[0,3],[1,3]]

Unnamed: 0,two,Four
0,10,1000
3,40,4000


### c) Slicing Using Conditions

Using Conditions works with loc basically

In [43]:
df['two'] > 20

0    False
1    False
2     True
3     True
4     True
5     True
Name: two, dtype: bool

In [44]:
df.loc[df['two']>20, ['three', 'Four']]

Unnamed: 0,three,Four
2,300,3000
3,400,4000
4,500,5000
5,600,6000


• So we could extract only those data for which the value is more than 20

• For the columns we have used comma(,) to extract specifc columns which is 'three' and 'four'

Let's see another example

In [45]:
df.loc[df['two'] < 30, ['three', 'Four']]

Unnamed: 0,three,Four
0,100,1000
1,200,2000


In [46]:
df

Unnamed: 0,One,two,three,Four
0,1,10,100,1000
1,2,20,200,2000
2,3,30,300,3000
3,4,40,400,4000
4,5,50,500,5000
5,6,60,600,6000


#### c) Column Addition in DataFrame

We can add a column in many ways. Let us discuss three ways how we can add column here

• Using List

• Using Pandas Series

• Using an existing Column(we can modify that column in the way we want and that modified part can also be displayed)

In [47]:
l = [10000, 20000, 30000, 40000, 50000, 60000]

In [48]:
df['five'] = l

In [49]:
df

Unnamed: 0,One,two,three,Four,five
0,1,10,100,1000,10000
1,2,20,200,2000,20000
2,3,30,300,3000,30000
3,4,40,400,4000,40000
4,5,50,500,5000,50000
5,6,60,600,6000,60000


In [50]:
sr = pd.Series([11,22,33,44,55,66])
df['six'] = sr
df

Unnamed: 0,One,two,three,Four,five,six
0,1,10,100,1000,10000,11
1,2,20,200,2000,20000,22
2,3,30,300,3000,30000,33
3,4,40,400,4000,40000,44
4,5,50,500,5000,50000,55
5,6,60,600,6000,60000,66


Using an existing Column

Now we can see the column 7 is having all the values of column 1 incremented by 10

In [51]:
df['seven'] = df['One'] + 10
df

Unnamed: 0,One,two,three,Four,five,six,seven
0,1,10,100,1000,10000,11,11
1,2,20,200,2000,20000,22,12
2,3,30,300,3000,30000,33,13
3,4,40,400,4000,40000,44,14
4,5,50,500,5000,50000,55,15
5,6,60,600,6000,60000,66,16


## d) Column Deletion in Dataframes

In [52]:
df

Unnamed: 0,One,two,three,Four,five,six,seven
0,1,10,100,1000,10000,11,11
1,2,20,200,2000,20000,22,12
2,3,30,300,3000,30000,33,13
3,4,40,400,4000,40000,44,14
4,5,50,500,5000,50000,55,15
5,6,60,600,6000,60000,66,16


Using del

• You can see that the column which had the name 'six' has been deleted

In [53]:
del df['seven']
df

Unnamed: 0,One,two,three,Four,five,six
0,1,10,100,1000,10000,11
1,2,20,200,2000,20000,22
2,3,30,300,3000,30000,33
3,4,40,400,4000,40000,44
4,5,50,500,5000,50000,55
5,6,60,600,6000,60000,66


Using pop

• You can see that the columm five has also been deleted from our dataframe

In [54]:
df.pop('six')
df

Unnamed: 0,One,two,three,Four,five
0,1,10,100,1000,10000
1,2,20,200,2000,20000
2,3,30,300,3000,30000
3,4,40,400,4000,40000
4,5,50,500,5000,50000
5,6,60,600,6000,60000


### e) Addition of rows

In a Pandas DataFrame, you can add rows by using the append method. You can also create a new DataFrame with the desired row values and use the append to add the new row to the original dataframe. Here's an example of adding a single row to a dataframe:

In [55]:
df1 = pd.DataFrame([[1,2], [3,4]], columns = ['a', 'b'])
df2 = pd.DataFrame([[5,6], [7,8]], columns = ['a', 'b'])
df1.append(df2).reset_index(drop = True)

  df1.append(df2).reset_index(drop = True)


Unnamed: 0,a,b
0,1,2
1,3,4
2,5,6
3,7,8


### f) Pandas drop function

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas provide data analysts a way to delete and filter data frame using .drop method. Rows or columns can be removed using index label or column name using this method.

Syntax: DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise)
                       
Parameters:
                       
labels: String or list of strings referring row or column name. axis: int or string value, O 'index' for Rows and 1 'columns' for Columns. index or columns:

Single label or list. index or columns are an alternative to axis and cannot be used together. level: Used to specify level in case data frame is having multiple level index. inplace: Makes changes in original Data Frame if True. errors: Ignores error if any value from the list doesn't exists and drops rest of the
values when errors = 'ignore'

Return type: Dataframe with dropped values

In [56]:
data = { 'One'   : pd.Series([1,2,3,4,5,6]),
         'two'   : pd.Series([10,20,30,40,50,60]),
         'three' : pd.Series([100,200,300,400,500,600]),
         'Four'  : pd.Series([1000,2000,3000,4000,5000,6000])
    
}
df = pd.DataFrame(data)
df

Unnamed: 0,One,two,three,Four
0,1,10,100,1000
1,2,20,200,2000
2,3,30,300,3000
3,4,40,400,4000
4,5,50,500,5000
5,6,60,600,6000


axis =0 => Rows (row wise)

axis =1 => Columns (column wise)

In [57]:
#it not permenantly drop
#df.drop([2,3], axis = 0)
#for permenantly drop
#df = df.drop([2,3], axis = 0)
#df
#or
df.drop([0,5], axis = 0, inplace = True)

In [58]:
df

Unnamed: 0,One,two,three,Four
1,2,20,200,2000
2,3,30,300,3000
3,4,40,400,4000
4,5,50,500,5000


In [59]:
df.drop(['two', 'three'], axis = 1, inplace = True)

In [60]:
df

Unnamed: 0,One,Four
1,2,2000
2,3,3000
3,4,4000
4,5,5000


### g) Transposing a DataFrame

The . T attribute in a Pandas DataFrame is used to transpose the dataframe, i.e., to flip the rows and columns.

The result of transposing a dataframe is a new dataframe with the original rows as columns and the original columns as rows

Here's an example to illustrate the use of the .T attribute:

In [61]:
data = { 'One'   : pd.Series([1,2,3,4,5,6]),
         'two'   : pd.Series([10,20,30,40,50,60]),
         'three' : pd.Series([100,200,300,400,500,600]),
         'Four'  : pd.Series([1000,2000,3000,4000,5000,6000])
    
}
df = pd.DataFrame(data)
df

Unnamed: 0,One,two,three,Four
0,1,10,100,1000
1,2,20,200,2000
2,3,30,300,3000
3,4,40,400,4000
4,5,50,500,5000
5,6,60,600,6000


In [62]:
df.T

Unnamed: 0,0,1,2,3,4,5
One,1,2,3,4,5,6
two,10,20,30,40,50,60
three,100,200,300,400,500,600
Four,1000,2000,3000,4000,5000,6000


### h) A set of more DataFrame Functionalities

In [63]:
df

Unnamed: 0,One,two,three,Four
0,1,10,100,1000
1,2,20,200,2000
2,3,30,300,3000
3,4,40,400,4000
4,5,50,500,5000
5,6,60,600,6000


1. axes function

The .axes attribute in a Pandas DataFrame returns a list with the row and column labels of the DataFrame. The first element of the list is the row labels (index), and the second element is the column labels.

In [64]:
df.axes

[RangeIndex(start=0, stop=6, step=1),
 Index(['One', 'two', 'three', 'Four'], dtype='object')]

2. ndim function

The .ndim attribute in a Pandas DataFrame returns the number of dimensions of the dataframe, which is always 2 for a DataFrame (row-and-column format).

In [65]:
df.ndim

2

3. dtype function

The .dtypes attribute in a Pandas DataFrame returns the data types of the columns in the DataFrame. The result is a Series with the column names as index and the data types of the columns as values.

In [66]:
df.dtypes

One      int64
two      int64
three    int64
Four     int64
dtype: object

4. Shape function

The shape attribute in a Pandas DataFrame returns the dimensions (number of rows, number of columns) of the DataFrame as a tuple

In [67]:
df.shape

(6, 4)

5. head() function

The .head() method in a Pandas DataFrame returns the first n rows (by default, n=5) of the DataFrame. This method is useful for quickly examining the first few rows of a large DataFrame to get a sense of its structure and content.

In [68]:
data = {'Name'        :['Rasheed', 'Mateen', 'Aarfath', 'Aadam', 'Imran', 'Faisal', 'Omer', 'Asna', 'Afia', 'Aqsa', 'Aafan'],
        'Age'         :[24, 26,14,6,26,28,13,13,11,14,4],
        'Address'     :['Dallas', 'Hyderabad', 'Hyderabad', 'Hyderabad','Singapore', 'Hyderabad', 'Hyderabad', 'Hyderabad','Hyderabad','Hyderabad','Hyderabad'],
        'Qualification':['MBA & IT', 'B.com', '10th grade', '1st grade', 'MSC', 'Intermediate', '8th grade', '6th grade', '4th grade', '10th grade', 'UKG' ]
       }
df = pd.DataFrame(data)
df.head(7)

Unnamed: 0,Name,Age,Address,Qualification
0,Rasheed,24,Dallas,MBA & IT
1,Mateen,26,Hyderabad,B.com
2,Aarfath,14,Hyderabad,10th grade
3,Aadam,6,Hyderabad,1st grade
4,Imran,26,Singapore,MSC
5,Faisal,28,Hyderabad,Intermediate
6,Omer,13,Hyderabad,8th grade


• By default it will display first 5 rows

• We can mention the number of starting rows we want to see

• We will see this function more often furthur since the dataframe is so small at this point so we cannot use something like df. head(20)

6. tail() function

The .tail() method in a Pandas DataFrame returns the last n rows (by default, n=5) of the DataFrame. This method is useful for quickly examining the last few rows of a large DataFrame to get a sense of its structure and content.

In [69]:
df.tail(8)

Unnamed: 0,Name,Age,Address,Qualification
3,Aadam,6,Hyderabad,1st grade
4,Imran,26,Singapore,MSC
5,Faisal,28,Hyderabad,Intermediate
6,Omer,13,Hyderabad,8th grade
7,Asna,13,Hyderabad,6th grade
8,Afia,11,Hyderabad,4th grade
9,Aqsa,14,Hyderabad,10th grade
10,Aafan,4,Hyderabad,UKG


7. empty() function

The empty attribute in a Pandas DataFrame returns a Boolean value indicating whether the DataFrame is empty or not. A DataFrame is considered empty if it has no rows

In [70]:
df.empty

False

### i) Statistical or Mathematical Functions

Sum,
Mean,
Median,
Mode,
Variance,
Min,
Max,
Standard Deviation

In [71]:
data = { 'One'   : pd.Series([1,2,3,4,5,6]),
         'two'   : pd.Series([10,20,30,40,50,60]),
         'three' : pd.Series([100,200,300,400,500,600]),
         'Four'  : pd.Series([1000,2000,3000,4000,5000,6000])
    
}
df = pd.DataFrame(data)
df

Unnamed: 0,One,two,three,Four
0,1,10,100,1000
1,2,20,200,2000
2,3,30,300,3000
3,4,40,400,4000
4,5,50,500,5000
5,6,60,600,6000


1. sum

In [72]:
df.sum()

One         21
two        210
three     2100
Four     21000
dtype: int64

2. mean()

In [73]:
df.mean()

One         3.5
two        35.0
three     350.0
Four     3500.0
dtype: float64

3. median()

In [74]:
df.median()

One         3.5
two        35.0
three     350.0
Four     3500.0
dtype: float64

4. mode()

In [75]:
d = pd.DataFrame({'A':[1,2,3,4,4,4,5,7,8], 'B': [10,13,13,17,18,28,28,13,30]})
print(d['A'].mode())
print(d['B'].mode())

0    4
Name: A, dtype: int64
0    13
Name: B, dtype: int64


5. Varience()

In [76]:
d = pd.DataFrame({'A':[1,2,3,4,4,4,5,7,8], 'B': [10,13,13,17,18,28,28,13,30]})
print(d['A'].var())
print(d['B'].var())
print(d.var())

4.944444444444444
59.61111111111111
A     4.944444
B    59.611111
dtype: float64


6. min()

In [77]:
d.min()

A     1
B    10
dtype: int64

7. max()

In [78]:
d.max()

A     8
B    30
dtype: int64

8. std()

In [79]:
d.std()

A    2.223611
B    7.720823
dtype: float64

### j) Describe function

The describe method in a Pandas DataFrame returns descriptive statistics of the data in the DataFrame. It provides a quick summary of the central tendency, dispersion, and shape of the distribution of a set of numerical data.

The default behavior of describe is to compute descriptive statistics for all numerical columns in the DataFrame. If you want to compute descriptive statistics for a specific column, you can pass the name of the column as an argument.

In [80]:
data = { 'one'   : pd.Series([1,2,3,4,5,6]),
         'two'   : pd.Series([10,20,30,40,50,60]),
         'three' : pd.Series([100,200,300,400,500,600]),
         'four'  : pd.Series([1000,2000,3000,4000,5000,6000]),
         'five'  : pd.Series(['A', 'B', 'C', 'D', 'E', 'F'])
    
}
        
df = pd.DataFrame(data)
df.describe()

Unnamed: 0,one,two,three,four
count,6.0,6.0,6.0,6.0
mean,3.5,35.0,350.0,3500.0
std,1.870829,18.708287,187.082869,1870.828693
min,1.0,10.0,100.0,1000.0
25%,2.25,22.5,225.0,2250.0
50%,3.5,35.0,350.0,3500.0
75%,4.75,47.5,475.0,4750.0
max,6.0,60.0,600.0,6000.0


In [81]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   one     6 non-null      int64 
 1   two     6 non-null      int64 
 2   three   6 non-null      int64 
 3   four    6 non-null      int64 
 4   five    6 non-null      object
dtypes: int64(4), object(1)
memory usage: 372.0+ bytes


### k) Pipe Functions

1. Pipe Function

The pipe() method in a Pandas DataFrame allows you to apply a function to the DataFrame, similar to the way the apply method works. The difference is that pipe allows you to chain multiple operations together by passing the output of one function to the input of the next function.

In [82]:
data = { 'one'   : pd.Series([1,2,3,4,5,6]),
         'two'   : pd.Series([10,20,30,40,50,60]),
         'three' : pd.Series([100,200,300,400,500,600]),
         'four'  : pd.Series([1000,2000,3000,4000,5000,6000])
         
    
}
def add_(i,j):
    return i + j
def sub_(i,j):
    return i - j
df = pd.DataFrame(data)
df

Unnamed: 0,one,two,three,four
0,1,10,100,1000
1,2,20,200,2000
2,3,30,300,3000
3,4,40,400,4000
4,5,50,500,5000
5,6,60,600,6000


Example 1

In [83]:
df.pipe(add_, 13)

Unnamed: 0,one,two,three,four
0,14,23,113,1013
1,15,33,213,2013
2,16,43,313,3013
3,17,53,413,4013
4,18,63,513,5013
5,19,73,613,6013


Example 2

In [84]:
def mean_(i):
    return df.mean()
    
def sq(i):
    x = i**2
    return x
print(df.pipe(mean_))
df.pipe(sq)

one         3.5
two        35.0
three     350.0
four     3500.0
dtype: float64


Unnamed: 0,one,two,three,four
0,1,100,10000,1000000
1,4,400,40000,4000000
2,9,900,90000,9000000
3,16,1600,160000,16000000
4,25,2500,250000,25000000
5,36,3600,360000,36000000


2. Apply function

The map() method in a Pandas DataFrame allows you to apply a function to each element of a specific column of the DataFrame. The function can be either a built-in Pvthon function or a user-defined function.

In [85]:
data = { 'one'   : pd.Series([1,2,3,4,5,6]),
         'two'   : pd.Series([10,20,30,40,50,60]),
         'three' : pd.Series([100,200,300,400,500,600]),
         'four'  : pd.Series([1000,2000,3000,4000,5000,6000])
         
    
}
df = pd.DataFrame(data)
df

Unnamed: 0,one,two,three,four
0,1,10,100,1000
1,2,20,200,2000
2,3,30,300,3000
3,4,40,400,4000
4,5,50,500,5000
5,6,60,600,6000


In [86]:
print(df.apply(np.mean))
print(df.apply(np.median))
print(df.apply(np.max))
print(df.apply(np.min))
print(df.apply(np.var))
print(df.apply(np.std))

one         3.5
two        35.0
three     350.0
four     3500.0
dtype: float64
one         3.5
two        35.0
three     350.0
four     3500.0
dtype: float64
one         6
two        60
three     600
four     6000
dtype: int64
one         1
two        10
three     100
four     1000
dtype: int64
one      2.916667e+00
two      2.916667e+02
three    2.916667e+04
four     2.916667e+06
dtype: float64
one         1.707825
two        17.078251
three     170.782513
four     1707.825128
dtype: float64


In [87]:
print(df.apply(lambda x: x.max() - x.min()))

one         5
two        50
three     500
four     5000
dtype: int64


3. Apply map function

The map() method in a Pandas DataFrame allows you to apply a function to each element of a specific column of the DataFrame. The function can be either a built-in Python function or a user-defined function

In [88]:
df.apply(lambda x: x*2)

Unnamed: 0,one,two,three,four
0,2,20,200,2000
1,4,40,400,4000
2,6,60,600,6000
3,8,80,800,8000
4,10,100,1000,10000
5,12,120,1200,12000


applymap and apply are both functions in the pandas library used for applying a function to elements of a pandas
DataFrame or Series.

applymap is used to apply a function to every element of a Dataframe. It returns a new DataFrame where each element has been modified by the input function.

apply is used to apply a function along any axis of a Dataframe or Series. It returns either a Series or a DataFrame, depending on the axis along which the function is applied and the return value of the function. Unlike applymap, apply can take into account the context of the data, such as the row or column label.

So, applymap is meant for element-wise operations while apply can be used for both element-wise and row/column-wise operations.

In [89]:
d = {'A':[1.6,3.7,4.9,5.2],
     'B':[2.5,6.4,7.8,8.1]}
df= pd.DataFrame(d)
df1 = df.applymap(np.int64)
print(df1)

df2 = df.apply(lambda row: row.mean(), axis = 1)
print(df2)

   A  B
0  1  2
1  3  6
2  4  7
3  5  8
0    2.05
1    5.05
2    6.35
3    6.65
dtype: float64


### l) Reindex Function

The reindex function in Pandas is used to change the row labels and/or column labels of a DataFrame. This function can be used to align data from multiple DataFrames or to update the labels based on new data. The function takes in a list or an array of new labels as its first argument and, optionally, a fill value to replace any missing values. The reindexing can be done along either the row axis (0) or the column axis (1). The reindexed DataFrame is returned.

In [90]:
#Example for rows
data = { 'one'   : pd.Series([1,2,3,4,5,6]),
         'two'   : pd.Series([10,20,30,40,50,60]),
         'three' : pd.Series([100,200,300,400,500,600]),
         'four'  : pd.Series([1000,2000,3000,4000,5000,6000])
         
    
}
df = pd.DataFrame(data)
print(df)
print(df.reindex([1,5,0,4,1,2]))

   one  two  three  four
0    1   10    100  1000
1    2   20    200  2000
2    3   30    300  3000
3    4   40    400  4000
4    5   50    500  5000
5    6   60    600  6000
   one  two  three  four
1    2   20    200  2000
5    6   60    600  6000
0    1   10    100  1000
4    5   50    500  5000
1    2   20    200  2000
2    3   30    300  3000


In [91]:
#Example for columns
data = {'Name'        :['Rasheed', 'Mateen', 'Aarfath', 'Aadam'],
        'Age'         :[24, 26,14,6],
        'Address'     :['Dallas', 'Hyderabad', 'Hyderabad', 'Hyderabad'],
        'Qualification':['MBA & IT', 'B.com', '9th grade', '1st grade']
       }
df = pd.DataFrame(data)
df.reindex(columns = ['Name', 'Qualification', 'Age', 'Address', 'Name'])

Unnamed: 0,Name,Qualification,Age,Address,Name.1
0,Rasheed,MBA & IT,24,Dallas,Rasheed
1,Mateen,B.com,26,Hyderabad,Mateen
2,Aarfath,9th grade,14,Hyderabad,Aarfath
3,Aadam,1st grade,6,Hyderabad,Aadam


### m) Renaming Columns in Pandas DataFrame

The rename function in Pandas is used to change the row labels and/or column labels of a DataFrame. It can be used to update the names of one or multiple rows or columns by passing a dictionary of new names as its argument. The dictionary should have the old names as keys and the new names as values

In [92]:
data = { 'one'   : pd.Series([1,2,3,4,5,6]),
         'two'   : pd.Series([10,20,30,40,50,60]),
         'three' : pd.Series([100,200,300,400,500,600]),
         'four'  : pd.Series([1000,2000,3000,4000,5000,6000])
         
    
}
df = pd.DataFrame(data)
df.rename(columns = {'one': 'ONE', 'two': 'TWO', 'three': 'THREE', 'four':'FOUR'}, 
          inplace = True, index ={0:'First', 1:'a', 2:'b', 3:'c', 4:'d', 5:'e',})

In [93]:
df

Unnamed: 0,ONE,TWO,THREE,FOUR
First,1,10,100,1000
a,2,20,200,2000
b,3,30,300,3000
c,4,40,400,4000
d,5,50,500,5000
e,6,60,600,6000


### n) Sorting in Pandas DataFrame

Pandas provides several methods to sort a DataFrame based on one or more columns.

In [94]:
data = { 'one'   : pd.Series([13,20,93,41,57,26]),
         'two'   : pd.Series([10,20,30,40,50,60]),
         'three' : pd.Series([100,200,300,400,500,600]),
         'four'  : pd.Series([1000,2000,3000,4000,5000,6000])
         
    
}
df = pd.DataFrame(data)
df

Unnamed: 0,one,two,three,four
0,13,10,100,1000
1,20,20,200,2000
2,93,30,300,3000
3,41,40,400,4000
4,57,50,500,5000
5,26,60,600,6000


Sort with respect to Specific Column

In [95]:
df.sort_values(by='one')

Unnamed: 0,one,two,three,four
0,13,10,100,1000
1,20,20,200,2000
5,26,60,600,6000
3,41,40,400,4000
4,57,50,500,5000
2,93,30,300,3000


'Sort in Scecific Order'

In [96]:
df.sort_values(by = 'one', ascending = False)

Unnamed: 0,one,two,three,four
2,93,30,300,3000
4,57,50,500,5000
3,41,40,400,4000
5,26,60,600,6000
1,20,20,200,2000
0,13,10,100,1000


sort based on multiple values

In [97]:
df.sort_values(by= ['one', 'three'])

Unnamed: 0,one,two,three,four
0,13,10,100,1000
1,20,20,200,2000
5,26,60,600,6000
3,41,40,400,4000
4,57,50,500,5000
2,93,30,300,3000


• quicksort

• mergesort

• heapsort

In [98]:
df.sort_values(by= 'one', kind = 'heapsort')

Unnamed: 0,one,two,three,four
0,13,10,100,1000
1,20,20,200,2000
5,26,60,600,6000
3,41,40,400,4000
4,57,50,500,5000
2,93,30,300,3000


### o) Groupby Functions

The groupby function in pandas is used to split a dataframe into groups based on one or more columns. It returns a DataFrameGroupBy object, which is similar to a DataFrame but has some additional methods to perform operations on the grouped data

In [99]:
cricket = {'Team':['India','India','Australia','Australia','SA', 'SA', 'SA', 'SA', 'NZ', 'NZ', 'NZ', 'India'],
           'Rank': [2, 3, 1,2, 3,4,11,1,2, 4, 1, 21],
           'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
           'Points': [876,801, 891, 815, 776, 784,834,824,758,691,883,782]}
df = pd.DataFrame(cricket)
df

Unnamed: 0,Team,Rank,Year,Points
0,India,2,2014,876
1,India,3,2015,801
2,Australia,1,2014,891
3,Australia,2,2015,815
4,SA,3,2014,776
5,SA,4,2015,784
6,SA,11,2016,834
7,SA,1,2017,824
8,NZ,2,2016,758
9,NZ,4,2014,691


In [100]:
df.groupby(by='Team').groups

{'Australia': [2, 3], 'India': [0, 1, 11], 'NZ': [8, 9, 10], 'SA': [4, 5, 6, 7]}

• Austrealia is present in index 2 and 3
• India is present in index 0,1 and 11 and so on

To search for specific Country with specific year

In [101]:
df.groupby(['Team', 'Year']).get_group(('Australia', 2014))

Unnamed: 0,Team,Rank,Year,Points
2,Australia,1,2014,891


If the data is not present then we will be getting an error

Adding some statistical computation on top of groupby

In [102]:
df.groupby(by = 'Team')['Points'].sum()

Team
Australia    1706
India        2459
NZ           2332
SA           3218
Name: Points, dtype: int64

• This means we have displayed the teams which are having the maximum sum in Poitns

Let us sort it to get it in a better way

In [103]:
df.groupby(by = 'Team')['Points'].sum().sort_values(ascending = False)

Team
SA           3218
India        2459
NZ           2332
Australia    1706
Name: Points, dtype: int64

Checking multiple stats for points team wise

In [104]:
groups = df.groupby(['Team'])
stat = groups["Points"].agg([np.sum, np.mean, np.median, np.max, np.min, np.var, np.std])
ranking_stat = stat.sort_values(by ='sum', ascending = False)
ranking_stat

Unnamed: 0_level_0,sum,mean,median,amax,amin,var,std
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
SA,3218,804.5,804.0,834,776,827.666667,28.769196
India,2459,819.666667,801.0,876,782,2470.333333,49.702448
NZ,2332,777.333333,758.0,883,691,9496.333333,97.449132
Australia,1706,853.0,853.0,891,815,2888.0,53.740115


filter function along with groupby

In [105]:
df.groupby ('Team').filter (lambda x : len (x) == 4)

Unnamed: 0,Team,Rank,Year,Points
4,SA,3,2014,776
5,SA,4,2015,784
6,SA,11,2016,834
7,SA,1,2017,824


# 3. Working with cs files and basic data Analysis Using Pandas

a) Reading csv

Reading csv files from local system

In [106]:
df = pd.read_csv('Football.csv')
df

Unnamed: 0,Country,League,Club,Player Names,Matches_Played,Substitution,Mins,Goals,xG,xG Per Avg Match,Shots,OnTarget,Shots Per Avg Match,On Target Per Avg Match,Year
0,Spain,La Liga,(BET),Juanmi Callejon,19,16,1849,11,6.62,0.34,48,20,2.47,1.03,2016
1,Spain,La Liga,(BAR),Antoine Griezmann,36,0,3129,16,11.86,0.36,88,41,2.67,1.24,2016
2,Spain,La Liga,(ATL),Luis Suarez,34,1,2940,28,23.21,0.75,120,57,3.88,1.84,2016
3,Spain,La Liga,(CAR),Ruben Castro,32,3,2842,13,14.06,0.47,117,42,3.91,1.40,2016
4,Spain,La Liga,(VAL),Kevin Gameiro,21,10,1745,13,10.65,0.58,50,23,2.72,1.25,2016
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
655,Netherlands,Eredivisie,(UTR),Gyrano Kerk,24,0,2155,10,7.49,0.33,50,18,2.20,0.79,2020
656,Netherlands,Eredivisie,(AJA),Quincy Promes,18,2,1573,12,9.77,0.59,56,30,3.38,1.81,2020
657,Netherlands,Eredivisie,(PSV),Denzel Dumfries,25,0,2363,7,5.72,0.23,45,14,1.81,0.56,2020
658,Netherlands,Eredivisie,,Cyriel Dessers,26,0,2461,15,14.51,0.56,84,43,3.24,1.66,2020


In [107]:
link = 'https://github.com/AshishJangra27/In-One-Go/tree/main/Pandas'
df = pd.read_csv(link)
df.head()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 33, saw 6


### b) Pandas Info Function

Pandas dataframe.info function is used to get a concise summary of the dataframe. It comes really handy when doing exploratory analysis of the data. To get a quick overview of the dataset we use the dataframe.info function.

Syntax: DataFrame.info(verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None)

In [None]:
df.info()

### c) isnull() function to check if there are nan values present

In [None]:
df.isnull()

So we can see we are getting a boolean kind of a table giving True and False

If we use the sum function along with it then we can get how many null values are present in each columns

In [None]:
df.isnull().sum()

### d) Quantile function to get the specific percentile value

Let us check the 80 percentile value of each columns using describe function first

In [None]:
df.describe(percentiles = [0.8])

So we can see the 80th Percentile value of Mins is 2915.80

#### Let us use the quantile function to get the exact value now

In [None]:
df['Mins'].quantile(0.8)

Here we go we got the same value

To get the 99 percentile value we can write

In [None]:
df['Mins'].quantile(0.99)

• This funciton is important as it can be used to treat ourliers in Data Science EDA process

### e) Copy function

If we normal do:
    
de=df

dataframe

Then a change in de will affect the data of df as well so we need to copy in such a way that it creates a totally new object and does not affect the old

In [None]:
de = df.copy()
df.head(3)

In [None]:
de['Year+100'] = de['Year'] + 100
de.head(3)

In [None]:
df.head(3)

• The new column is not present here

### f) Value Counts function

Pandas Series.value_counts function return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.

Syntax: Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)

In [None]:
df['Player Names'].value_counts()

### g) Unique and Nunique Function

While analyzing the data, many times the user wants to see the unique values in a particular column, which can be done using Pandas unique function

In [None]:
df['Player Names'].unique()

While analyzing the data, many times the user wants to see the unique values in a particular column. Pandas unique is used to get a count of unique values

In [None]:
df['Player Names'].nunique()

### h) dropna() function

Sometimes cv file has null values, which are later displayed as NaN in Data Frame. Pandas dropna method allows the user to analyze and drop
Rows/Columns with Null values in different ways.

Syntax:
    
DataFrameName.dropna(axis=0,inplace=False)

axis: axis takes int or string value for rows/columns. Input can be 0 or 1 for Integer and 'index' or 'columns' for String.

In [None]:
 df.isnull().sum()

• ok so it seems like we have alot of Null Values in column Rating and few null values in some other columns

In [None]:
#for rows
df.dropna(inplace = True, axis = 0)

#for columns
df.dropna(inplace = True, axis = 1)

In [None]:
df

### i) Fillna Function

Pandas Series. fillna0 function is used to fill NA/NaN values using the specified method.

Suppose if we want to fill the null values with something instead of removing them then we can use fillna function
Here we will be filling the numerical columns with its mean values and Categorical columns with its mode

In [None]:
df = pd.read_csv('Football.csv')
df

In [None]:
#Numerical columns
mis = round(df['Goals'].mean())
print(mis)
df['Goals'] = df['Goals'].fillna(mis)

If we would have used inplcae=True then it would have permenantly stored those values in our dataframe

Caterogical Values

In [None]:
df['Club'] = df['Club'].fillna('Abdul Rasheed')
df

### i) sample function

Pandas sample is used to generate a sample random row or column from the function caller data frame.
Syntax:

DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)

In [None]:
df.sample(10)

### K) to_csv function

Pandas Series. to_csv0 function write the given series object to a comma-separated values (cs) file/format.

Syntax: Series.to_csv(args, *kwargs)

In [None]:
data = {'Name'        :['Rasheed', 'Mateen', 'Aarfath', 'Aadam', 'Imran', 'Faisal', 'Omer', 'Asna', 'Afia', 'Aqsa', 'Aafan'],
        'Age'         :[24, 26,14,6,26,28,13,13,11,14,4],
        'Address'     :['Dallas', 'Hyderabad', 'Hyderabad', 'Hyderabad','Singapore', 'Hyderabad', 'Hyderabad', 'Hyderabad','Hyderabad','Hyderabad','Hyderabad'],
        'Qualification':['MBA & IT', 'B.com', '10th grade', '1st grade', 'MSC', 'Intermediate', '8th grade', '6th grade', '4th grade', '10th grade', 'UKG' ]
       }
df = pd.DataFrame(data)
df.to_csv('Details.csv', index = False)

In [None]:
d = pd.read_csv('Details.csv')
d

# 4. A detailed Pandas Profile report

The pandas profiling library in Python include a method named as ProfileReport which generate a basic report on the input DataFrame.

The report consist of the following:
    
DataFrame overview, Each attribute on which DataFrame is defined, Correlations between attributes (Pearson Correlation and Spearman Correlation), and A sample of DataFrame.

In [None]:
!pip install pandas_profiling


In [None]:
conda install -c conda-forge pandas-profiling

In [None]:
conda install -c "conda-forge/label/cf201901" pandas-profiling

In [108]:
import pandas_profiling as pp
import matplotlib

In [109]:
df = pd.read_csv('Football.csv')
df.head()

Unnamed: 0,Country,League,Club,Player Names,Matches_Played,Substitution,Mins,Goals,xG,xG Per Avg Match,Shots,OnTarget,Shots Per Avg Match,On Target Per Avg Match,Year
0,Spain,La Liga,(BET),Juanmi Callejon,19,16,1849,11,6.62,0.34,48,20,2.47,1.03,2016
1,Spain,La Liga,(BAR),Antoine Griezmann,36,0,3129,16,11.86,0.36,88,41,2.67,1.24,2016
2,Spain,La Liga,(ATL),Luis Suarez,34,1,2940,28,23.21,0.75,120,57,3.88,1.84,2016
3,Spain,La Liga,(CAR),Ruben Castro,32,3,2842,13,14.06,0.47,117,42,3.91,1.4,2016
4,Spain,La Liga,(VAL),Kevin Gameiro,21,10,1745,13,10.65,0.58,50,23,2.72,1.25,2016


In [111]:
report = pp.ProfileReport(df)
report

  iterable = list(iterable)
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))


  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  return list(map(*args))
  report = pp.ProfileReport(df)
  report = pp.ProfileReport(df)
  report = pp.ProfileReport(df)
  report = pp.ProfileReport(df)
  report = pp.ProfileReport(df)


TypeError: concat() got an unexpected keyword argument 'join_axes'