Here are five functions of the Pandas library along with their execution:

1.'read_csv': This function is used to read data from a CSV file and create a DataFrame.

In [6]:
import pandas as pd

# Read the CSV file and create a DataFrame
data = pd.read_csv('players_data.csv')

# Display the first few rows of the DataFrame
print(data.head())


  Rk         Player Pos Age   Tm   G  GS    MP   FG  FGA  ...   FT%  ORB  DRB  \
0  1     Quincy Acy  PF  24  NYK  68  22  1287  152  331  ...  .784   79  222   
1  2   Jordan Adams  SG  20  MEM  30   0   248   35   86  ...  .609    9   19   
2  3   Steven Adams   C  21  OKC  70  67  1771  217  399  ...  .502  199  324   
3  4    Jeff Adrien  PF  28  MIN  17   0   215   19   44  ...  .579   23   54   
4  5  Arron Afflalo  SG  29  TOT  78  72  2502  375  884  ...  .843   27  220   

   TRB  AST STL BLK  TOV   PF   PTS  
0  301   68  27  22   60  147   398  
1   28   16  16   7   14   24    94  
2  523   66  38  86   99  222   537  
3   77   15   4   9    9   30    60  
4  247  129  41   7  116  167  1035  

[5 rows x 30 columns]


2.'head': This function is used to display the first few rows of a DataFrame.

In [7]:
data=pd.DataFrame({'Name':['Abhi','Sagar','Himanshu','Rahul'],
                   'Age':[26,26,25,24]
})
print(data.head(2))

    Name  Age
0   Abhi   26
1  Sagar   26


3.'info': This function provides a summary of the DataFrame, including the number of non-null values and data types of each column.

In [8]:
print(data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    4 non-null      object
 1   Age     4 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 192.0+ bytes
None


4.'groupby': This function is used for grouping data based on one or more columns and performing aggregations.

In [10]:
data=pd.DataFrame({'Name':['Abhi','Sagar','Himanshu','Rahul'],
                   'Age':[26,26,25,24],
                   'Salary':[25000,48000,10000,5000]
})
# Group the data by 'Name' and calculate the average salary
grouped_data=data.groupby('Name')['Salary'].mean()
print(grouped_data)

Name
Abhi        25000.0
Himanshu    10000.0
Rahul        5000.0
Sagar       48000.0
Name: Salary, dtype: float64


5.'merge': This function is used to merge two DataFrames based on a common column.

In [11]:
import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({'ID': [1, 2, 3],
                    'Name': ['John', 'Alice', 'Bob']})

df2 = pd.DataFrame({'ID': [2, 3, 4],
                    'Age': [25, 30, 35]})

# Merge the DataFrames based on the 'ID' column
merged_data = pd.merge(df1, df2, on='ID')

# Display the merged DataFrame
print(merged_data)


   ID   Name  Age
0   2  Alice   25
1   3    Bob   30


Using the reset_index() method in Pandas along with some basic Python logic to generate the new index values. Here's a Python function that re-indexes the DataFrame as requested:

In [1]:
import pandas as pd

def reindex_dataframe(df):
    df = df.reset_index(drop=True)
    df.index = df.index.map(lambda x: x * 2 + 1)
    return df


In this function, reset_index(drop=True) resets the existing index of the DataFrame, and index.map(lambda x: x * 2 + 1) assigns the new index values by multiplying the current index by 2 and adding 1 to each value.

Here's an example usage of the function:

In [3]:
# Sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

# Re-indexing the DataFrame
df_reindexed = reindex_dataframe(df)

print(df_reindexed)


   A  B  C
1  1  4  7
3  2  5  8
5  3  6  9


The DataFrame df_reindexed now has a new index starting from 1 and incrementing by 2 for each row.

You can use the head method in Pandas to extract the first three values from the 'Values' column of the DataFrame and then calculate their sum using the sum method. Here's an example of a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column:


In [6]:
import pandas as pd
def calculate_sum(df):
    values=df['Values'].head(3)
    sum_of_values=values.sum()
    print('Sum of first three values:',sum_of_values)

In this function, df['Values'].head(3) retrieves the first three values from the 'Values' column of the DataFrame. Then, values.sum() calculates the sum of those three values.

Here's an example usage of the function:

In [7]:
#Sample dataframe
df=pd.DataFrame({'Values':[10,20,30,40,50]})
calculate_sum(df)

Sum of first three values: 60


To create a new column 'Word_Count' in a Pandas DataFrame that contains the number of words in each row of the 'Text' column, you can use the apply method along with a lambda function. Here's an example of a Python function that achieves this:

In [10]:
import pandas as pd
def count_words(df):
    df['Word_Count']=df['Text'].apply(lambda x:len(str(x).split()))
    return df

In this function, df['Text'].apply(lambda x: len(str(x).split())) applies a lambda function to each element of the 'Text' column. The lambda function converts each element to a string (str(x)), splits it into words using the default whitespace delimiter (split()), and then calculates the length of the resulting list of words (len()).

The new column 'Word_Count' is assigned the calculated word counts for each row using df['Word_Count'] = ....

Here's an example usage of the function:

In [11]:
df=pd.DataFrame({'Text':['Hello world','My name is Abhishek Mishra','PW skills course is amazing']})
df_with_word_count=count_words(df)
print(df_with_word_count)

                          Text  Word_Count
0                  Hello world           2
1   My name is Abhishek Mishra           5
2  PW skills course is amazing           5


DataFrame.size is an attribute in Pandas that returns the total number of elements in the DataFrame. It is calculated by multiplying the number of rows by the number of columns. In other words, DataFrame.size gives the total count of all cells or elements in the DataFrame.

On the other hand, DataFrame.shape is also an attribute that returns a tuple representing the dimensions of the DataFrame. It provides two values: the number of rows and the number of columns in the DataFrame, respectively. The shape is expressed as (number_of_rows, number_of_columns).

To summarize:

DataFrame.size gives the total number of elements (cells) in the DataFrame.

DataFrame.shape gives the dimensions of the DataFrame in terms of the number of rows and columns.

Let's consider an example to illustrate the difference:

In [12]:
import pandas as pd
df=pd.DataFrame({'A':[1,2,3],'B':[4,5,6],'C':[7,8,9]})
print("Dataframe size:",df.size)
print("Dataframe shape:",df.shape)

Dataframe size: 9
Dataframe shape: (3, 3)


The DataFrame df has 9 elements (cells) in total, so df.size returns 9. The shape of the DataFrame is (3, 3), indicating that it has 3 rows and 3 columns.

In Pandas, you can use the pd.read_excel() function to read an Excel file into a DataFrame. This function allows you to read data from an Excel file and store it in a tabular format in a Pandas DataFrame.

Here's the basic syntax of pd.read_excel():

import pandas as pd

df = pd.read_excel('file_path.xlsx')

In the above code, you need to replace 'file_path.xlsx' with the actual file path of the Excel file you want to read. If the Excel file is in the same directory as your Python script or notebook, you can simply provide the file name, like 'example.xlsx'. Otherwise, you need to specify the full file path.

The pd.read_excel() function can handle various options and configurations, such as specifying the sheet name (if the Excel file contains multiple sheets), skipping rows or columns, setting the index, and more. You can explore the pandas documentation for additional parameters and details on how to use pd.read_excel() based on your specific requirements.


To extract the username part from email addresses in the 'Email' column of a Pandas DataFrame and store it in a new column 'Username', you can use the str.split() method along with the str.get() method. Here's an example of a Python function that achieves this:

In [13]:
import pandas as pd
def extract_username(df):
    df['Username']=df['Email'].str.split('@').str.get(0)
    return df

In this function, df['Email'].str.split('@') splits each email address in the 'Email' column by the '@' symbol, creating a list of two parts: the username and the domain. Then, str.get(0) extracts the first element (the username) from each list.

The new column 'Username' is assigned the extracted usernames using df['Username'] = ....

Here's an example usage of the function:

In [14]:
# Sample DataFrame
df = pd.DataFrame({'Email': ['john.doe@example.com', 'jane.smith@example.com']})

# Extracting usernames and creating the 'Username' column
df_with_username = extract_username(df)

print(df_with_username)

                    Email    Username
0    john.doe@example.com    john.doe
1  jane.smith@example.com  jane.smith


In [20]:
import pandas as pd
def select_row(df):
    selected_rows=df[(df['A']>5) & (df['B']<10)]
    return selected_rows

In this function, df[(df['A'] > 5) & (df['B'] < 10)] creates a boolean mask by checking two conditions: df['A'] > 5 and df['B'] < 10. The & operator is used to combine the conditions with the logical AND operation.

The resulting boolean mask is used to select the rows from the original DataFrame where both conditions are True, creating a new DataFrame containing only the selected rows.

Here's an example usage of the function with the provided sample DataFrame:

In [21]:
df=pd.DataFrame({'A':[3,8,6,2,9],'B':[5,2,9,3,1],'C':[1,7,4,5,2]})
selected_df=select_row(df)
print(selected_df)

   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


The function selects the rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The resulting DataFrame selected_df contains only the selected rows.

To calculate the mean, median, and standard deviation of the values in the 'Values' column of a Pandas DataFrame, you can use the mean(), median(), and std() functions provided by Pandas. Here's an example of a Python function that calculates these statistics:

In [25]:
import pandas as pd
def calculate_statistics(df):
    mean_value=df['Values'].mean()
    median_value=df['Values'].median()
    std_value=df['Values'].std()
    return mean_value,median_value,std_value

In this function, df['Values'].mean() calculates the mean value of the 'Values' column, df['Values'].median() calculates the median value, and df['Values'].std() calculates the standard deviation.

The calculated statistics are stored in separate variables (mean_value, median_value, and std_value).

Here's an example usage of the function:

In [26]:
# Sample DataFrame
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})

# Calculating statistics
mean, median, std = calculate_statistics(df)

print("Mean:", mean)
print("Median:", median)
print("Standard Deviation:", std)


Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


To calculate the moving average of the 'Sales' column in a Pandas DataFrame over a window of 7 days, including the current day, you can use the rolling() function along with the mean() function. Here's an example of a Python function that creates a new column 'MovingAverage' with the desired calculation:

In [6]:
import pandas as pd
def calculate_moving_average(df):
    df['MovingAverage']=df['Sales'].rolling(window=7,min_periods=1).mean()
    return df

In this function, df['Sales'].rolling(window=7, min_periods=1) creates a rolling window of size 7 on the 'Sales' column. The min_periods=1 argument ensures that the moving average calculation starts from the first day.

The mean() function is then applied to the rolling window to calculate the average for each window.

The new column 'MovingAverage' is assigned the calculated moving averages using df['MovingAverage'] = ....

Here's an example usage of the function:

In [7]:
#Sample DataFrame
df=pd.DataFrame({'Data':pd.date_range(start='2023-07-03',periods=10),
                 'Sales':[10,20,30,40,50,60,70,80,90,100]
})
#Caluclating the moving average
df_with_moving_average=calculate_moving_average(df)
print(df_with_moving_average)

        Data  Sales  MovingAverage
0 2023-07-03     10           10.0
1 2023-07-04     20           15.0
2 2023-07-05     30           20.0
3 2023-07-06     40           25.0
4 2023-07-07     50           30.0
5 2023-07-08     60           35.0
6 2023-07-09     70           40.0
7 2023-07-10     80           50.0
8 2023-07-11     90           60.0
9 2023-07-12    100           70.0


To create a new column 'Weekday' in a Pandas DataFrame that contains the weekday names corresponding to the dates in the 'Date' column, you can use the dt accessor and the strftime() method. Here's an example of a Python function that achieves this:

In [9]:
import pandas as pd
def add_weekday_column(df):
    df['Weekday']=df['Date'].dt.strftime('%A')
    return df

In this function, df['Date'].dt.strftime('%A') uses the dt accessor to access the datetime properties of the 'Date' column and then applies the strftime('%A') method to format the dates as weekday names.

The %A format code is used to represent the full weekday name, such as 'Sunday', 'Monday', etc.

The new column 'Weekday' is assigned the formatted weekday names using df['Weekday'] = ....

Here's an example usage of the function:

In [10]:
# Sample DataFrame
df = pd.DataFrame({'Date': pd.date_range(start='2023-01-01', periods=5)})

# Adding the 'Weekday' column
df_with_weekday = add_weekday_column(df)

print(df_with_weekday)


        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


To select all rows from a Pandas DataFrame where the date in the 'Date' column falls between '2023-01-01' and '2023-01-31', you can use boolean indexing along with the pd.to_datetime() function. Here's an example of a Python function that achieves this:

In [13]:
import pandas as pd
def select_rows_between_dates(df):
    start_date=pd.to_datetime('2023-01-01')
    end_date=pd.to_datetime('2023-01-31')
    selected_rows=df[(df['Date']>=start_date) & (df['Date']<=end_date)]
    return selected_rows

In this function, pd.to_datetime('2023-01-01') and pd.to_datetime('2023-01-31') convert the start and end date strings into Pandas Timestamp objects.

The boolean indexing df[(df['Date'] >= start_date) & (df['Date'] <= end_date)] selects the rows where the 'Date' falls between the start and end dates, inclusively.

The resulting DataFrame selected_rows contains only the selected rows.

Here's an example usage of the function:

In [14]:
# Sample DataFrame
df = pd.DataFrame({'Date': pd.date_range(start='2023-01-01', end='2023-02-28')})

# Selecting rows between dates
selected_df = select_rows_between_dates(df)

print(selected_df)


         Date
0  2023-01-01
1  2023-01-02
2  2023-01-03
3  2023-01-04
4  2023-01-05
5  2023-01-06
6  2023-01-07
7  2023-01-08
8  2023-01-09
9  2023-01-10
10 2023-01-11
11 2023-01-12
12 2023-01-13
13 2023-01-14
14 2023-01-15
15 2023-01-16
16 2023-01-17
17 2023-01-18
18 2023-01-19
19 2023-01-20
20 2023-01-21
21 2023-01-22
22 2023-01-23
23 2023-01-24
24 2023-01-25
25 2023-01-26
26 2023-01-27
27 2023-01-28
28 2023-01-29
29 2023-01-30
30 2023-01-31


The first and foremost library that needs to be imported to use the basic functions of Pandas is the pandas library itself. The pandas library provides data structures and data analysis tools for handling and manipulating structured data, including the DataFrame object that is widely used for data manipulation.

To import the pandas library in Python, you can use the following import statement:

import pandas as pd

The pd alias is a commonly used convention to refer to the pandas library in Python code. It allows you to use the Pandas functions and classes by prefixing them with pd, such as pd.DataFrame() or pd.read_csv().

Once you have imported the pandas library, you can start using its functions and classes to work with tabular data in a DataFrame format and perform various data manipulation and analysis tasks.